News

Welcome to End Point’s blog

Ongoing observations by End Point people

Bucardo replication workarounds for extremely large Postgres updates

Bucardo is very good at replicating data among Postgres databases (as well as replicating to other things, such as MariaDB, Oracle, and Redis!). However, sometimes you need to work outside the normal flow of trigger-based replication systems such as Bucardo. One such scenario is when many changes need to be made to your replicated tables. And by a lot, I mean many millions of rows. When this happens, it may be faster and easier to find an alternate way to replicate those changes.

When a change is made to a table that is being replicated by Bucardo, a trigger fires and stores the primary key of the row that was changed into a "delta" table. Then the Bucardo daemon comes along, gathers a list of all rows that were changed since the last time it checked, and pushes those rows to the other databases in the sync (a named replication set). Although all of this is done in a fast and efficient manner, there is a bit of overhead that adds up when (for example), updating 650 million rows in one transaction.

The first and best solution is to simply hand-apply all the changes yourself to every database you are replicating to. By disabling the Bucardo triggers first, you can prevent Bucardo from even knowing, or caring, that the changes have been made.

To demonstrate this, let's have Bucardo replicate among five pgbench databases, called A, B, C, D, and E. Databases A, B, and C will be sources; D and E are just targets. Our replication looks like this: ( A <=> B <=> C ) => (D, E). First, we create all the databases and populate them:

## Create a new cluster for this test, and use port 5950 to minimize impact
$ initdb --data-checksums btest
$ echo port=5950 >> btest/postgresql.conf
$ pg_ctl start -D btest -l logfile

## Create the main database and install the pg_bench schema into it
$ export PGPORT=5950
$ createdb alpha
$ pgbench alpha -i --foreign-keys

## Replicated tables need a primary key, so we need to modify things a little:
$ psql alpha -c 'alter table pgbench_history add column hid serial primary key'

## Create the other four databases as exact copies of the first one:
$ for dbname in beta gamma delta epsilon; do createdb $dbname -T alpha; done

Now that those are done, let's install Bucardo, teach it about these databases, and create a sync to replicate among them as described above.

$ bucardo install --batch
$ bucardo add dbs A,B,C,D,E dbname=alpha,beta,gamma,delta,epsilon dbport=5950
$ bucardo add sync fiveway tables=all dbs=A:source,B:source,C:source,D:target,E:target

## Tweak a few default locations to make our tests easier:
$ echo -e "logdest=.\npiddir=." > .bucardorc

At this point, we have five databases all ready to go, and Bucardo is setup to replicate among them. Let's do a quick test to make sure everything is working as it should.

$ bucardo start
Checking for existing processes
Starting Bucardo

$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "select '$db',sum(abalance) from pgbench_accounts";done | tr "\n" " "
alpha|0 beta|0 gamma|0 delta|0 epsilon|0

$ pgbench alpha
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average: 0.000 ms
tps = 60.847066 (including connections establishing)
tps = 62.877481 (excluding connections establishing)

$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "select '$db',sum(abalance) from pgbench_accounts";done | tr "\n" " "
alpha|6576 beta|6576 gamma|6576 delta|6576 epsilon|6576

$ pgbench beta
starting vacuum...end.
...
tps = 60.728681 (including connections establishing)
tps = 62.689074 (excluding connections establishing)
$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "select '$db',sum(abalance) from pgbench_accounts";done | tr "\n" " "
alpha|7065 beta|7065 gamma|7065 delta|7065 epsilon|7065

Let's imagine that the bank discovered a huge financial error, and needed to increase the balance of every account created in the last two years by 20 dollars. Let's further imagine that this involved 650 million customers. That UPDATE will take a very long time, but will suffer even more because each update will also fire a Bucardo trigger, which in turn will write to another "delta" table. Then, Bucardo will have to read in 650 million rows from the delta table, and (on every other database in the sync) apply those changes by deleting 650 million rows then COPYing over the correct values. This is one situation where you want to sidestep your replication and handle things yourself. There are three solutions to this. The easiest, as mentioned, is to simply do all the changes yourself and prevent Bucardo from worrying about it.

The basic plan is to apply the updates on all the databases in the syncs at once, while using the session_replication_role feature to prevent the triggers from firing. Of course, this will prevent *all* of the triggers on the table from firing. If there are some non-Bucardo triggers that must fire during this update, you might wish to temporarily set them as ALWAYS triggers.

Solution one: manual copy

## First, stop Bucardo. Although not necessary, the databases are going to be busy enough
## that we don't need to worry about Bucardo at the moment.
$ bucardo stop
Creating ./fullstopbucardo ... Done

## In real-life, this query should get run in parallel across all databases,
## which would be on different servers:
$ QUERY='UPDATE pgbench_accounts SET abalance = abalance + 25 WHERE aid > 78657769;'

$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "SET session_replication_role='replica'; $QUERY"; done | tr "\n" " "
UPDATE 83848570 UPDATE 83848570 UPDATE 83848570 UPDATE 83848570 UPDATE 83848570 

## For good measure, confirm Bucardo did not try to replicate all those rows:
$ bucardo kick fiveway
Kicked sync fiveway

$ grep Totals log.bucardo
(11144) [Mon May 16 23:08:57 2016] KID (fiveway) Totals: deletes=36 inserts=28 conflicts=0
(11144) [Mon May 16 23:09:02 2016] KID (fiveway) Totals: deletes=38 inserts=29 conflicts=0
(11144) [Mon May 16 23:09:22 2016] KID (fiveway) Totals: deletes=34 inserts=27 conflicts=0
(11144) [Tue May 16 23:15:08 2016] KID (fiveway) Totals: deletes=10 inserts=7 conflicts=0
(11144) [Tue May 16 23:59:00 2016] KID (fiveway) Totals: deletes=126 inserts=73 conflicts=0

Solution two: truncate the delta

As a second solution, what about the event involving a junior DBA who made all those updates on one of the source databases without disabling triggers? When this happens, you would probably find that your databases are all backed up and waiting for Bucardo to handle the giant replication job. If the rows that have changed constitute most of the total rows in the table, your best bet is to simply copy the entire table. You will also need to stop the Bucardo daemon, and prevent it from trying to replicate those rows when it starts up by cleaning out the delta table. As a first step, stop the main Bucardo daemon, and then forcibly stop any active Bucardo processes:

$ bucardo stop
Creating ./fullstopbucardo ... Done

$ pkill -15 Bucardo

Now to clean out the delta table. In this example, the junior DBA updated the "beta" database, so we look there. We may go ahead and truncate it because we are going to copy the entire table after that point.

# The delta tables follow a simple format. Make sure it is the correct one
$ psql beta -Atc 'select count(*) from bucardo.delta_public_pgbench_accounts'
650000000
## Yes, this must be the one!

## Truncates are dangerous; be extra careful from this point forward
$ psql beta -Atc 'truncate table bucardo.delta_public_pgbench_accounts'

The delta table will continue to accumulate changes as applications update the table, but that is okay - we got rid of the 650 million rows. Now we know that beta has the canonical information, and we need to get it to all the others. As before, we use session_replication_role. However, we also need to ensure that nobody else will try to add rows before our COPY gets in there, so if you have active source databases, pause your applications. Or simply shut them out for a while via pg_hba.conf! Once that is done, we can copy the data until all databases are identical to "beta":

$ ( echo "SET session_replication_role='replica'; TRUNCATE TABLE pgbench_accounts; " ; pg_dump beta --section=data -t pgbench_accounts ) | psql alpha -1 --set ON_ERROR_STOP=on
SET
ERROR:  cannot truncate a table referenced in a foreign key constraint
DETAIL:  Table "pgbench_history" references "pgbench_accounts".
HINT:  Truncate table "pgbench_history" at the same time, or use TRUNCATE ... CASCADE.

Aha! Note that we used the --foreign-keys option when creating the pgbench tables above. We will need to remove the foreign key, or simply copy both tables together. Let's do the latter:

$ ( echo "SET session_replication_role='replica'; TRUNCATE TABLE pgbench_accounts, pgbench_history; " ; pg_dump beta --section=data -t pgbench_accounts \
  -t pgbench_history) | psql alpha -1 --set ON_ERROR_STOP=on
SET
TRUNCATE TABLE
SET
SET
SET
SET
SET
SET
SET
SET
COPY 100000
COPY 10
 setval 
--------
     30
(1 row)
## Do the same for the other databases:
$ for db in gamma delta epsilon; do \
 ( echo "SET session_replication_role='replica'; TRUNCATE TABLE pgbench_accounts, pgbench_history; " ; pg_dump $db --section=data -t pgbench_accounts \
  -t pgbench_history) | psql alpha -1 --set ON_ERROR_STOP=on ; done

Note: if your tables have a lot of constraints or indexes, you may want to disable those to speed up the COPY. Or even turn fsync off. But that's the topic of another post.

Solution three: delta excision

Our final solution is a variant on the last one. As before, the junior DBA has done a mass update of one of the databases involved in the Bucardo sync. But this time, you decide it should be easier to simply remove the deltas and apply the changes manually. As before, we shut down Bucardo. Then we determine the timestamp of the mass change by checking the delta table closely:

$ psql beta -Atc 'select txntime, count(*) from bucardo.delta_public_pgbench_accounts group by 1 order by 2 desc limit 3'
2016-05-26 23:23:27.252352-04|65826965
2016-05-26 23:23:22.460731-04|80
2016-05-07 23:20:46.325105-04|73
2016-05-26 23:23:33.501002-04|69

Now we want to carefully excise those deltas. With that many rows, it is quicker to save/truncate/copy than to do a delete:

$ psql beta
beta=# BEGIN;
BEGIN
## To prevent anyone from firing the triggers that write to our delta table
beta=#LOCK TABLE pgbench_accounts;
LOCK TABLE
## Copy all the delta rows we want to save:
beta=# CREATE TEMP TABLE bucardo_store_deltas AS SELECT * FROM bucardo.delta_public_pgbench_accounts WHERE txntime <> '2016-05-07 23:20:46.325105-04';
SELECT 1885
beta=# TRUNCATE TABLE bucardo.delta_public_pgbench_accounts;
TRUNCATE TABLE
## Repopulate the delta table with our saved edits
beta=# INSERT INTO bucardo.delta_public_pgbench_accounts SELECT * FROM bucardo_store_deltas;
INSERT 0 1885
## This will remove the temp table
beta=# COMMIT;
COMMIT

Now that the deltas are removed, we want to emulate what caused them on all the other servers. Note that this query is a contrived one that may lend itself to concurrency issues. If you go this route, make sure your query will produce the exact same results on all the servers.

## As in the first solution above, this should ideally run in parallel
$ QUERY='UPDATE pgbench_accounts SET abalance = abalance + 25 WHERE aid > 78657769;'

## Unlike before, we do NOT run this against beta
$ for db in alpha gamma delta epsilon; do psql $db -Atc "SET session_replication_role='replica'; $QUERY"; done | tr "\n" " "
UPDATE 837265 UPDATE 837265 UPDATE 837265 UPDATE 837265 UPDATE 837265 

## Now we can start Bucardo up again
$ bucardo start
Checking for existing processes
Starting Bucardo

That concludes the solutions for when you have to make a LOT of changes to your database. How do you know how much is enough to worry about the solutions presented here? Generally, you can simply let Bucardo run - you will know when everything crawls to a halt that perhaps trying to insert 465 million rows at once was a bad idea. :)

A caching, resizing, reverse proxying image server with Nginx

While working on a complex project, we had to set up a caching reverse proxying image server with the ability of automatically resize any cached image on the fly.

Looking around on the Internet, we discovered an amazing blog post describing how Nginx could do that with a neat Image Filter module capable of resizing, cropping and rotating images, creating an Nginx-only solution.

What we wanted


What we wanted to achieve in our test configuration was to have a URL like:

http://www.example.com/image/<width>x<height>/<URL>

...that would retrieve the image at:

https://upload.wikimedia.org/<URL>

...then resize it on the fly, cache it and serve it.

Our setup ended up being almost the same as the one in that blog post, with some slight differences.

Requirements installation


First, as the post points out, the Image Filter module is not installed by default on many Linux distributions. As we're using Nginx's official repositories, it was just a matter of installing the nginx_module_image_filter package and restarting the service.

Cache Storage configuration


Continuing following the post's great instructions, we set up the cache in our main http section, tuning each parameter to fit ur specific needs. We wanted a 10MB storage space for keys and 100MB for actual images, that will be removed after not being accessed for 40 days. The main configuration entry was then:

proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=nginx_cache:10M max_size=100M inactive=40d;

This went straight in the http section of nginx.conf.

Caching Proxy configuration


Next, we configured our front facing virtual host. In our case, we needed the reverse proxy to live within an already existing site, and that's why we chose the /image/ path prefix.

server {
      ...
  
      location /image/ {
          proxy_pass http://127.0.0.1:20000;
          proxy_cache nginx_cache;
          proxy_cache_key "$proxy_host$uri$is_args$args";
      }
  
      location / {
          # other locations we may need for the site.
          root /var/www/whatever;
      }
  
  }

Every URL starting with /image/ would be server from the cache if present, otherwise it would be proxied to our Resizing Server, and cached for 30 days, as desired.

Resizing Server configuration


We then configured the resizing server, using a regexp to extract the width, height and URL of the image we desire.

The server will proxy the request to https://upload.wikimedia.org/ looking for the image, resize it and then serve it back to the Caching Proxy. We preferred to keep it simple and tidy, as we didn't actually need any aws-related configuration as the blog post did.

server {
      ...
  
      location ~ ^/image/([0-9]+)x([0-9]+)/(.+) {
          image_filter_buffer 20M; # Will return 415 if image is bigger than this
          image_filter_jpeg_quality 75; # Desired JPG quality
          image_filter_interlace on; # For progressive JPG
  
          image_filter resize $1 $2;
  
          proxy_pass https://upload.wikimedia.org/$3;
      }
  
  }

Note that we may also use image_filter resize and crop options, should we need different results than just resizing.

Testing the final result


You should now be able to fire up your browser and access an URL like:

http://www.example.com/image/150x150/wikipedia/commons/0/01/Tiger.25.jpg

...and enjoy your caching, resizing, reverse proxying image server.

Optionally securing access to your image server


As this was not a public server, we didn't use any security mechanism to validate the request.

The original blog post, though, reports a very simple and clever way to prevent abuse from unauthorized access, using the Secure Link module.

To access your server you will now need to add an auth parameter to the request, with a secure token that can be easily calculated as an MD5 hash.

This is the simple Bash command we used to test it:

echo -n '/image/150x150/wikipedia/commons/0/01/Tiger.25.jpg your_secret' | openssl md5 -binary | openssl base64 | tr +/ -_ | tr -d =

...and the resulting URL would be:

http://www.example.com/image/150x150/wikipedia/commons/0/01/Tiger.25.jpg?auth=TwcXg954Rhkjt1RK8IO4jA

Conclusions


Thanks to Charles Leifer for explaining his findings so well and giving us a smooth path with only minor tweaks to make our project work.

How to Add Labels to a Dimple JS Line Chart

I was recently working on a project that was using DimpleJS, which the docs describe as "An object-oriented API for business analytics powered by d3". I was using it to create a variety of graphs, some of which were line graphs. The client had requested that the line graph display the y-value of the line on the graph. This is easily accomplished with bar graphs in Dimple, however, not so easily done with line graphs.

I had spent some time Googling to find what others had done to add this functionality but could not find it anywhere. So, I read the documentation where they add labels to a bar graph, and "tweaked" it like so:

var s = myChart.addSeries(null, dimple.plot.line);
.
.
.
/*Add prices to line chart*/
s.afterDraw = function (shape, data) {
  // Get the shape as a d3 selection
  var s = d3.select(shape);
  var i = 0;
  _.forEach(data.points, function(point) {
    var rect = {
    x: parseFloat(point.x),
    y: parseFloat(point.y)
  };
  // Add a text label for the value
  if(data.markerData[i] != undefined) {
    svg.append("text")
    .attr("x", rect.x)
    .attr("y", rect.y - 10)
    // Centre align
    .style("text-anchor", "middle")
    .style("font-size", "10px")
    .style("font-family", "sans-serif")
    // Format the number
    .text(data.markerData[i].y);
  }
  i++
});

Some styling still needs to be done but you can see that the y-values are now placed on the line graph. We are using lodash on this project but if you do not want to use lodash, just replace the _.forEach (line 10)and this technique should just plug in for you.

If you're reading this it's likely you've run into the same or similar issue and I hope this helps you!

Randomized Queries in Ruby on Rails

I was recently asked about options for displaying a random set of items from a table using Ruby on Rails. The request was complicated by the fact that the technology stack hadn’t been completely decided on and one of the items still up in the air was the database. I’ve had an experience with a project I was working on where the decision was made to switch from MySQL to PostgreSQL. During the switch, a sizable amount of hand constructed queries stopped functioning and had to be manually translated before they would work again. Learning from that experience, I favor avoidance of handwritten SQL in my Rails queries when possible. This precludes the option to use built-in database functions like rand() or random().

With the goal set in mind, I decided to look around to find out what other people were doing to solve similar requests. While perusing various suggested implementations, I noticed a lot of comments along the lines of “Don’t use this approach if you have a large data set.” or “this handles large data sets, but won’t always give a truly random result.”

These comments and the variety of solutions got me thinking about evaluating based not only on what database is in use, but what the dataset is expected to look like. I really enjoyed the mental gymnastics and thought others might as well.

Let’s pretend we’re working on an average project. The table we’ll be pulling from has several thousand entries and we want to pull back something small like 3-5 random records. The most common solution offered based on the research I performed works perfectly for this situation.

records_desired = 3
count = [OurObject.count, 1].max
offsets = records_desired.times.inject([]) do |offsets|
  offsets << rand(count)
end
while count > offsets.uniq!.size && offsets.size < records_desired do
  offsets << rand(count)
end
offsets.collect {|offset| OurObject.offset(offset).first}

Analyzing this approach, we’re looking at minimal processing time and a total of four queries. One to determine the total count and the rest to fetch each of our three objects individually. Seems perfectly reasonable.

What happens if our client needs 100 random records at a time? The processing is still probably within tolerances, but 101 queries? I say no unless our table is Dalmations! Let’s see if we can tweak things to be more large-set friendly.

records_desired = 100
count = [OurObject.count - records_desired, 1].max
offset = rand(count)
OurObject.limit(records_desired).offset(offset)

How’s this look? Very minimal processing and only 2 queries. Fantastic! But is this result going to appear random to an observer? I think it’s highly possible that you could end up with runs of related looking objects (created at similar times or all updated recently). When people say they want random, they often really mean they want unrelated. Is this solution close enough for most clients? I would say it probably is. But I can imagine the possibility that for some it might not be. Is there something else we can tweak to get a more desirable sampling without blowing processing time sky-high? After a little thought, this is what I came up with.

records_desired = 100
count = records_desired * 3
offset = rand([OurObject.count - count, 1].max)
set = OurObject.limit(count).offset(offset).pluck(:id)
OurObject.find(ids.sample(records_desired))

While this approach may not truly provide more random results from a mathematical perspective, by assembling a larger subset and pulling randomly from inside it, I think you may be able to more closely achieve the feel of what people expect from randomness if the previous method seemed to return too many similar records for your needs.

Sketchfab on the Liquid Galaxy

For the last few weeks, our developers have been working on syncing our Liquid Galaxy with Sketchfab. Our integration makes use of the Sketchfab API to synchronize multiple instances of Sketchfab in the immersive and panoramic environment of the Liquid Galaxy. The Liquid Galaxy already has so many amazing capabilities, and to be able to add Sketchfab to our portfolio is very exciting for us! Sketchfab, known as the “YouTube for 3D files,” is the leading platform to publish and find 3D and VR content. Sketchfab integrates with all major 3D creation tools and publishing platforms, and is the 3D publishing partner of Adobe Photoshop, Facebook, Microsoft HoloLens and Intel RealSense. Given that Sketchfab can sync with almost any 3D format, we are excited about the new capabilities our integration provides.

Sketchfab content can be deployed onto the system in minutes! Users from many industries use Sketchfab, including architecture, hospitals, museums, gaming, design, and education. There is a natural overlap between the Liquid Galaxy and Sketchfab, as members of all of these industries utilize the Liquid Galaxy for its visually stunning and immersive atmosphere.

We recently had Alban Denoyel, cofounder of Sketchfab, into our office to demo Sketchfab on the Liquid Galaxy. We're happy to report that Alban loved it! He told us about new features that are going to be coming out on Sketchfab soon. These features will automatically roll out to Sketchfab content on the Liquid Galaxy system, and will serve to make the Liquid Galaxy's pull with 3D modeling even greater.

We’re thrilled with how well Sketchfab works on our Liquid Galaxy as is, but we’re in the process of making it even more impressive. Some Sketchfab models take a bit of time to load (on their website and on our system), so our developers are working on having models load in the background so they can be activated instantaneously on the system. We will also be extending our Sketchfab implementation to make use of some of the features already present on Sketchfab's excellent API, including displaying model annotations and animating the models.

You can view a video of Sketchfab content on the Liquid Galaxy below. If you'd like to learn more, you can call us at 212-929-6923, or contact us here.

Gem Dependency Issues with Rails 5 Beta

The third-party gem ecosystem is one of the biggest selling points of Rails development, but the addition of a single line to your project's Gemfile can introduce literally dozens of new dependencies. A compatibility issue in any one of those gems can bring your development to a halt, and the transition to a new major version of Rails requires even more caution when managing your gem dependencies.

In this post I'll illustrate this issue by showing the steps required to get rails_admin (one of the two most popular admin interface gems for Rails) up and running even partially on a freshly-generated Rails 5 project. I'll also identify some techniques for getting unreleased and forked versions of gems installed as stopgap measures to unblock your development while the gem ecosystem catches up to the new version of Rails.

After installing the current beta3 version of Rails 5 with gem install rails --pre and creating a Rails 5 project with rails new I decided to address the first requirement of my application, admin interface, by installing the popular Rails Admin gem. The rubygems page for rails_admin shows that its most recent release 0.8.1 from mid-November 2015 lists Rails 4 as a requirement. And indeed, trying to install rails_admin 0.8.1 in a Rails 5 app via bundler fails with a dependency error:

Resolving dependencies...
Bundler could not find compatible versions for gem "rails":
In snapshot (Gemfile.lock):
rails (= 5.0.0.beta3)

In Gemfile:
rails (< 5.1, >= 5.0.0.beta3)

rails_admin (~> 0.8.1) was resolved to 0.8.1, which depends on
rails (~> 4.0)

I took a look at the GitHub page for rails_admin and noticed that recent commits make reference to Rails 5, which is an encouraging sign that its developers are working on adding compatibility with Rails 5. Looking at the gemspec in the master branch on GitHub shows that the rails_admin gem dependency has been broadened to include both Rails 4 and 5, so I updated my app's Gemfile to install rails_admin directly from the master branch on GitHub:

gem 'rails_admin', github: 'sferik/rails_admin'

This solved the above dependency of rails_admin on Rails 4 but revealed some new issues with gems that rails_admin itself depends on:

Resolving dependencies...
Bundler could not find compatible versions for gem "rack":
In snapshot (Gemfile.lock):
rack (= 2.0.0.alpha)

In Gemfile:
rails (< 5.1, >= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionmailer (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionpack (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
rack (~> 2.x)

rails_admin was resolved to 0.8.1, which depends on
rack-pjax (~> 0.7) was resolved to 0.7.0, which depends on
rack (~> 1.3)

rails (< 5.1, >= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionmailer (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionpack (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
rack-test (~> 0.6.3) was resolved to 0.6.3, which depends on
rack (>= 1.0)

rails_admin was resolved to 0.8.1, which depends on
sass-rails (< 6, >= 4.0) was resolved to 5.0.4, which depends on
sprockets (< 4.0, >= 2.8) was resolved to 3.6.0, which depends on
rack (< 3, > 1)

This bundler output shows a conflict where Rails 5 depends on rack 2.x while rails_admin's rack-pjax dependency depends on rack 1.x. I ended up resorting to a Google search which led me to the following issue in the rails_admin repo: https://github.com/sferik/rails_admin/issues/2532

Installing rack-pjax from GitHub:

gem 'rack-pjax', github: 'afcapel/rack-pjax', branch: 'master'

resolves the rack dependency conflict, and bundle install now completes without error. Things are looking up! At least until you try to run the Rake task to rails g rails_admin:install and you're presented with this mess:

/Users/patrick/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/actionpack-5.0.0.beta3/lib/action_dispatch/middleware/stack.rb:108:in `assert_index': No such middleware to insert after: ActionDispatch::ParamsParser (RuntimeError)
from /Users/patrick/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/actionpack-5.0.0.beta3/lib/action_dispatch/middleware/stack.rb:80:in `insert_after'

This error is more difficult to understand, especially given the fact that the culprit (the remotipart gem) is not actually mentioned anywhere in the error. Thankfully, commenters on the above-mentioned rails_admin issue #2532 were able to identify the remotipart gem as the source of this error and provide a link to a forked version of that gem which allows rails_admin:install to complete successfully (albeit with some functionality still not working).

In the end, my Gemfile looked something like this:

gem 'rails_admin', github: 'sferik/rails_admin'
# Use github rack-pjax to fix dependency versioning issue with Rails 5
# https://github.com/sferik/rails_admin/issues/2532
gem 'rack-pjax', github: 'afcapel/rack-pjax'
# Use forked remotipart until following issues are resolved
# https://github.com/JangoSteve/remotipart/issues/139
# https://github.com/sferik/rails_admin/issues/2532
gem 'remotipart', github: 'mshibuya/remotipart', ref: '3a6acb3'

A total of three unreleased versions of gems, including the forked remotipart gem that breaks some functionality, just to get rails_admin installed and up and running enough to start working with. And some technical debt in the form of comments about follow-up tasks to revisit the various gems as they have new versions released for Rails 5 compatibility.

This process has been a reminder that when working in a Rails 4 app it's easy to take for granted the ability to install gems and have them 'just work' in your application. When dealing with pre-release versions of Rails, don't be surprised when you have to do some investigative work to figure out why gems are failing to install or work as expected.

My experience has also underscored the importance of understanding all of your application's gem dependencies and having some awareness of their developers' intentions when it comes to keeping their gems current with new versions of Rails. As a developer it's in your best interest to minimize the amount of dependencies in your application, because adding just one gem (which turns out to have a dozen of its own dependencies) can greatly increase the potential for encountering incompatibilities.