End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

A few PostgreSQL tricks

We ran into a couple of interesting situations recently, and used some helpful tricks to solve them, which of course should be recorded for posterity.

Unlogged tables


One of our customers needed a new database, created as a copy of an existing one but with some data obscured for privacy reasons. We could have done this with a view or probably any of several other techniques, but in this case, given the surrounding infrastructure, a new database, refreshed regularly from the original, was the simplest method. Except that with this new database, the regular binary backups became too large for the backup volume in the system. Since it seemed silly to re-provision the backup system (and talk the client into paying for it) to accommodate data we could throw away and recalculate at any time, we chose unlogged tables as an alternative.

"Unlogged," in this case, means changes to this table aren't written in WAL logs. This makes for better performance, but also means if the database crashes, these tables can't be recovered in the usual way. As a side effect, it also means these tables aren't copied via WAL-based replication, so the table won't show up in a hot standby system, for instance, nor will the table appear in a system restored from a WAL-based backup (pg_dump will still find them). Unlogged tables wouldn't give our application much of a performance boost in this case — the improved performance applies mostly to queries that modify the data, and ours were meant to be read-only. But before this change, the regular refresh process generated all kinds of WAL logs, and now they've all disappeared. The backups are therefore far smaller, and once again fit within the available space. Should the server crash, we'll have a little more work to do, regenerating these tables from their sources, but that's a scripted process and simple to execute.

Stubborn vacuum


Another fairly new customer has a database under a heavy and very consistent write load. We've had to make autovacuum very aggressive to keep up with bloat in several tables. When the vacuum process happens to clean all the tuples from the end of a table file, it tries to shrink the file and reclaim disk space, but it has to obtain a brief exclusive lock to do it. If it can't get one1, it gives up, and emits a log message you'll see if you are vacuuming in verbose mode:

INFO:  "some_big_table": stopping truncate due to conflicting lock request

Note that though the log message calls this process "truncating", it should not be confused with the "TRUNCATE TABLE" command, which (locks permitting) would reclaim quite a bit more disk space than we want it to. Anyway, when the shrinking operation succeeds, there is no log message, so if VACUUM VERBOSE doesn't say anything about "stopping truncate", it's because it was able to get its lock and shrink the table, or the table didn't need it in the first place. Because of this database's tendency to bloat, we'd like vacuum to be able to shrink tables regularly, but the query load is such that for some tables, it never gets the chance. We're working on mitigating that, but in the meantime, one stop-gap solution is to run VACUUM VERBOSE in a tight loop until you don't see one of those "stopping truncate" messages. In our case we do it like this:

#!/bin/bash
 
timeout="8m"
problematic_tables="
one_bloated_table
another_bloated_table
and_one_more_bloated_table
"
my_database="foo"
 
for table in $problematic_tables; do
    echo "Vacuuming $table"
    ( timeout $timeout bash <<VACUUM
        while :; do
            vacuumdb -v -t $table $my_database 2>&1 | grep "stopping truncate" || break
        done
VACUUM
) || echo "Timed out on table $table"
done

This script iterates through a list of tables we'd like to shrink, and vacuums each repeatedly, as quickly as possible, until the vacuum process fails to emit a "stopping truncate" message, or it finds it has spent eight minutes2 trying. Of course this whole technique is only useful in a few limited cases, but for our purposes we've found it helpful for managing bloat while we continue to work on the query patterns to reduce locking overall.

1. In version 9.0 and before, it simply tries to obtain the lock once. In 9.1 and later versions, it tries every few milliseconds for up to five seconds to obtain the lock.
2. There's nothing magic about eight minutes, it just works out well enough for our purposes.

Shrink XFS partition: Almost possible with LVM

If you happen to have reached this page because you're trying to shrink an XFS filesystem let's put things straight: sorry, that's not possible.

But before you go away you should know there's still hope and I'd like to show you how I used a little workaround to avoid reinstalling a Red Hat Enterprise Linux 7 or CentOS 7 VM using XFS dump/restore on-the-fly and LVM capabilities, both standard choices for the regular RHEL/CentOS 7 setup.

First of all let's clarify the situation I found myself in. For various reasons I had a CentOS 7 VM with everything already configured and working, installed not many days ago to test new software we're evaluating.

The VM itself is hosted on a dedicated server we manage on our own, so I had a certain degree of freedom to what I could do without the need of paying any additional fee. You may not be in this same situation, but you can probably try some similar solution for little money if you're using an "hourly-billed" VPS provider.

The problem was that, even if everything was working and configured, the virtual hard disk device attached to the machine was too big and on the wrong storage area of the virtualization hosting server.

There was also another minor glitch: The VM itself was using an old virtualization device driver (IDE-based) instead of the new VIRTIO one. Since I knew that the virtualized OS CentOS 7 is capable of using VIRTIO based devices I also took the chance to fix this.

Unfortunately, XFS is not capable of being shrunk at the moment (and for the foreseeable future) so what I needed to do was to:

  1. add a new virtual device correctly using VIRTIO and stored in the right storage area of the virtualization host server
  2. migrate all the filesystems to the new virtual device
  3. set the VM OS to be working from the newly-built partition
  4. dismiss the old virtual device

In my specific case this translated to connect to the virtualization hosting server, create a new LVM logical volume to host the virtual disk device for the VM and then add the new virtual device to the VM configuration. Unfortunately in order to have the VM see the new virtual device I had to shut it down.

While connected to the virtualization host server I also downloaded and attached to the VM the latest ISO of SysRescueCD which is a data rescue specialized Linux distribution. I'm specifically using this distro since it's one of the few which offers the XFS dump/restore tools on the live ISO.

Now the VM was ready to be booted with the SysRescueCD Live OS and then I could start working my way through all the needed fixes. If you're doing something similar, of course please make sure you have offsite backups and have double-checked that they're readable before doing anything else.

First of all inspect your dmesg output and find what is the source virtual device and what's the new target virtual device. In my case the source was /dev/sda and the target was /dev/vda

dmesg | less

Then create a partition on the new device (eg: /dev/vda1) as Linux type for the /boot partition; this should be of the same size as the source /boot partition (eg: /dev/sda1) and dedicate all the remaining space to a new LVM type partition (eg: /dev/vda2)

fdisk /dev/vda
# [create /boot and LVM partitions]

You could also mount and copy the /boot files or re-create them entirely if you need to change the /boot partition size. Since I kept /boot exactly the same so I could use ddrescue (a more verbose version of classic Unix dd).

ddrescue /dev/sda1 /dev/vda1

The next step is supposed to migrate the MBR and should be working but in my case the boot phase kept failing so I also needed to reinstall the bootloader via the CentOS 7 rescue system (not covered in this tutorial but briefly mentioned near the end of the article).

ddrescue -i0 -s512 /dev/sda /dev/vda

Then create the target LVM volumes.

pvcreate /dev/vda2
vgcreate fixed_centos_VG /dev/vda2
lvcreate -L 1G -n swap fixed_centos_VG
lvcreate -l 100%FREE -n root fixed_centos_VG
vgchange -a y fixed_centos_VG

Create the target XFS filesystem.

mkfs.xfs -L root /dev/fixed_centos_VG/root

And then create the swap partition.

mkfs.swap /dev/fixed_centos_VG/swap

Next create the needed mountpoints and mount the old source and the new empty filesystems.

mkdir /mnt/disk_{wrong,fixed}
mount /dev/fixed_centos_VG/root /mnt/disk_fixed
vgchange -a y wrong_centos_VG
mount /dev/centos_VG/root /mnt/disk_wrong

Now here's the real XFS magic. We'll use xfsdump and xfsrestore to copy the filesystem content (files, directory, special files) without having to care about files permission, type, extended ACLs or anything else. Plus since it's only moving the content of the filesystem we won't need to have a partition of the same size and it won't take as long as copying the entire block device as the process will just have to go through real used space.

xfs_dump -J - /mnt/disk_wrong | xfs_restore -J - /mnt/disk_fixed

If you want a more verbose output, leave out the -J option. After the process is done, be sure to carefully verify that everything is in place in the new partition.

ls -lhtra /mnt/disk_fixed/

Then unmount the disks and deactivate the LVM VGs.

umount /mnt/disk_{old,fixed}
vgchange -a n centos_VG
vgchange -a n fixed_centos_VG

At this point in order to avoid changing anything inside the virtualized OS (fstab, grub and so on), let's remove the old VG and rename the newer one with the same name the old one had.

vgremove centos_VG
pvremove /dev/sda2
vgrename {fixed_,}centos_VG

You should be now able to shutdown the VM again, detach the old disk and start the new VM which will be using the new smaller virtual device.

If the boot phase keeps failing, boot the CentOS installation media in rescue mode and after chroot-ing inside your installation run grub-install /dev/vda (targeting your new main device) to reinstall grub.

Only after everything is working as expected, proceed to remove the old unneeded device and remove it from the virtualization host server.

Social Innovation Summit 2014



In November Josh Ausborne and I set up a Liquid Galaxy at the Sofitel Hotel in Redwood City for the 2014 Social Innovation Summit. Prior to the event the End Point content team worked together with Chris Busselle and Sasha Buscho from Google.org to create presentations featuring 10 grantee organizations.

With the Liquid Galaxy we were able to feature “Street View” panoramas of people enjoying the High Line in New York City, penguins standing on the shoreline for Penguin Foundation, and seals swimming underwater for Conservation International. The touchscreen and space navigator control device enabled users to view 360 degrees of the landscape as if they had been teleported to each location.

Displaying 2014-SIS-8x-114.jpg

I was thrilled to see the Google.org team in front of the system sharing with fellow philanthropists the larger narrative associated with each project. This highlights one of the many opportunities of the Liquid Galaxy, the opportunity to share, explore and collaborate in real time, in front of an immersive array of screens.


credit Chris Busselle : https://twitter.com/chrisatgoogle/media



One of the prepared presentations highlighted a data collaboration with the Polaris Project to fight human trafficking. With Google Earth Chris Busselle identified a location in India which shows brick firing kilns. Chris navigated through this particular landscape in India and shared how these bricks are manufactured by enslaved human beings. This was yet another revelation for me as I recognized how a satellite image overlayed in a virtual environment can be a touchpoint for a narrative which spans the globe.

As you might guess, it was a pleasure to work with Google.org team (we even got a hug at the end!). The team’s passion for their work and the impact Google.org is having through technology is undeniable. I am excited to recognize this event as yet another proof in principle for how the Liquid Galaxy and Google Earth act as tools for awareness which can inspire positive change in the “real” world.

Displaying betterworld.png

Getting realtime output using Python Subprocess

The Problem

When I launch a long running unix process within a python script, it waits until the process is finished, and only then do I get the complete output of my program. This is annoying if I'm running a process that takes a while to finish. And I want to capture the output and display it in the nice manner with clear formatting.

Using the subprocess and shlex library

Python has a “batteries included” philosophy. I have used 2 standard libraries to solve this problem.
import subprocess 
import shlex 
  • subprocess - Works with additional processes
  • shlex - Lexical analysis of shell-style syntaxes

subprocess.popen

To run a process and read all of its output, set the stdout value to PIPE and call communicate().
import subprocess
process = subprocess.Popen(['echo', '"Hello stdout"'], stdout=subprocess.PIPE)
stdout = process.communicate()[0]
print 'STDOUT:{}'.format(stdout)
The above script will wait for the process to complete and then it will display the output. So now we are going to read the stdout line by line and display it in the console untill it completes the process.
output = process.stdout.readline()
This will read a line from the stdout.
process.poll()
The poll() method will return
  • the exit code if the process is completed.
  • None if the process is still running.
while True:
        output = process.stdout.readline()
        if output == '' and process.poll() is not None:
            break
        if output:
            self.logger.info(output.strip())
    rc = process.poll()
The above will loop and keep on reading the stdout and check for the return code and displays the output in real time.
I had one more problem in parsing the shell commands to pass it to popen when I set the shell=False. Below is an example command:
rsync -avzXH --delete --exclude=*.swp --exclude=**/drivers.ini /media/lgisos/lg.iso root@42-a:/isodevice
To split the string using shell-like syntax I have used shlex library's split method.

Here is the final code looks like

def run_command(self, command):
    process = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE)
    while True:
        output = process.stdout.readline()
        if output == '' and process.poll() is not None:
            break
        if output:
            self.logger.info(output.strip())
    rc = process.poll()
    return rc

Postgres session_replication role - Bucardo and Slony's powerful ally

One of the lesser known Postgres parameters is also one of the most powerful: session_replication_role. In a nutshell, it allows you to completely bypass all triggers and rules for a specified amount of time. This was invented to allow replication systems to bypass all foreign keys and user triggers, but also can be used to greatly speed up bulk loading and updating.


(Triggerfish picture by Shayne Thomas)

The problem with disabling triggers

Once upon a time, there were two replication systems, Slony and Bucardo, that both shared the same problem: triggers (and rules) on a "target" table could really mess things up. In general, when you are replicating table information, you only want to replicate the data itself, and avoid any side effects. In other words, you need to prevent any "post-processing" of the data, which is what rules and triggers may do. The disabling of those was done in a fairly standard, but very ugly method: updating the system catalogs for the tables in question to trick Postgres into thinking that there were no rules or triggers. Here's what such SQL looks like in the Bucardo source code:

$SQL = q{
    UPDATE pg_class
    SET    reltriggers = 0, relhasrules = false
    WHERE  (
};
$SQL .= join "OR\n"
    => map { "(oid = '$_->{safeschema}.$_->{safetable}'::regclass)" }
      grep { $_->{reltype} eq 'table' }
      @$goatlist;
$SQL .= ')';

This had a number of bad side effects. First and foremost, updating the system catalogs is never a recommended step. While it is *possible*, it is certainly discouraged. Because access to the system catalogs do not follow strict MVCC rules, odd things can sometimes happen. Another problem is that editing the system catalogs causes locking issues, as well as bloat on the system tables themselves. Yet another problem is that it was tricky do get this right; even the format of the system catalogs change over time, so that your code would need to have alternate paths for disabling and enabling triggers depending on the version of Postgres in use. Finally, the size of the SQL statements needed grew with the number of tables to be replicated: in other words, you had to specifically disable and enable each table. All in all, quite a mess.

The solution to disabling triggers

The solution was to get away from editing the system catalogs altogether, and provide a cleaner way to temporarily disable all triggers and rules on tables. Jan Wieck, the inventor of Slony, wrote a new user parameter and named it "session_replication_role". As you can tell by the name, this is a session-level setting. In other words, only the current user will see the effects of setting this, and it will last as long as your session does (which is basically equivalent to as long as you are connected to the database). This setting applied to all tables, and can be used to instruct Postgres to not worry about triggers or rules at all. So the new code becomes:

$SQL = q{SET session_replication_role TO 'replica'};

Much cleaner, eh? (you may see session_replication_role abbreviated as s_r_r or simply srr, but Postgres itself needs it spelled out). You might have noticed that we are setting it to 'replica', and not 'on' and 'off'. The actual way this parameter works is to specify which types of triggers should be fired. Previous to this patch, triggers were all of one type, and the only characteristic they could have was "enabled" or "disabled". Now, a trigger can have one of four states: origin, always, replica, or disabled (stored in the 'tgenabled' field of the pg_trigger table as 'O', 'A', 'R', or 'D'). By default, all triggers that are created are of type 'origin'. This applies to the implicitly created system triggers used by foreign keys as well. Thus, when session_replication_role is set to replica, only triggers of the type 'replica' will fire - and not the foreign key enforcing ones. If you really need a user trigger to fire on a replica (aka target) table, you can adjust that trigger to be of type replica. Note that this trigger will *only* fire when session_replication_role is set to replica, and thus will be invisible in day to day use.

Once the replication is done, session_replication_role can be set back to the normal setting like so:

$SQL = q{SET session_replication_role TO 'origin'};

You can also set it to DEFAULT, which in theory could be different from origin as one can set the default session_replication_role to something other than origin inside of the postgresql.conf file. However, it is much cleaner to always specify the exact role you want; I have not come across a use case that required changing the default from origin.

This feature only exists in Postgres 8.3 or better. As such, Bucardo still contains the old system catalog manipulation code, as it supports versions older than 8.3, but it uses session_replication_role whenever possible. Slony always uses one or the other, as it made session_replication_role a backwards-incompatible change in its major version. Thus, to replicate versions of Postgres before 8.3, you need to use the older Slony 1.2

There are some good use cases other than a replication system for using this feature. The most common is simply bulk loading or bulk updating when you do not want the effects of the triggers, or simply do not want the performance hit. Remember that system triggers are disabled as well, so use this with care (this is one of the reasons you must be a superuser to change the session_replication_role parameter).

What if you are not a superuser and need to disable triggers and/or rules? You could create a wrapper function that runs as a superuser. The big downside to that is the all-or-nothing nature of session_replication_role. Once it is changed, it is changed for *everything*, so handing that power to a normal user could be dangerous. My colleague Mark Johnson came up with another great solution: a function that runs as the superuser, and does the old-style system catalog manipulations, but uses an ingenious foreign key trick to ensure that the matching "enable" function *must* be run. Great for fine-grained control of table triggers.

You might wonder about the other setting, "local". It's used mostly to have a third type of trigger, for times when you don't want normal (e.g. origin) triggers to fire but don't want the triggers to fire in replica mode either. Slony uses these when it does some of it DDL trickery, peruse the Slony documentation for more details.

Postgres will also show you what state a trigger is in when you are viewing a table using the "backslash-d" command inside of psql. Here are some examples. Remember that psql never shows "system-level" triggers, but they are there, as we shall see below. First, let's create two test tables linked by a foreign key, and a trigger with supporting function that raises a simple notice when fired:

greg=# create table foo (id int primary key);
CREATE TABLE
greg=# create table bar(id int primary key, fooid int references foo(id));
CREATE TABLE
greg=# insert into foo values (1),(2),(3);
INSERT 0 3
greg=# insert into bar values (10,1), (11,2);
INSERT 0 2

greg=# create function alertalert() returns trigger language plpgsql AS $$ BEGIN RAISE NOTICE 'cookie dough'; RETURN null; END $$;
CREATE FUNCTION

greg=# create trigger mytrig after update on foo for each statement execute procedure alertalert();
CREATE TRIGGER

Now that those are setup, let's see what psql shows us about each table:

greg=# \d foo
      Table "public.foo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id     | integer | not null
Indexes:
    "foo_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "bar" CONSTRAINT "bar_fooid_fkey" FOREIGN KEY (fooid) REFERENCES foo(id)
Triggers:
    mytrig AFTER UPDATE ON foo FOR EACH STATEMENT EXECUTE PROCEDURE alertalert()

greg=# \d bar
      Table "public.bar"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id     | integer | not null
 fooid  | integer | 
Indexes:
    "bar_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "bar_fooid_fkey" FOREIGN KEY (fooid) REFERENCES foo(id)

Everything looks good. Let's see the trigger in action:

greg=# update foo set id=id;
NOTICE:  cookie dough
UPDATE 3

Although the output of psql only shows a single trigger on the foo table, their are actually two others, created by the foreign key, which helps to enforce the foreign key relationship. We can see them by looking at the pg_trigger table:


greg=# select tgname, tgenabled, tgisinternal, tgconstraint from pg_trigger where tgrelid::regclass::text = 'foo';
            tgname            | tgenabled | tgisinternal | tgconstraint 
------------------------------+-----------+--------------+--------------
 RI_ConstraintTrigger_a_73776 | O         | t            |        45313
 RI_ConstraintTrigger_a_57179 | O         | t            |        45313
 mytrig                       | O         | f            |            0
(3 rows)

We can see that they are internal triggers (which prevents psql from showing them), and that they have an associated constraint. Let's make sure these triggers are doing their job by causing one of them to fire and complain that the underlying constraint is being violated:

## Try and fail to delete id 1, which is being referenced by the table bar:
greg=# delete from foo where id = 1;
ERROR:  update or delete on table "foo" violates foreign key constraint "bar_fooid_fkey" on table "bar"
DETAIL:  Key (id)=(1) is still referenced from table "bar".
## Check the name of the constraint referenced above by pg_trigger:
greg=# select conname, contype from pg_constraint where oid = 45313;
    conname     | contype 
----------------+---------
 bar_fooid_fkey | f

Time to demonstrate the power and danger of the session_replication_role attribute. First let's set it to 'replica' and verify that all triggers fail to fire. We should be able to perform the "illegal" deletion we tried before, and an update should fail to raise any notice at all:

greg=# show session_replication_role;
 session_replication_role 
--------------------------
 origin
greg=# set session_replication_role = 'replica';
SET
greg=# delete from foo where id = 1;
DELETE 1
greg=# update foo set id=id;
UPDATE 2
greg=# show session_replication_role;
 session_replication_role 
--------------------------
 replica

Let's force our trigger to fire by setting it to replica:

greg=# alter table foo enable replica trigger mytrig;
ALTER TABLE
greg=# \d foo
      Table "public.foo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id     | integer | not null
Indexes:
    "foo_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "bar" CONSTRAINT "bar_fooid_fkey" FOREIGN KEY (fooid) REFERENCES foo(id)
Triggers firing on replica only:
    mytrig AFTER UPDATE ON foo FOR EACH STATEMENT EXECUTE PROCEDURE alertalert()
greg=# set session_replication_role = 'replica';
SET
greg=# update foo set id=id;
NOTICE:  cookie dough
UPDATE 2

So what is the consequence of the above DELETE command? The foreign key relationship is now a lie, as there are rows in bar that do not point to a row in foo!

greg=# select * from bar where not exists (select 1 from foo where fooid = foo.id);
 id | fooid 
----+-------
  1 |     1
(1 row)

Ouch! This shows why session_replication_role is such a dangerous tool (indeed, this is the primary reason it is only allowed to be changed by superusers). If you find yourself reaching for this tool, don't. But if you really have to, double-check everything twice, and make sure you always change it back to 'origin' as soon as possible.

Elastic Beanstalk Whenever

I recently got the opportunity to pick up development on a Ruby on Rails application that was originally setup to run on AWS using their Elastic Beanstalk deployment tools. One of our first tasks was to move some notification hooks out of the normal workflow into scripts and schedule those batch scripts using cron.

Historically, I've had extremely good luck with Whenever. In my previous endeavors I've utilized Capistrano which Whenever merged with seamlessly. With how simple it was to integrate Whenever with Capistrano, I anticipated a similar experience dealing with Elastic Beanstalk. While the integration was not as seamless as Capistrano, I did manage to make it work.

My first stumbling block was finding documentation on how to do after or post hooks. I managed to find this forum post and this blog post which helped me out a lot. The important detail is that there is a "post" directory to go along with "pre" and "enact", but it's not present by default, so it can be easy to miss.

I used Marcin's delayed_job config as a base. The first thing I had to address was an apparent change in Elastic Beanstalk's configuration structure. Marcin's config has

  . /opt/elasticbeanstalk/support/envvars
but that file doesn't exist on the system I was working on. With a small amount of digging, I found:
  . /opt/elasticbeanstalk/containerfiles/envvars
in one of the other ebextensions. Inspecting that file showed a definition for and exportation of $EB_CONFIG_APP_CURRENT suggesting this is a similar file just stored in a different location now.

Another change that appears to have occurred since Marcin developed his config is that directories will be created automatically if they don't already exist when adding a file in the files section of the config. That allows us to remove the entire commands section to simplify things.

That left me with a config that looked like:

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/99_update_cron.sh"
  mode: "000755"
  owner: root
  group: root
  content: |
    #! /usr/bin/env bash
    . /opt/elasticbeanstalk/containerfiles/envvars
    su -c "cd $EB_CONFIG_APP_CURRENT; bundle exec whenever --update-cron" - $EB_CONFIG_APP_USER

This command completed successfully but on staging the cron jobs failed to run. The reason for that was an environment mismatch. The runner entries inside the cron commands weren't receiving a RAILS_ENV or other type of environment directive so they were defaulting to production and failing when no database was found.

After some greping I was able to find a definition for RACK_ENV in:

/opt/elasticbeanstalk/containerfiles/envvars.d/sysenv
Making use of it, I came up with this final version:
files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/99_update_cron.sh"
  mode: "000755"
  owner: root
  group: root
  content: |
    #! /usr/bin/env bash
    . /opt/elasticbeanstalk/containerfiles/envvars
    . /opt/elasticbeanstalk/containerfiles/envvars.d/sysenv
    su -c "cd $EB_CONFIG_APP_CURRENT; bundle exec whenever --update-cron --set='environment=$RACK_ENV'" - $EB_CONFIG_APP_USER

CentOS 7 on Hetzner server with more than 2 TB disk

We use a variety of hosting providers for ourselves and our clients, including Hetzner. They provide good servers for a great price, have decent support, and we've been happy with them for our needs.

Recently I was given the task of building out a new development server for one of our clients, and we wanted it to be set up identically to another one of their servers but with CentOS 7. I placed the order for the hardware with Hetzner and then began the procedure for installing the OS.

Hetzner provides a scripted install process that you can kick off after booting the machine into rescue mode. I followed this process and selected CentOS 7 and proceeded through the whole process without a problem. After rebooting the server and logging in to verify everything, I noticed that the disk space was capped at 2 TB, even though the machine had two 3 TB drives in it (in hardware RAID 1). I looked at the partitions and found the partition table was "msdos". Ah ha!

At this point painful memories of running into this problem before hit me. I reviewed our notes of what we had done last time, and felt like it was worth a shot even though this time I'm dealing with CentOS 7. I went through the steps up to patching anaconda and then found that anaconda for CentOS 7 is newer and the files are different. I couldn't find any files that care about the partition table type, so I didn't patch anything.

I then tried to run the CentOS 7 install as-is. This only got me so far because I then ran into trouble with NetworkManager timing out and not starting.

screen shot of CentOS 7 installer failing
A screenshot of the CentOS 7 installer failing (anaconda) similar to what I was seeing.

Baffled, I looked into what may have been causing the trouble and discovered that the network was not set up at all and it looked as if no network interfaces existed. WHAT?? At this point I dug through dmesg and found that the network interfaces did indeed exist but udevd had renamed them. Ugh!

Many new Linux distributions are naming network interfaces based on their physical connection to the system: those embedded on the motherboard get named em1, em2, etc. Apparently I missed the memo on this one, as I was still expecting eth0, eth1, etc. And from all indications, so was NetworkManager because it could not find the network interfaces!

Rather than spend more time going down this route, I decided to change gears and look to see if there was any way to patch the Hetzner install scripts to use a GPT partition table with my install instead of msdos. I found and read through the source code for their scripts and soon stumbled on something that just might solve my problem. In the file /root/.oldroot/nfs/install/functions.sh I found mention of a config variable FORCE_GPT. If this is set to "1" then it will try to use a GPT partition table unless it thinks the OS won't like it, and it thinks that CentOS won't like it (no matter the version). But if you set FORCE_GPT to "2" it will use a GPT partition table no matter what. This config setting just needs to be added to the file you edit where you list out your partitions and LVM volumes.

FORCE_GPT 2                                                                                                      

PART /boot ext3 512M                                                                                             
PART lvm   vg0  all                                                                                              
                                                                                                                 
LV  vg0  swap swap   swap  32G                                                                                   
LV  vg0  root  /     ext4 100G                                                                                   
LV  vg0  home  /home ext4 400G                                                                                   

I then ran the installer script and added the secret config option and... Bingo! It worked perfectly! No need to manually patch anything or install manually. And now we have a CentOS 7 server with full 3 TB of disk space usable.

(parted) print                                                            
Model: DELL PERC H710 (scsi)
Disk /dev/sda: 3000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: pmbr_boot

Number  Start   End     Size    File system  Name  Flags
 3      1049kB  2097kB  1049kB                     bios_grub
 1      2097kB  539MB   537MB   ext3
 2      539MB   3000GB  2999GB                     lvm