End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Merging JSONs in PostgreSQL

PostgreSQL's JSON support is great, however there is one thing missing, and it will be missing in the next PostgreSQL release. It is not too easy to manipulate the values stored in such a JSON field. Fortunately there is an easy way to do anything you want, you can use some external programming language.

Merging JSONs

Sometimes you need to update one JSON with values from another. Or you just need to change one field in a JSON value. There is no easy way to do it in PostgreSQL, but with the help of external language, it seems trivial.

Let’s use simple JSONs:

WITH j AS (
  SELECT
    '{"a":42, "b":"test"}'::JSON a,
    '{"a":1764, "x":"test x"}'::JSON b
)
SELECT a, b
FROM j;
a           |            b 
-------------------------------------------------------            
 {"a":42, “b”:"test"} | {"a":1764, “x”:“test x”}

Let’s assume I have value a stored in a table, and I want to update it with values from b. So I want to do something like:

UPDATE data SET j = something(j, b) WHERE id = 10;

The question is: how to merge those JSONs.

Merging

For merging I’ve written a simple plpython function:

CREATE FUNCTION merge_json(left JSON, right JSON)
RETURNS JSON AS $$
  import simplejson as json
  l, r = json.loads(left), json.loads(right)
  l.update(r)
  j = json.dumps(l)
  return j
$$ LANGUAGE PLPYTHONU;

This function merges the "left" argument with the "right" overwriting values if they exist.

This way if we use this function for the above JSONs, then (using the pretty print json function from previous post):

WITH j AS (
  SELECT
    '{"a":42, "b":"test"}'::JSON a,
    '{"a":1764, "x":"test x"}'::JSON b
)
SELECT
  pp_json(a) as "a",
  pp_json(b) as "b",
  pp_json(merge_json(a, b)) as "a+b"
FROM j;

        a        |         b         |        a + b      
-----------------+-------------------+-------------------
 {              +| {                +| {                +
     "a": 42,   +|     "a": 1764,   +|     "a": 1764,   +
     "b": "test"+|     "x": "test x"+|     "b": "test", +
 }               | }                 |     "x": "test x"+
                 |                   | }

So it is pretty easy to manipulate JSON using plpython, if standard PostgreSQL’s operators and functions are not sufficient, however I’m sure that in the release after 9.3 there will be much wider JSON support, so hopefuly the functions like the above won’t be needed.

Pretty Printing JSONs in PostgreSQL

PostgreSQL has huge support for JSON type, like I wrote recently. It also has some operators for converting data to and from JSON, and the JSON type itself is great for ensuring that the JSON stored in database is always valid.

Pretty Printing JSON

The Problem

JSONs can be quite complicated and can have multiple levels. Look at them as normal strings: printing the values can increase their readability. Let’s use a sample JSON like:

{"a":42, "d":{"a":10, "b":[1,2,3], "c":"x2"}, "x":"test", "p":[1,2,3,4,5]}

I think it would be much readable in the form:

 {                
     "a": 42,     
     "d": {       
         "a": 10, 
         "b": [   
             1,   
             2,   
             3    
         ],       
         "c": "x2"
     },           
     "p": [       
         1,       
         2,       
         3,       
         4,       
         5        
     ],           
     "x": "test"  
 }

The Solution

To generate this kind of format, I created a very simple Python function:

CREATE FUNCTION pp_json(j JSON, sort_keys BOOLEAN = TRUE, indent TEXT = '    ')
RETURNS TEXT AS $$
  import simplejson as json
  return json.dumps(json.loads(j), sort_keys=sort_keys, indent=indent)
$$ LANGUAGE PLPYTHONU;

It uses Python’s module simplejson for parsing the JSON from string to dictionary. The dictionary then is converted to JSON again, but this time in quite nicer format.

The function arguments allow to sort the JSON keys and set the string used for indentation, i.e. number of spaces.

This function can be used as:

x=# SELECT pp_json('{"a":42, "d":{"a":10, "b":[1,2,3], "c":"x2"}, "x":"test", "p":[1,2,3,4,5]}');
      pp_json      
-------------------
 {                +
     "a": 42,     +
     "d": {       +
         "a": 10, +
         "b": [   +
             1,   +
             2,   +
             3    +
         ],       +
         "c": "x2"+
     },           +
     "p": [       +
         1,       +
         2,       +
         3,       +
         4,       +
         5        +
     ],           +
     "x": "test"  +
 }

Logging Pretty JSON

Sometimes it can be needed not to get the JSON from a function, but write it to logs. For this I have another function:

CREATE FUNCTION pp_log_json(j JSON, sort_keys BOOLEAN = TRUE, indent TEXT = '    ')
RETURNS VOID AS $$
BEGIN
  RAISE NOTICE '%', pp_json(j, sort_keys, indent);
END;
$$ LANGUAGE PLPGSQL;

It uses the previous function for formatting the JSON and raises a notice, which should be stored in logs, depending on PostgreSQL logging settings.

GNU Screen logtstamp string

A short note on GNU Screen configuration:

You can add configuration to ~/.screenrc or another configuration file named by -c filename upon invocation, and among the many options are some to enable logging what happens in the screen windows. This is useful when using screen as a reattachable daemonizer.

Consider this configuration:

logfile path/to/screen-output.%Y%m%d.log
logfile flush 1
logtstamp on
logtstamp after 5
log on

That works nicely. With logfile we specify the name of the logfile, using some % escapes as per "STRING ESCAPES" in the manpage to put the date in the logfile name.

With logfile flush 1 we request that every 1 second the output be flushed to the log, making it easier to follow with tail -f.

logtstamp on turns on timestamping which writes a timestamp to the log after a default 2 minutes of inactivity. We shorten that to 5 seconds with logtstamp after 5.

Finally, log on turns on the logging.

Now, what if we want to customize the timestamp? The default looks like this:

-- 0:process-name -- time-stamp -- Jul/24/13  9:09:56 --

Which the manpage says can be customized with logtstampt string ..., where the default is ‘-- %n:%t -- time-stamp -- %M/%d/%y %c:%s --\n’.

The manpage earlier says that arguments may be separated by single or double quotes. Doing so with the default shown doesn't work, because a literal \n shows up in the logfile.

The solution I worked out by trial and error is that you must double-quote the string and use an octal escape value \012. Single-quoting that will output a literal backslash 0 1 2, and \n simply is not a recognized escape. Thus our final configuration directive is:

logtstamp string "-- time-stamp -- %Y-%m-%d %0c:%s --\012"

which results in output like:

-- time-stamp -- 2013-07-24 09:59:35 --

If you use more than one screen window with this configuration, all windows' output will go into the same logfile. Use the %n escape string to include a window number in the logfile name if you'd like them kept separate.

has_many filter in RailsAdmin

I enjoyed using the RailsAdmin record filtering abilities until one day I needed to find all the orders with the specific product.

class Order < ActiveRecord::Base
   has_many :products, :through => :orders_products
end

The following valid piece of RailsAdmin configuration did not break anything but did not work either:

RailsAdmin.config do |config|
  config.model Order do
    list do
      field :products do
        searchable :name 
      end
    end
  end
end

The reason is that only the belongs_to association is enabled for the search, as stated in the "Field searching" section of the documentation:

(3) Belongs_to associations : will be searched on their foreign_key (:team_id) 
or on their label if label is not virtual (:name, :title, etc.)
Benoit Bénézech, creator of RailsAdmin, confirmed this as well:
has_many are not added to the include for perf reason. That means that AR won't find the :programs table

We only had a few has_many fields configured across the project, so I decided to look into the source code and see if the limitation can be bypassed.

MainController class in RailsAdmin invokes the "get_collection" method to fetch the records for the list action. It defines the "associations" variable which is used to generate SQL query for the filters. I reopened the class in config/initializers/rails_admin_main_controller.rb:

module RailsAdmin
  MainController.class_eval do
    def get_collection(model_config, scope, pagination)
      associations = model_config.list.fields
        .select {|f| f.type == :belongs_to_association || f.type == :has_many_association && !f.polymorphic?}
        .map {|f| f.association[:name] } 
      options = {}
      options = options.merge(:page => (params[:page] || 1).to_i,
        :per => (params[:per] || model_config.list.items_per_page)) if pagination
      options = options.merge(:include => associations) unless associations.blank?
      options = options.merge(get_sort_hash(model_config))
      options = options.merge(:query => params[:query]) if params[:query].present?
      options = options.merge(:filters => params[:f]) if params[:f].present?
      options = options.merge(:bulk_ids => params[:bulk_ids]) if params[:bulk_ids]
      objects = model_config.abstract_model.all(options, scope)
    end
  end
end
Take a look at the very first line of the method. It used to be:
associations = model_config.list.fields.
select {|f| f.type == :belongs_to_association && !f.polymorphic? }
I allowed for :has_many_association to pass through... and it turned out that RailsAdmin search works perfectly with it!

The SQL output for the Orders list action in RailsAdmin is below:

SELECT * FROM "orders" LEFT OUTER JOIN "orders_products" 
ON "orders"."id" = "orders_products"."order_id" 
LEFT OUTER JOIN "products" ON "products"."id" = "orders_products.product_id"
WHERE "orders"."id" IN (5469, 5448, 5447, 5436, 5428, 5384, 5007, 4960...)
AND (((products.name ILIKE '%BEER%')))
ORDER BY order.id desc

There is a more exquisite solution to the same problem in the form of a pull request that never got accepted into the official RailsAdmin repo. Use it at your own risk and watch the performance closely.

PostgreSQL Autovacuum Issues In EOL Postgres


We recently had a web application shut down and start throwing PostgreSQL errors such as the following:

ERROR:  database is shut down to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and use a standalone backend to vacuum database "postgres"

This is of course the dreaded error message that occurs when you get close to the transaction wraparound point, and PostgreSQL refuses to continue to run in server mode.

This is a situation which occurs when vacuuming is not run regularly (or at all, considering that autovacuum has been enabled by default since PostgreSQL 8.2), so this is rare to see in actual usage.

The particular installation we were looking at was an older one, running PostgreSQL 8.1, which had been included as the base PostgreSQL version with RHEL 5.  (To stave off the inevitable comments: yes, upgrading is a good idea, considering 8.1 has been End-Of-Life'd for years now.  This isn't the point of this article.)

After running postgres in single-user mode and running VACUUM FULL on all of the databases, I started the cluster back up and started to see why we ran into the wraparound issue.

Using psql, I verified that the autovacuum setting was off (the immediate source of the wraparound issue).  However when I went to enable that in the postgresql.conf file, I saw that the postgresql.conf setting showed "autovacuum = on".  This contradicted my expectations; based on the pg_settings view this was being set in the config file, and this was the only instance of this directive in the file, so it was clearly not being overwritten.

Resorting to the documentation for the autovacuum setting, it appears that in addition to autovacuum being enabled, you also need to enable stats_start_collector (enabled) and stats_row_level (disabled).  After ensuring these were both enabled, I restarted the cluster and verified that the autovacuum setting had the expected value.

Just hoping to save someone some time if they have to deal with an older version of PostgreSQL and run into this same issue.


Speedier column additions in PostgreSQL

Say you want to add a column to a large table, e.g.,

ALTER TABLE transactions ADD COLUMN email_sent BOOLEAN DEFAULT FALSE;

You really do want new rows to start out with the column "false" (if no value is supplied when the row is created). However, you also want all existing rows to be "true", so innocently:

UPDATE transactions SET email_sent = TRUE;

This is a great time for a coffee break, or a trip to the post office, or maybe (if you're a telecommuter like me), a stroll out to weed the garden. Unfortunately for all those side activities, you really didn't have to take the long way.

BEGIN;
ALTER TABLE transactions ADD COLUMN email_sent BOOLEAN DEFAULT TRUE;
ALTER TABLE transactions ALTER TABLE email_sent SET DEFAULT FALSE;
COMMIT;

This is a lot faster; create all the columns with the appropriate value, then set the default for new rows, and all inside a transaction so you know it gets done atomically.

Kamelopard update -- Panoramic camera simulation, and splines have returned

A few days ago I pushed Kamelopard version 0.0.12 to RubyGems. This version includes a couple big items. The first of these is a new implementation of the spline code that was removed a while ago. As I mentioned in a previous blog post, this original implementation was built in anticipation of an API that never materialized. The new version is built on the same API discussed in the previous post I mentioned, modified to support multidimensional functions. More information about these splines is available on the Google Code wiki for Liquid Galaxy.

The other big new feature is the ability to simulate multiple-camera panoramic images, which we're calling "multicam". It has its own Google Code wiki page as well, but I wanted to describe it in greater detail, because there's a fair bit of 3D geometry involved that seemed blog-worthy. First, though, it's important to understand the goal. In a Liquid Galaxy, each instance of Google Earth displays the view from one virtual "camera". One display's camera points exactly where you tell it to point; the others point to one side or the other, based on a few settings in a Google Earth configuration file. When placed side-by-side in the right order, these displays form a single panoramic image. Google Earth itself figures out where these displays' cameras should point, but for some applications, we wanted to be able to calculate those display angles and position the cameras on our own. For instance, it would sometimes be nice to pre-record a particular tour and play it back on a Liquid Galaxy as a simple video. For a seven-screen galaxy, we'll need seven different video files, each with the same movements to and from the same geographic locations, but each with a slightly different camera orientation.

Camera orientation is controlled by KML AbstractView elements, of which there are two varieties: Camera, and LookAt. The former tells the camera its position and orientation exactly; the latter describes it in terms of another point. For now, multicam only supports Camera objects, because LookAt is somewhat more complicated.

A Camera object gives the camera's latitude, longitude, and altitude, and three angles called heading, tilt, and roll. The camera initially points straight down, positioned so that north is up. When orienting the camera, Google Earth first rotates around the Z axis, which extends through the lens of the camera, by the heading amount. Then it rotates around the X axis, which extends through the sides of the camera, by the amount given by the tilt angle. Finally, it rotates around the Z axis again, by the roll amount. The Y axis always points in whatever direction the camera thinks is "up".

With that background, we're ready to tackle the problem at hand. We'll consider our cameras as shown in the drawing. Camera 0 is the master view, positively numbered cameras go off to the master camera's right, and negatively numbered cameras to the left. Each camera has the same view angle. We want an API that will accept the master camera's orientation, the number of the camera we're interested in, and the view angle of each camera, and return the orientation for the camera number we gave. We'll consider each camera's orientation in terms of two vectors; the first will tell us what direction the camera is pointing, and the second, the up direction. I'll call these the "camera" vector and the "up" vector. The first step will be to find these two vectors for the camera we want, and the second will be to translate that back into a Camera object.

Getting the position of these vectors for any given camera is a simple application of rotation matrices. In the Kamelopard code, there's a function defined for rotating around each of the three axes. Here's the example for the X axis; the others are all similar:

def self.rot_x(a)
   # Convert the angle to radians
   a = a * Math::PI / 180.0
   return Matrix[[1, 0, 0], [0, Math.cos(a), -1 * Math.sin(a)], [0, Math.sin(a), Math.cos(a)]]
end

So we start with a camera pointed at the ground, and we multiply it by a rotation matrix, so that it points in the direction our camera would, before the heading, tilt, and roll rotations are applied. When we perform those rotations, we'll end up with a vector pointed in the right direction for our camera. This code does just that.

# The camera vector is [0,0,1] rotated around the Y axis the amount
# of the camera angle
camera = rot_y(cam_angle) * Vector[0,0,1]

# The up vector is the same for all cameras
up = Vector[0,1,0]
matrix = rot_z(heading) * rot_x(tilt) * rot_z(roll)
(h, t, r) = vector_to_camera(matrix * camera, matrix * up)

The last line of this snippet calculates the camera and up vectors, and passes them to the vector_to_camera() function, which takes care of the second step in the process: converting these vectors back into a usable heading, tilt, and roll. Two operations fundamental to linear algebra will become important here. Both take two vectors as input. The dot product returns product of the two vectors' magnitudes and the cosine of the angle between them. We'll use it here to find the angle between two vectors. The cross product returns a normal vector, or a vector which is perpendicular to the two input vectors.

First we want to calculate the heading, which we can find by calculating the angle between two planes. The first plane is defined by the camera vector and the Z axis, and the second is the plane of the Y and Z axes. To find the angle between two planes, we find the angle between their normal vectors. The normal vector of the YZ plane is simply the X axis; the normal vector for hte first plane is the cross product of the camera vector and the Z axis. The dot product lets us find the angle between these two vectors, which is our heading.

Tilt is simply the angle between the camera vector and the original Z axis, but roll is a bit harder. To find it, we transform the original Y axis -- the original "up" vector -- using the heading and tilt calculated previously. We then find the angle between it and the current "up" vector, again using the dot product.

These calculations underlie a simple API, which simply takes a view, for the original camera, the number of the camera we're interested in getting, and the camera angle or total number of cameras. Like this, for instance:

view_hash = {:latitude => otp[:latitude], :longitude => lo,
    :altitudeMode => otp[:altitudeMode], :altitude => otp[:altitude],
    :roll => otp[:roll], :timestamp => Kamelopard::TimeStamp.new('1921-07-29'),
    :heading => otp[:heading], :tilt => otp[:tilt]}
v = Kamelopard::Multicam.get_camera_view(make_view_from(view_hash), camera, nil, CamCount)

VIM - Tabs and Splits

Vim is my go-to code editor these days. After many years of using different editors, I've settled on vim as my editor of choice. There are some things I've done to make using vim more enjoyable and usable and this blog post is dedicated to some of those things that I use everyday.

Tabs

I love using tabs in vim. I know there are some who are opposed to tabs, but I find them invaluable. The default shortcuts for manipulating tabs are a little cumbersome, which I believe deters many users. Here are some modifications that I added to my vimrc.

nmap <silent> tt :tabnew<CR>
nmap <silent> [g :tabprevious<CR>
nmap <silent> ]g :tabnext<CR>
nmap <silent> [G :tabrewind<CR>
nmap <silent> ]G :tablast<CR>

First, I'm using nmap here which says to only map these keys in normal mode. Next, I use <silent> which keeps my editor clean of any distractions while performing the task. I find that the double tap short-cuts (see more below) tt work really well for normal mode and I love their simplicity. Double-tap t and you have a new tab. Using the bracket navigation is something that I've stolen from Tim Pope's vim-unimpaired plugin. Using g and G work for me, but you can use whatever you like.

Splits

I believe most vim users use splits often. At first, I found that splitting my current buffer was also cumbersome using the default method in vim (:split and :vsplit). So utilizing the double-tap method I used for tabs, I created my own shortcuts:

nmap <silent> vv :vsp<CR>
nmap <silent> ss :sp<CR>

Once you have a split, navigating between them can be a little bit of a pain as well. Here's my optimization:

map <C-h> <C-w>h
map <C-j> <C-w>j
map <C-k> <C-w>k
map <C-l> <C-w>l

After using vim's splitting capability for a while, I noticed I didn't always like where the split occurred. Sometimes above, sometimes to the left. To ensure that you have consistency, try these settings:

set splitright " When splitting vertically, split to the right
set splitbelow " When splitting horizontally, split below

Using splits and tabs give you a lot of flexibility in managing the code you are currently working. Adding just a few shortcuts and optimizations to your vimrc can really speed up your workflow and help you navigate your code more quickly.

One more thing

Want a quick way to get to your vimrc?

:e $MYVIMRC

SFTP virtual users with ProFTPD and Rails: Part 2

In Part 1 of "SFTP virtual users with ProFTPD and Rails", I introduced ProFTPD's virtual users and presented my annotated proftpd.conf that I used to integrate virtual users with a Rails application. Here in Part 2, I'll show how we generate virtual user credentials, how we display them to the user, as well as our SftpUser ActiveRecord model that does the heavy lifting.

Let's start at the top with the SFTP credentials UI. Our app's main workflow actually has users doing most of their uploads through a sweet Plupload widget. So, by default, the SFTP functionality is hidden behind a simple button sitting to the right of the Plupload widget:

The user can click that button to open the SFTP UI, or the Plupload widget will open it automatically if the user tries to upload a file through the widget that is too big. Either way, it uses a jQuery UI function to slide the SFTP UI open. Before it makes the jQuery call, an Ajax request is made to request a new SFTP virtual user username and password. When that request returns, we populate those two textboxes. At that point, that virtual user exists as a new row in the sftp_users database table. At this point in the workflow, the user will be able to login using those credentials, and upload their files.

Let's go look at our shiny new virtual user in the sftp_users table:

-[ RECORD 1 ]-+-----------------------------------
id            | 1
user_id       | 3
username      | user13A84C76
passwd        | {sha1}5kT0WDb/5H6C8M92dTeiKQO0Kg0=
uid           |
sftp_group_id |
homedir       | /home/sftp/uploads/3/user13A84C76
shell         |
created_at    | 2013-07-08 17:24:24.780753

The id and user_id fields are ignored by ProFTPD (they're just standard primary/foreign key integers.) The username can be anything, so we just use "user" plus a random-enough hex string. The way we configured ProFTPD in Part 1 means our virtual users have absolutely nothing to do with normal system users, so the uid, sftp_group_id and shell fields can be empty.

The homedir has three components: "/home/sftp/uploads" must exist and be writable by the effective UID that is running your Rails app. The directory named "3" is the value from the user_id field. (Adding that additional level lets us associate these virtual users with actual web app users should we need to debug, etc.) Lastly, each *virtual* user gets its own directory underneath that. In this example, ProFTPD will create the "3" and "3/user13A84C76" directories when that virtual user logs in. The format of the value in the passwd field is highly dependent upon your proftpd.conf settings for SQLAuthTypes. The format shown here matches the proftpd.conf shown in Part 1 of this article, if you're a copying and pasting type. The created_at field is ignored by ProFTPD, but it's handy info to have.

Let's look at the relevant methods implemented by our SftpUser ActiveRecord model.

  def hash_password(password)
    "{sha1}" + Base64.strict_encode64(Digest::SHA1.digest(generate_password))
  end

  def generate_username
    'user' + SecureRandom.hex(4).upcase
  end

  def generate_password
    SecureRandom.hex(8).downcase
  end

Admittedly, there's no deep wizardry going on there. The critical piece is that the password gets hashed in the format that ProFTPD expects. Our hash_password method handles that for us here. (You can do whatever you want for the username and the raw password.) As a caveat, upgrading from Ruby 1.8.7 to 1.9.3 required replacing the Base64.encode64 call with Base64.strict_encode64.

Ok, that's the end of Part 2. We now have a new virtual user with credentials that can be used to login via SFTP. We've displayed those credentials to the user using jQuery UI and Ajax. Each virtual user gets its own chrooted home directory that is created only if and when they log in. The next steps in the workflow will be discussed in Part 3: How the user signals they are done uploading via SFTP, using Resque to make the Rails app process the files uploaded by the user, preventing upload overlap and re-entrant problems, and cleaning up after a virtual user's files are successfully processed offline by the Rails app.

Challenges in testing Dancer 2.0 apps

I've been dabbling in Dancer, and I managed to put together a moderately complex re-hosting of a web application with Dancer 1.0, including a preliminary package of unit tests via Test::More.

Spoiler alert: I don't yet have a solution, but I thought maybe this blog post would organize my thoughts to where someone else might peek in and spot my problem.

A bit of introduction follows for those who may not have been down this path before.

Testing a web application can take one of two general approaches:

  1. Test the underlying code by calling it directly, one program to another, with no web server involved. This is pretty straightforward, although you may have to rig up some replacements for the environment (such as if your code expects CGI parameters, or reacts to things in the web server environment such as cookies, remote IP address, etc.). In any case, you have to recognize that you are now testing the code logic, not the actual interaction of your code as a subsystem in the web server.
  2. Test the web application in its "native environment", by issuing requests to its associated web server and examining the responses (as web pages, JSON, what-have-you). This is much preferred, as it will catch all sorts of subtle bugs, if you haven't exactly reproduced the environment (for instance, "correctly" spelling HTTP_REFERRER in your rigged-up environment, whereas the misspelled HTTP_REFERER is actually in use).

The module Dancer2::Test provides a small set of routines for supporting this second approach. For instance, you can verify that your routes are set up correctly via "route_exists":

route_exists [ GET => '/index.html' ], "We can get the home page";

This behaves like the various test-helper routines in Test::More. If the test succeeds (in this case, a route is defined that matches the path), great; if not, an error message is reported using the string parameter.

You can also verify that the correct page has been delivered without errors:

my $response = dancer_response GET => '/something';
is $response->{status}, 200, '/something returns OK';
like $response->{content}, qr{Something from the /something page};

dancer_response is the preferred way to go about this, because it captures status, headers, and content in one operation, saving you time but also avoiding the potential for disturbing the state of your application by subsequent requests.

Now, this article's title starts with "Challenge", so I should let the other shoe drop. The Dancer2::Test setup doesn't seem to have a way to preserve "state" on the server between requests. So if your application supports, for instance, a "login" page and a "modify my account" page, you don't really have a way to test the latter page, as you can't remain logged in.

Use Rubygems.org To Find Github Source For Gems

If you want to look at source for a gem on GitHub, make sure to go to Rubygems.org first and find the gem's GitHub project through rubygems' page for that particular gem. The reason for this is that there are lot of forks on GitHub and you may end up finding the source to a fork that is dead. Rubygems.org is guaranteed to have the right path to the gem you are installing via "gem install mynewgem".

Most of the profile pages for a gem on Rubygems.org have a link for the "home page" and "source code" of a gem and these usually link to the GitHub page of the gem's source. This trick isn't a 100% as not every gem's source is located on GitHub but it works about 90% of the time. In the case where you can't find a gem's source through Rubygems.org, try "gem unpack <mynewgem>" which will give you the source locally.

Proxmox and the fun maze of IPv6 connectivity

While working on the Proxmox machine setup and specifically on the IPv6 connectivity I found a problem where after a reboot I always kept getting the net.ipv6.conf.all.forwarding and all related variable set to 0, thus giving lots of IPv6 network connectivity issues on the guests.

While brainstorming with a colleague on this, we discovered in the boot logs these few messages which are quite indicative of something horrible happening at boot:

# less /var/log/boot.0
[..]
Mon Jul  8 18:38:59 2013: Setting kernel variables ...sysctl: cannot stat /proc/sys/net/ipv6/conf/all/forwarding: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/default/forwarding: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/default/autoconf: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/default/accept_dad: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/default/accept_ra: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/default/accept_ra_defrtr: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/default/accept_ra_rtr_pref: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/default/accept_ra_pinfo: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/default/accept_source_route: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/default/accept_redirects: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/all/autoconf: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/all/accept_dad: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/all/accept_ra: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/all/accept_ra_defrtr: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/all/accept_ra_rtr_pref: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/all/accept_ra_pinfo: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/all/accept_source_route: No such file or directory
Mon Jul  8 18:38:59 2013: sysctl: cannot stat /proc/sys/net/ipv6/conf/all/accept_redirects: No such file or directory
Mon Jul  8 18:38:59 2013: done.
[..]

The following steps would be to either crawl through the "inextricable" maze of the ProxMox (PVE) boot initrd image and probably came up with the solution or find a quick way to deal with this in a clean way without touching the boot process.

Since it was all due to sysctl being called too early in the boot process and then not finding proper IPv6 module already loaded calling it again later would suffice. So I simply added the following line to /etc/network/interfaces

iface eth0 inet6 static
address    YOUR:IPV6:IS:HERE
netmask    64
up ip -6 route add default via fe80::1 dev eth0
up sysctl -p # <------ ADDED THIS LINE TO FIX IPv6 CONNECTIVITY ISSUES

And there it goes. Reboot once again to verify and you should be all set.

Automating checking for new versions of PostgreSQL


Hubble image via Flickr user castielstar

It is important to run the latest revision of the major branch of Postgres you are using. While the pgsql-announce mailing list is often touted as a canonical way to become aware of new releases, a better method is to use the check_postgres program, which will let you know if the version you are running needs to be upgraded or not. An example usage:


$ check_postgres.pl --action=new_version_pg --dbhost=iroh
POSTGRES_NEW_VERSION_PG CRITICAL:  Please upgrade to version 9.2.4 of Postgres. You are running 9.2.2

Postgres version numbers come in three sections. The first two indicate the "major version" of Postgres that you are running. The third number indicates the revision. You always want to be using the highest revision available for your version. In the example above, Postgres version 9.2 is being used, and it should be upgraded from revision 2 to revision 4. A change in the revision number is known as a minor release; these are only done for important reasons, such as security or important bug fixes. Read the versioning policy page for more information.

When a new version of PostgreSQL is made, there are two general ways of communicating this fact: the pgsql-announce mailing list, and the versions.rss file on the postgresql.org web site. While the mailing list is low volume, it is not ideal as it contains posts about conferences, and about other Postgres-related software. A better solution is to track the versions.rss file. You could simply subscribe to it, but this will only tell you when the file has been changed. The check_postgres program parses this file and compares the latest revision to the version of Postgres that you are using.

To use it, simply call check_postgres and pass new_version_pg as the action argument, as well as telling check_postgres which PostgreSQL instance you are checking. For example, to check that the Postgres running on your internal host "zuko" is up to date, just run:

check_postgres.pl --action=new_version_pg --dbhost=zuko

Here is what the default output looks like, for both a matching and a non-matching version:

$ check_postgres.pl --action=new_version_pg --dbhost=appa
POSTGRES_NEW_VERSION_PG CRITICAL:  Please upgrade to version 9.2.4 of Postgres. You are running 9.2.2

$ check_postgres.pl --action=new_version_pg --dbhost=toph
POSTGRES_NEW_VERSION_PG OK:  Version 9.2.4 is the latest for Postgres

Those examples are very Nagios specific, of course, as evidenced by those uppercase strings at the beginning of the output. If you are using Nagios, it's a good idea to run this, perhaps once a day or more often. If you are not using Nagios, you can make the output a little cleaner with the --simple argument:

$ check_postgres.pl --action=new_version_pg --dbhost=azula
Please upgrade to version 9.2.4 of Postgres. You are running 9.2.2

$ check_postgres.pl --action=new_version_pg --dbhost=sokka
Version 9.2.4 is the latest for Postgres

One quick and simple trick is to make this into a cron job and add the --quiet argument, which prevents all output if the check was "OK". In this way, cron will only send outout (e.g. mail someone) when a new revision has been released. A cron example:

## Twice a day, check that we are running the latest Postgres:
0 7,18 * * * check_postgres.pl --action=new_version_pg --dbhost=cabbage --quiet

Once this alarm goes off, you should strive to update your clusters as soon as possible. If you are using a packaging system, then it may be as simple as relying on it to do the right thing, e.g. "yum update postgresql". If you are installing from source, you will need the new tarball and can simply "make install" onto the existing Postgres, and then restart it. Always check the release notes for the new revision - once in a blue moon the update requires some other action, such as a reindex of certain types of indexes.

The check_postgres program can verify versions of some other programs as well, including Bucardo, tail_n_mail , and check_postgres itself.

CouchDB pagination with couchdb-python

I've been working with couchdb-python v0.8 to meet some of a project's CouchDB needs and ran into an unfortunate shortcoming.

Retrieving database rows is as easy as:

for row in db.view(mapping_function):
    print row.key

however, all the rows will be loaded into memory. Now, for small databases this is not a problem but lucky me is dealing with a 2.5 million+ document database. Long story short: kaboom!

There is an excellent blog post by Marcus Brinkmann that details the memory issue and also provides a pagination solution. However, couchdb-python v0.9 (released 2013-04-25) does provide its own solution: the iterview method. If you've been able to manage small databases with v0.8 but are anticipating larger and larger databases, be sure to upgrade!

WAVE: Evaluating Web Accessibility

It's been far too long since I wrote a blog article! I've been a bit preoccupied:


My [cute] preoccupation.

I thought I'd start with a short blog article to get back into it. Recently, the H2O project has been keeping me busy. One interesting resource I've been using for the project is WAVE. WAVE is a free web accessibility evaluation tool, which in our case helps us evaluate the H2O web application for it's use in an academic setting for those users with disabilities. The tool helps to evaluate accessibility against Section 508, WCAG, and more.

To use the tool, you simply visit http://wave.webaim.org/ and type in the URL that you want evaluated:


WAVE homepage

The results show warnings and alerts, as well as feedback regarding other structural markup overlayed on the page. The results are meant to serve as a resource to make decisions regarding accessibility. Some web application developers may be more inclined to follow the guidelines than others.


Example WAVE output, analysis of End Point's home page.