Welcome to End Point’s blog

Ongoing observations by End Point people

Rails transposing day and month after upgrading Ruby 1.8.7

If you're wondering why your month and day are being transposed when saved in your database, you're likely:

  • Using a text field input for dates (likely with some JavaScript date picker)
  • Using American style date formatting (mm/dd/YYYY)
  • Upgrading from Ruby 1.8.7

If you meet these criteria you'll find that American style dates get parsed incorrectly in Ruby 1.9+ because of Ruby 1.9.x's new date parsing strategy. Unbelievably, this change effectively does away with American style date parsing in Ruby 1.9.x and Rails has happily followed suit!

american_date to the rescue!

After trying and failing to restore American style date parsing using timeliness, delocalize, and i18n_alchemy, I found american_date. If you look at the implementation, it is straight forward and restores backwards compatibility by simply adding a line to your Gemfile. Enjoy the return to sanity!

With Ruby 1.8.7 going EOL this month, and Rails 2.3.x and older losing support for even sevre security issues, it's time to bust out your upgrade-foo for those old Rails apps. Of course, this is a tried and true topic, with many resources, most notibly Railscast #225. Good luck with your upgrades!

Custom 500 error page while using nginx proxying

I was working with our customer Paper Source to setup a 500 error page that looked like the rest of the site when I ran into something interesting. I went through the nginx configuration and added this line to allow for a custom 500 error page just like I had done for the custom 404 error page.

error_page   500  =  /cgi-bin/paper/500.html;

What I noticed when I forced the site to create an Internal Server Error was that I was still getting the ugly normal Apache version of the 500 error page. It seemed like nginx was ignoring the error_page directive. I did some searching and found out that you have to use the proxy_intercept_errors directive.

proxy_intercept_errors on;

This directive allows nginx to recognize the 500 error code being returned from Apache and run its own directives to display the right page.

Debugging obscure Postgres problems with strace

One of the nice things about being a Postgres consultant is the sheer variety of interesting problems you get to solve. Here's one that recently popped up, and a walkthrough of how I solved it. One of our clients had this strange error pop up when they were trying to start Postgres:

FATAL:  too many private dirs demanded

This is a very rare and esoteric error. A peek at the source code showed that this error only appears in src/backend/storage/file/fd.c, like so:

AllocateDir(const char *dirname)
    DIR        *dir;

    DO_DB(elog(LOG, "AllocateDir: Allocated %d (%s)",
               numAllocatedDescs, dirname));

     * The test against MAX_ALLOCATED_DESCS prevents us from overflowing
     * allocatedDescs[]; the test against max_safe_fds prevents AllocateDir
     * from hogging every one of the available FDs, which'd lead to infinite
     * looping.
    if (numAllocatedDescs >= MAX_ALLOCATED_DESCS ||
        numAllocatedDescs >= max_safe_fds - 1)
        elog(ERROR, "too many private dirs demanded");

So it appeared as if we ran into some sort of safety valve that was meant to bail out before too many directories were opened. A strange error to suddenly have appear (this client's Postgres worked just fine a few days ago - luckily, this was not on a production system!).

The client was using Postgres 8.3. In version 9.1, the source code was changed to give a much less mysterious message, including outputting the name of the troublesome directory in question as a debugging clue:

    if (numAllocatedDescs >= MAX_ALLOCATED_DESCS ||
        numAllocatedDescs >= max_safe_fds - 1)
        elog(ERROR, "exceeded MAX_ALLOCATED_DESCS while trying to open directory \"%s\"",

However, I had no such clue. What to do? This was definitely a job for the strace program. Its job is to show a trace of all system calls that a process is making. In this way, you can see at a very low level what a particular program is doing. In this case, the program was PostgreSQL, or to be precise, the "postmaster" program.

While it's possible to have strace attach to an already running process, that was not possible in this case as Postgres errored out immediately after being invoked. The invocation looked like this:

pg_ctl start -D /var/lib/pgsql/data -l /tmp/postgres.startup.log

To run strace, we can simply add "strace" to the start of the command above. However, this will dump the system calls to the screen for the pg_ctl command. We need a few flags to make things easier.

The first flag is "-f", which tells strace to follow forked processes. Without this, we would simply strace pg_ctl itself - and we need to strace the postmaster process instead. As we want to be able to look at the output in an editor, we also add the "-o" flag to send all strace output to an output file. We also take the opportunity to upgrade "-f" to "-ff", which tells strace to send each forked process to a separate file. Very handy, that. Finally, we add a "-t" flag, which prepends each line with a timestamp. Not strictly needed in this case, but it's a nice flag to always use. The final command looked like this:

strace -o gregtest -ff -t pg_ctl start -D /var/lib/pgsql/data -l /tmp/postgres.startup.log

When I actually ran the command, pg_ctl did *not* return as before. Examining the created files showed me why:

-rw-r--r--  1 postgres postgres     8383 Jun  7 13:44 gregtest.26664
-rw-r--r--  1 postgres postgres     9372 Jun  7 13:44 gregtest.26668
-rw-r--r--  1 postgres postgres     3191 Jun  7 13:44 gregtest.26667
-rw-r--r--  1 postgres postgres     3053 Jun  7 13:44 gregtest.26669
-rw-r--r--  1 postgres postgres    10628 Jun  7 13:44 gregtest.26666
-rw-r--r--  1 postgres postgres 17311855 Jun  7 13:44 gregtest.26670

The first PID (26664) is pg_ctl itself, and the rest are the various postgres processes that are created. What we need is the main postmaster one, which is obvious in this case: not only is it the highest PID, but the file size is very large (and growing fast). A peek at the file data/ confirmed that the main postmaster PID is indeed 26670.

Why did pg_ctl not return, and not give the expected error? The reason is that the strace process adds enough overhead that it takes a lot longer to reach the "too many private dirs demanded" error. As I suspected this error was related to entering an infinite loop of file openings, that delay makes sense.

Taking a peek at the tail end of the giant strace output gives us the solution right away. Repeating over and over were calls like this:

19:37:38 open("/usr/share/zoneinfo/zoneinfo/zoneinfo/zoneinfo", 

The actual call was 16 directories deep, I'm just showing three for brevity! So the problem was definitely with the /usr/share/zoneinfo file. A look at the file system showed that /usr/share/zoneinfo was a directory, which contained a symlink named "zoneinfo" inside of it. Where did that symlink point to?

lrwxrwxrwx  1 root root    19 Jun  3 05:45 zoneinfo -> /usr/share/zoneinfo

Whoops! That explains the problem. Postgres was trying to walk through all the subdirectories in the /usr/share/zoneinfo file. Unfortunately, this meant an infinite loop as the zoneinfo symlink kept looping back to the current directory.

So the solution was to simply remove that faulty symlink. Once that was done, Postgres started without a hitch. Once again, the invaluable strace saves the day.

Spree's New Release Policy

Spree has recently updated its documentation regarding contributions and has included (maybe for the first time?) an official Release Policy. This is an important step forward for the Spree community so that developers can communicate to clients the potential costs and benefits when upgrading Spree, understand how well Spree supports older releases, and gauge the overall speed of the "upgrade treadmill".

Deprecation Warnings

Deprecation warnings are to be added in patch releases (i.e. 2.0.2) and the code being deprecated will only be removed in minor versions. For example, if a deprecation warning is added in 2.0.2, 2.0.3 will still contain the same deprecation warning but in 2.1.0 the deprecation warning and the code will be gone.

Deprecation warnings are very helpful for developers, but without a robust test suite exercising your application, it's easy for deprecation warnings to go unnoticed. A strong test suite coupled with deprecation warnings helps you manage your client's expectations about how upgrades can affect your Spree customizations and extensions.

Master Branch Receives All Patches

Master branch receives all patches, including new features and breaking API changes (with deprecation warnings, if necessary).

No surprises here; the master branch is for developers and should not be used for production. If you think you've encountered a bug in Spree, make sure to create a failing test against master to make sure it hasn't be resolved by any existing patches. Read more about filing an issue for additional details.

Current Stable Release Branch

One branch "back" from master (currently 2-0-stable) receives patches that fix all bugs, and security issues, and modifications for recently added features (for example, split shipments). Breaking API changes should be avoided, but if unavoidable then a deprecation warning MUST be provided before that change takes place.

Continuing Spree's history of very aggressive refactoring, breaking API changes are permitted in the current stable branch. If you're looking for a truly stable release of Spree, you'll need to look back two stable branches behind master.

Two Releases Behind Master

Two branches "back" from master (currently 1-3-stable) receives patches for major and minor issues, and security problems. Absolutely no new features, tweaking or API changes.

In my opinion, this is the only branch that should be considered for use in production. With the API locked down and a greater chance of most bugs worked out while it was the current release branch, the "two-back" stable branch is a safe bet that's going to have the most up-to-date feature set.

Three and Four Releases Behind Master

Three branches back from master (currently 1-2-stable) receives patches for major issues and security problems. The severity of an issue will be determined by the person investigating the issue. Absolutely no features, tweaking or API changes. Four branches and more "back" from master (currently 1-1-stable and lesser) receive patches only for security issues, as people are still using these branches and may not be able to or wish to upgrade to the latest version of Spree.

It's nice to see a fairly strong commitment to accepting security patches, although if we look at this in absolute terms, the 1.1.x release was put into security-patches-only mode after just 13 months. Considering that the 1-1-stable branch is 2,127 commits behind the 1-3-stable branch (!!), it's clear that Spree is now formalizing it's very aggressive release culture.

Managing the Upgrade Treadmill

As stated previously, a strong test suite is the best tool available to be able to determine what upstream updates affect your Spree customizations. Coupled with deprecation warnings, it becomes a fairly straight-forward process for identifying breaking changes, creating estimates for fixes, and communicating these costs to clients. Following the stated guides for customizing Spree is also recommended. When visiting the Spree guides section on customization, you'll first be taken to the sub-section for authentication. Make sure to expand the Customization menu on your left to see additional guidance for customizing internationalization, views, JavaScript, stylesheet and image assets, business logic, and checkout.

Another important suggestion helpfully submitted by my colleague Mike Farmer was to never simply add gem 'spree' to your Gemfile. You should be very conscious about what version of Spree you want to use, and make sure you specify it in your Gemfile.

Making use of a Unix Pipe

Developing in a Unix-based environment has many wonderful advantages. Thanks to Gary Bernhardt of DestroyAllSoftware Screencasts, I've recently discovered a new use for the Unix pipe. A pipe in Unix does exactly what you might think it would do by its name. Send something in one side and watch it come out the other. If you've done much in the shell, you've probably used pipes before where you've probably piped some output from one command to another. Here's an example:

$ cat foo.txt | grep bar

This command simply says take the output of cat and sends it to the input of grep. Pipes used in this way can yield very powerful commands in the shell.

There is another pipe in Unix and this is a named pipe. A named pipe, or a FIFO (First In, First Out), works similarly to command line pipe. You put stuff in one end and it comes out the other. To create a named pipe, you use the mkfifo command.

$ mkfifo my_fifo
$ ls -l
prw-rw-r--  1 mikefarmer mikefarmer      0 Jun  5 21:22 my_fifo

Notice the "p" at the beginning of the file list. The "p" designates this file as a named pipe to the system. To try out our pipe, we will, in one terminal, listen for anything coming out of the pipe. Then in another terminal we will send some text to the pipe. To listen to the pipe we will be using cat to just display whatever comes into the pipe.

$ cat my_fifo

Notice that when this runs, it blocks while it waits for something to come through the pipe. Now let's push some text through the pipe. In another terminal window, I'll just echo some text to the pipe.

$ echo "hello world" > my_fifo

As soon as I press enter on this command, I see that the other terminal outputs "hello world" and then exits.

Now that we have a basic understanding of how the pipe works, we can set it up to run some commands. I'm going to throw my listener into a shell script that will create our pipe and start listening to and executing anything that comes out of it.

  if [ ! -p commands ]; then
    mkfifo commands

  while true; do
    sh -c "clear && $(cat commands)"

The first three lines create the pipe. The last three lines setup the listener. Remember how the process ended after the first command was sent? Well, putting this into a loop allows us to call cat repeatedly. I also added a clear call to clear my screen between each command. Now to test it out:

$ sh ./setup_listener.rb

In another terminal:

$ echo 'ls -l' > commands

You'll notice that the command clears the screen in the other terminal and displays the output of the command ls -l. Nice!

Now let's put this to some practical use. I have a Rails application that has some Minitest tests and a small script that I've put together to run all the tests. Here's the content of of my test runner:

# run_all_tests.rb

#!/usr/bin/env ruby

files = Dir.glob('test/**/*_test.rb')
files.each{|file| require file.sub(/^test\/|.rb$/,'')}

The script is a simple ruby script that adds the test directory to the LOAD_PATH and then just requires all the files in the test directory that start with "test". I make this script executable using a chmod +x command. Then to run it, I just call

$ ./run_all_tests.rb

Simple. To run individual tests, I just run:

$ minitest test/test_foo.rb

With these two commands in mind, I can now put together everything I need for my pipe. First I'm going to vertically split my screen (You can use tmux, or whatever tool you'd like. I like iTerm's simple split screen for this.) In my terminal on the right, I'm going to startup my listener just like I did above. In the terminal on the right, I'm going to start up vim and bring up my test file. To execute my test, I'll just use vim's :! command to execute the test command in the shell using the % as a placeholder for the current file name in my active buffer.

:!echo "minitest %" > commands

If I've done everything right, my test will run on the terminal on the right and then wait for my next command. W00T! Now I'm going to run all my tests:

:!echo "./run_all_tests.rb" > commands

Immediately upon pressing enter, all my tests are running on the right pane! Hooray!

To make things a little easier in vim, I setup some key mappings for running the tests.

:nmap <leader>g :w\|:silent !echo "minitest %" > commands<cr>
:nmap <leader>G :w\|:silent !echo "./run_all_tests.rb" > commands<cr>

Now I don't necessarily want these shortcuts all the time so I'm going to add these shortcuts to a file called setup_test_shortcuts.vim. Then to activate them I just run :source setup_test_shortcuts.vim. Now I have a simple shortcut in vim for running my tests!

If you are using zeus then you will need to modify your shortcuts to look like this:

nmap <leader>g :w\|:silent !echo "zeus test %" > commands<cr>
nmap <leader>G :w\|:silent !echo "zeus test ./run_all_tests.rb" > commands<cr>

Using Unix to help me in my workflow always brings a big smile to my face as I see the gains in productivity. Simple Unix concepts continue to blow my mind at their practicality and underlying simplicity.

Railsbridge NYC

Last week I attended a Ruby on Rails workshop hosted by Railsbridge NYC. The organization promotes diversity in tech by introducing web development concepts to a community of technology professionals and enthusiasts. It's geared towards women, but open to all.


As the website describes, the workshop consists of an Installfest (about a 2 hour event), followed by a full day of learning the following day. The Installfest is a series of step-by-step instructions to install Ruby, Rails and other tools you would need for your particular OS. The detailed instructions helped make the entire process effortless, and most students were able to install all the tools without major issues.

An Introduction to Ruby and Rails

The next day was the actual workshop, beginning promptly at 9am. The skill level of the women in attendance varied; some were programmers proficient in other languages, while others, like myself, were new to programming. After a short welcome presentation, the organizers divided the group into two, with the "more advanced" students heading off to their own room for a slightly more quick-paced session.

I remained in the beginner group, which was significantly larger. The day was divided in 2 parts - the first was a focus on Ruby, and the second was on Rails. The curriculum was quite dense, and we weren't able to cover all the concepts due to a restriction on time. However, what I found most valuable was that the organizers and volunteers helped foster a no-judgment atmosphere where all our "beginner" questions were answered, and no one felt lost for too long. The volunteers took the time to clear any and all doubts, even about topics that were not covered in the curriculum. And although we were unable to get to all the topics, there were a lot of follow-up resources that were shared among the group, including a Meetup called NYC Ruby Women. At the end of the day, each of us even received a certificate of completion!


It was great to meet other newbie Ruby developers who were going through some of the same beginner pains that I was. The workshop was a positive experience, and gave me the push I periodically need to keep myself immersed in technology. If you are interested in learning more about Ruby and Rails, or just meeting fellow technologists active in the supportive Ruby community, consider attending a Railsbridge workshop near you.

Installing PostgreSQL without Root

PostgreSQL can be installed using installers prepared for your operation system. However this way you just depend on the installation settings chosen by the packages mainainers. Installation requires root privileges, on some machines programmers are not allowed to do that. What’s more, this way you rather will not install the PostgreSQL beta version.

The only way to install Postgres without root privileges, in home directory, is to compile it from sources. That’s not very difficult.

Download Sources

First of all you need to download sources. I use Github for getting the latest sources. There is Postgres Github mirror. I clone that, but you could just download zip file.

Unpack it somewhere, and you have the Postgres sources you need.

Install Needed Software

For compiling Postgres you will need some libraries and programs. The complete list can be found in Postgres documentation.

I’m using Ubuntu, the packages I use for compiling Postgres are:

  • gcc – C compiler
  • libreadline6, libreadline6-dev – readline support
  • zlib1g, zlib1g-dev – compression library used internally by Postgres
  • libpython2.7, libpython2.7-dev – for compiling with plPython support

If you are using different system, or different system/Postgres version, then your packages/libraries can be named differently.


Now you should enter the directory where your sources are and run below command for source configuration:

./configure --prefix=$HOME/postgres/ --with-python PYTHON=/usr/bin/python2.7

The --prefix parameter shows the path where Postgres will be installed.

The --with-python parameter enables compiling with plpython support.

PYTHON parameter points to current python binary installation.

The configure command should finish without any errors. If you have any errors, most probably you don’t have some needed libraries installed.


If configure succeeded, you can compile the sources. It is simple:

make -j 4

The -j parameter allows for this maximum number of jobs at the same time.

My computer has 4 cores, I want to use all of them, this way compilation time will be much shorter. On my laptop compilation with 4 cores takes 1 minute 26 seconds. Using one core is almost 4 times longer.


If compilation ended without error, you can install the database. Installation copies all files into the directory from the --prefix parameter. For installation just run:

make install

This should create four directories in ~/postgres/

* bin
* include
* lib
* share

Create database

I’m going to keep the database in the same directory as the installed files. The data will be at ~/postgres/data directory. For this I need to use initdb program, but not this one installed at system level, but this one from ~/postgres/bin/ directory:

~/postgres/bin/initdb -D ~/postgres/data/

Your output should look like this:

The files belonging to this database system will be owned by user "szymon".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.

creating directory /home/szymon/postgres/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
creating configuration files ... ok
creating template1 database in /home/szymon/postgres/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

/home/szymon/postgres/bin/postgres -D /home/szymon/postgres/data/

/home/szymon/postgres/bin/pg_ctl -D /home/szymon/postgres/data/ -l logfile start

Take a look at the last couple of line. This shows you exact commands needed for running Postgres at this location with this database.


Let’s run, I will use the first command, the log lines will be printed on console:

/home/szymon/postgres/bin/postgres -D /home/szymon/postgres/data/
LOG:  database system was shut down at 2013-06-05 11:08:10 CEST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started


Connect to the database. One important notice, you should use the psql program which you’ve already installed:

~/postgres/bin/psql --version
psql (PostgreSQL) 9.3beta1

The PostgreSQL installation uses my system username szymon as Postgres admin name. The default database created in Postgres is postgres. To login to this database I use:

~/postgres/bin/psql -U szymon postgres
psql (9.3beta1)
Type "help" for help.

Let’s also check the database version:

postgres=# select version();
 PostgreSQL 9.3beta1 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3, 64-bit
(1 row)

And check the data directory:

postgres=# show data_directory ;
(1 row)


All configuration files are stored in the data directory: ~/postgres/data. If you want to run multiple Postgres versions on the same machine, they must have different data directories and different ports. The port number can be changed in ~/postgres/data/postgresql.conf.

Deleting Installation

If you want to delete the installation – nothing easier. Just stop the Postgres if it is running, and delete the whole ~/postgres/ directory.

If you want to delete the whole data – just stop Postgres if it is running, and delete ~/postgres/data directory. Then you can run initdb once again to have a brand new database.


I know that there are many ways of installing, and compiling Postgres in local directory. I know that there can be made some more advanced tweaks and automations. I also have my makefile with some default values.

My goal here was to show that it is very easy and there is nothing too difficult for programmers to do it, without asking for root privileges for installation or even starting/stopping database.

This method also works if you need some older Postgres versions and there are no packages for your system, you don’t want to mess with existing packages or the packages require some old libraries you cannot install.

Converting root filesystem from ext3 to ext4 on CentOS and RHEL 5.9

Here's a quick explanation of the procedure to convert the root / filesystem on RHEL and CentOS 5.9 from ext3 to ext4, because ext3 wasn't available during install time.

Note that this is not a configuration Red Hat supports, but it works fine. (I believe you cannot convert the /boot filesystem to ext4 on standard RHEL/CentOS 5 because its GRUB can't handle it, but you can convert all the other filesystems.)

Ideally do this only on a fairly freshly-installed system you don't mind losing. Back everything up first unless this is a system you don't mind destroying! This is a risky operation and (ahem) things can go wrong.

You'll need direct console or KVM access to the server. You can do without that if you can remount -o ro / but that usually won't work with sshd or other daemons that keep files open on the / filesystem.

You will of course need to adapt the current kernel version and root filesystem block device path in the examples below.

Now, to live dangerously:

  • yum -y install e4fsprogs
  • Edit /etc/fstab so that the / filesystem is mounted as ext4 (which works with the existing ext3 filesystem as well). If you're using a battery-backed RAID controller you may want to add the nobarrier mount option. See man mount to read about what that choice entails.
  • mkinitrd --with=ext4 --with=ext3 -f /boot/initrd-2.6.18-348.el5.img 2.6.18-348.el5
  • shutdown -r now
  • Boot into single-user mode: use GRUB to edit the linux arguments adding " single" to the end
  • fsck.ext3 -pf /dev/vg0/lv_root
  • tune4fs -O extents,uninit_bg,dir_index /dev/vg0/lv_root
  • fsck.ext4 -yfD /dev/vg0/lv_root
  • shutdown -r now
  • Allow it to boot normally into multi-user mode.

There are lots of articles out there about converting from ext3 to ext4, but none I found that covered RHEL/CentOS 5 specifically and contained all the needed steps and actually worked. This is based on info from here for Debian and here for CentOS.

PostgreSQL Functional Indexes

PostgreSQL has got the great feature named “functional indexes”. A normal index just stores sorted values of some field. It is great for searching, as the values are already sorted.

You can create an index with a simple query like:

CREATE INDEX i_test ON test (i);

It will store all values of column i from table test. This index can be used with a query like:

SELECT * FROM test WHERE i < 100 ORDER BY i;

Functional Indexes

There is also something I like most. Index can store all values you want, they don’t need to be values from the table. You can use values calculated from the table columns. They will be sorted, so searching with those indexes will be pretty fast.

Creating such index is simple:

CREATE INDEX i_test_lower_i ON test (lower(i));

The main rule is: this index can be used if you have the same function call in your query, something like:

SELECT * FROM test WHERE lower(i) = 'aaa';


Let’s check something more complicated. My test table looks like:

CREATE TABLE test(t timestamp);

I filled this table with sample data. We need some bigger number of rows:

INSERT INTO test(t) SELECT generate_series(now() - '1 year'::interval, now(), '1 minute');

This way there are 500k rows.

I need to get two row sets from database. First I will get the rows with dates from the last 10 days. Later I will get all rows with dates from current year.

The Last 10 Days

I can get the rows with dates from the last 10 days like:

postgres=# explain analyze select t from test where t::date > (now() - '10 days'::interval)::date;
                                                  QUERY PLAN                                                   
 Seq Scan on test  (cost=0.00..14152.02 rows=175200 width=8) (actual time=265.640..272.701 rows=13558 loops=1)
   Filter: ((t)::date > ((now() - '10 days'::interval))::date)
   Rows Removed by Filter: 512043
 Total runtime: 273.152 ms
(4 rows)

For speeding this up I will create an index storing sorted dates:

CREATE INDEX i_test_t ON test((t::date));

With this index the plan is much better and the query is much faster:

postgres=# explain analyze select t from test where t::date > (now() - '10 days'::interval)::date;
                                                       QUERY PLAN                                                        
 Index Scan using i_test_t on test  (cost=0.43..459.23 rows=13817 width=8) (actual time=0.083..8.337 rows=13558 loops=1)
   Index Cond: ((t)::date > ((now() - '10 days'::interval))::date)
 Total runtime: 9.990 ms
(3 rows)

This index will also be used when you want to sort the results using the same values as stored in index:

postgres=# explain analyze select t from test where t::date > (now() - '10 days'::interval)::date order by t::date asc;
                                                        QUERY PLAN                                                        
 Index Scan using i_test_t on test  (cost=0.43..493.78 rows=13817 width=8) (actual time=0.080..13.479 rows=13558 loops=1)
   Index Cond: ((t)::date > ((now() - '10 days'::interval))::date)
 Total runtime: 15.833 ms
(3 rows)

And even when you sort backwards:

postgres=# explain analyze select t from test where t::date > (now() - '10 days'::interval)::date order by t::date desc;
                                                            QUERY PLAN                                                             
 Index Scan Backward using i_test_t on test  (cost=0.43..493.78 rows=13817 width=8) (actual time=0.053..13.847 rows=13558 loops=1)
   Index Cond: ((t)::date > ((now() - '10 days'::interval))::date)
 Total runtime: 16.230 ms
(3 rows)

All Rows for This Year

You can get current year from a timestamp or date value with the extract function.

postgres=# SELECT extract( year from '2013-01-01'::date);
(1 row)
postgres=# SELECT extract( year from now());
(1 row)

This can be used for getting all the rows with this year dates:

select t from test where extract(year from t) = extract(year from now());

Check the plan:

postgres=# explain analyze select t from test where extract(year from t) = extract(year from now());
                                                  QUERY PLAN                                                  
 Seq Scan on test  (cost=0.00..12838.02 rows=2628 width=8) (actual time=136.349..235.307 rows=223670 loops=1)
   Filter: (date_part('year'::text, t) = date_part('year'::text, now()))
   Rows Removed by Filter: 301931
 Total runtime: 242.259 ms
(4 rows)

It doesnt look too good. Let’s use the functional index. This time I need to store the year number:

CREATE INDEX i_test_t_year ON test (extract(year from t));

The plan should improve, so the query should be faster now:

postgres=# explain analyze select t from test where extract(year from t) = extract(year from now());
                                                            QUERY PLAN                                                            
 Index Scan using i_test_t_year on test  (cost=0.43..7426.77 rows=225448 width=8) (actual time=0.052..51.715 rows=223670 loops=1)
   Index Cond: (date_part('year'::text, t) = date_part('year'::text, now()))
 Total runtime: 60.969 ms
(3 rows)


As you can see functional indexes can be used for storing calculated values which you can use for searching. In other database engines this can be achieved with additional columns storing those values (updated with triggers or application) or materialized views.

As usually there are also some cons of this solution. The index needs to be updated each time you change the data, this means that inserts, updates and deletes will be slower. It doesn’t mean that it will be easily noticeable. Another thing worth remembering is that those indexes are stored on disk, so each index means more disk operations. It also means longer time for restoring dumps. The index data is not stored in a database dump – there is stored only index definition, and it needs to be recreated when loading dump into database.

All this means that you should check if the index you want to create is really needed, not used indexes are just a waste of resources which could be used some better way.

Creating custom button graphics in Android

In the Android timesheet app I'm working on, I have a scrollable layout of RadioButtons for the user to pick how much time they've spent on a project (see my earlier blog post about it), and for that I use custom button graphics to make it look nice. So, I'm going to show you how to do that with 9-patch PNGs and selector XML.

First, what's a 9-patch PNG? A 9-patch is a special PNG image where you specify regions that can be stretched to make room for text. Android will automatically resize a 9-patch to best fit whatever contents you give it. The tool you need to create a 9-patch image is included in the Android SDK Tools, so download that if you haven't already. More information about 9-patch images can be found here.

Okay! I've got custom button graphics (72x72 for HDPI screens), drawn in the Gimp and saved in my project's res/drawable-hdpi/ folder as button_selected.png and button_unselected.png:

To convert it to a 9-patch, browse to the tools/ directory of the Android SDK and run draw9patch. This will open a window with a graphical editor on the left, and a button preview on the right. The editor window is for specifying which parts of the image will be stretched. The top and left edges show this, and the right and bottom edges show which parts of the image can contain the text you put in the button.

When you've finished with draw9patch, save the images in the same place, but with .9.png as the file extension (in this case, res/drawable-hdpi/button_selected.png will be res/drawable-hdpi/button_selected.9.png). Make sure to delete the old images, because Android R (the generated resource class) doesn't use file extensions, so it can't tell the difference between our two image types.

Now, let's try making a button with our custom graphics. Add a Button to your Activity XML, like so:


 <RelativeLayout xmlns:android=""  
   tools:context=".MainActivity" >  
     android:text="@string/hello_world" />  

Now, when we run it, it looks like this:

So we have our custom background working, but it's the same (red) whether or not you're pushing it. To use different images for different states, we can use selector XML. With ours, we just have two images, so it's simple:


 <?xml version="1.0" encoding="utf-8"?>  
 <selector xmlns:android="" >  
   <!-- When selected, use this image -->  
   <item android:drawable="@drawable/button_selected"  
     android:state_pressed="true" />  
   <!-- When not selected, use this image -->  
   <item android:drawable="@drawable/button_unselected"  
     android:state_pressed="false" />  

Now, instead of pointing our Button at an image directly, we can reference the XML instead:



And when we run it, it looks great (assuming you like bright red)!

This project is on GitHub, if you'd like to download it and try some stuff.

JSConf US — Day 2

Choose Your Own Adventure

For day two of JSConf, the organizers decided to try something new. Rather than a day of scheduled talks, attendees could choose from a variety of activities. There was kayaking, golf, segway tours, scavenger hunts or you could just hang out at the pool if you like. There was also the opportunity to hack on NodeBots or Node Copters. While the outdoor stuff sounded awesome, I opted to hack on and play with the nodecopters.

Node Copter

With the help of nodejitsu, Chris Williams was able to bring 50 Parrot AR Drone quadcopters for teams to play with and hack on with JavaScript. Felix Geisendorfer and a contingent of volunteers showed us how to fly the quadcopters (with either an iOS or Android device) and how they could be controlled with JavaScript code. Felix is the author of the ar-drone node.js module which makes this possible. With a laptop connected to the quadcopter, the following code is all you need to have it take off, rotate a little bit, perform a flip and then land:

var arDrone = require('ar-drone');
var client = arDrone.createClient();


  .after(5000, function() {
  .after(3000, function() {
    this.animate('flipLeft', 15);
  .after(1000, function() {

The Challenge

I worked in a team with Ryan Seddon and John Buckley on a challenge set out at the beginning of the day by Felix. We used Open CV (Computer Vision library) initially and later the on-board compass data to have the nodecopter take off, locate a marked line on the floor, fly to the end of the line and then land in the prescribed location. We were close several times but did not successfully complete the challenge in the end. This was dissappointing but we still had a blast working on it throughout the day.


At the end of the day, many of the teams demonstrated what their team had come up with. This was a lot of fun to watch. Typically, each nodecopter operates as a wifi access point. To control the nodecopter, you connect your computer to that wireless network and then gain control. This works well but has the limitation of only operating one nodecopter at a time. Several teams worked on hacks to enable multiple nodecopters to fly at the same time. This short clip I filmed, shows one such squadron of nodecopters flying in unison.

Many thanks to Felix and his team, Chris for bringing in so many nodecopters (and batteries), and to nodejitsu for sponsoring this great event!

Window functions in action

Image by Wikimedia user Ardfern

Yesterday I ran on to a nice practical application of a number of slightly unusual SQL features, in particular, window functions. PostgreSQL has had window functions for quite a while now (since version 8.4, in fact, the oldest version still officially supported), but even though they're part of the SQL standard, window functions aren't necessarily a feature people use every day. As a bonus, I also threw in some common table expressions (also known as CTEs, also a SQL standard feature), to help break up what could have been a more confusing, complex query.

A client of ours noticed a problem in some new code they were working on. It was possible for users to submit duplicate orders to the system in quick succession, by double-clicking or something similar. This was fixed in the code easily enough, but we needed to clean up the duplicate orders in the database. Which meant we had to find them. We defined a group of duplicates as all orders involving the same line items, with one of a set of possible status codes, created in an interval of less than five minutes by the same user.

This discussion of the time interval between two different records should immediately signal "window functions" (or possibly a self-join, but window functions are much easier in this case). A window function takes a set of rows and lets you chop them up into subsets, processing the subsets in various ways. In this case, we want to take all the rows in the orders table with a particular status value, group them by the customer who placed the order as well as by the items in the order, and then evaluate each of those groups.

As might be expected, the items associated with an order are in a different table from the orders themselves. There are probably several different ways I could have compared the items; I chose to accumulate all the items in an order into an array, and compare the resulting arrays (I imagine this would be awfully slow if orders had many different items attached to them, but it wasn't a problem in this instance). An item consists of its ID value, and an integer quantity; I created a composite type, so I could compare these values as an array.

CREATE TYPE order_item AS (
   item_id integer,
   quantity integer

Now I need a query that will gather all the items on an order into an array. I'll use the array_agg() aggregate to do that. While I'm at it, I'll also filter out order status codes I don't want. One important note here is that later, I'll be comparing the results of array_agg() with each other, and array element ordering will matter. So I need to sort the rows as they're being aggregated. Fortunately we've been able to do that easily since version 9.0, with ordered aggregates (NB! many of the features in use in this post are part of the SQL standard; ordered aggregates are a PostgreSQL-specific extension).

SELECT, o.created_at, o.user_id,
    array_agg((item_id, quantity)::order_item ORDER BY item_id, quantity) AS order_items
    orders o
    JOIN order_items oi
        ON ( = oi.order_id)
WHERE o.status >= 200 AND o.status <= 400

The result, taken from a sample data set I created for demonstration purposes, is this:

 id |         created_at         | user_id |        order_items         
  4 | 2013-06-04 17:56:22.857817 |       3 | {"(2,3)"}
  5 | 2013-06-04 17:57:11.099472 |       1 | {"(1,11)","(2,1)","(3,2)"}
  1 | 2013-06-04 17:56:16.017938 |       1 | {"(1,10)","(2,1)","(3,2)"}
  2 | 2013-06-04 17:56:19.27393  |       1 | {"(1,10)","(2,1)","(3,2)"}
  3 | 2013-06-04 17:56:21.137858 |       2 | {"(1,1)"}
(5 rows)

Now I need to compare various rows in this list, and that's where window functions come in. A call to a window function looks like any other function call, except that it is followed by a "window definition", which describes the window of rows the function will operate on: how rows are grouped into windows, and optionally, how rows within a window are sorted. Here a query I used to get started.

    id, created_at,
    first_value(created_at) OVER (PARTITION BY user_id ORDER BY created_at)
FROM orders

The OVER and subsequent parenthetical expression is the window clause. This one tells PostgreSQL to group all the orders by the user ID that created them, and sort them in ascending order of their creation time. In future iterations I'll need to partition by the items in the order as well, but we're keeping it simple for this query. The first_value() function is the actual window function, and returns the expression it is passed, evaluated in terms of the first row in the window. In this case the sort order is important; without it, there is no guarantee which of the window's rows is considered "first". There are lots of other window functions available, documented here.

It's time to start combining these queries together. For this, I like to use common table expressions, which essentially let me define a named view for purposes of just this query. My first such expression will gather the items associated with an order into an array. The second part of the query will use window functions to compare the results of the first part with each other.

WITH order_items_array AS (
    SELECT, o.created_at, o.user_id,
        array_agg((item_id, quantity)::order_item ORDER BY item_id, quantity) AS order_items
        orders o
        JOIN order_items oi
            ON ( = oi.order_id)
    WHERE o.status >= 200 AND o.status <= 400
    id, user_id, created_at,
    first_value(id) OVER user_order_partition AS first_id,
    first_value(created_at) OVER user_order_partition AS first_created
FROM order_items_array
WINDOW user_order_partition AS (PARTITION BY user_id, order_items ORDER BY created_at);

This gives these results:

 id | user_id |         created_at         | first_id |       first_created        
  1 |       1 | 2013-06-04 17:56:16.017938 |        1 | 2013-06-04 17:56:16.017938
  2 |       1 | 2013-06-04 17:56:19.27393  |        1 | 2013-06-04 17:56:16.017938
  5 |       1 | 2013-06-04 17:57:11.099472 |        5 | 2013-06-04 17:57:11.099472
  3 |       2 | 2013-06-04 17:56:21.137858 |        3 | 2013-06-04 17:56:21.137858
  4 |       3 | 2013-06-04 17:56:22.857817 |        4 | 2013-06-04 17:56:22.857817
(5 rows)

This shows the order ID, user ID, and creation timestamp, as well as the ID and creation timestamp of the first order in the window each row belongs to. Note that in order to avoid having to retype the same long window definition twice, I used an alternate syntax whereby I created a named window definition, and referred to that name. Anyway, you can see that the first two rows have the same first_id value; this means they're duplicates, and we want to get rid of one of them. You'll have to trust me that that's the only duplicated order in my sample database; suffice it to say that these results are, in fact, correct. I'll decide (because it turns out to be easier this way) to keep the earliest of the duplicate orders, so from the results above, I can see that I want to remove order ID 2. It would be nice, though, to have a list of just the order IDs I need to remove without any other information. Even better, a list of SQL commands to run to remove them. Like this:

WITH order_items_array AS (
    SELECT, o.created_at, o.user_id,
        array_agg((item_id, quantity)::order_item ORDER BY item_id, quantity) AS order_items
        orders o
        JOIN order_items oi
            ON ( = oi.order_id)
    WHERE o.status >= 200 AND o.status <= 400
), order_duplicates AS (
        id, user_id, created_at,
        first_value(id) OVER user_order_partition AS first_id,
        first_value(created_at) OVER user_order_partition AS first_created
    FROM order_items_array
    WINDOW user_order_partition AS (PARTITION BY user_id, order_items ORDER BY created_at)
    SELECT 'DELETE FROM orders WHERE id = ' || id || ';' FROM order_duplicates
    WHERE first_id != id;

...which returns this one result:

 DELETE FROM orders WHERE id = 2;
(1 row)

So by combining common table expressions, ordered aggregates, composite types, arrays, and window functions, we've successfully cleaned up this database. Until we find another application bug...

JSConf US 2013 — Day One

Room With A View

I attended JSConf in Amelia Island, FL last week. As you can see, the venue was pretty spectacular and the somewhat remote location lent itself very well to the vision set by the conference organizers. Many developers myself included, often find the line between work and play blurring because there is much work to be done, many new open source projects to check out, constant advancements in browser technology, programming languages, you name it. Keeping up with it all is fun but can be challenging at times. While the talks were amazing, the focus and ethos of JSConf as I experienced it was more about people and building on the incredible community we have. I highly recommend attending meetups or conferences in your field if possible.

Without further ado, I've written about some of the talks I attended. Enjoy!

Day One

Experimenting With WebRTC

Remy Sharp presented the first talk of the day about his experience building a Google Chrome Experiment with WebRTC and the peer to peer communication API. The game (headshots), was built to work specifically on Chrome for Android Beta. Because WebRTC is so young, the libraries supporting it (Peer.js, easyRTC,, SimpleWebRTC) are very young as well and developing quickly. He found that "Libraries are good when you're fumbling, bad when stuff doesn't work". The "newness" of WebRTC ate much more time than he had anticipated. Remi demoed the game and it looked like good fun. If you're interested in checking out headshots further, the source code is up on GitHub.

JavaScript Masterclass

Angelina Fabbro gave a talk about levelling up your skills as a developer. She first broke everyone's heart by telling us "we're not special" and that nobody is a natural born programmer. She presented some data (studies etc) to support her point and argued that early exposure to the practice of logical thinking and practicing programming can make you seem like a natural.

She described several ways to know you're not a beginner:

  • You can use the fundamentals in any language
  • You are comfortable writing code from scratch
  • You peek inside the libraries you use
  • You feel like your code is mediocre and you're unsure what to do about it
...and ways to know you're not an expert:
  • You don't quite grok (understand) all the code you read
  • You can't explain what you know
  • You aren't confident debugging
  • You rely on references/docs too much

“Welcome to the ambiguous zone of intermediate-ness”.

Angelina suggested several ways to improve your skills. Ask "why?" obsessively, teach or speak at an event, work through a suggested curriculum, have opinions, seek mentorship, write in another language for a while etc. One book she specifically recommended was Secrets of the JavaScript Ninja by John Resig.

JavaScript is Literature is JavaScript

Angus Croll from Twitter presented a hilariously entertaining talk in which he refactored some JavaScript functions in the literary styles of Hemingway, Shakespeare, Bolaño, Kerouac, James Joyce and other literary figures. The talk was inspired by a blog post he'd written and had the entire conference hall erupting with laughter throughout.


Learning New Words

Andrew Dupont continued in the literary, language-oriented vein, giving a talk which drew a parallel between olde english purists who did not want to adopt any "new" words and the differing views surrounding the EcmaScript 6 specification process. Dupont's talk was very thought-provoking especially in light of the resistance to some proposals in the works (e.g. ES6 modules). Check out his slides — the video will also be published in future.


Flight twitter

Dan Webb spoke about Flight, a client-side framework developed by the team at Twitter and released this past January. They have been using it to run lots of and most of tweetdeck as well. Webb got a laugh out of the room when he recalled the first comment on Hacker News after Flight was announced: "Why not use Angular?". The motivation behind Flight's design was to make a "system that's easy to think about". This aim was achieved by decoupling components (entirely self-contained).

A Flight component is just a JavaScript object with a reference to a DOM node. The component can be manipulated, trigger events and listen to events (from other components). Applications are constructed by attaching Flight components to pieces of the DOM. Keeping the components independent in this way makes testing very easy. Complexity can be layered on but does not require any additional mental overhead. Dan suggested that Flight's design and architecture would work very well with Web Components and Polymer in future.

DevOps engineer job opening (remote)

End Point continues to grow! We are looking for a full-time, salaried DevOps engineer to work on projects with our internal server hosting team and our external clients. If you like to figure out and solve problems, if you take responsibility for getting a job done well without intensive oversight, please read on.

What is in it for you?

  • Work from your home office
  • Flexible full-time work hours
  • Health insurance benefit
  • 401(k) retirement savings plan
  • Annual bonus opportunity
  • Ability to move without being tied to your job location
  • Collaborate with a team that knows their stuff

What you will be doing:

  • Remotely set up and maintain Linux servers (mostly RHEL/CentOS, Debian, and Ubuntu), with custom software written mostly in Ruby, Python, Perl, and PHP
  • Audit and improve security, backups,reliability, monitoring (with Nagios etc.)
  • Support developer use of major language ecosystems: Perl's CPAN, Python PyPI (pip/easy_install), Ruby gems, PHP PEAR/PECL, etc.
  • Automate provisioning with Ansible, Chef, Puppet, etc.
  • Use open source tools and contribute back as opportunity arises
  • Use your desktop platform of choice: Linux, Mac OS X, Windows

What you will need:

  • Professional experience with Linux system administration, networking, iptables, Apache or nginx web servers, SSL, DNS
  • A customer-centered focus
  • Strong verbal and written communication skills
  • Experience directing your own work, and working from home
  • Ability to learn new technologies
  • Willingness to shift work time to evening and weekend hours on occasion

Bonus points for experience:

  • Packaging software for RPM, Yum, and apt/dpkg
  • Managing Amazon Web Services, Rackspace Cloud, Heroku, or other cloud hosting services
  • Working with PostgreSQL, MySQL, Cassandra, CouchDB, MongoDB, or other databases
  • Complying with or auditing for PCI and other security standards
  • Using load balancers, virtualization (kvm, Xen, VirtualBox, VMware), FC or iSCSI SAN storage
  • With JavaScript, HTML/CSS, Java/JVM, Node.js, etc.
  • Contributing to open source projects

About us

End Point is a 17-year-old Internet consulting company based in New York City, with 35 full-time employees working mostly remotely from home offices. We serve over 200 clients ranging from small family businesses to large corporations, using a variety of open source technologies. Our team is made up of strong ecommerce, database, and system administration talent, working together using ssh, Screen and tmux, IRC, Google+ Hangouts, Skype, and good old phones.

How to apply

Please email us an introduction to to apply. Include a resume and your GitHub or other URLs that would help us get to know you. We look forward to hearing from you!

CSS Conf 2013 — When Bootstrap Attacks!

Cssconf 2013

I attended the inaugural CSS Conf last week at Amelia Island, Florida. The conference was organized by Nicole Sullivan, Brett Stimmerman, Jonathan Snook, and Paul Irish and put on with help from a host of volunteers. The talks were presented in a single track style on a wide range of CSS-related topics; there was something interesting for everyone working in this space. I really enjoyed the conference, learned lots and had great discussions with a variety of people hacking on interesting things with CSS. In the coming days I will be blogging about some of the talks I attended and sharing what I learned, so stay tuned!

When Bootstrap Attacks

Pamela Fox had the opening slot and spoke about the experiences and challenges she faced when upgrading Bootstrap to V2 in a large web app (Coursera). What she initially thought would be a quick project turned into a month-long "BOOTSTRAPV2ATHON". Bootstrap styles were used throughout the project in dozens of PHP, CSS and JavaScript files. The fact that Bootstrap uses generic CSS class names like "alert", "btn", error etc made it very difficult to grep through the codebase for them. The Bootstrap classes were also used as hooks by the project's JavaScript application code.

Lessons Learned

Fox offered some tips for developers facing a similar situation. The first of which was to prefix the Bootstrap CSS classes (e.g. .tbs-alert) in order to decouple Bootstrap from the customizations in your project. Some requests have been made to the Bootstrap team on this front but the issue has not been addressed yet. In the meantime, devs can add a task to their build step (e.g. Grunt, the asset pipeline in Rails etc) to automate the addition of prefixes to each of the CSS classes.

Another tip is to avoid using Bootstrap CSS classes directly. Instead, use the "extend" functionality in your preprocessor (Sass, Less, Stylus etc) of choice. For example:

  .ep-btn {
    @extend .btn
      &:hover {
          @extend .btn:hover
This way your project can extend the Bootstrap styles but keep your customizations separate and not closely coupled to the framework.

The same logic should also be applied to the JavaScript in your project. Rather than using the Bootstrap class names as hooks in your JavaScript code, use a prefix (e.g. js-btn) or use HTML5 data attributes. Separating the hooks used for CSS styles from those used in JavaScript is very helpful when upgrading or swapping out a client-side framework like Bootstrap.

Test All Of The Things

Pamela wrapped up the talk by explaining how testing front end code would ease the pain of upgrading a library next time. There are many testing libraries available today which address some of these concerns. She mentioned mocha, Chai, jsdom and Selenium which all look very helpful. In addition to testing front end code she offered up the idea of "diffing your front end" in a visual way. This concept was very interesting to someone who ensures designs are consistent across a wide array of browsers and devices on a daily basis. Diff your front end

Needle is a tool which allows you to do this automatically. Once you develop a test case, you can run Needle to view a visual diff of your CSS changes. I think this is an excellent idea. Pamela also noted that the combination of Firefox screenshots and Kaleidoscope could be used manually in much the same way.

Many thanks to Pamela for sharing this! The slides for this talk can be viewed here and the talk was recorded so the video will also be available sometime soon.

PostgreSQL as NoSQL with Data Validation

PostgreSQL is a relational database with many great features. There are also many so called NoSQL databases, some of them, like CouchDB, are document databases. However the document in CouchDB is automatically enhanced with a "_id" field, if it is not present. When you want to get this one document, you can use this "_id" field - it behaves exactly like the primary key from relational databases. PostgreSQL stores data in tables' rows while CouchDB stores data as JSON documents. On one hand CouchDB seems like a great solution, as you can have all the different data from different PostgreSQL tables in just one JSON document. This flexibility comes with a cost of no constraints on the data structure, which can be really appealing at the first moment and really frustrating when you have a huge database and some of the documents contain bad values or there are missing some fields.

PostgreSQL 9.3 comes with great features which can turn it into a NoSQL database, with full transaction support, storing JSON documents with constraints on the fields data.

Simple Example

I will show how to do it using a very simple example of a table with products. Each product has a name, description, some id number, price, currency and number of products we have in stock.

PostgreSQL Version

The simple table in PostgreSQL can look like:

CREATE TABLE products (
    name TEXT,
    description TEXT,
    price DECIMAL(10,2),
    currency TEXT,
    in_stock INTEGER

This table allows us to insert products like:

INSERT INTO products (name, description, price, currency, in_stock) VALUES ('shoes', 'blue shoes', 12.34, 'dollars', 5); 

Unfortunately the above table also allows for adding rows missing some important information:

INSERT INTO products (name, description, price, currency, in_stock) VALUES ('', null, -20, 'handa', -42); 

This should be fixed by adding constraints in the database. Assume that we want to always have unique not empty name, not empty description, non negative price and in_stock, and the currency should always be dollars. The table with such constraints is:

CREATE TABLE products (
    description TEXT NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    currency TEXT NOT NULL,
    in_stock INTEGER NOT NULL,
    CHECK (length(name) > 0),
    CHECK (description IS NOT NULL AND length(description) > 0),
    CHECK (price >= 0.0),
    CHECK (currency = 'dollars'),
    CHECK (in_stock >= 0)
); >

Now all the operations, like adding or modifying a row, which violate any of those constraints, just fail. Let's check:

postgres=# INSERT INTO products (name, description, price, currency, in_stock) VALUES ('shoes', 'blue shoes', 12.34, 'dollars', 5);
postgres=# INSERT INTO products (name, description, price, currency, in_stock) VALUES ('shoes', 'blue shoes', 12.34, 'dollars', -1);
ERROR:  new row for relation "products" violates check constraint "products_in_stock_check"
DETAIL:  Failing row contains (2, shoes, blue shoes, 12.34, dollars, -1). 

NoSQL Version

In CouchDB the inserted row in the above table, would be just a JSON looking like this:

    "id": 1,
    "name": "shoes",
    "description": "blue_shoes",
    "price": 12.34,
    "currency": "dollars",
    "in_stock": 5

The Trivial Solution

In PostgreSQL we can store this JSON as a row in the products table:

CREATE TABLE products (
    data TEXT

This works like most of the NoSQL datatabases, no checks, no errors with bad fields. As a result, you can modify the data the way you want, the problem begins when your application expects that the price is a number, and you get a string there, or there is no price at all.

Validate JSON

CouchDB validates JSON before saving the document into database. In PostgreSQL 9.2 there is the nice type for that, it is named JSON. The JSON type can store only a proper JSON, there is validation performed before converting into this type.

Let's change the definition of the table to:

CREATE TABLE products (
    data JSON

We can insert correct JSON into this table:

postgres=# INSERT INTO products(data) VALUES('{
    "id": 1,
    "name": "shoes",
    "description": "blue_shoes",
    "price": 12.34,
    "currency": "dollars",
    "in_stock": 5
postgres=# SELECT * FROM products;
 {                               +
     "id": 1,                    +
     "name": "shoes",            +
     "description": "blue_shoes",+
     "price": 12.34,             +
     "currency": "dollars",      +
     "in_stock": 5               +
(1 row) 

This works, but inserting not a valid JSON ends with an error:

postgres=# INSERT INTO products(data) VALUES('{
    "id": 1,
    "name": "shoes",
    "description": "blue_shoes",
    "price": 12.34,
    "currency": "dollars",
    "in_stock": 5,
ERROR:  invalid input syntax for type json
LINE 1: INSERT INTO products(data) VALUES('{
DETAIL:  Expected string, but found "}".
CONTEXT:  JSON data, line 5: ...,
    "currency": "dollars",
    "in_stock": 5,

The problem with formatting can be hard to notice (I've added comma after the last field, JSON doesn't like it).

Validating Fields

OK, so we have a solution which looks almost like the first native PostgreSQL solution: we have data which validates. It doesn't mean the data is sensible.

Let's add checks for validating the data.

In PostgreSQL 9.3, which has not been released yet, there are some new great features for manipulating JSON values. There are defined operators for the JSON type, which give you easy access to the fields and values.

I will use only one operator '->>', but you can find more information in PostgreSQL documentation.

I also need to validate the types of the fields, including id field. This is something Postgres just checks because of the types definitions. I am going to use some other syntax for the checks, as I want to name it. It will be easier to look at problem with specific field instead of searching through the whole huge JSON.

The table with the constraints looks like this:

CREATE TABLE products (
    data JSON,
    CONSTRAINT validate_id CHECK ((data->>'id')::integer >= 1 AND (data->>'id') IS NOT NULL ),
    CONSTRAINT validate_name CHECK (length(data->>'name') > 0 AND (data->>'name') IS NOT NULL ),
    CONSTRAINT validate_description CHECK (length(data->>'description') > 0  AND (data->>'description') IS NOT NULL ),
    CONSTRAINT validate_price CHECK ((data->>'price')::decimal >= 0.0 AND (data->>'price') IS NOT NULL),
    CONSTRAINT validate_currency CHECK (data->>'currency' = 'dollars' AND (data->>'currency') IS NOT NULL),
    CONSTRAINT validate_in_stock CHECK ((data->>'in_stock')::integer >= 0 AND (data->>'in_stock') IS NOT NULL )

The "->>" operator allows me to get the value of a specific field from JSON, check if it exists and validate it.

Let's add a JSON without a description:

postgres=# INSERT INTO products(data) VALUES('{
    "id": 1,
    "name": "d", 
    "price": 1.0,
    "currency": "dollars",
    "in_stock": 5
ERROR:  new row for relation "products" violates check constraint "validate_description"
DETAIL:  Failing row contains ({
    "id": 1,
    "name": "d", 
    "price": 1.0,
    "currency...). >

There is one more validation left. The id and name fields should be unique. This can be easily done with two indexes:

CREATE UNIQUE INDEX ui_products_id ON products((data->>'id'));
CREATE UNIQUE INDEX ui_products_name ON products((data->>'name')); 

Now when you try to add a JSON document which id which already exists in database, then you will have an error like:

ERROR:  duplicate key value violates unique constraint "ui_products_id"
DETAIL:  Key ((data ->> 'id'::text))=(1) already exists.
ERROR:  current transaction is aborted, commands ignored until end of transaction block 

Id Generation

In NoSQL databases the id field is usually some UUID. This is an identifier generated with algorithms with a very small chance of generating the same value, even when you generate them on different machines. So I'm not going to touch it here.


You can search the JSON data normally like you were searching columns in a table. Let's search for the most expensive product we have in stock:

SELECT * FROM products WHERE in_stock > 0 ORDER BY price DESC LIMIT 1; 

The JSON version is very similar:

SELECT * FROM products WHERE (data->>'in_stock')::integer > 0 ORDER BY (data->>'price')::decimal DESC LIMIT 1; 

This query can be very inefficient. It needs to read all the rows, parse JSON fields and check the in_stock and price fields, convert into proper types and then sort. The plan of such a query, after filling the table with 100k rows, looks like this:

                                                        QUERY PLAN                                                          
 Limit  (cost=9256.48..9256.48 rows=1 width=32) (actual time=412.911..412.912 rows=1 loops=1)
   ->  Sort  (cost=9256.48..9499.05 rows=97027 width=32) (actual time=412.910..412.910 rows=1 loops=1)
         Sort Key: (((data ->> 'price'::text))::numeric)
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on products  (cost=0.00..8771.34 rows=97027 width=32) (actual time=0.022..375.624 rows=100000 loops=1)
               Filter: (((data ->> 'in_stock'::text))::integer > 0)
 Total runtime: 412.939 ms
(7 rows) 

The "Seq Scan" line means that PostgreSQL needs to read the whole table. The time of 412 ms is not that bad, but can we make it better?

Fortunately PostgreSQL has a great feature: indexes on expressions, also named as functional indexes. It can store in the index sorted values of some expressions, and if the same expressions occur in a query, then the index can be used.

The indexes I need are:

CREATE INDEX i_products_in_stock ON products(( (data->>'in_stock')::integer ));
CREATE INDEX i_products_price ON products(( (data->>'price')::decimal )); 

Notice the double parenthesis, they are required because of the non trivial expression.

The plan now looks a little bit different, after creating indexes and running analyze on the products table:

                                                                    QUERY PLAN                                                                     
 Limit  (cost=0.42..0.55 rows=1 width=32) (actual time=0.041..0.041 rows=1 loops=1)
   ->  Index Scan Backward using i_products_price on products  (cost=0.42..13690.06 rows=100000 width=32) (actual time=0.041..0.041 rows=1 loops=1)
         Filter: (((data ->> 'in_stock'::text))::integer > 0)
 Total runtime: 0.062 ms
(4 rows)

So it is 664k percent faster.

The JSON Advantage

The JSON solution has got one nice feature which the native PostgreSQL hasn't. The application can add its own fields on the fly without altering any table. JSON field is just a text, however with some validation. The new field won't be checked by the indexes and constraints I've shown you above.

What's more, you can add a constraint for this field later. This way you can have the best from both worlds: easy data model changing and consistent JSON structure across the database.

On the other hand you could of course add a trigger checking the JSON, before saving it to database, to check the list of available fields. This way you could prevent adding new fields by the application.


So, I've shown you how you can use PostgreSQL as a simple NoSQL database storing JSON blobs of text. The great advantage over the simple NoSQL databases storing blobs is that you can constrain the blobs, so they are always correct and you shouldn't have any problems with parsing and getting them from the database.

You can also query the database very easily, with huge speed. The ad-hoc queries are really simple, much simpler than the map-reduce queries which are needed in many NoSQL databases.