End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Rails: Devise and Email Capitalization

This week, I found a bug for one of our Rails clients that was worth a quick blog post. The client website runs on Rails 3.2.8 with ActiveRecord and PostgreSQL, uses RailsAdmin for an admin interface, Devise for user authentication, and CanCan for user authorization. Before we found the bug, our code looked something like this:

class SomeController < ApplicationController
  def some_method
    user = User.find_or_create_by_email(params[:email])
    # do some stuff with the user provided parameters
    if user.save
      render :json => {}
    else
      render :json => {}, :status => 500
    end
  end
end

It's important to note that the 500 error wasn't reported to the website visitor - there were no visible UI notes to indicate the process had failed. But besides that, this code looks sane, right? We are looking up or creating a user from the provided email, updating the user parameters, and then attempting to save. For the most part, this worked fine, until we came across a situation where the user data was not getting updated properly.

Looking through the logs, I found that the user experiencing the bug was entering mixed caps emails, for example, Steph@endpoint.com. Let's walk through the code in this scenario:

First, a new user is created because there is no user in the system with the exact email Steph@endpoint.com. However, a user does exist in the system tied to steph@endpoint.com.

user = User.find_or_create_by_email(params[:email]) # with "Steph@endpoint.com" 

No problems here:

# do some stuff with the user provided parameters

Below is where the issue is coming up. Devise, our user authentication gem, automatically downcases (lowercases) all emails when they are stored in the database. There is already a user tied to steph@endpoint.com, so user.save fails, a 500 error is thrown, but as an end-user, I don't see anything to indicate that my AJAX call failed.

if user.save

The moral of this story is that it's important to a) understand how plugins manipulate user data automatically (in this case Devise automatically filters the email) and b) test a variety of use cases (in this case, we hadn't considered testing mixed caps emails). Our updated code looks something like this, which downcases emails and upon failure, adds more to the logs for additional unexpected user update failures:

class SomeController < ApplicationController
  def some_method
    user = User.find_or_create_by_email(params[:email].downcase)
    # do some stuff with the user provided parameters
    if user.save
      render :json => {}
    else
      render :json => {}, :status => 500
      Rails.logger.warn "USER ERROR: #{user.errors.full_messages} #{user.attributes.inspect}"
    end
  end
end

Detecting table rewrites with the ctid column

In a recent article, I mentioned that changing the column definition of a Postgres table will sometimes cause a full table rewrite, but sometimes it will not. The rewrite depends on both the nature of the change and the version of Postgres you are using. So how can you tell for sure if changing a large table will do a rewrite or not? I'll show one method using the internal system column ctid.

Naturally, you do not want to perform this test using your actual table. In this example, we will create a simple dummy table. As long as the column types are the same as your real table, you can determine if the change will do a table rewrite on your version of PostgreSQL.

The aforementioned ctid column represents the physical location of the table's row on disk. This is one of the rare cases in which this column can be useful. The ctid value consists of two numbers: the first is the "page" that the row resides in, and the second number is the slot in that page where it resides. To make things confusing, the page numbering starts at 0, while the slot starts at 1, which is why the very first row is always at ctid (0,1). However, the only important information for this example is determining if the ctid for the rows has changed or now (which indicates that the physical on-disk data has changed, even if the data inside of it has not!).

Let's create a very simple example table and see what the ctids look like. When Postgres updates a row, it actually marks the current row as deleted and inserts a new row. Thus, there is a "dead" row that needs to be eventually cleaned out. (this is the way Postgres implements MVCC - there are others). The primary way this cleanup happens is through the use of VACUUM FULL, so we'll use that command to force the table to rewrite itself (and thus 'reset' the ctids as you will see):

postgres=# DROP TABLE IF EXISTS babies;
DROP TABLE

postgres=# CREATE TABLE babies (gender VARCHAR(10), births INTEGER);
CREATE TABLE

postgres=# INSERT INTO babies VALUES ('Girl', 1), ('Boy', 1);
INSERT 0 2

-- Note: the ctid column is never included as part of '*'
postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,1) | Girl   |      1
 (0,2) | Boy    |      1
(2 rows)

-- Here comes Ivy, another girl:
postgres=# UPDATE babies SET births = births+1 WHERE gender = 'Girl';
UPDATE 1

-- Note that we have a new ctid: slot 3 of page 0
-- The old row at (0,1) is still there, but it is deleted and not visible
postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,2) | Boy    |      1
 (0,3) | Girl   |      2
(2 rows)

-- The vacuum full removes the dead rows and moves the live rows to the front:
postgres=# VACUUM FULL babies;
VACUUM

-- We are back to the original slots, although the data is reversed:
postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,1) | Boy    |      1
 (0,2) | Girl   |      2
(2 rows)

That's what a table rewrite will look like - all the dead rows will be removed, and the rows will be rewritten starting at page 0, adding slots until a new page is needed. We know from the previous article and the fine documentation that Postgres version 9.1 is smarter about avoiding table rewrites. Let's try changing the column definition of the table above on version 8.4 and see what happens. Note that we do an update first so that we have at least one dead row.

postgres=# SELECT substring(version() from $$\d+\.\d+$$);
 substring 
-----------
 8.4

postgres=# DROP TABLE IF EXISTS babies;
DROP TABLE

postgres=# CREATE TABLE babies (gender VARCHAR(10), births INTEGER);
CREATE TABLE

postgres=# INSERT INTO babies VALUES ('Girl', 1), ('Boy', 1);
INSERT 0 2

-- No real data change, but does write new rows to disk:
postgres=# UPDATE babies SET gender = gender;
UPDATE 2

postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,3) | Girl   |      1
 (0,4) | Boy    |      1
(2 rows)

-- Change the VARCHAR(32) to a TEXT:
postgres=# ALTER TABLE babies ALTER COLUMN gender TYPE TEXT;
ALTER TABLE

postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,1) | Girl   |      1
 (0,2) | Boy    |      1
(2 rows)

We can see from the above that changing from VARCHAR to TEXT in version 8.4 of Postgres does indeed rewrite the table. Now let's see how version 9.1 performs:

postgres=# SELECT substring(version() from $$\d+\.\d+$$);
 substring 
-----------
 9.1

postgres=# DROP TABLE IF EXISTS babies;
DROP TABLe

postgres=# CREATE TABLE babies (gender VARCHAR(10), births INTEGER);
CREATE TABLe

postgres=# INSERT INTO babies VALUES ('Girl', 1), ('Boy', 1);
INSERT 0 2

-- No real data change, but does write new rows to disk:
postgres=# UPDATE babies SET gender = gender;
UPDATE 2

postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,3) | Girl   |      1
 (0,4) | Boy    |      1
(2 rows)

-- Change the VARCHAR(32) to a TEXT:
postgres=# ALTER TABLE babies ALTER COLUMN gender TYPE TEXT;
ALTER TABLE

postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
 -------+--------+--------
 (0,3) | Girl   |      1
 (0,4) | Boy    |      1
(2 rows)

We confirmed that the ALTER TABLE in this particular case does *not* perform a table rewrite when using version 9.1, as we suspected. We tell this by seeing that the ctids stayed the same. We could further verify by doing a vacuum full and showing that there were indeed dead rows that had been left untouched by the ALTER TABLE.

Note that this small example works because nothing else is vacuuming the table, as it is too small and transient for autovacuum to care about it. VACUUM FULL is one of three ways a table can get rewritten; besides ALTER TABLE, the other way is with the CLUSTER command. We go through all the trouble above because an ALTER TABLE is the only one of the three that *may* rewrite the table - the other two are guaranteed to do so.

This is just one example of the things you can do by viewing the ctid column. It is always nice to know beforehand if a table rewrite is going to occur, as it can be the difference between a query that runs in milliseconds versus hours!

Job Opening: DevOps Engineer

We're looking for a full-time, salaried DevOps engineer to work with our existing hosting and system administration team and consult with our clients on their needs. If you like to figure out problems, solve them, can take responsibility for getting a job done well without intensive oversight, please read on!

What is in it for you?

  • Work from your home office
  • Flexible full-time work hours
  • Health insurance benefit
  • 401(k) retirement savings plan
  • Annual bonus opportunity
  • Ability to move without being tied to your job location

What you will be doing:

  • Remotely set up and maintain Linux servers (mostly RHEL/CentOS, Debian, and Ubuntu), daemons, and custom software written mostly in Ruby, Python, Perl, and PHP
  • Audit and improve security, reliability, backups, monitoring (with Nagios etc.)
  • Support developer use of major language ecosystems: Perl's CPAN, Python PyPI (pip/easy_install), Ruby gems, PHP PEAR/PECL, etc.
  • Automate provisioning with Chef, Puppet, etc.
  • Work with internal and customer systems and staff
  • Use open source tools and contribute back as opportunity arises
  • Use your desktop platform of choice: Linux, Mac OS X, Windows

What you will need:

  • Professional experience with Linux system administration, networking, firewalls, Apache or nginx web servers, SSL, DNS
  • A customer-centered focus
  • Strong verbal and written communication skills
  • Experience directing your own work, and working from home
  • Ability to learn new technologies
  • Willingness to shift work time to evening and weekend hours when required

Bonus points for experience:

  • Packaging software for RPM, Yum, and apt/dpkg
  • Managing Amazon Web Services, Rackspace Cloud, Heroku, or other cloud hosting services
  • Working with PostgreSQL, MySQL, Cassandra, CouchDB, or other databases
  • Complying or auditing for PCI and other security standards
  • Using load balancers, virtualization (kvm, Xen, VirtualBox, VMware), FC or iSCSI SAN storage
  • With JavaScript, HTML/CSS, Java/JVM, Node.js, etc.
  • Contributing to open source projects

About us

End Point is a 17-year-old Internet consulting company based in New York City, with 31 full-time employees working mostly remotely from home offices. We serve over 200 clients ranging from small family businesses to large corporations, using a variety of open source technologies. Our team is made up of strong ecommerce, database, and system administration talent, working together using ssh, Screen and tmux, IRC, Google+ Hangouts, Skype, and good old phones.

How to apply

Please email us an introduction to jobs@endpoint.com to apply. Include a resume and your GitHub or other URLs that would help us get to know you. We look forward to hearing from you!

PostgreSQL search_path Behaviour

PostgreSQL has a great feature: schemas. So you have one database with multiple schemas. This is a really great solution for the data separation between different applications. Each of them can use different schema, and they also can share some schemas between them.

I have noticed that some programmers tend to name the working schema as their user name. This is not a bad idea, however once I had a strange behaviour with such a solution.

I'm using user name szymon in the database szymon.

First let's create a simple table and add some values. I will add one row with information about the table name.

# CREATE TABLE a ( t TEXT );
# INSERT INTO a(t) VALUES ('This is table a');

Let's check if the row is where it should be:

# SELECT t FROM a;

        t        
-----------------
 This is table a
(1 row)

Now let's create another schema, name it like my user's name.

# CREATE SCHEMA szymon;

Let's now create table a in the new schema.

# CREATE TABLE szymon.a ( t TEXT );

So there are two tables a in different schemas.

# SELECT t FROM pg_tables WHERE tablename = 'a';

 schemaname | tablename | tableowner | tablespace | hasindexes | hasrules | hastriggers 
------------+-----------+------------+------------+------------+----------+-------------
 public     | a         | szymon     | \N         | f          | f        | f
 szymon     | a         | szymon     | \N         | f          | f        | f
(2 rows)

I will just add a row similar to the previous one.

# INSERT INTO szymon.a(t) VALUES ('This is table szymon.a');

Let's check the data in the table "szymon.a".

# SELECT t FROM szymon.a;

           t            
------------------------
 This is table szymon.a
(1 row)

OK, now I have all the data prepared for showing the quite interesting behaviour. As you might see in the above queries, selecting table "a" when there is only one schema works. What's more, selecting "szymon.a" works as well.

What will hapeen when I get data from the table "a"?

# SELECT t FROM a;

           t            
------------------------
 This is table szymon.a
(1 row)

Suddenly PostgreSQL selects data from other table than at the beginning. The reason of this is the schema search mechanism. There is a PostgreSQL environment variable "search_path". If you set the value of this variable to "x,a,public" then PostgreSQL will look for all the tables, types and function names in the schema "x". If there is no such table in this schema, then it will look for this table in the next schema, which is "a" in this example.

What's the defualt value of the search_path variable? You can check the current value of this variable with the following query:

# show search_path;

  search_path   
----------------
 "$user",public
(1 row)

The default search path makes PostgreSQL search first in the schema named exactly as the user name you used for logging into database. If the user name is different from the schema names, or there is no table "szymon.a" then there would be used the "public.a" table.

The problem is even more tricky, even using simple EXPLAIN doesn't help, as it shows only table name omitting the schema name. So the plan for this query looks exactly the same, regardless of the schema used:

# EXPLAIN SELECT * FROM a;
                      QUERY PLAN                      
------------------------------------------------------
 Seq Scan on a  (cost=0.00..1.01 rows=1 width=32)
(1 row)

For plan with more information you should use EXPLAIN VERBOSE, then you will have the plan with schema name, so it will be easier to spot the usage of different schema:

# EXPLAIN VERBOSE SELECT * FROM a;
                         QUERY PLAN                          
-------------------------------------------------------------
 Seq Scan on szymon.a  (cost=0.00..1.01 rows=1 width=32)
   Output: t
(2 rows)

Piggybak on Heroku

Several weeks ago, we were contacted through our website with a request for Heroku support on Piggybak. Piggybak is an open source Ruby on Rails ecommerce platform developed and maintained by End Point. Piggybak is similar to many other Rails gems in that it can be installed from Rubygems in any Rails application, and Heroku understands this requirement from the application’s Gemfile. This is a brief tutorial for getting a Rails application up and running with Piggybak. For the purpose of this tutorial, I’ll be using the existing Piggybak demo for deployment, instead of creating a Rails application from scratch.

a) First, clone the existing Piggybak demo. This will be your base application. On your development machine (local or other), you must run bundle install to get all the application’s dependencies.

b) Next, add config.assets.initialize_on_precompile = false to config/application.rb to allow your assets to be compiled without requiring creating a local database.

c) Next, compile the assets according to this Heroku article with the command RAILS_ENV=production bundle exec rake assets:precompile. This will generate all the application assets into the public/assets/ directory.

d) Next, add the assets to the repo by removing public/assets/ from .gitignore and committing all modified files. Heroku’s disk read-only limitation prohibits you from writing public/assets/ files on the fly, so this is a necessary step for Heroku deployment. It is not necessary for standard Rails deployments.

e) Next, assuming you have a Heroku account and have installed the Heroku toolbelt, run heroku create to create a new Heroku application.

f) Next, run git push heroku master to push your application to your new Heroku application. This will push the code and install the required dependencies in Heroku.

g) Next, run heroku pg:psql, followed by \i sample.psql to load the sample data to the Heroku application.

h) Finally, run heroku restart to restart your application. You can access your application through a browser by running heroku open.

That should be it. From there, you can manipulate and modify the demo to experiment with Piggybak functionality. The major difference between Heroku deployment and standard deployment is that all your compiled assets must be in the repository because Heroku cannot write them out on the fly. If you plan to deploy the application elsewhere, you will have to make modifications to the repository regarding public/assets.

A full set of commands for this tutorial includes:

# Clone and set up the demo app
git clone git://github.com/piggybak/demo.git
bundle install
# add config.assets.initialize_on_precompile = false
# to config/application.rb

# Precompile assets and add to repository
RAILS_ENV=production bundle exec rake assets:precompile
# edit .gitignore here to stop ignoring public/assets/
git add .
git commit -m "Heroku support commit."

# Deploy to Heroku
heroku create
git push heroku master
heroku pg:psql
>> \i sample.psql
heroku restart
heroku open

Postgres alter column problems and solutions


Image from Flickr user ell brown

A common situation for database-backed applications is the need to change the attributes of a column. One can change the data type, or more commonly, only the size limitation, e.g. VARCHAR(32) gets changed to VARCHAR(42). There are a few ways to accomplish this in PostgreSQL, from a straightforward ALTER COLUMN, to replacing VARCHAR with TEXT (plus a table constraint), to some advanced system catalog hacking.

The most common example of such a change is expanding a VARCHAR declaration to allow more characters. For example, your "checksum" column was based on MD5 (at 32 characters), and now needs to be based on Keccak (Keccak is pronounced "catch-ack") (at 64 characters) In other words, you need a column in your table to change from VARCHAR(32) to VARCHAR(64). The canonical approach is to do this:

ALTER TABLE foobar ALTER COLUMN checksum TYPE VARCHAR(64);

This approach works fine, but it has two huge and interrelated problems: locking and time. This approach locks the table for as long as the command takes to run. And by lock, we are talking a heavy 'access exclusive' lock which shuts everything else out of the table. If your table is small, this is not an issue. If your table has a lot of data, however, this brings us to the second issue: table rewrite. The above command will cause Postgres to rewrite every single row of the table, which can be a very expensive operation (both in terms of disk I/O and wall clock time). So, a simple ALTER COLUMN solution usually comes at a very high cost for large tables. Luckily, there are workarounds for this problem.

First, some good news: as of version 9.2, there are many operations that will no longer require a full table rewrite. Going from VARCHAR(32) to VARCHAR(64) is one of those operations! Thus, if you are lucky enough to be using version 9.2 or higher of Postgres, you can simply run the ALTER TABLE and have it return almost instantly. From the release notes:

Reduce need to rebuild tables and indexes for certain ALTER TABLE ... ALTER COLUMN TYPE operations (Noah Misch)

Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite. Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite. Table rewrites are also avoided in similar cases involving the interval, timestamp, and timestamptz types.

However, if you are not yet on version 9.2, or are making an operation not covered above (such as shrinking the size limit of a VARCHAR), your only option to avoid a full table rewrite is the system catalog change below. However, before you jump down there, consider a different option: abandoning VARCHAR altogether.

In the Postgres world, there are few differences between the VARCHAR and TEXT data types. The latter can be thought of as an unbounded VARCHAR, or if you like, a VARCHAR(999999999999). You may also add a check constraint to a table to emulate the limit of a VARCHAR. For example, to convert a VARCHAR(32) column named "checksum" to a TEXT column:

ALTER TABLE foobar ALTER COLUMN checksum TYPE text;
ALTER TABLE foobar ADD CONSTRAINT checksum_length
  CHECK (LENGTH(checksum) <= 32);

The data type change suffers from the same full table rewrite problem as before, but if you are using version 9.1 or newer of Postgres, the change from VARCHAR to TEXT does not do a table rewrite. The creation of the check constraint, however, will scan all of the existing table rows to make sure they meet the condition. While not as costly as a full table rewrite, scanning every single row in a large table will still be expensive. Luckily, version 9.2 of Postgres comes to the rescue again with the addition of the NOT VALID phrase to the check constraint clause. Thus, in newer versions you can avoid the scan entirely by writing:

ALTER TABLE foobar ADD CONSTRAINT checksum_length
  CHECK (LENGTH(checksum) <= 32) NOT VALID;

This is a one-time exception for the constraint, and only applies as the constraint is being created. In other words, despite the name, the constraint is very much valid after it is created. If you want to validate all the rows that you skipped at a later time, you can use the ALTER TABLE .. VALIDATE CONSTRAINT command. This has the double advantage of allowing the check to be delayed until a better time, and taking a much lighter lock on the table than the ALTER TABLE .. ADD CONSTRAINT does.

So why would you go through the trouble of switching from your VARCHAR(32) to a TEXT column with a CHECK constraint? There are at least three good reasons.

First, if you are running Postgres 9.2 or better, this means you can change the constraint requirements on the fly, without a table scan - even for the 'non-optimal' situations such as going from 64 characters down to 32. Just drop the old constraint, and add a new one with the NOT VALID clause thrown on it.

Second, the check constraint gives a better error message, and a clearer indication that the limitation was constructed with some thought behind it. Compare these messages:

postgres=# CREATE TABLE river( checksum VARCHAR(4) );
CREATE TABLE

postgres=# INSERT INTO river VALUES ('abcde');
ERROR:  value too long for type character varying(4)

postgres=# CREATE TABLE river( checksum TEXT,
postgres-#   CONSTRAINT checksum_length CHECK (LENGTH(checksum) <= 4) );
CREATE TABLE

postgres=# INSERT INTO river VALUES ('abcde');
ERROR:  new row for relation "river" violates check constraint "checksum_length"
DETAIL:  Failing row contains (abcde).

Third, and most important, you are no longer limited to a single column attribute (maximum length). You can use the constraint to check for many other things as well: minimum size, actual content, regex matching, you name it. As a good example, if we are are truly storing checksums, we probably want the hexadecimal Keccak checksums to be *exactly* 64 characters, and not just a maximum length of 64 characters. So, to illustrate the above point about switching constraints on the fly, you could change the VARCHAR(32) to a TEXT and enforce a strict 64 character limit with:

BEGIN;

ALTER TABLE foobar DROP CONSTRAINT checksum_length;

ALTER TABLE foobar ADD CONSTRAINT checksum_length
  CHECK (LENGTH(checksum) = 64) NOT VALID;

COMMIT;

We just introduced a minimum *and* a maximum, something old VARCHAR could not do. We can constrain it further, as we should only be allowing hexadecimal characters to be stored. Thus, we can also reject and characters other than 0123456789abcdef from being added:

BEGIN;

ALTER TABLE foobar DROP CONSTRAINT checksum_length;

ALTER TABLE foobar ADD CONSTRAINT checksum_valid
  CHECK ( LENGTH(checksum) = 64 AND checksum ~ '^[a-f0-9]*$' ) NOT VALID;

COMMIT;

Since we have already added a regex check, we can reduce the size of the CHECK with a small hit in clarity like so:

BEGIN;

ALTER TABLE foobar DROP CONSTRAINT checksum_length;

ALTER TABLE foobar ADD CONSTRAINT checksum_valid
  CHECK ( checksum ~ '^[a-f0-9]{64}$' ) NOT VALID;

COMMIT;

Image from Flickr user loozrboy

Back to the other problem, however: how can we avoid a table rewrite when going from VARCHAR(64) to VARCHAR(32), or when stuck on an older version of Postgres that always insists on a table rewrite? The answer is the system catalogs. Please note that any updating to the system catalogs should be done very, very carefully. This is one of the few types of update I will publicly mention and condone. Do not apply this lesson to any other system table or column, as there may be serious unintended consequences.

So, what does it mean to have VARCHAR(32) vs. VARCHAR(64)? As it turns out, there is no difference in the way the actual table data is written. The length limit of a VARCHAR is simply an implicit check constraint, after all, and as such, it is quite easy to change.

Let's create a table and look at some of the important fields in the system table pg_attribute. In these examples we will use Postgres 8.4, but other versions should look very similar - this part of the system catalog rarely changes.

postgres=# CREATE TABLE foobar ( checksum VARCHAR(32) );
CREATE TABLE

postgres=# \x
Expanded display is on.

postgres=# SELECT attname, atttypid::regtype, atttypmod FROM pg_attribute 
postgres=#  WHERE attrelid = 'foobar'::regclass AND attname = 'checksum';
-[ RECORD 1 ]----------------
attname   | checksum
atttypid  | character varying
atttypmod | 36

The important column is atttypmod. It indicates the legal length of this varchar column (whose full legal name is 'character varying', but everyone calls it varchar). In the case of Postgres, there is also 4 characters of overhead. So VARCHAR(32) shows up as 36 in the atttypmod column. Thus, if we want to change it to a VARCHAR(64), we add 4 to 64 and get a number of 68. Before we do this change, however, we need to make sure that nothing else will be affected. There are other dependencies to consider, such as views and foreign keys, that you need to keep in mind before making this change. What you should do is carefully check all the dependencies this table has:

postgres=# SELECT c.relname||':'||objid AS dependency, deptype
postgres-#   FROM pg_depend d JOIN pg_class c ON (c.oid=d.classid)
postgres-#   WHERE refobjid = 'foobar'::regclass;
  dependency   | deptype 
---------------+---------
 pg_type:16419 | i

We can see in the above that the only dependency is an entry in the pg_type table - which is a normal thing for all tables and will not cause any issues. Any other entries, however, should give you pause before doing a manual update of pg_attribute. You can use the information returned by the first column of the above query to see exactly what is referencing the table. For example, let's make that column unique, as well as adding a view that uses the table, and then see the effects on the pg_depend table:

postgres=# CREATE UNIQUE INDEX jack ON foobar(checksum);
CREATE INDEX

postgres=# CREATE VIEW martha AS SELECT * FROM foobar;
CREATE VIEW

postgres=# SELECT c.relname||':'||objid AS dependency, deptype
postgres-#   FROM pg_depend d JOIN pg_class c ON (c.oid=d.classid)
postgres-#   WHERE refobjid = 'foobar'::regclass;
   dependency     | deptype 
------------------+---------
 pg_type:16419    | i
 pg_class:16420   | a
 pg_rewrite:16424 | n

The 'i', 'a', and 'n' stand for internal, auto, and normal. They are not too important in this context, but more details can be found in the docs on the pg_depend table. The first column shows us the system table and oid of the dependency, so we can look them up and see what they are:

postgres=# SELECT typname FROM pg_type WHERE oid = 16419;
 typname 
---------
 foobar

postgres=# SELECT relname, relkind FROM pg_class WHERE oid = 16420;
 relname | relkind 
---------+---------
 jack    | i

-- Views require a little redirection as they are implemented via the rules system
postgres=# SELECT relname,relkind FROM pg_class WHERE oid = 
postgres-#   (SELECT ev_class FROM pg_rewrite WHERE oid = 16424);
 relname | relkind 
---------+---------
 martha  | v

postgres=# \d martha
              View "public.martha"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(32) | 
View definition:
 SELECT foobar.checksum
   FROM foobar;

So what does all that tell us? It tells us we should look carefully at the index and the view to make sure they will not be affected by the change. In this case, a simple index on the column will not be affected by changing the length, so it (along with the pg_type entry) can be ignored. The view, however, should be recreated so that it records the actual column size.

We are now ready to make the actual change. This would be an excellent time to make a backup of your database. This procedure should be done very carefully - if you are unsure about any of the entries in pg_depend, do not proceed.

First, we are going to start a transaction, lock the table, and drop the view. Then we are going to change the length of the varchar directly, recreate the view, and commit! Here we go:

postgres=# SELECT c.relname||':'||objid AS dependency, deptype
postgres-#   FROM pg_depend d JOIN pg_class c ON (c.oid=d.classid)
postgres-#   WHERE refobjid = 'foobar'::regclass;
   dependency     | deptype 
------------------+---------
 pg_type:16419    | i
 pg_class:16420   | a
 pg_rewrite:16424 | n

postgres=# \d foobar
            Table "public.foobar"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(32) | 
Indexes:
    "jack" UNIQUE, btree (checksum)

postgres=# \d martha
            View "public.martha"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(32) | 
View definition:
 SELECT foobar.checksum
   FROM foobar;
postgres=# BEGIN;
BEGIN

postgres=# DROP VIEW martha;
DROP VIEW

postgres=# LOCK TABLE pg_attribute IN EXCLUSIVE MODE;
LOCK TABLE

postgres=# UPDATE pg_attribute SET atttypmod = 68
postgres-#   WHERE attrelid = 'foobar'::regclass AND attname = 'checksum';
UPDATE 1

postgres=# COMMIT;
COMMIT

Verify the changes and check out the pg_depend entries:

postgres=# \d foobar
            Table "public.foobar"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(64) | 
Indexes:
    "jack" UNIQUE, btree (checksum)

postgres=# CREATE VIEW martha AS SELECT * FROM foobar;
CREATE VIEW

postgres=# \d martha
            View "public.martha"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(64) | 
View definition:
 SELECT foobar.checksum
   FROM foobar;

postgres=# SELECT c.relname||':'||objid AS dependency, deptype
postgres-#   FROM pg_depend d JOIN pg_class c ON (c.oid=d.classid)
postgres-#   WHERE refobjid = 'foobar'::regclass;
    dependency    | deptype 
------------------+---------
 pg_type:16419    | i
 pg_class:16420   | a
 pg_rewrite:16428 | n

Success. Both the table and the view are showing the new VARCHAR size, but the data in the table was not rewritten. Note how the final row returned by the pg_depend query changed: we dropped the view and created a new one, resulting in a new row in both pg_class and pg_rewrite, and thus a new OID shown in the pg_rewrite table.

Hopefully this is not something you ever have to perform. The new features of 9.1 and 9.2 that prevent table rewrites and table scanning should go a long way towards that.

Using cec-client to Control HDMI Devices

Maintaining the horde of computers it takes to run Liquid Galaxies in all corners of the globe is a big job. As of November of 2012, we're monitoring 154 computers at permanent installations in addition to keeping our development and testing systems running like the well-oiled machines we want them to be. All that fluff aside, end users never see the fruits of our labor unless the TV's are working as expected! Without methods for getting and setting the status of displays, we are completely blind to what people are actually experiencing in front of a Liquid Galaxy.

Enter HDMI-CEC. CEC is a protocol that allows HDMI-connected devices to control each other in various ways. It has a set of standard features that make it easy for home users with a stack of DVD players or TiVos or other devices to change the active source, put everything on standby, control the volume, and some other handy tricks.

We typically use Samsung TV's which support CEC under the trade name "Anynet+". To interface between computers and TV's, we use Pulse Eight's USB - CEC adapters which, in conjunction with libCEC, give us a command line interface for arbitrary commands to the TV.

libCEC is available on apt for Ubuntu users:

$ sudo apt-get install cec-utils

Once installed, we have access to all kinds of fun commands. CEC-O-Matic is a great reference, and will even build command strings for you! Just bear in mind that CEC-O-Matic output has colons separating the octets, while cec-client's "tx" expects spaces.

The syntax of a libCEC "tx" command is like this:

$ echo 'tx <src-dst> <cmd> <args...>' | cec-client <device>

The first octet of the command will be the source and destination. The P8 CEC adapter uses device 1 by default, which is "Recording 1", and a TV is always 0, so when querying the TV our first octet will be "10".

The second octet is a command code. Let's say we want to know what language the TV's menu is set to. On CEC-O-Matic you can find this in the "Supporting Features" tab, "System Information" section, "Get Menu Language," which indicates that the message ID is "91".

Arguments are situational, and many commands will not require any arguments. We'll talk about arguments soon.

The device can be found with the ever-handy:

$ cec-client -l
Found devices: 1

device:              1
com port:            /dev/ttyACM0
firmware version:    2
firmware build date: Thu Aug  2 09:40:28 2012 +0000
type:                Pulse-Eight USB-CEC Adapter

We want to use that COM port, "/dev/ttyACM0".

For example, let's query the menu language of the TV connected to the first CEC adapter.

$ echo 'tx 10 91' | cec-client /dev/ttyACM0
[ . . . ]
DEBUG:   [             492] << requesting power status of 'TV' (0)
TRAFFIC: [             492] << 10:8f
TRAFFIC: [             638] >> 01:90:00
DEBUG:   [             638] >> TV (0) -> Recorder 1 (1): report power status (90)
DEBUG:   [             638] TV (0): power status changed from 'unknown' to 'on'
DEBUG:   [             638] expected response received (90: report power status)
waiting for input
TRAFFIC: [             639] << 10:91
TRAFFIC: [             842] >> 0f:32:65:6e:67
DEBUG:   [             842] >> TV (0) -> Broadcast (F): set menu language (32)
DEBUG:   [             842] TV (0): menu language set to 'eng'

Here you see that, after a bunch of other cruft is resolved, the adapter requests the power status of the TV (10:8f) and the TV reports that is is 'on' (01:90:00). This seems to be the first action of any query. Now, since we received an expected response, it sends the menu language query (10:91). In response the TV sends "0f:32:65:6e:67". What does this mean?

The first octet "0f" means the source is the TV (0) and the destination is "broadcast to all devices in the signal path" (f). The second octet, "32", is the command code for "Set Menu Language". The next three octets "65:6e:67" are ASCII code for "eng" which is the ISO 639-2 Code for "English." So, instead of just telling us what the menu language is, the TV is responding by setting all devices to its language.

In the next two DEBUG lines it reports what I just explained.

At this point, cec-client is idling. Use Ctrl-C to end the process gracefully.

What if we want to change the power setting of the TV? In this case, libCEC has built-in commands to make it a little more intuitive.

$ echo 'standby 0' | cec-client -s /dev/ttyACM0
$ echo 'on 0' | cec-client -s /dev/ttyACM0

The first command will put the TV on standby, the second will turn it on. In this case, we don't need to specify the command source, only the destination of TV (0). Also notice that we used the -s argument to cec-client and it exited as soon as it sent the command. -s is short for --single-command, which sends a command and then exits. We didn't use -s in the above menu language query because it causes cec-client to exit before we get the response back from the TV! For automated cec-client commands, such as automatic screen sleeping at night, -s is quite useful since the process doesn't 'hang' after execution.

These are the basics of cec-client. For more information on the app itself, you can consult the cec-client manpage or visit the libcec project on GitHub.

Happy hacking!

Getting Started with the Perl Debugger

The Perl debugger is not an easy system to leap into unprepared, especially if you learned to program in the "modern era", with fancy, helpful GUIs and other such things.

So, for those of us who are old school, and those who aren't but wondering what the fuss is about, here's a very gentle introduction to debugging Perl code interactively.

First, an aside. You may think to yourself, "Hey, command-line scripting is passé; nobody does that any more. All my code runs within a website (maybe a modern MVC framework), so how can I make use of a command-line debugger?"

Well, that's where test-driven development and its related methodologies come in. If you have developed using a Perl test framework, you can use the approach outlined here.

The debugger is invoked by using the "-d" switch on the command line. You control the execution of your code with the "n" command:

$ perl -d debug.pl Loading DB routines from perl5db.pl version 1.33 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(debug.pl:1): my $x = 1; DB<1> n main::(debug.pl:2): $x = $x + 1;

"n" steps you through the code from line to line. To step into a subroutine call, use "s", e.g.,

$ perl -d debug.pl ... main::(debug.pl:1): my $x = fx(1); DB<1> s main::fx(debug.pl:4): return;

You can switch back and forth between step-over ("n") and step-into ("s") mode; just issue the command you want to use.

Next, let's talk about breakpoints. These are places in the code where you'd like the execution to stop so you can look around and take stock.

You can issue a "temporary" breakpoint with the "c" command:

main::(debug.pl:1): ... DB<1> c 23 main::(debug.pl:23): ...

You can set a permanent breakpoint with the "b" command:

main::(debug.pl:1): ... DB<1> b 23 DB<2> c main::(debug.pl:23): ... DB<2> c main::(debug.pl:23): ...

And note how the "c" command is used to mean "run until breakpoint (or exit)".

If all you could do with the Perl debugger was step through your program, that would be enough. (You could use "print" statements to see what was going on, but it would be awkward to go back and forth between the debugger and your editor.) Of course, we can do more:

main::(debug.pl:23): $x = get_complex_data_structure($arg); DB<1> x $x 0 HASH(0x1e123c8) 'a' => 1 'b' => 2 DB<2> x [1, 2, { b => sin(0.5) } ] 0 ARRAY(0x1e8e7c0) 0 1 1 2 2 HASH(0x19ff298) 'b' => 0.479425538604203

You can even evaluate complex expressions on the fly! Or invoke your code directly:

DB<1> x get_complex_data_structure($arg) 0 HASH(0x1e123c8) 'a' => 1 'b' => 2

Or set a breakpoint within your code, then invoke it:

DB<1> b My::Pkg::_routine DB<2> x get_complex_data_structure($arg) My::Pkg::_routine(My/Pkg.pm:99): ...

I hope this brief introduction whetted your appetite for the debugger. It's a powerful system for exploring unfamiliar code, verifying assumptions about data structures, or tracking down bugs. There are many more debugger commands than I've outlined here, but this should get you started.

Happy debugging!

cPanel no-pty ssh noise removal

We commonly use non-interactive ssh for automation of various tasks. This usually involves setting BatchMode=yes in the ~/.ssh/config file or the no-pty option in the ~/.ssh/authorized_keys file, and stops a tty from being assigned for the ssh session so that a job will not wait for interactive input in unexpected places.

When using a RHEL 5 Linux server that has been modified by cPanel, ssh sessions display “stdin: is not a tty” on stderr. For ad-hoc tasks this is merely an annoyance, but for jobs run from cron it means an email is sent because cron didn’t see an empty result from the job and wants an administrator to review the output.

You could quell all output from ssh, but then if any legitimate errors or warnings were sent, you won’t see those. So that is not ideal.

Using bash’s set -v option to trace commands being run on the cPanel server we found that they had modified Red Hat’s stock /etc/bashrc file and added this line:

mesg y

That writes a warning to stderr when there’s no tty because mesg doesn’t make sense in non-interactive environments.

The solution is simple, since we don’t care to hear that warning. We edit that line like this:

mesg y 2>/dev/null

This tip that may only be useful to one or two people ever, if even that many. I hope they enjoy it. :)

Piggybak: Roadmap Status Update

About a month ago, I shared an outline of the current Piggybak Roadmap. Piggybak is an open-source Ruby on Rails ecommerce platform created and maintained by End Point. It is developed as a Rails Engine and is intended to be mounted on an existing Rails application. Over the last month, Tim and I have been busy at work building out features in Piggybak, and completing refactoring that opens the door for better extension and feature development. Here's a summary of the changes that we've finished up.

Real-time Shipping Lookup

One of our Piggybak clients already had integrated USPS and UPS shipping, but we decided to extract this and combine it with FedEx, to offer real-time shipping lookup shipping in Piggybak. This extension leverages Shopify's open-source ActiveShipping Ruby gem. When you are ready to get your Piggybak store up and running, you can include this new extension and configure USPS, UPS, and FedEx real-time shipping lookup immediately.

Installer process

Tim updated the installation process to be more streamlined. The previous installation process was a bit crufty and required changes to your Gemfile, routes, layouts, and precompiled assets. Tim described the installation work in this article.

Rename Variant to Sellable

A minor but notable change that happened in the last month was the change of "variant" to "sellable". Any model in a Rails application, now can be extended with the class method acts_as_sellable, which will allow that item to be managed as a sellable item and be a purchaseable item.

Variants Extension

Tied directly to the variant to sellable change, we developed a new extension to provide advanced variant support in Piggybak. The advanced variant data model has similarities to Spree's data model, one that we have observed as a successful feature of Spree. The basic principles are that you assign specific options to sellable items (e.g. size and color), and then you assign option values to those options (e.g. red and blue for size, large and small for color). Then, for each sellable item, you can define many variants each with a different combination of options, each with a unique sku, quantity on hand, cart description, and price. The user sees these options on the product detail page, and selects option values to add items to the cart.


Advanced production optioning support in Piggybak: In this screenshot, options for frame size and frame finish are provided.
Each variant has individual pricing, quantity on hand, and a description in the cart.

Line Item Rearchitecture

I also spent a good amount of time rearchitecting line item associations to orders, where a line item now represents all monetary items in an order (sellable, payment, tax item, shipment, etc.). This results in a more simplified order total and balance due calculation, as well as allows for extensions to introduce custom line items that are included in order calculations without order processing code changes. This significant change is described in this article.

Piggybak Coupons

The line item rearchitecture work was done in tandem with development of a Piggybak coupon extension. The extension includes support for defining discount type (percent, dollar, or free shipping), discount amount (for percent and dollar), minimum cart total, expiration date, allowed number of uses.


Coupon support in Piggybak: Coupon application on the checkout happens via AJAX and
is displayed in the order totals calculations shown in the screenshot.

Gift Certificate Support

Finally, one of the recent extensions completed was development of a gift certificate extension. A gift certificate can be purchased at various increments and applied on the checkout page to an order via an AJAX call. Gift certificates may also be purchased and redeemed in the Piggybak admin.


Gift Certificate support in Piggybak: Gift certificate application on the checkout happens via AJAX and is displayed in the order totals calculations shown in the screenshot. In this case, the gift certificate covers the entire order.

Minor Bug Fixes, Refactoring and Feature Development

Several bug fixes and minor refactoring was applied during development of these features, including but not limited to:

  • attr_accessible updates to support Rails 3 mass assignment attributes
  • Improved inventory management on the admin side
  • Minor refactoring to introduce Proxy Association extensions
  • Removal of jeweler, and move to standard Rails engine architecture
  • Added functionality to support copying a billing address to shipping address in admin.
  • Added logic to enforce one payment method be added at a time via admin.
  • JavaScript-based validation on the checkout.
  • Stripe payment gateway support via an extension.

What's Next?

If we take a look the roadmap list a month ago, we can cross several items off from the list:

  • Realtime Shipping with USPS, UPS, and Fedex support
  • Improvement of Piggybak installation process
  • Advanced Product Optioning Support
  • Line Item Rearchitecture to support future work on Gift Certificates, Discounts
  • Gift Certificate, Discount Support
  • Advanced Taxonomy
  • Reviews & Ratings

A few new things have recently been added to the list:

  • Add SSL support in core
  • Create Heroku deployment tutorial
  • Saved cart, Wishlist support
  • Saved address support

Our goal for the immediate future is to focus on development of the most common ecommerce features.

All of the features described in this article are active on the Piggybak demo. Check it out now! And of course, contact End Point today if you'd like to get started on a Ruby on Rails Piggybak project!

How to Make PostgreSQL Query Slow

Some applications can be very vulnerable to long running queries. When you test an application, sometimes it is good to have a query running for, let's say, 10 minutes. What's more it should be a normal query, so the application can get the normal results, however this query should run for some longer time than usual.

PostgreSQL has quite a nice function pg_sleep which takes exactly one parameter, it is the number of seconds this function will wait before returning. You can use it as a normal PostgreSQL function, however it's not very sensible:

# SELECT pg_sleep(10);

 pg_sleep 
----------
 
(1 row)

Time: 10072.794 ms

The most interesting usage is adding this function into a query. Let's take this query:

# SELECT schemaname, tablename 
  FROM pg_tables 
  WHERE schemaname <> 'pg_catalog';

Time: 0.985 ms

As you can see, this query is quite fast and returns data in less than 1 ms. Let's now make this query much slower, however returning exactly the same data, but after 15 seconds:

# SELECT schemaname, tablename 
  FROM pg_tables, pg_sleep(15) 
  WHERE schemaname <> 'pg_catalog';

Time: 15002.084 ms

In fact the query execution time is a little bit longer, the pg_sleep function was waiting 15 seconds, but PostgreSQL had to spend some time on query parsing, execution and returning proper data.

I was using this solution many times to simulate a long running query, without changing the application logic, to check how the application behaves during some load peaks.

How to Build a Command Line Executable Installer with Rubygems and Thor

Gems for Rails often need the user to do something more for installation than just adding the gem to a Gemfile and running bundler install. Sometimes it's a simple matter of copying over some migration files and sometimes it's just setting up a config file, and most of the time these simple installation steps are best handled with a well written installation section in the README file. When the installation process is more complex a long README might not be so enticing to the potential gem user, in a world where everyone has a finger on the back button it's nice to be able to create an installer that allows the user to complete complex installation tasks by executing a one liner and that's where an installer made through Gem executables and Thor can come in handy.

We wanted to make it easier for new users of Piggybak to get started and decided that an installer was the best way to do that. Creating a binary installer that is installed by Rubygems is one of those esoteric things that may not be thought of as one of the core strengths of Rubygems and Rails but it's a bonus to be able to do something like this without a whole lot of fuss.

Creating an installer with Rubygems, and Thor:

  1. In your Rails app, create a file in your lib directory that inherits from Thor, this file will house all of your command line actions. Thor is already included as a part of Rails so you don't need to add it to your Gemfile.
  2. Inside your Thor subclass, define methods which will in turn become invokable actions from the command line. Installers usually need to copy files around and execute commands, Thor provides a library the covers the most common cases which can be added to your class by including Thor::Actions (A list of the included actions). Have a look at the Piggybak installer class, and you'll see that the Thor actions are not too complicated to understand.
  3. Create a bin directory in your Rails directory that will be used to start your Thor class, add a file with the name of your executable which starts your Thor class (details below)
  4. Add an "executables" entry for the file in your bin directory to your gemspec file

Add a file to your bin folder than starts Thor

The code below shows the file located in the bin directory and it could act as a template for your own executable. The things to note are the inclusion of the ruby shebang, and the requiring of the piggybak cli class. Finally at the end of the file the start method is sent to the Thor class.

#!/usr/bin/env ruby

require 'rubygems'

begin
  require 'piggybak/cli'
rescue LoadError => e
warn 'Could not load "piggybak/cli"'
  exit -1
end

Piggybak::CLI.start

Add an entry to your gemspec for the executable

Rubygems expects the executable to be in a directory called bin which is in the same directory as the gemspec, if you want to place the executable in a different location you'll need to specify that in your gemspec with a "bindir" entry. (Check the Rubygem docs for a more detailed explanation.)

spec.executables << 'piggybak'

Once that's done your gem is ready to go and can be included inside the Gemfile of a Rails app. When the gem is installed, Rubygems will place a file in your Ruby bin directory that can be invoked via the command line.

$ piggybak install