End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Association Extensions in Rails for Piggybak

I recently had a problem with Rails named scopes while working on minor refactoring in Piggybak, an open source Ruby on Rails ecommerce platform that End Point created and maintains. The problem was that I found that named scopes were not returning uncommitted or new records. Named scopes allow you to specify ActiveRecord query conditions and can be combined with joins and includes to query associated data. For example, based on recent line item rearchitecture, I wanted order.line_items.sellables, order.line_items.taxes, order.line_items.shipments to return all line items where line_item_type was sellable, tax, or shipment, respectively. With named scopes, this might look like:

class Piggybak::LineItem < ActiveRecord::Base
    scope :sellables, where(:line_item_type => "sellable")
    scope :taxes, where(:line_item_type => "tax")
    scope :shipments, where(:line_item_type => "payment")
    scope :payments, where(:line_item_type => "payment")
  end

However, while processing an order, any uncommited or new records would not be returned when using these named scopes. To work around this, I added the Enumerable select method to iterate over the line items, e.g.:

# Reviewing shipments in an order
order.line_items.select { |li| li.line_item_type == "shipment" }.all? { |s| s.shipment.status == "shipped" }

# Get number of new payments
order.line_items.select { |li| li.new_record? && li.line_item_type == "payment" }.size

Association Extensions

I felt that the above workaround was crufty and not very readable and sent out a request to my coworkers in hopes that there was a solution for improving the readability and clarity of the code. Kamil confirmed that named scopes do not return uncommitted records, and Tim offered an alternative solution by suggesting association extensions. An association extension allows you to add new finders, creators or methods that are only used as part of the association. After some investigation, I settled on the following code to extend the line_items association:

class Piggybak::Order < ActiveRecord::Base
  has_many :line_items, do 
    def sellables
      proxy_association.proxy.select { |li| li.ilne_item_type == "sellable" }
    end
    def taxes
      proxy_association.proxy.select { |li| li.ilne_item_type == "tax" }
    end
    def shipments
      proxy_association.proxy.select { |li| li.ilne_item_type == "shipment" }
    end
    def payments
      proxy_association.proxy.select { |li| li.ilne_item_type == "payment" }
    end
  end
end

The above code allows us to call order.line_items.sellables, order.line_items.taxes, order.line_items.shipments, and order.line_items.payments, which will return all new and existing line item records. These custom finder methods are used during order preprocessing which occurs during the ActiveRecord before_save callback before an order is finalized.

Dynamic Creation

Of course, the Piggybak code takes this a step further because additional custom line item types can be added to the code via Piggybak extensions (e.g. coupons, gift certificates, adjustments). To address this, association extensions are created dynamically in the Piggybak engine instantiation:

Piggybak::Order.class_eval do
  has_many :line_items, do
    Piggybak.config.line_item_types.each do |k, v|
      # k is sellable, tax, shipment, payment, etc.
      define_method "#{k.to_s.pluralize}" do
        proxy_association.proxy.select { |li| li.line_item_type == "#{k}" }
      end
    end
  end
end

Conclusion

The disadvantage to association extensions versus named scopes are that association extensions are not chainable, which means you cannot add methods to the association extension. For example, a named scope may allow you to query order.line_items.sellables.price_greater_than_50 to return committed line items with a price greater than 50, but this functionality would not be possible with association extensions. This is not a limitation in the current code base, but it may become a limitation in the future.

PostgreSQL auto_explain Module

PostgreSQL has many nice additional modules, usually hidden and not enabled by default. One of them is auto_explain, which can be very helpful for bad query plan reviews. Autoexplain allows for automatic logging of query plans, according to the module's configuration.

This module is very useful for testing. Due to some ORM features, it is hard to repeat exactly the same queries with exactly the same parameters as ORMs do. Even without ORM, many applications make a lot of different queries depending on input data and it can be painful the repeat all the queries from logs. It's much easier to run the app and let it perform all the queries normally. The only change would be adding a couple of queries right after the application connects to the database.

At the beginning let's see how my logs look when I run "SELECT 1" query:

2012-10-24 14:55:09.937 CEST 5087e52d.22da 1 [unknown]@[unknown] LOG:  connection received: host=127.0.0.1 port=33004
2012-10-24 14:55:09.947 CEST 5087e52d.22da 2 szymon@szymon LOG:  connection authorized: user=szymon database=szymon
2012-10-24 14:55:10.860 CEST 5087e52d.22da 3 szymon@szymon LOG:  statement: SELECT 1;
2012-10-24 14:55:10.860 CEST 5087e52d.22da 4 szymon@szymon LOG:  duration: 0.314 ms

Your logs can look a little bit different depending on your settings. The settings I use for logging on my development machine are:

log_destination = 'stderr'
logging_collector = on
log_directory = '/var/log/postgresql/'
log_filename = 'postgresql-9.1-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0666
log_rotation_age = 1d
log_rotation_size = 512MB
client_min_messages = notice
log_min_messages = notice
log_min_duration_statement = -1
log_connections = on
log_disconnections = on
log_duration = on
log_line_prefix = '%m %c %l %u@%d '
log_statement = 'all'

The main idea for the above logging configuration is to log all queries before execution, so when a query fails (e.g. because of Out Of Memory Error), it will be logged as well. The execution time won't be logged, but the query will.

Let's run the simple query SELECT 1/0; which should fail. Then the log entries look like:

2012-10-24 15:00:24.767 CEST 5087e52d.22da 5 szymon@szymon LOG:  statement: SELECT 1/0;
2012-10-24 15:00:24.823 CEST 5087e52d.22da 6 szymon@szymon ERROR:  division by zero
2012-10-24 15:00:24.823 CEST 5087e52d.22da 7 szymon@szymon STATEMENT:  SELECT 1/0;

Enabling it for whole PostgreSQL installation is not the best idea, I always enable it only for my session using the following query:

LOAD 'auto_explain';

Now we have to configure this plugin a little bit. The main thing is to set the minimum statement execution time to log, let's set this to 0, just to explain all queries:

SET auto_explain.log_min_duration = 0;

Now let's create a table for tests:

CREATE TABLE x(t text); 
INSERT INTO x(t) SELECT generate_series(1,10000);

The first query will be quite simple, let's just take the first ten rows.

SELECT t FROM x ORDER BY t LIMIT 10;
2012-10-24 16:21:34.102 CEST 5087f8f8.3fe6 16 szymon@szymon LOG:  statement: SELECT * FROM x ORDER BY t LIMIT 10;
2012-10-24 16:21:34.109 CEST 5087f8f8.3fe6 17 szymon@szymon LOG:  duration: 6.586 ms  plan:
 Query Text: SELECT * FROM x ORDER BY t LIMIT 10;
 Limit  (cost=361.10..361.12 rows=10 width=4)
   ->  Sort  (cost=361.10..386.10 rows=10000 width=4)
         Sort Key: t
         ->  Seq Scan on x  (cost=0.00..145.00 rows=10000 width=4)
2012-10-24 16:21:34.109 CEST 5087f8f8.3fe6 18 szymon@szymon LOG:  duration: 7.285 ms

Other things we can do with auto_explain module is to use EXPLAIN ANALYZE. First set the setting:

SET auto_explain.log_analyze = true;

Now PostgreSQL adds into logs the following lines:

2012-10-24 16:23:22.514 CEST 5087f8f8.3fe6 21 szymon@szymon LOG:  statement: SELECT * FROM x ORDER BY t LIMIT 10;
2012-10-24 16:23:22.522 CEST 5087f8f8.3fe6 22 szymon@szymon LOG:  duration: 8.248 ms  plan:
 Query Text: SELECT * FROM x ORDER BY t LIMIT 10;
 Limit  (cost=361.10..361.12 rows=10 width=4) (actual time=8.214..8.218 rows=10 loops=1)
   ->  Sort  (cost=361.10..386.10 rows=10000 width=4) (actual time=8.211..8.213 rows=10 loops=1)
         Sort Key: t
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on x  (cost=0.00..145.00 rows=10000 width=4) (actual time=0.032..2.663 rows=10000 loops=1)
2012-10-24 16:23:22.522 CEST 5087f8f8.3fe6 23 szymon@szymon LOG:  duration: 8.722 ms

There are other settings, but I didn't find them as useful as the above.

Auto_explain is a great module for testing, however it can be a little painful on production databases when you enable EXPLAIN ANALYZE, as all the queries will always be run with calculating the node plan times, regardless of the log_min_duration setting. So the query can be much slower than usually, even if its plan is not logged.

An Encouraging LinuxFest

A few weekends ago I gave a talk at Ohio LinuxFest: Yes, You Can Run Your Business On PostgreSQL. Next Question? (slides freshly posted.) The talk isn't as technically oriented as the ones I'll usually give, but rather more inspirational and encouraging. It seemed like a good and reasonable topic, centered around Postgres but applicable to open source in general, and it's something I'd been wanting to get out there for a while.

In a previous life I worked with Microsoft shops a bit more often. You know, companies that use Windows and related software pretty much exclusively. This talk was, more or less, a result of a number of conversations with those companies about open source software and why it's a valid option. I heard a number of arguments against, some reasonable, some pretty far out there, so it felt like it'd be a good thing to gather up all of those that I'd heard over time.

These days I don't interact with those companies so much, so I was a little worried at first that the landscape had changed enough that the talk wouldn't really be useful any more. But after talking with a few people around the conference a day or two before the talk, there's definitely some companies that don't see the value in open source technologies.

The slides are essentially a rough outline, but I tried to go back and add some of the spoken context. Anyway, enjoy, and hopefully it'll help you get the open source word out.

Postgres system triggers error: permission denied

This mystifying Postgres error popped up for one of my coworkers lately while using Ruby on Rails:

ERROR:  permission denied: "RI_ConstraintTrigger_16410" is a system trigger

On PostgreSQL version 9.2 and newer, the error may look like this:

ERROR:  permission denied: "RI_ConstraintTrigger_a_32778" is a system trigger

ERROR:  permission denied: "RI_ConstraintTrigger_c_32780" is a system trigger

I labelled this as mystifying because, while Postgres' error system is generally well designed and gives clear messages, this one stinks. A better one would be something similar to:

ERROR:  Cannot disable triggers on a table containing foreign keys unless superuser

As you can now guess, this error is caused by a non-superuser trying to disable triggers on a table that is used in a foreign key relationship, via the SQL command:

ALTER TABLE foobar DISABLE TRIGGERS ALL;

Because Postgres enforces foreign keys through the use of triggers, and because data integrity is very important to Postgres, one must be a superuser to perform such an action and bypass the foreign keys. (A superuser is a Postgres role that has "do anything" privileges). We'll look at an example of this in action, and then discuss solutions and workarounds.

Note that if you are not a superuser *and* you are not the owner of the table, you will get a much better error message when you try to disable all the triggers:

ERROR:  must be owner of relation foobar

To reproduce the original error, we will create two tables, and then link them together via a foreign key:

postgres=# create user alice;
CREATE ROLE

postgres=# \c postgres alice
You are now connected to database "postgres" as user "alice".

-- Verify that we are not a superuser
postgres=> select usename, usesuper from pg_user where usename = (select current_user);
 usename | usesuper 
---------+----------
 alice   | f

postgres=> create table foo(a int unique);
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "foo_a_key" for table "foo"
CREATE TABLE

postgres=> create table bar(b int);
CREATE TABLE

postgres=> alter table bar add constraint baz foreign key (b) references foo(a);
ALTER TABLE

Let's take a look at both tables, and then try to disable triggers on each one. Because the triggers enforcing the foreign key are internal, they will not show up when we do a \d:

postgres=> \d foo
      Table "public.foo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
Indexes:
    "foo_a_key" UNIQUE CONSTRAINT, btree (a)
Referenced by:
    TABLE "bar" CONSTRAINT "baz" FOREIGN KEY (b) REFERENCES foo(a)

postgres=> \d bar
      Table "public.bar"
 Column |  Type   | Modifiers 
--------+---------+-----------
 b      | integer | 
Foreign-key constraints:
    "baz" FOREIGN KEY (b) REFERENCES foo(a)


postgres=> alter table foo disable trigger all;
ERROR:  permission denied: "RI_ConstraintTrigger_41047" is a system trigger

postgres=> alter table bar disable trigger all;
ERROR:  permission denied: "RI_ConstraintTrigger_41049" is a system trigger

If we try the same thing as a superuser, we have no problem:

postgres=# \c postgres postgres
You are now connected to database "postgres" as user "postgres".

postgres=# select usename, usesuper from pg_user where usename = (select current_user);
 usename  | usesuper 
----------+----------
 postgres | t

postgres=# alter table foo disable trigger all;
ALTER TABLE

postgres=# alter table bar disable trigger all;
ALTER TABLE

-- Don't forget to re-enable the triggers!

postgres=# alter table foo enable trigger all;
ALTER TABLE

postgres=# alter table bar enable trigger all;
ALTER TABLE

So, this error has happened to you - now what? Well, it depends on exactly what you are trying to do, and how much control over your environment you have. If you are using Ruby on Rails, for example, you may not be able to change anything except the running user. As you may imagine, this is the most obvious solution: become a superuser and run the command, as in the example above.

If you do have the ability to run as a superuser however, it is usually much easier to adjust the session_replication_role. In short, this disables *all* triggers and rules, on all tables, until it is switched back again. Do NOT forget to switch it back again! Usage is like this:

postgres=# \c postgres postgres
You are now connected to database "postgres" as user "postgres".

postgres=# set session_replication_role to replica;
SET

-- Do what you need to do - triggers and rules will not fire!

postgres=# set session_replication_role to default;
SET

Note: while you can do "SET LOCAL" to limit the changes to the current transaction, I always feel safer to explicitly set it before and after the changes, rather than relying on the implicit change back via commit and rollback.

It may be that you are simply trying to disable one or more of the "normal" triggers that appear on the table. In which case, you can simply disable user triggers manually rather than use 'all':

postgres=# \c postgres alice
You are now connected to database "postgres" as user "alice".

postgres=> \d bar
      Table "public.bar"
 Column |  Type   | Modifiers 
--------+---------+-----------
 b      | integer | 
Foreign-key constraints:
    "baz" FOREIGN KEY (b) REFERENCES foo(a)
Triggers:
    trunk AFTER INSERT ON bar FOR EACH STATEMENT EXECUTE PROCEDURE funk()
    vupd BEFORE UPDATE ON bar FOR EACH ROW EXECUTE PROCEDURE verify_update();

postgres=> alter table bar disable trigger trunk;
ALTER TABLE

postgres=> alter table bar disable trigger vupd;
ALTER TABLE

-- Do what you need to do, then:

postgres=> alter table bar enable trigger trunk;
ALTER TABLE

postgres=> alter table bar enable trigger vupd;
ALTER TABLE

Another option for a regular user (in other words, a non super-user) is to remove the foreign key relationship yourself. You cannot disable the trigger, but you can drop the foreign key that created it in the first place. Of course, you have to add it back in as well:

postgres=# \c postgres alice
You are now connected to database "postgres" as user "alice".

postgres=> alter table bar drop constraint baz;
ALTER TABLE

-- Do what you need to do then:

postgres=> alter table bar add constraint baz foreign key (b) references foo(a);
ALTER TABLE

The final solution is to work around the problem. Do you really need to disable triggers on this table? Then you can simply not disable any triggers. Perhaps the action you are ultimately trying to do (e.g. update/delete/insert to the table) can be performed some other way.

All of these solutions have their advantages and disadvantages. And that's what charts are good for!:

Permission denied: "RI_ConstraintTrigger" is a system trigger - now what?
Solution Good Bad
Become a superuser Works as you expect it to Locks the table
Must re-enable triggers
Adjust session_replication_role No table locks!
Bypasses triggers and rules on ALL tables
Must be superuser
MUST set it back to default setting
Disable user triggers manually Regular users can perform
Very clear what is being done
Less damage if forget to re-enable
Locks the table
May not be enough
Drop the foreign key Regular users can perform
Very clear what is being done
Locks the tables
Must recreate the foreign key
Not disable any triggers No locking
Nothing to remember to re-enable
May not work in all situations

For the rest of this article, we will tie up two loose ends. First, how can we see the triggers if \d will not show them? Second, what's up with the crappy trigger name?

As seen above, the output of \d in the psql program shows us the triggers on a table, but not the internal system triggers, such as those created by foreign keys. Here is how triggers normally appear:

postgres=# \c postgres postgres
You are now connected to database "postgres" as user "postgres".

postgres=# create language plperl;
CREATE LANGUAGE

postgres=# create function funk() returns trigger language plperl as $$ return undef; $$;
CREATE FUNCTION

postgres=# create trigger trunk after insert on bar for each statement execute procedure funk();
CREATE TRIGGER

postgres=# \d bar
      Table "public.bar"
 Column |  Type   | Modifiers 
--------+---------+-----------
 b      | integer | 
Foreign-key constraints:
    "baz" FOREIGN KEY (b) REFERENCES foo(a)
Triggers:
    trunk AFTER INSERT ON bar FOR EACH STATEMENT EXECUTE PROCEDURE funk()

postgres=# alter table bar disable trigger all;
ALTER TABLE

postgres=# \d bar
      Table "public.bar"
 Column |  Type   | Modifiers 
--------+---------+-----------
 b      | integer | 
Foreign-key constraints:
    "baz" FOREIGN KEY (b) REFERENCES foo(a)
Disabled triggers:
    trunk AFTER INSERT ON bar FOR EACH STATEMENT EXECUTE PROCEDURE funk()

Warning: Versions older than 8.3 will not tell you in the \d output that the trigger is disabled! Yet another reason to upgrade as soon as possible because 8.2 and earlier are end of life.

If you want to see all the triggers on a table, even the internal ones, you will need to look at the pg_trigger table directly. Here is the query that psql uses when generating a list of triggers on a table. Note the exclusion based on the tgisinternal column:

SELECT t.tgname, pg_catalog.pg_get_triggerdef(t.oid, true), t.tgenabled
FROM pg_catalog.pg_trigger t
WHERE t.tgrelid = '32774' AND NOT t.tgisinternal
ORDER BY 1;

So in our example table above, we should find the trigger we created, as well as the two triggers created by the foreign key. All of them are enabled. Disabled triggers will show as a 'D' in the tgenabled column. (O stands for origin, and has to do with session_replication_role).

postgres=# select tgname,tgenabled,tgisinternal from pg_trigger 
postgres-#  where tgrelid = 'bar'::regclass;
            tgname            | tgenabled | tgisinternal 
------------------------------+-----------+--------------
 RI_ConstraintTrigger_c_32780 | D         | t
 RI_ConstraintTrigger_c_32781 | D         | t
 trunk                        | D         | f

postgres=# alter table bar enable trigger all;
ALTER TABLE

postgres=# select tgname,tgenabled,tgisinternal from pg_trigger
postgres-#  where tgrelid = 'bar'::regclass;
            tgname            | tgenabled | tgisinternal 
------------------------------+-----------+--------------
 RI_ConstraintTrigger_c_32780 | O         | t
 RI_ConstraintTrigger_c_32781 | O         | t
 trunk                        | O         | f

As you recall, the original error - with the system trigger that had a rather non-intuitive named - looked like this:

ERROR:  permission denied: "RI_ConstraintTrigger_16509" is a system trigger

We can break it apart to see what it is doing. The "RI" is short for "Referential Integrity", and anyone who manages to figure that out can probably make a good guess as to what it does. The "Constraint" means it is a constraint on the table - okay, simple enough. The "Trigger" is a little redundant, as it is extraordinarily unlikely you will ever come across this trigger without some context (such as the error message above) that tells you it is a trigger. The final number is simply the oid of the trigger itself. Stick them all together and you get a fairly obscure trigger name that is hopefully not as mysterious now!

The truth about Google Wallet integration

Google Wallet integration is quite a bumpy ride for every developer. I would like to describe one integration pattern that actually works. It is written in PHP for Google Wallet 2.5 API.

Google Merchant account settings

First, one must sign up for Google Merchant account. Once this is done, it is very important to configure the service properly on the Settings > Integration tab

Buy now button

Buy Now buttons are the simplest form of integration. The code for the button can be obtained on the Tools > Buy Now Buttons tab.

I modified the code provided by Google to transfer information to Google Wallet server via the hidden fields on the form.

<form method="POST" action="https://sandbox.google.com/checkout/api/checkout/v2/checkoutForm/Merchant/" accept-charset="utf-8">
<input type="hidden" name="item_name_1" value=""/>
<input type="hidden" name="item_description_1" value="Subscription Fees"/>
Enter Amount to Deposit:<input type="text" class="normal" size="5" name="item_price_1" value=""/>
<input type="hidden" name="item_currency_1" value="USD"/>
<input type="hidden" name="item_quantity_1" value="1"/>
<input type="hidden" name="shopping-cart.items.item-1.digital-content.display-disposition" value="OPTIMISTIC"/>
<input type="hidden" name="shopping-cart.items.item-1.digital-content.description" value="It may take up to 24 hours to process your deposit. Check your account balance and notify the Commissioner if it's not updated within 24 hours."/>
<input type="hidden" name="_charset_" />
<!-- Button code -->
<input type="image" name="Google Checkout" alt="Fast checkout through Google" src="http://sandbox.google.com/checkout/buttons/checkout.gif?merchant_id=&w=180&h=46&style=white&variant=text&loc=en_US" height="46" width="180" />>
</form>

Tweaking Google Merchant Libraries for PHP

The business logic for my website includes:

  • People enter amount they wish to deposit via Google Wallet and click "Buy Now" button.
  • Google sends back confirmation that card is pre-authorized and ready to charge
  • The script tells Google: Go ahead and charge it.
  • Google tells the site that charge went through.
  • Table "orders" in the database adds entry of transaction.

The good news is that we have a very good PHP library for Google Wallet integration in place. The source code can be found here.
The bad news is that the sample code provided with the library does not entirely work with the newest 2.5 Google Wallet API. The library does have functions like SendChargeAndShipOrder (charge-and-ship-order) from 2.5 API implemented, but the demo example still uses the old function SendChargeOrder (charge-order).

I will tweak the sample code in responsehandlerdemo.php for my specific business requirements and 2.5 API version.

Customization is implemented with the help of useful Google Wallet callbacks sent at various stages of order processing. First, my script will wait for the "authorization-amount-notification" notification that is sent after credit card data was verified and authorized. When the notification is received, I will issue the "charge-and-ship-order" command. The "charge-order" command used in the library's sample code no longer works in 2.5 API, because charging and shipping is now done at once.

switch ($root) {
  case "authorization-amount-notification": {
    $google_order_number = $data[$root]['google-order-number']['VALUE']; 
    $GChargeRequest = new GoogleRequest($merchant_id, $merchant_key, $server_type);
    $GChargeRequest->SendChargeAndShipOrder($google_order_number); 
    $Gresponse->sendAck($data[$root]['serial-number']);
    break;
  }
  default:
    $Gresponse->sendAck($data[$root]['serial-number']);
    break;
}

I should receive the "charge-amount-notification" notification after the order is successfully charged. If we go to the Google Wallet dashboard, we will notice that the order is marked as charged and shipped. When the notification is received, the record is created in the "orders" table.

case "charge-amount-notification": {
  $items = get_arr_result($data[$root]['order-summary']['shopping-cart']['items']['item']);
  $googleid = $data[$root]['google-order-number']['VALUE'];
  foreach( $items as $item ) {      
    $userid =$item['item-name']['VALUE'];
    if(isset($userid)){ 
      $amount = $data[$root]['total-charge-amount']['VALUE'];
      $date = $data[$root]['timestamp']['VALUE']; 
      $tmpsql = "INSERT into orders(google_id,amount,userid) VALUES('". $googleid."','".$amount."','".$userid."')";
      if(! $sqlresult=mysql_query($tmpsql)) {
        $Gresponse->log->LogError(mysql_error());
      }
    }
  }
  $Gresponse->sendAck($data[$root]['serial-number']);
  break; 
}

Debugging tools

Google provides a very useful Google Wallet Sandbox, that the developers can use while testing their features. No actual orders are placed and no credit cards are charged.

All the low level warnings and exceptions are recorded to the Tools > Integration Console tab.

Finally, it can be very useful to check googlemessage.log file defined in the sample code.

define('RESPONSE_HANDLER_LOG_FILE', 'googlemessage.log');

Full source code

The full source code for the modified responsedemohandler.php can be found here.

Case Sensitive MySQL Searches

MySQL's support for case sensitive search is explained somewhat opaquely in the aptly titled Case Sensitivity in String Searches documentation. In short, it explains that by default, MySQL won't treat strings as case sensitive when executing a statement such as:

SELECT first_name FROM contacts WHERE first_name REGEXP '^[a-z]';

This simple search to look for contacts whose first name starts with a lower case letter, will return *all* contacts because in the default character set used by MySQL (latin1), upper and lower case letters share the same "sort value".

UPDATE: After many helpful comments from readers, it would seem the term I should have used was collation, not sort value. The documentation for both MySQL and PostgreSQL have lengthy discussions on the topic.

Enough with the backstory, how do I perform case sensitive searches!

The docs say to convert the string representation to a binary one. This allows "comparisons [to] use the numeric values of the bytes in the operands". Let's see it in action:

SELECT first_name FROM contacts WHERE BINARY(first_name) REGEXP '^[a-z]';

There are other strategies available, such as changing the character set being used for comparisons with the COLLATE function. This would likely work better for cases where you had many columns to compare.

SELECT first_name FROM contacts WHERE first_name REGEXP '^[a-z]' COLLATE latin1_bin;

You can even go so far as to have MySQL switch character sets and collations. But you do have to do this for each database, each table, and each column you need to convert. Not terribly fun.

Debugging Sinatra with racksh and pry

One of the most beloved features of the Ruby on Rails framework is certainly its “console” facility. Ruby on Rails programmers often don’t need any debugger simply because they can view their application state in their app’s console. But what do we have at our disposal when using Sinatra?

The sheer beauty of Sinatra

Many of us who had an opportunity to play with Sinatra stand in awe of its pure simplicity. It gives you raw power as a programmer to structure a whole project however you like. It isn’t as opinionated as Ruby on Rails - in fact, there is even a framework called Padrino built upon Sinatra leveraging its unopinionated nature.

Sinatra’s way (®) was also employed in many other languages like JavaScript (through Node.js), Clojure and even in Haskell.

The elephant in the room

The above paragraph seems cool, doesn't it? It provides a catchy and exciting marketing copy, just enough to make you a little bit curious about this whole Sinatra thing. And while Sinatra stands the test of practicality, otherwise it wouldn't be hailed as widely as it is today, there are “gotchas” waiting just around the corner.

Almost every web application could be simplified just to this description: managing persistent data state and rendering this state back to the user. I don’t have to remind us all how tricky this state-management can be at times...

Dude, where is my console?

The first of many Sinatra gotchas is there is no such thing as “Sinatra console”. You’re doomed to write all those pesky “puts” in almost every place in your code and then make countless number of requests while watching the output, right? No! Chin up, my friend. Racksh to the rescue!

Racksh? You mean console for ... Rack?

Yup! It’s not only a solution for Sinatra, but virtually all Rack-based applications. Taken from its GitHub description:

It's like script/console in Rails or merb -i in Merb, but for any app built on Rack. You can use it to load application environment for Rails, Merb, Sinatra, Camping, Ramaze or your own framework provided there is config.ru file in app's root directory.

Its purpose is to allow developer to introspect his application and/or make some initial setup. You can for example run DataMapper.auto_migrate! or make a request to /users/666 and check response details. It's mainly aimed at apps that don't have console-like component (i.e. app built with Sinatra) but all frameworks can benefit from interactive Rack stack and request introspection.

Pretty cool, isn’t it? To install it just put in Gemfile:

gem "racksh"

Then:

bundle install 

and e-voilĂ  - you’ve got your Sinatra console at your disposal. To run it just do:

racksh

Assuming you have config.ru in current directory - otherwise just specify the path to it with CONFIG_RU env variable, like:

CONFIG_RU=/some/path/config.ru racksh

But what if I still need to examine state in my handlers?

We all know how cumbersome it is to get our old ruby-debugger to play nice with ruby-1.9.3. But fear not, we’ve got more cool tools under the belt. One of these is called “pry”.

Taken from its website: Pry is a powerful alternative to the standard irb shell for Ruby. It features syntax highlighting, a flexible plugin architecture, runtime invocation and source and documentation browsing.

Coupled with gem called “pry-debugger” it gives you far better experience debugging than you ever had in the old days with ruby-debugger.

To use it you just call binding.pry wherever you want the execution of your code to suspend giving you an opportunity to play with its current state. So the only difference in usage from old ruby-debugger is putting “binding.pry” instead of “debugger”.

For example:

def index
  @collection = SomeModel.where(name: params[:name])
  binding.pry # let’s see what we have here in @collection..
  (...)
end

Executing second line of this handler will hold further execution and start a REPL session with same execution state as this line in handler would have.

Now, having included pry-debugger in your Gemfile allows you to inspect all the state as well as execute usual debugging commands: step, next, continue, finish and breakpoints.

Read more at:

Piggybak Update: Line Item Rearchitecture

Over the last couple of weeks, I’ve been involved in doing significant rearchitecture of Piggybak’s line items data model. Piggybak is an open-source mountable Ruby on Rails ecommerce solution created and maintained by End Point. A few months ago after observing a few complications with Piggybak’s order model and it’s interaction with various nested elements (product line items, shipments, payments, adjustments) and calculations, and after reviewing and discussing these complications with a couple of my expert coworkers, we decided to go in the direction of a uniform line item data model based on our success with this model for other ecommerce clients over the years (whoa, that was a long sentence!). Here, I’ll discuss some of the motiivations and an overview of the technical aspects of this rearchitecture.

Motivation

The biggest drivers of this change were a) to enable more simplified order total calculations based on uniform line items representing products, shipments, payments, etc. and b) to enable easier extensibility or hookability into the order architecture without requiring invasive overrides. For example, the code before for order totals may looked something like this:

order.subtotal = sum of items + sum of adjustments + sum of credits + sum of shipments
order.total_due = sum of items + sum of adjustments + sum of payments + sum of credits + sum of shipments

And after the order calculation, a more simplified version of order total calculation looks like this:

self.subtotal = sum of line item prices that aren't payments
self.total_due = sum of all line items

A related motivation that helped drive this change was to develop several credit-based features for Piggybak such as gift certificates, coupons and bundle discounts to grow the feature set of Piggybak. Rather than requiring complex overrides to incorporate these custom credits to orders, a consistent line item interface supports integration of additional custom line item types.

Data Model Changes

Prior to the rearchitecture, the data model looked like this:


Piggybak data model prior to line item rearchitecture.

Some important notes on this are:

  • Line items, payments, shipments and adjustments belong to the order. An order can have many of these elements.
  • During order processing, all of these elements had to be processed independently without uniform consistency. An order balance due represented the sum of various charge related elements (items, shipments, tax) minus any payments or adjustments.
  • In the case of adjustments, this was a bit tricky because an adjustment could be in the form of a negative or positive amount. This introduced complications in the order calculation process.
  • Line items represented products only.

With the rearchitecture, the data model now looks like this:


Piggybak data model after line item rearchitecture.

Important notes on this are:

  • In the core Piggybak data model, line item types represent sellable (product), payment, shipment, adjustment, and tax entries.
  • Line items can still be related to other elements, such as payment and shipment, but the line item has uniform information such as price and description.
  • Line item types are controlled by a Piggybak configuration variable, which allows for Piggybak extensions and the main Rails application to incorporate additional custom line item types.
  • Because various calculation methods are applied on each line item type (e.g. payments are charged against credit card, shipping is recalculated with a shipping calculator) the line item order model is amenable to custom preprocessing and processing per line item type. This takes advantage Ruby’s respond_to? method to determine if specific preprocessing or postprocessing methods exist.
  • The new architecture also takes advantage of metaprogramming by defining methods dynamically against the line item types. For example, order instance methods "shipping_charge" and "tax_charge" are dynamically created which return the sum of line item prices where the line item type is shipping or tax, respectively.

Coupon Support in Piggybak

Much of the line item rearchitecture work was done in tandem with development of a Piggybak coupon extension, so I’m excited to announce that with this change, we now have another Piggybak extension piggybak_coupons available for use with Piggybak. The piggybak_coupon extensions includes support for defining discount type (percent, dollar, or free shipping), discount amoount (for percent and dollar), minimum cart total, expiration date, allowed number of uses. A coupon may be applied on the checkout page via an AJAX lookup. The piggybak_coupons extension is similar to piggybak in that it must be installed as a gem into the application. It includes it's own migration, model, controller, view, and decorator files.

What's Next?

Introducing this new architecture gives us the abiliity to incorporate new and custom line item processing functionality. Popular line item types that correspond to popular ecommerce features include:

  • refunds
  • gift certificates
  • coupons
  • bundle discounts

Less common, but still possible with this new architecture might include:

  • custom discounts (e.g. buy one get one free)
  • payment via purchase order
  • payment via check
  • donations

The future for the Piggybak team includes further development of extensions to support some of the common line item type features.

Naturally, there may be a few follow-up incremental improvements since this was a significant change. All of this work is included in the Piggybak gem release version 0.6.2. Read more about Piggybak and check out the demo here, or contact End Point today to get us involved in your next ecommerce project!

Simple bash shell script for running batch MySQL jobs

The other day I needed to run a simple mysql job to backup and delete some database records on a live server. Being a live server, it is important to make sure you aren't asking the database to take on jobs that could potentially lock it up. Better to run a batch job. Running a batch is simple. You can call it right from the mysql console with:

source [path_to]/[the_batch_script].sql

But what if there are millions of records that need deleting? Bash shell script to the rescue.

Here is the idea of the SQL job that needed to get run a few times:

START TRANSACTION;

/* Find what you want to delete and put a LIMIT on your batch size */
CREATE TEMPORARY TABLE records_to_delete_temp SELECT id from `records` where ..... limit 1000;

/* Creating backup table to archive spam orders */
CREATE TABLE IF NOT EXISTS `records_backup` LIKE `records`;
INSERT INTO `records_backup` SELECT * from `records` where id in (select id from `records_to_delete_temp`);

/* Delete Dependents - If your records have foreign key dependencies, delete them first */
DELETE FROM `dependent_1` where record_id in (select id from `records_to_delete_temp`);
DELETE FROM `dependent_2` where record_id in (select id from `records_to_delete_temp`);

/* Delete the records */
DELETE FROM `records` where id in (select id from `records_to_delete_temp`);

/* Return count of remaining records - the where clause should be the same as the original select for the records to delete */
SELECT COUNT(*) from `records` where .... ;

COMMIT ;

Note:

SELECT COUNT(*)

This will return the remaining record count to the shell script.

And the shell script...

ret=$(mysql -u [user] -p[password] --skip-column-names [database_name]  < [the_batch_script].sql)

while [ $ret -gt 0 ]
do
  echo "rows left: $ret"
  sleep 3
  ret=$(mysql -u [user] -p[password] --skip-column-names [database_name]  < [the_batch_script].sql)
done

Notes:

--skip-column-names

This is the little nugget that gives you clean output from the mysql script. Without --skip-column-names you will have to get more creative with parsing the returned table.

-p[password]

Just a friendly reminder, no space after the -p option.

Should you really be running batches to remove millions of records from a live server with a shell script?

Just because you can doesn't mean you should. Before pulling the trigger, back up, consider what could go wrong, have a plan in place to address the possible failure.

Find something fun for your hands to do while bash takes care of running the jobs. At the least, cross your fingers!

How to pick a Ruby gem

RubyGems are one of the big benefits of developing in the Ruby environment as they can provide you with a powerful set of building blocks that were created by some great developers. Earlier in my Ruby career I used to think of RubyGems as a quick way to get some "free" code into my applications and I would spend a tremendous amount of time trying to see what kind of apps I could concoct by stacking gem on top of gem. In practice this turned out to be foolish because rather than gaining a stack of "free" code what I was instead doing was "paying" for each gem by having to learn how each of these gems worked and what kind of assumptions and gotchas they were bringing into my apps. I changed my ideas about gems and now I opt by default to avoid adding gems to my projects, but when I do decide that a piece of functionality might be better served through a gem, I make sure to put potential candidates through a rigorous vetting process.

When looking for a gem the question I keep in mind is, "Does adding this gem to my project benefit me more than just writing these features by hand?" I measure the gem up against some criteria and if the answer remains yes then I'll add the gem.

Criteria for gem evaluation

1. Does this gem support my application's Rails version?

The first thing I like to do when I find a gem is to check out the README on Github and see if Rails version information is included with the gem installation instructions. This gives me an idea about how up to date the gem is.

2. When was the last commit and how much recent activity does the gem have?

Again I'm trying to make a decision if the gem is still fresh or if it's exceeded its expiration date. The Rails world changes pretty fast and if a gem hasn't been touched in recent months then that's a good indication that the gem is out of date. Unfortunately Rubygems, especially gems for Rails have unwritten expiration dates and an old gem past its prime doesn't automatically get dropped off of github or rubygems.org, instead it just sits around without any activity.

3. Does the code look like what I expect it to?

Generally I can come up with a quick idea about how I think the gem should work and then I'll quickly scan through the gem's contents to see how my idea of the gem and its actual internals match up. If my understanding of how the gem should be and its actual implementation are too far off then this is an indicator that the gem might be more expensive to add in terms of maintenance and the learning I'll need to invest. If the gem's code is extremely complex and I barely understand how it works then that is a red flag that maybe this gem is going to take me on a wild ride.

4. Does it pass its unit tests and do the tests show me how the gem works?

I like to use unit tests as documentation that can show me quickly how a gem is intended to be used. I'll run the unit tests and if they pass that's kind of a gold bond measurement for me that the gem is worth using. I'll still consider using a gem even if the tests fail because sometimes a gem's tests will fail or be difficult to execute and that still won't be a deal breaker for me so long as I can reasonably follow why the tests might be failing.

After a gem has passed its vetting process I'll still keep an awareness about how the gem feels in my app and how much time I'm spending integrating it with what I'm trying to do. If I get the feeling that I'm spending too much time fighting with the gem and that it's not quite fitting with what I'm doing then I will consider pulling the plug and either using a different gem or writing the functionality by hand.

If you liked this discussion of picking out Ruby gems then I encourage you to check out the always excellent Railscasts by RyanB who recently posted a video about this very same topic.

Don't Sleep on Rails 3 SQL Injection Vulnerabilities


SQL injection is a problem that every web developer needs to be aware of when accepting parameters that will during the life of the request be converted into SQL statements. Rails historically has done what it can to mitigate this risk for the developer by providing vehicles for sanitizing parameter inputs at the points when they are being converted for use inside of a SQL statement, however with Rails 3 there are numerous ways to execute a SQL statement against the database and some of these methods are safer than others.

Consider two cases where valid Rails code is vulnerable to SQL injection:

#user inputed parameters
params[:query] = "'robert'; DROP TABLE students; ##"

#CASE 1 - find_by_sql
User.find_by_sql("SELECT * FROM users WHERE (name = '#{params[:query]}'")  ##(BAD BAD BAD)

#generated SQL
SELECT  `users`.* FROM `users`  WHERE (email = 'Robert'); DROP TABLE STUDENTS; ##') ##(THIS STATEMENT WILL DROP TABLE STUDENTS)

The example above shows how find_by_sql can allow parameters submitted by a user to be directly entered into a SQL statement and how an attacker might use the vulnerability to wreak havoc. These types of find_by_sql statements used to be more commonly used in earlier versions of Rails (1.0 - 2.0) and it was through these statements that the Rails community realized that SQL injection was a problem that needed addressing. Here's another example prominent in the early Rails days:

User.find :first, :conditions => "(name = '#{params[:query]}')" ##(BAD BAD BAD)
produces this SQL statement:

#generated SQL
SELECT  `users`.* FROM `users`  WHERE (email = 'Robert'); DROP TABLE STUDENTS; ##') ##(THIS STATEMENT WILL DROP TABLE STUDENTS)

The above example shows a common Rails idiom for performing an ActiveRecord query, as with the previous find_by_sql example the find query here is piping the param in directly and generating the exact same tainted SQL.

Fortunately, Rails core decided to make it easier to just do the right thing and provided ways to pass in parameters by using built-in filters that handle special SQL characters, which will escape ’ , " , NULL character and line breaks. Instead of passing in the parameter directly as a raw string, you can pass in an array to sanitize the tainted strings using the built-in filters:

User.find_by_sql(["SELECT * FROM users WHERE (name = ?)", params])  ##(GOOD)
User.find :first, :conditions => ["(name = '?')", params] ##(GOOD)
#generated SQL
SELECT * FROM users WHERE (name = 'Robert\'); DROP TABLE STUDENTS; ##') (RETURNS NIL)

The distinction in the filtered SQL statement is the escaped single quote right after the t in Robert which prevents the name parameter from terminating and allowing the DROP TABLE STUDENTS from being executed since it remains a part of the string parameter. Additionally, Rails also included these built-in filters automatically when AR queries were called from find_by_something or a conditions hash:

User.find_by_name(params[:query]) ##(GOOD)
#generated SQL
SELECT * FROM `users` WHERE `users`.`name` = 'Robert\'); DROP TABLE STUDENTS; ##' (RETURNS NIL)
User.find :first, :conditions => {:name => params} ##(GOOD)
SELECT `users`.* FROM `users` WHERE `users`.`name` = 'Robert\'); DROP TABLE STUDENTS; ##' (RETURNS NIL)

Rails 3 introduced AREL, which is another way to perform ActiveRecord queries and with it came as well, another way to make the exact same SQL injection mistakes that are already listed above. However having gotten accustomed to looking for SQL injection vulnerabilities in the AR query formats above you might be lulled into thinking that the new and improved ActiveRecord query methods would just magically handle the tainted strings for you and you'd be dead wrong:

User.where("name = '#{params}'") (BAD)
SELECT `users`.* FROM `users`  WHERE (name = 'Robert'); DROP TABLE STUDENTS; ##') ##(THIS STATEMENT WILL DROP TABLE STUDENTS)

The nice thing is that the same fix can also be applied:

User.where(["name = ?", params])
SELECT `users`.* FROM `users`  WHERE (name = ''Robert\'); DROP TABLE STUDENTS; ##'')  ##(RETURNS NIL)

Feature Isolation, an Overview

Yesterday, Brian Buchalter blogged about a recent presentation I did for End Point Rails developers.

While the blog article did a great job of capturing some of the nitty gritty detail from the presentation, I'd like to just followup with a quick overview statement about Feature Isolation. I've also made my slides available for anyone who is interested.

Feature Isolation is what I'm calling a development strategy for adding new features to existing applications. In Rails, I'm utilizing cucumber, a tool for transforming use-case language into ruby code, to specify the requirements and then execute them outside of the Rails environment and away from the complexity of the rest of the application.

Using stubbing and a minimal mock of ActiveRecord (FastModel) I can then begin to design my feature from a more object oriented approach than is typical in Rails development. I can bring in any models, new or existing, that I will need and stub out the interface to the database. Likewise, I can design my classes and their public interface. Getting all my tests to pass from a high level without actually developing the behavior itself allows me to make design decisions quickly and ensure I'm capturing all the requirements in my design.

From there, it's just a matter of removing the stubs and mocks from the tests and then building them out in the application ensuring that I'm still passing from my outer cucumber tests as I go. Eventually, the cucumber tests will drive a browser (using a terminal headless browser called capybara-webkit).

This approach has really disciplined me in how I approach a new feature and helped me to stay focused on building what is needed instead of trying to do too much. It's also centralized my business logic in objects within the application instead of Rails itself.

I'm hoping to get some feedback from the Rails community to improve the process and the tools some more, but having gone through the process several times, I believe it can really help -- especially when dealing with existing complex applications.

Feature Isolation with Mike Farmer

My brilliant co-worker Mike Farmer gave a presentation today talking about a development strategy he calls "Feature Isolation." It involves developing new features on the fringe of your application, isolating it from the complexity of existing code. This allows greater focus on ensuring that your feature is well designed from an object-oriented perspective and that you don't build more than you need.

In order to truly isolate the feature, Mike put together some cucumber tools to allow you to run cucumber without Rails and to create what he calls a "FastModel". The models are fast for two reasons. First, you don't need to load ActiveRecord to get functionality like specifying field names, specifying relationships, or emulating saving records. Second, it let's you to sketch out a design for your class while the cost of change is very very low.

An Example: Product Variants

Here's an example of a tight little feature and step set for showing shoppers a comparison of product variants.

Feature: As a shopper, I want to compare the variants of a product

  Background:
    Given there is a product named "Product A"
    And it has some variants with various options
    And I am on the comparison page for "Product A"


  Scenario: The shopper sees a comparison chart of Variant A
    When "Variant A" has options "a,b,c"
    Then the comparison chart header should have options "a,b,c"
    And the comparison grid should have 3 checkboxes

When a feature request comes in you write out your cucumber scenarios and do the usual sign-off from the product manager. Nothing new or exciting here. Next, implement failing step files; yes of course they should fail first, this is TDD right? Mike has a strong opinion about what your features and steps should look like: short and specific. He's found that if you're writing too much setup, your feature is not well isolated or that perhaps this feature needs to be broken into several smaller ones.

Now that we have the feature in place, we can create the steps.

Given /^there is a product named "([^"]*)"$/ do |product_name|
    @product = Product.create(:name => product_name)
  end

  Given /^it has some variants with various options$/ do
    @variant_a = Variant.create(:sku => 'Variant A', product => @product)
    VariantOption.create(:name => "Option A", :variant => @variant_a)
    #... same thing for B, C, D
  end

  Given /^I am on the comparison page for "([^"]*)"$/ do |product|
    # UI Action
    # visit product_path(@product.permalink)
    @current_product = Product.find product
  end

  When /^"([^"]*)" has options "([^"]*)"$/ do |variant, options|
    @current_variant = Variant.find variant
    opts = options.split(",")
    #... find the options
    @options = [@option_a, @option_b, @option_c, @option_d]
  end

  #... more steps

Keep in mind that at first, that these steps are going to be powered by FastModel and not Rails. Let's see how we use cucumber_tools to do this.

Magic Sauce: Rapid Class Design

After compeleting a RED TDD cycle, Mike's strategy really takes center stage. Instead of creating classes, he uses FastModel to define the models, including existing models, fields and relationships needed for his feature. He goes even farther and stubs out the interface which is needed to satisify the cucumber test. Let's look at a specific example.

class Product < FastModel
    fields :name
    has_many :variants
  end

  class Variant < FastModel
    fields :name
    has_many :variant_options
    belongs_to :product
  end

  class VariantOption < FastModel
    fields :name
    belongs_to :variant
  end

  class VariantCompareChart; end

FastModel

Without writing migrations, or even loading Rails, you can use FastModel to think about some of the basics of your ActiveRecord models. This let's you focus on the design of your classes, by iterarting and testing quickly. These FastModels can actually go in the step file making it easier to stay focused on what matters: class design.

Why stub in a cucumber feature?

It might seem strange to start stubbing out your class interface inside a cucumber feature. Mike understands this and just like the FastModels above, he isn't expecting these stubs to stay. He's using it as a tool to drive very quick class interface modeling. It's a great way to stay inside one test file and keep the cost of iterating through the class design low. This is a central part of the magic of his strategy. By keeping the design of class' interface directly linked to the design of the feature's steps, we ensure we only design and test our public interface. Let's see it in action:

Then /^the comparison chart header should have options "([^"]*)"$/ do |options|
    opts = options.split(",")
    header = stub(:header) { opts }
    chart = stub(:chart, :header => header)
    VariantComparisonChart.stub(:build_for) { chart }

    chart = VariantComparisonChart.build_for(@current_product)
    opts.each do |option|
      chart.header.should include option
    end
  end

Outside: Green. Time to move inside!

With passing cucumber tests backed by FastModels and stubs, our class design is complete. It's time to actually start building out the real, but still isolated classes, outside Rails. Using your FastModels and stubs as guides you can TDD up a well unit tested class that does the job. Then swap out the stubs and the FastModels with your real classes and confirm the cucumber cycle is still green.

How do I use it?

FastModel and the stubbing library are included as part of cucumber_tool's no_rails.rb which can be called when executing your cucumber spec:

cucumber -r no_rails.rb path/to/feature.feature

Mike recommends avoiding the standard env.rb usage of cucumber to avoid loading Rails until you need it later. Mike says these tools were thrown together very quickly to capture the concept of "Feature Isolation" so any pull requests would be greatly appreciated. For more details, please visit Mike's follow up article.

Piggybak: The Roadmap

Over the last couple of weeks, a few of us at End Point have had some discussion about the future direction of Piggybak. Piggybak is an open source mountable ecommerce framework written in Ruby on Rails supported and developed by End Point. It introduces core ecommerce functionality into a Rails application, but is intended to allow the mounted Rails application to maintain control over some architecture elements.

Pros of Piggybak

Until now, the advantage of Piggybak is that it's a fairly lightweight approach. It leverages the power of RailsAdmin rather than creating it's own admin. It allows the mounted Rails application to make decisions on what types of items are sellable and how these items are found (i.e. product finding methods, SSL configuration). Piggybak also has streamlined integration of ActiveMerchant, which immediately provides support of over 40 popular payment gateways. Piggybak has a cookie-based cart and an AJAX-driven one-page checkout.

Cons of Piggybak Approach

Because Piggybak has a lightweight approach, the major disadvantage is that it cannot compete with existing ecommerce frameworks as an out of the box solution with a full ecommerce feature set. When compared with more feature-rich ecommerce platforms like Spree and Magento these other ecommerce platforms may have more features out of the box. This is a disadvantage because the abstraction, code cleanliness and maintainability provided by Piggybak is not necessarily as strong of a selling point to the feature list to a potential website owner.

The Roadmap

In looking towards the future of Piggybak, we've decided to build out some features of Piggybak, but will try to maintain a balance between having a good feature set while still maintaining the lightweightedness of Piggybak. Some of our goals in the future include:

  • Realtime Shipping with USPS, UPS, and Fedex support [as extension]
  • Improvement of Piggybak installation process [core]
  • Advanced Product Optioning Support [as extension]
  • Line Item Rearchitecture to support future work on Gift Certificates, Discounts [core]
  • Gift Certificate, Discount Support [core]
  • Advanced Taxonomy [as extension]
  • Reviews & Ratings [as extension]

Follow the Piggybak github user here, check out the website and demo here and keep an eye out for future blog posts on the progress of Piggybak. Also, contact End Point now if you are ready to get started or need support on your current Ruby on Rails ecommerce platform.

The Real Cost of Data Roundtrip

Sometimes you need to perform some heavy database operations. I don't know why very often programmers are afraid of using databases for that. They usually have some fancy ORM which performs all the operations, and the only way to change the data is to make some SELECT * from a table, create a bunch of unneeded objects, change one field, convert those changed objects into queries and send that to the database.

Have you ever thought about the cost of the roundtrip of data? The cost of getting all the data from database just to send changed data into the database? Why do that if there would be much faster way of achieving the same results?

Imagine that you have quite a heavy operation. Let's make something which normally databases cannot do, some more complicated operation. Many programmers just don't know that there is any other way than writing this in the application code. Let's change all the HTML entities into real characters.

The HTML entities are a way of writing many different characters in HTML. This way you can write for instance the Euro currency sign "€" in HTML even if you don't have it on your keyboard. You just have to write &euro; or &#8364; instead. I don't have to, as when I use UTF-8 encoding and write this character directly, it should be showed normally. What's more I have this character on my keyboard.

I will convert the text stored in database changing all the htmlentities into real unicode characters. I will do it using three different methods.

  • The first will be a simple query run inside PostgreSQL
  • The second will be an external program which downloads the text column from database, changes it externally and loads into database.
  • The third method will be almost the same as the second, however it will download whole rows.

Generate Data

So, for this test I need to have some data. Let's write a simple data generator.

First, a simple function for returning a random number within the given range.

CREATE FUNCTION random(INTEGER, INTEGER) RETURNS INTEGER AS $$
  SELECT floor ( $1 + ($2 - $1 + 1 ) * random())::INTEGER;
$$ LANGUAGE SQL;

Now the function for generating random texts of random length filled with the HTML entities.

CREATE FUNCTION generate_random_string() RETURNS TEXT AS $$
DECLARE
  items TEXT[] =
    ARRAY[
      'AAAA','BBBB','CCCC','DDDD','EEEE','FFFF','GGGG',
      'HHHH','IIII','JJJJ','KKKK','LLLL','MMMM','NNNN',
      'OOOO','PPPP','QQQQ','RRRR','SSSS','TTTT','UUUU',
      'VVVV','WWWW','XXXX','YYYY','ZZZZ',
      '&amp;', '&#34;', '&#39;', '&#38;','&#60;','&#62;',
      '&#162;','&#163;','&#164;','&#165;','&#166;','&#167;',
      '&#168;','&#169;','&#170;','&#171;','&#172;','&#173;',
      '&#174;','&#175;','&#176;','&#177;','&#178;','&#179;',
      '&#180;','&#181;','&#182;','&#183;','&#184;','&#185;',
      '&#186;','&#187;','&#188;','&#189;','&#190;'
    ];
  length INTEGER := random(500, 1500);
  result TEXT := '';
  items_length INTEGER := array_length(items, 1);
BEGIN
  FOR x IN 1..length LOOP
    result := result || items[ random(1, items_length) ];
  END LOOP;

  RETURN result;
END;
$$ LANGUAGE PLPGSQL;

The table for storing the data is created with the following query:

CREATE TABLE data (
    id SERIAL PRIMARY KEY, 
    padding TEXT, 
    t TEXT
);

Then I filled this table using a query generating 50k rows with random data:

INSERT INTO data(payload, t) 
SELECT 
    generate_random_string(), 
    generate_random_string() 
FROM 
    generate_series(1, 50*1000);

Let's check the table size:

SELECT pg_size_pretty(pg_relation_size('data'));
 pg_size_pretty 
 ----------------
  207 MB
  (1 row)

As the table is filled with random data, I need to have two tables with exactly the same data.

CREATE TABLE query (id SERIAL PRIMARY KEY, payload TEXT, t TEXT);
CREATE TABLE script (id SERIAL PRIMARY KEY, payload TEXT, t TEXT);
CREATE TABLE script_full (id SERIAL PRIMARY KEY, payload TEXT, t TEXT);

INSERT INTO query SELECT * FROM data;
INSERT INTO script SELECT * FROM data;
INSERT INTO script_full SELECT * FROM data;

The Tests

SQL

Many programmers think that such operations are not normally available inside a database. However PostgreSQL has quite a nice feature, it can execute functions written in many different languages. For the purpose of this test I will use the language pl/perlu which allows me to use external libraries. I will also use HTML::Entities package for the conversion.

The function I wrote is quite simple:

CREATE FUNCTION decode_html_entities(text) RETURNS TEXT AS $$
    use HTML::Entities;
    return decode_entities($_[0]);
$$ LANGUAGE plperlu;

The update of the data can be done using the following query:

UPDATE query SET t = decode_html_entities(t);

Application

In order to have those tests comparable, I will write a simple perl script using exactly the same package for converting html entities.

#!/usr/bin/env perl

use DBI;
use HTML::Entities;
use Encode; 

my $dbh = DBI->connect(
    "DBI:Pg:dbname=test;host=localhost", 
    "szymon", 
    "", 
    {'RaiseError' => 1, 'pg_utf8_strings' => 1});

$dbh->do('BEGIN');

my $upd = $dbh->prepare("UPDATE script SET t = ? WHERE id = ?");

my $sth = $dbh->prepare("SELECT id, t FROM script");
$sth->execute();

while(my $row = $sth->fetchrow_hashref()) {
    my $t = decode_entities( $row->{'t'} );
    $t = encode("UTF-8", $t);
    $upd->execute( $t, $row->{'id'} );
}

$dbh->do('COMMIT');
$dbh->disconnect();

The Worst Application

There is another terrible idea implemented by programmers too often. Why select only the column you want to change? Let's select all the rows and send them back to database.

This script will look like this (the important changes are in lines 17 and 23)

#!/usr/bin/env perl

use DBI;
use HTML::Entities;
use Encode; 

my $dbh = DBI->connect(
    "DBI:Pg:dbname=test;host=localhost", 
    "szymon", 
    "", 
    {'RaiseError' => 1, 'pg_utf8_strings' => 1});

$dbh->do('BEGIN');

my $upd = $dbh->prepare("UPDATE script_all SET t = ? WHERE id = ?");

my $sth = $dbh->prepare("SELECT id, payload, t FROM script_all");
$sth->execute();

while(my $row = $sth->fetchrow_hashref()) {
    my $t = decode_entities( $row->{'t'} );
    $t = encode("UTF-8", $t);
    $upd->execute( $t, $row->{'payload'}, $row->{'id'} );
}

$dbh->do('COMMIT');
$dbh->disconnect();

Results.

The query using pl/perlu function executed in 26 seconds.

The script changing data externally execuded in 2 minutes 10 seconds (5 times slower)

The worst script getting and resending whole rows finished in 4 minutes 35 seconds (10 times slower).

I used quite a small number of rows. There were just 50k rows (about 200MB). On production servers the numbers are much bigger.

Just imagine that the code you developed for changing data could run 10 times faster if you'd do this in the database.

Setting user ownership of nginx and Passenger processes

Do this now on all your production Rails app servers:

ps ux | grep Rails

The first column in the results of that command show which user runs your Rails and Passenger processes. If this is a privileged user (sudoer, or worse yet password-less sudoer), then this article is for you.

Assumptions Check

There are several different strategies for modifying which user your Rails app runs as. By default the owner of config/environment.rb is the user which Passenger will run your application as. For some, simply changing the ownership of this file is sufficient, but in some cases, we may want to force Passenger to always use a particular user.

This article assumes you are running nginx compiled with Passenger support and that you have configured an unprivileged user named rails-app. This configuration has been tested with nginx version 0.7.67 and Passenger version 2.2.15. (Dated I know, but now that you can't find the docs for these old versions, this article is extra helpful.)

Modifying nginx.conf

The changes required in nginx are very straight forward.

# Added in the main, top-level section
user rails-app;

# Added in the appropriate http section among your other Passenger related options
passenger_user_switching off;
passenger_default_user rails-app;

The first directive tells nginx to run it's worker processes as the rails-app user. It's not completely clear to me why this was required, but failing to include this resulted in the following error. Bonus points to any one who can help me understand this one.

[error] 1085#0: *1 connect() to unix:/tmp/passenger.1064/master/helper_server.sock failed (111: Connection refused) while connecting to upstream, client: XXX, server: XXX, request: "GET XXX HTTP/1.0", upstream: "passenger://unix:/tmp/passenger.1064/master/helper_server.sock:", host: "XXX"

The second directive, passenger_user_switching off, tells Passenger to ignore the ownership of config/environment.rb and instead use the user specified in the passenger_default_user directive. Pretty straight forward!

Log File Permissions Gotcha

Presumably you're not storing your production log files in your apps log directory, but instead in /var/log/app_name and using logrotate to archive and compress your logs nightly. Make sure you update the configuration of logrotate to create the new log files with the appropriate user. Additionally, make sure you change the ownership of the current log file so that Passenger can write your applications logs!

Ubuntu Dual Monitor Setup on Dell XPS

Over the weekend, I received a new desktop (Dell XPS 8500 with NVIDIA graphics card) and troubleshot dual monitor setup on Ubuntu. Because I spent quite a while googling for results, I thought I'd write up a quick summary of what did and didn't work.

One monitor was connected via HDMI and the other through DVI (with a VGA to DVI adaptor provided with the computer). When I started up the computer in Windows, both monitors were recognized immediately. Besides configuring the positioning of the monitors, Windows was good to go. But when I installed Ubuntu, the DVI/VGA monitor was recognized with incorrect resolution and the monitor connected via HDMI was not recognized at all. I tried switching the unrecognized monitor to a VGA/DVI connection, and it worked great by itself, so I concluded that it wasn't an issue with a driver for the HDMI-connected monitor.

Many of the Google results I came across pointed to troubleshooting with xrandr, but any xrandr commands produced a "Failed to get size of gamma for output default." error and any progress beyond that was shut down. Another set of Google results pointed to using "nvidia-detector", but there weren't any follow-up tips or pointers on that when it returned nothing. And many Google results were all over the place or were on forums and were unanswered.

Finally, I came across a couple of articles that suggested that Ubuntu didn't have the proprietary nvidia driver and to install it with the command nvidia-current. After installing this, and restarting my X session, both monitors were working with correct resolution. I finished up by resetting the positioning of the monitors, adjusting the primary display, and saving to the xorg (X Configuration) file. This was probably the fastest I've figured out dual monitor setup (it's always an adventure), most likely because of improved Linux/Ubuntu support on various machines over time.


nvidia-settings display settings.