End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Spree: Gift Certificates and Coupons

In a recent Spree project, I've been working with Bill to add gift certificate functionality. According to the Spree documentation, gift certificate functionality is trivial to implement using the existing coupon architecture. Here are some of the changes we went through as we tried to use the coupon architecture for gift certificate implementation - we found that it wasn't so simple after all.


Here is a very simplified visualization of the coupon and adjustment data model in Spree. Coupons use polymorphic calculators to compute the applicable discount.

First, Bill and I brainstormed to come up with an initial set of changes required for implementing gift certificates as coupons after we reviewed the data model shown above:

  1. Add logic to create a coupon during checkout finalization, which was done with the following:
  2. # coupon object class method
    def self.generate_coupon_code
      # some method to generate an unused random coupon code beginning in 'giftcert-'
    end
    
    # inside order model during checkout finalization
    line_items.select { |li| li.variant.product.is_gift_cert? }.each do |line_item|
      line_item.quantity.times do
        coupon = Coupon.create(:code => Coupon.generate_coupon_code,
                               :description => "Gift Certificate",
                               :usage_limit => 1,
                               :combine => false,
                               :calculator => Calculator::FlatRate.new)
        coupon.calculator.update_attribute(:preferred_amount, line_item.variant.price)
      end
    end
    
  3. Add logic to decrease a coupon amount during checkout finalization if used:
    # order model during checkout finalization
    coupon_credits.select{ |cc| cc.adjustment_source.code.include?('giftcert-') }.each do |coupon_credit|
      coupon = coupon_credit.adjustment_source
      amount = coupon.calculator.preferred_amount - item_total
      coupon.calculator.update_attribute(:preferred_amount, amount < 0 ? 0 : amount)
    end
    
  4. Add relationship between line item and coupon because we'd want to have a way to associate coupons with line items. The intention here was to limit a gift certificate line item to a quantity of 1 since the gift certificate line item might include personal information like an email in the future.
    LineItem.class_eval do
      has_one :line_item_coupon
      has_one :coupon, :through => :line_item_coupon
    end
    
    class LineItemCoupon < ActiveRecord::Base
      belongs_to :line_item
      belongs_to :coupon
    
      validates_presence_of :line_item_id
      validates_presence_of :coupon_id
    end
    
  5. Create the sample data for a gift certificate (coupon) - the implementation offers a master variant for a fixed cost of $25.00. In addition to the code below, Bill created sample data to assign a product property is_gift_cert to the product.
    # products.yml
    gift_certificate:
      name:          Gift Certificate
      description:   Gift Certificate
      available_on:  <%= Time.zone.now.to_s(:db) %>
      permalink:     gift-certificate
      count_on_hand: 100000
    
    # variants.yml
    gift_cert_variant:
      product:       gift_certificate
      sku:           giftcert
      price:         25.00
      is_master:     true
      count_on_hand: 10000
      cost_price:    25.00
      is_extension:  false
    
  6. Finally, Bill edited the order mailer view to include gift certificate information

After the above changes were implemented, additional changes were required for our particular Spree application.

  1. Adjust the shipping API so it doesn't include gift certificates in the shipping request, because gift certificates aren't shippable. Below is an excerpt of the XML builder code that generates the XML request made to the shipping API:
    # shipping calculator
    -order.line_items.each do |li|
    +order.line_items.select { |li| !li.variant.product.is_gift_cert? }
       x.item {
         x.quantity(li.quantity)
         x.weight(li.variant.weight != 0.0 ? li.variant.weight : Spree::MyShipping::Config[:default_weight])
         x.length(li.variant.depth ? li.variant.depth : Spree::MyShipping::Config[:default_depth])
         x.width(li.variant.width ? li.variant.width : Spree::MyShipping::Config[:default_width])
         x.height(li.variant.height ? li.variant.height : Spree::MyShipping::Config[:default_height])
         x.description(li.variant.product.name)
       }
    
  2. Create a new calculator for free shipping applicable to orders with gift certificate line items only, using the is_gift_cert product property:
    # registering the calculator inside Spree site_extension.rb (required for all calculators to be used in Spree)
    [   
      Calculator::GiftCertificateShipping,
    ].each{ |c_model|
      begin
        c_model.register if c_model.table_exists?
      rescue Exception => e
        $stderr.puts "Error registering calculator #{c_model}"
      end 
    }
    
    # shipping method and calculator creation in sample data
    s = ShippingMethod.new(:zone_id => 16, :name => 'Gift Certificate Shipping')
    s.save
    c = Calculator.new
    c.calculable = s
    c.type = 'Calculator::GiftCertificateShipping'
    c.save 
    
    # calculator for free gift cert shipping
    class Calculator::GiftCertificateShipping < Calculator
      ...
      def available?(order)
        order.line_items.inject(0) { |sum, li| sum += li.quantity if !li.variant.product.is_gift_cert?; sum } == 0
      end
    
      def compute(line_items)
        0
      end
    end
    

After Bill implemented these changes, I contemplated the following code more:

coupon_credits.select{ |coupon_credit| coupon_credit.adjustment_source.code =~ /^giftcert-/}.each do |coupon_credit|
  coupon = coupon_credit.adjustment_source
  amount = coupon.calculator.preferred_amount - item_total
  coupon.calculator.update_attribute(:preferred_amount, amount < 0 ? 0 : amount)
end 

I wondered why the coupon amount being decremented by the item_total and not the order total. What about shipping and sales tax? I verified by looking at the the Spree Coupon class that a coupon's amount will only take into account the item total and not shipping or tax, which would present a problem since gift certificates traditionally apply to tax and shipping costs.


In the Spree core, coupons are never applied to shipping or tax costs.

I investigated the following change to separate coupon and gift certificate credit calculation:

def site_calculate_coupon_credit
  return 0 if order.line_items.empty?
  amount = adjustment_source.calculator.compute(order.line_items).abs
  order_total = adjustment_source.code.include?('giftcert-') ? order.item_total + order.charges.total : order.item_total
  amount = order_total if amount > order_total
  -1 * amount
end

After this change, I found that when arriving on the payment page where the gift certificate has covered the entire order including tax and shipping, the payment logic isn't set up handle orders with a total cost of 0. Additional customization on payment implementation, validation and checkout flow would be required to handle orders where gift certificates cover the entire cost. However, rather than implementing these additional customizations, our client was satisfied with the implementation where gift certs don't cover tax and shipping, so I did not pursue this further.

In the future, I'd recommend creating a new model for gift certificate and gift certificate credit management rather than combining the business logic with coupons, because:

  1. The coupon implementation in Spree doesn't have a whole lot to it. It uses several custom Spree calculators, has a backend CRUD interface, and credits are applied to orders. Grabbing the coupon implementation and copying and modifying it for gift certificates shouldn't be daunting.
  2. It will likely be more elegant to separate coupon logic from gift certificate logic. Coupons and gift certificates share a few business rules, but not all. Gift certificates traditionally apply to tax and shipping and multiple gift certificates can be used on one order (but this part can be configurable). Coupons may have more complex logic to apply to items and do not traditionally get applied to tax and shipping (however, in some cases a free shipping coupon may be needed that covers the cost of shipping only). Additionally, a big difference in business logic is that gift certificates should probably be treated as a payment, where checkout accepts gift certificates as a form of payment, and the backend provides reporting on the gift certificate driven payments. Rather than dirtying-up the the coupon logic with checks for gift certificates versus coupon behavior, it'll be more elegant to separate the logic into classes that address the individual business needs.

Besides "hindsight is 20/20", the takeaway for me here is that you have to understand business rules and requirements for coupon and gift certificate implementation in ecommerce, which can get tricky quickly. We were lucky because the client was satisfied with the resulting behavior of using the coupon architecture for gift certificates. Hopefully, the takeaway for someone not familiar with Spree is that gift certificate implementation might require things like functionality for creating gift certificates after checkout completion, decrementing the gift certificate after it's used, backend reporting to show gift certificates purchase and use and coding for the impact of gift certificate purchase on shipping.

Note that all of the changes described here apply to the latest stable version of Spree (0.11.0). After taking a look at the Spree edge code, I'll mention that there is a bit of an overhaul on coupons (to be called promotions). However, it looks many of the customizations described here would be needed for gift certificate implementation as the edge promotions still apply to item totals only and do not include any core modifications in accepting a credit as a payment.

Learn more about End Point's Ecommerce Development or Ruby on Rails Ecommerce Services.

Distributed Transactions and Two-Phase Commit

The typical example of a transaction involves Alice and Bob, and their bank. Alice pays Bob $100, and the bank needs to debit Alice and credit Bob. Easy enough, provided the server doesn't crash. But what happens if the bank debits Alice, and then before crediting Bob, the server goes down? Or what if they credit Bob first, and then try to debit Alice only to find she doesn't have enough funds? A transaction allows the debit and credit operations to happen as a package ("atomically" is the word commonly used), so either both operations happen or neither happens, even if the server crashes halfway through the transaction. That way the bank never credits Bob without debiting Alice, or vice versa.

That's simple enough, but the situation can become more complex. What if, for instance, for buzzword-compliance purposes, the bank has "sharded" its accounts database by splitting it in pieces and putting each piece on a different server (whether this is would be smart or not is outside the scope of this post). The typical transaction handles statements issued only for one database, so we can't wrap the debit and credit operations within a single BEGIN/COMMIT if Alice's account information lives on one server and Bob's lives on another.

Enter "distributed transactions". A distributed transaction allows applications to group multiple transaction-aware systems into a single transaction. These systems might be different databases, or they might include other systems such as message queues, in which case the transaction concept means a message would get delivered if and only if the rest of the transaction completed. So with a distributed transaction, the bank could debit Alice's account in one database and credit Bob's in another, atomically.

All this comes at some cost. Distributed transactions require a "transaction manager", an application which handles the special semantics required to commit a distributed transaction. Second, the systems involved must support "two-phase commit" (which was added to PostgreSQL in version 8.1). Distributed transactions are committed using PREPARE TRANSACTION 'foo' (phase 1), and COMMIT PREPARED 'foo' or ROLLBACK PREPARED 'foo' (phase 2), rather than the usual COMMIT or ROLLBACK.

The beginning of a distributed transaction looks just like any other transaction: the application issues a BEGIN statement (optional in PostgreSQL), followed by normal SQL statements. When the transaction manager is instructed to commit, it runs the first commit phase by saying "PREPARE TRANSACTION 'foo'" (where "foo" is some arbitrary identifier for this transaction) on each system involved in the distributed transaction. Each system does whatever it needs to do to determine whether or not this transaction can be committed and to make sure it can be committed even if the server crashes, and reports success or failure. If all systems succeed, the transaction manager follows up with "COMMIT PREPARED 'foo'", and if a system reports failure, the transaction manager can roll back all the other systems using either ROLLBACK (for those transactions it hasn't yet prepared), or "ROLLBACK PREPARED 'foo'". Using two-phase commit is obviously slower than committing transactions on only one database, but sometimes the data integrity it provides justifies the extra cost.

In PostgreSQL, two-phase commit is supported provided max_prepared_transactions is nonzero. A PREPARE TRANSACTION statement persists the current transaction to disk, and dissociates it from the current session. That way it can survive even if the database goes down. The current session no longer has an active transaction. However, the prepared transaction acts like any other open transaction in that all locks held by the prepared transaction remain held, and VACUUM cannot reclaim storage from that transaction. So it's not a good idea to leave prepared transactions open for a long time.

Distributed transactions are most common, it seems, in Java applications. Full J2EE application servers typically come with a transaction manager component. For my examples I'll use an open source, standalone transaction manager, called Bitronix. I'm not particularly fond of using Java for simple scripts, though, so I've used JRuby for this demonstration code.

This script uses two databases, which I've called "athos" and "porthos". Each has same schema, which provides a simple framework for the sharded bank example described above. This schema provides a table for account names, another for ledger information, and a simple trigger to raise an exception when a transaction would bring a person's balance below $0. I'll first populate athos with Alice's account information. She gets $200 to start. Bob will go in the porthos database, with no initial balance.

5432 josh@athos# insert into accounts values ('Alice');
INSERT 0 1
5432 josh@athos*# insert into ledger values ('Alice', 200);
INSERT 0 1
5432 josh@athos*# commit;
COMMIT5432 josh@athos# \c porthos
You are now connected to database "porthos".
5432 josh@porthos# insert into accounts values ('Bob');
INSERT 0 1
5432 josh@porthos*# commit;
COMMIT

Use of Bitronix is pretty straightforward. After setting up a few constants for easier typing, I create a Bitronix data source for each PostgreSQL database. Here I have to use the PostgreSQL JDBC driver's org.postgresql.xa.PGXADataSource class; "XA" is Java's protocol for two-phase commit, and requires JDBC driver support. Here's the code for setting up one data source; the other is just the same.

ds1 = PDS.new
ds1.set_class_name 'org.postgresql.xa.PGXADataSource'
ds1.set_unique_name 'pgsql1'
ds1.set_max_pool_size 3
ds1.get_driver_properties.set_property 'databaseName', 'athos'
ds1.get_driver_properties.set_property 'user', 'josh'
ds1.init

Then I simply get a connection from each data source, instantiate a Bitronix TransactionManager object, and begin a transaction.

c1 = ds1.get_connection
c2 = ds2.get_connection
btm = TxnSvc.get_transaction_manager
btm.begin

Within my transaction, I just use normal JDBC commands to debit Alice and credit Bob, after which I commit the transaction through the TransactionManager object. If this transaction fails, it raises an exception, which I can capture using Ruby's begin/rescue exception handling, and roll back the transaction.

begin
  s2 = c2.prepare_statement "INSERT INTO ledger VALUES ('Bob', 100)"
  s2.execute_update
  s2.close

  s1 = c1.prepare_statement "INSERT INTO ledger VALUES ('Alice', -100)"
  s1.execute_update
  s1.close

  btm.commit
  puts "Successfully committed"
rescue
  puts "Something bad happened: " + $!
  btm.rollback
end

When I run this, Bitronix gives me a bunch of output, which I haven't bothered to suppress, but among it all is the "Successfully committed" string I told it to print on success. Since Alice is debited $100 each time we run this, and she started with $200, we can run it twice before hitting errors. On the third time, we get this:

Something bad happened: org.postgresql.util.PSQLException: ERROR: Rejecting operation; account owner Alice's balance would drop below 0

This is our trigger firing, to tell us that we can't debit Alice any more. If I look in the two databases, I can see that everything worked as planned:

5432 josh@athos*# select get_balance('Alice');
 get_balance 
-------------
           0
(1 row)

5432 josh@athos*# \c porthos 
You are now connected to database "porthos".
5432 josh@porthos# select get_balance('Bob');
 get_balance 
-------------
         200
(1 row)

Remember I've run my script three times, but Bob has only been credited $200, because that's all Alice had to start with.

Efficiency of find -exec vs. find | xargs

This is a quick tip for anyone writing a cron job to purge large numbers of old files.

Without xargs, this is a pretty common way to do such a purge, in this case of all files older than 31 days:

find /path/to/junk/files -type f -mtime +31 -exec rm -f {} \;

But that executes rm once for every single file to be removed, which adds a ton of overhead just to fork and exec rm so many times. Even on modern operating systems that are so efficient with fork, it can easily increase the I/O and load and runtime by 10 times or more than just running a single rm command with a lot of file arguments.

Instead do this:

find /path/to/junk/files -type f -mtime +31 -print0 | xargs -0 -r rm -f

That will run xargs once for each very long list of files to be removed, so the overhead of fork & exec is incurred very rarely, and the job can spend most of its effort actually unlinking files. (The xargs -r option says not to run the command if there is no input to xargs.)

How long can the argument list to xargs be? It depends on the system, but xargs --show-limits will tell us. Here's output from a RHEL 5 x86_64 system (using findutils 4.2.27):

% xargs --show-limits                                                                                                   
Your environment variables take up 2293 bytes                                                                                        
POSIX lower and upper limits on argument length: 2048, 129024                                                                        
Maximum length of command we could actually use: 126731                                                                              
Size of command buffer we are actually using: 126731

The numbers are similar on Debian Etch and Lenny.

And here's output from an Ubuntu 10.04 x86_64 system (using findutils 4.4.2):

% xargs --show-limits
Your environment variables take up 1370 bytes
POSIX upper limit on argument length (this system): 2093734
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2092364
Size of command buffer we are actually using: 131072

Roughly 2 megabytes of arguments is a lot. But even the POSIX minimum of 4 kB is a lot better than processing one file at a time.

It doesn't usually make much of a difference, but we can tune even more. Make sure the maximum number of files is processed at one time by first changing to the base directory so that the relative pathnames are shorter:

cd /path/to/junk/files && find . -type f -mtime +31 -print0 | xargs -0 -r rm -f

That way each file argument is shorter, e.g. ./junkfile compared to /path/to/junk/files/junkfile.

The above assumes you're using GNU findutils, which includes find -print0 and xargs -0 for processing ASCII NUL-delimited filenames for safety when filenames include embedded spaces, newlines, etc.

PostgreSQL: per-version .psqlrc

File this under "you learn something new every day." I came across this little tidbit while browsing the source code for psql: you can have a per-version .psqlrc file which will be executed only by the psql associated with that major version. Just name the file .psqlrc-$version, substituting the major version for the $version token. So for PostgreSQL 8.4.4, it would look for a file named .psqlrc-8.4.4 in your $HOME directory.

It's worth noting that the version-specific .psqlrc file requires the full minor version, so you cannot currently define (say) an 8.4-only version which applies to all 8.4 psqls. I don't know if this feature gets enough mileage to make said modification worth it, but it would be easy enough to just use a symlink from the .psqlrc-$majorversion to the specific .psqlrc file with minor version.

This seems of most interest to developers, who may simultaneously run many versions of psql which may have incompatible settings, but also could come in handy to regular users as well.

PostgreSQL: Dynamic SQL Function

Sometimes when you're doing something in SQL, you find yourself doing something repetitive, which naturally lends itself to the desire to abstract out the boring parts. This pattern is often prevalent when doing maintenance-related tasks such as creating or otherwise modifying DDL in a systematic kind of way. If you've ever thought, "Hey, I could write a query to handle this," then you're probably looking for dynamic SQL.

The standard approach to using dynamic SQL in PostgreSQL is plpgsql's EXECUTE function, which takes a text argument as the SQL statement to execute. One technique fairly well-known on the #postgresql IRC channel is to create a function which essentially wraps the EXECUTE statement, commonly known as exec(). Here is the definition of exec():

CREATE FUNCTION exec(text) RETURNS text AS $$ BEGIN EXECUTE $1; RETURN $1; END $$ LANGUAGE plpgsql;

Using exec() then takes the form of a SELECT query with the appropriately generated query to be executed passed as the sole argument. We return the generated query text as an ease in auditing the actually executed results. Some examples:

SELECT exec('CREATE TABLE partition_' || generate_series(1,100) || ' (LIKE original_table)');
SELECT exec('ALTER TABLE ' || quote_identifier(attrelid::regclass) || ' DROP COLUMN foo') FROM pg_attribute WHERE attname = 'foo';

Some notes about the exec() function: since the generated SQL statement is being run inside a function, it is not run in a top-level transaction, so some commands will not work, including CREATE/DROP DATABASE, ALTER TABLESPACE, VACUUM, etc.

Starting in PostgreSQL 9.0, the plpgsql language will be pre-installed in all new databases, which will make this recipe even easier to use.

Localize $@ in DESTROY

I have been conditioned now for many years in Perl to trust the relationship of $@ to its preceding eval. The relationship goes something like this: if you have string or block eval, immediately after its execution, $@ will either be false or it will contain the die message of that eval (or the generic "Died at ..." message if none is provided). Implicit here is that evals contained within an eval have their effects on $@ concealed, unless the containing eval "passes on" the inner eval's die.

To quickly demonstrate:

use strict;
use warnings;

eval {
    print "Some stuff\n";
    eval {
        die 'Oops. Bad inner eval';
    };

    printf '$@ in outer eval: %s', $@;
};

printf '$@ after outer eval: %s', $@;
print $/;

produces the following output:

[mark@sokt ~]$ perl demo.pl 
Some stuff
$@ in outer eval: Oops. Bad inner eval at demo.pl line 7.
$@ after outer eval: 
[mark@sokt ~]$ 

Only if the containing eval itself dies do we find any data in $@:

use strict;
use warnings;

eval {
    print "Some stuff\n";
    eval {
        die 'Oops. Bad inner eval';
    };

    printf '$@ in outer eval: %s', $@;
    die 'Uh oh. Bad outer eval, too';
};

printf '$@ after outer eval: %s', $@;
print $/;

which produces:

[mark@sokt ~]$ perl demo.pl 
Some stuff
$@ in outer eval: Oops. Bad inner eval at demo.pl line 7.
$@ after outer eval: Uh oh. Bad outer eval, too at demo.pl line 11.

[mark@sokt ~]$ 

Why am I covering this, well known to any serious Perl programmer? Because I was caught off guard troubleshooting for a client last week when the result of an inner eval "leaked" through, affecting $@ of the containing eval. Because I was so conditioned to the stated relationship between eval and $@, it took me quite some time before I even opened up to the possibility.

It turned out the hitch had to do with garbage collection. The key was that the inner eval in question was initially called from a routine within an object's DESTROY method. As I discovered, at least in Perl 5.10, if a containing eval dies, causing an object to go out of scope, and that object's DESTROY itself executes an eval, $@ reflects the exit condition of the eval from within DESTROY, and not that of the containing eval. Even more strange, this is only true if the containing eval dies. If instead the containing eval completes, then that same dying eval within DESTROY does not affect the condition of $@ after the containing eval. It will still be false, as (IMO) it should be.

So, some code demonstrating each situation. We have 3 conditions:

  • Containing eval dies, eval within DESTROY dies
  • Containing eval dies, eval within DESTROY does not die
  • Containing eval does not die, and eval is called within DESTROY, die or not.

Sample code demonstrating 1st condition:

use strict;
use warnings;

package Obj;

sub DESTROY {
    eval { die 'in DESTROY' };
}

package main;

eval {
    my $obj = {};

    bless $obj, 'Obj';
    die 'in main eval';

    print "Super important stuff that must finish or we really need to know about it!\n";

    return 1;
};

if ($@) {
    printf '$@ comes from code %s', $@;
}
else {
    print "Happy days! Our eval code ran to completion. Woot!\n";
}

Output as follows:

[mark@sokt ~]$ perl test1.pl 
$@ comes from code in DESTROY at test1.pl line 7.
[mark@sokt ~]$ 

Demo of 2nd condition:

use strict;
use warnings;

package Obj;

sub DESTROY {
    eval { 1 };
}

package main;

eval {
    my $obj = {};

    bless $obj, 'Obj';
    die 'in main eval';

    print "Super important stuff that must finish or we really need to know about it!\n";

    return 1;
};

if ($@) {
    printf '$@ comes from code %s', $@;
}
else {
    print "Happy days! Our eval code ran to completion. Woot!\n";
}

Output as follows:

[mark@sokt ~]$ perl test2.pl 
Happy days! Our eval code ran to completion. Woot!
[mark@sokt ~]$ 

Notice how particularly insidious the above is. We not only don't know what the error was from the eval block that immediately precedes the evaluation of $@, but we actually think it succeeds!

Finally, the 3rd condition:

use strict;
use warnings;

package Obj;

sub DESTROY {
    eval { die 'in DESTROY' };
}

package main;

eval {
    my $obj = {};

    bless $obj, 'Obj';

    print "Super important stuff that must finish or we really need to know about it!\n";

    return 1;
};

if ($@) {
    printf '$@ comes from code %s', $@;
}
else {
    print "Happy days! Our eval code ran to completion. Woot!\n";
}

Output as follows:

[mark@sokt ~]$ perl test3.pl 
Super important stuff that must finish or we really need to know about it!
Happy days! Our eval code ran to completion. Woot!
[mark@sokt ~]$ 

So, fortunately, case 3 contains the leak when the outer eval completes successfully. We don't introduce the worst possible situation: a successful eval that is subsequently treated as a failure. However, cases 1, and especially 2, are bad enough.

Now that I know all this, the solution is thankfully simple. When constructing objects, if they include a supplied DESTROY, always localize $@. It doesn't matter whether I execute any evals or not; if the code calls any other routines that do, anywhere in the stack, the problem is introduced. A local $@ provides full protection.

A rerun of test1 but with localization provides a much more expected result:

use strict;
use warnings;

package Obj;

sub DESTROY {
    local $@;
    eval { die 'in DESTROY' };
}

package main;

eval {
    my $obj = {};

    bless $obj, 'Obj';
    die 'in main eval';

    print "Super important stuff that must finish or we really need to know about it!\n";

    return 1;
};

if ($@) {
    printf '$@ comes from code %s', $@;
}
else {
    print "Happy days! Our eval code ran to completion. Woot!\n";
}

Output as follows:

[mark@sokt ~]$ perl test1.pl 
$@ comes from code in main eval at test1.pl line 17.
[mark@sokt ~]$ 

and test2, which now doesn't lie to us about the success of the eval of interest:

use strict;
use warnings;

package Obj;

sub DESTROY {
    local $@;
    eval { 1 };
}

package main;

eval {
    my $obj = {};

    bless $obj, 'Obj';
    die 'in main eval';

    print "Super important stuff that must finish or we really need to know about it!\n";

    return 1;
};

if ($@) {
    printf '$@ comes from code %s', $@;
}
else {
    print "Happy days! Our eval code ran to completion. Woot!\n";
}

Output as follows:

[mark@sokt ~]$ perl test2.pl 
$@ comes from code in main eval at test2.pl line 17.
[mark@sokt ~]$ 

PostgreSQL: Migration Support Checklist

A database migration (be it from some other database to PostgreSQL, or even from an older version of PostgreSQL to a nice shiny new one) can be a complicated procedure with many details and many moving parts. I've found it helpful to construct a list of questions in order to make sure that you're considering all aspects of the migrations and gauge the scope of what will be involved. This list includes questions we ask our clients; feel free to contribute your own additional considerations or suggestions.

Technical questions:

  1. Database servers: How many database servers do you have? For each, what are the basic system specifications (OS, CPU architecture, 32- vs 64-bit, RAM, disk, etc)? What kind of storage are you using for the existing database, and what do you plan to use for the new database? Direct-attached storage (SAS, SATA, etc.), SAN (what vendor?), or other? Do you use any configuration management system such as Puppet, Chef, etc.?
  2. Application servers and other remote access: How many application servers do you have? For each, what are the basic system specifications (OS, CPU architecture, 32- vs 64-bit, RAM, disk, etc)? Do you use any configuration management system such as Puppet, Chef, etc.? What other network considerations are there? Is ODBC used, or SSL transport, any VPNs? Are multiple datacenters involved? How about egress/ingress firewalls?
  3. Middleware: Do you currently use any sort of connection pooling, load balancing, or other middleware between your application and database servers?
  4. Data needs: Can you describe your data access patterns? i.e., is the majority of your data historical and rarely accessed? Are there any existing reporting needs that will need to be duplicated on the PostgreSQL system? Do you already have reports of database usage, including traffic levels, frequent or intensive queries, etc?
  5. Size: What kind of transaction volume do you see? How large are your databases? How many tables do you have and what is the size of the larger ones? How many users or database connections will you need to support?
  6. Backups: What are your current backup policies/procedures? How will these need to change with the move to PostgreSQL?
  7. Replication/load balancing: What kind of system redundancy do you currently have/need? Do you have any kind of database load-balancing or master-slave replication?
  8. Monitoring: What is the current monitoring/in-house support infrastructure? What needs to be duplicated, and can any portion of this facility be reused?
  9. Interfaces: What language are your applications written in, and what drivers exist to connect to your current database? Will there be a compatible driver available in your language of choice in order?
  10. Extensions: Are you currently using any in-database procedures or functionality (i.e., in PL/SQL or another embedded language of choice)? If so, how many? What will the difficulty be in porting these functions to PostgreSQL?

And a couple of business-related questions:

  1. Scheduling: What is the timeframe for transition? When can appropriate downtime be scheduled? How much database downtime can you afford?
  2. Staffing: Do you currently have in-house DBAs to manage the servers, etc on a day-to-day basis? Is there anyone with PostgreSQL experience or familiarity on staff?

Being able to answer all of these questions is critical to formulating a migration plan and carrying out a migration successfully.

Particularly with the impending (July 2010) end of life for previous PostgreSQL releases 7.4, 8.0 and (in November 2010) 8.1, a database migration may be on your radar. End Point is one of many professional PostgreSQL support companies who would be happy to assist you in your transition.

Spree: Working with Sample Product Data

It's taken me a bit of time to gain a better understanding of working with Spree sample data or fixtures, but now that I am comfortable with it I thought I'd share some details. The first thing you might wonder is why should you even care about sample data? Well, in our project, we had a few motivations for creating sample data:

  1. Multiple developers, consistent sample data provides consistency during development. End Point offers SpreeCamps, a hosting solution that combines the open source Spree technology with devcamps to allow multiple development and staging instances of a Spree application. In a recent project, we had a two developers working on different aspects of the custom application in SpreeCamps; creating meaningful sample data allowed each developer to work from the same data starting point.
  2. Unit testing. Another important element of our project includes adding unit tests to test our custom functionality. Consistent test sample data gave us the ability to test individual methods and functionality with confidence.
  3. Application testing. In addition to unit testing, adding sample data gives the ability to efficiently test the application repeatedly with fresh sample data.

Throughout development, our standard practice is to repeatedly run rake db:bootstrap SKIP_CORE=1 AUTO_ACCEPT=1. Running bootstrap with these arguments will not create Spree's core sample data set, but it will set the core's default data that includes some base zones, zone members, countries, states, and roles, data that is essential for the application to work.

Product Data Model


Product data relationship in Spree

The first data I create is the product data. As you can see from the image above, products data may have relationships with other tables including option_types, option_values, and taxons, or the tables that create has and belongs to many relationships between products and these elements.

The most simple form of sample data might include one test product and its master variant, shown below. If you do not define a master variant for a product, the product page will crash, as a master variant is required to display the product price.

products.yml
test_product:
  id: 1 
  name: Test Product
  description: Lorem ipsum...
  available_on: <%= Time.zone.now.to_s(:db) %>
  count_on_hand: 10
  permalink: test-product
variants.yml
test_variant:
  product: test_product
  price: 10.00
  cost_price: 5.00
  count_on_hand: 10
  is_master: true
  sku: 1-master

Option Types and Option Values

To expand on this, you might be interested in adding option types and values to allow for sizes to be assigned to this variant, shown below. The option type and option value data structure provides a flexible architecture for creating product variants, or multiple varieties of a single product such as different sizes, colors, or combinations of these option values.

option_types.yml
size:
  name: size
  presentation: Size
option_values.yml
small:
  name: Small
  presentation: Small
  option_type: size
large:
  name: Large
  presentation: Large
  option_type: size
product_option_types.yml
test_product_size:
  product: test_product
  option_type: size
variants.yml
# ...  master variant
small_variant:
  product: test_product
  option_values: small
  price: 10.00
  cost_price: 5.00
  count_on_hand: 10
  sku: 1-small
large_variant:
  product: test_product
  option_values: large
  price: 20.00
  cost_price: 10.00
  count_on_hand: 10
  sku: 1-large

Taxonomies and Taxons

Another opportunity for expansion on sample product data is the taxonomy structure, which is very flexible. A root taxonomy can be thought of as a tree trunk with branches; products can be assigned to any number of branches. If we assume you have multiple test products, you might set up the following test data:

taxonomies.yml
category:
  name: Category
brand:
  name: Brand
taxons.yml
category_root:
  id: 1
  name: Category
  taxonomy: category
  permalink: c/
jackets:
  id: 2
  name: Jackets
  taxonomy: category_root
  permalink: c/jackets/
  parent_id: 1
  products: test_product, test_product2, test_product3
pants:
  id: 3
  name: Pants
  taxonomy: category_root
  permalink: c/pants/
  parent_id: 1
  products: test_product4, test_product5, test_product6
brand_root:
  id: 4
  name: Brand
  taxonomy: brand
  permalink: b/
brand_one:
  id: 5
  name: Brand One
  taxonomy: brand
  permalink: b/brand-one/
  parent_id: 4
  products: test_product, test_product3, test_product5
brand_two:
  id: 6
  name: Brand Two
  taxonomy: brand
  permalink: b/brand-two/
  parent_id: 4
  products: test_product2, test_product4, test_product6

I also needed to include the taxons.rb (used in the Spree core sample data) to assign products to taxons correctly.

taxons.rb
Taxon.rebuild!
Taxon.all.each{|t| t.send(:set_permalink); t.save}


Example taxonomies created with Spree sample data

Product Images

My last step in creating Spree sample data is to add product images. I've typically added the image via the Spree backend first, and then copied the images my site extension directory.

# upload Spree images

# create sample image directory
mkdir RAILS_ROOT/vendor/extensions/site/lib/tasks/sample/

# copy the uploaded images to the sample image directory
cp -r RAILS_ROOT/public/assets/products/ RAILS_ROOT/vendor/extensions/site/lib/tasks/sample/

After uploading and copying the images over, I include the image information in assets.yml. The ID for each asset must be equal to the directory containing the multiple image sizes. For example, the directory RAILS_ROOT/vendor/extensions/site/lib/tasks/sample/1/ contains directories original, large, product, small, and mini with images sized respectively.

assets.yml
i1:
  id: 1
  viewable: test_product
  viewable_type: Product
  attachment_content_type: image/jpg
  attachment_file_name: blue_sky.jpg
  attachment_width: 1024
  attachment_height: 683
  type: Image
  position: 1

And finally, I use a modified version of the Spree's core products.rb file to copy over product images during bootstrap:

products.rb
require 'find'
# make sure the product images directory exists
FileUtils.mkdir_p "#{RAILS_ROOT}/public/assets/products/"

# make product images available to the app
target = "#{RAILS_ROOT}/public/assets/products/"
source = "#{RAILS_ROOT}/vendor/extensions/site/lib/tasks/sample/products/"

Find.find(source) do |f|
  # omit hidden directories (SVN, etc.)
  if File.basename(f) =~ /^[.]/
    Find.prune
    next
  end

  src_path = source + f.sub(source, '')
  target_path = target + f.sub(source, '')

  if File.directory?(f)
    FileUtils.mkdir_p target_path
  else
    FileUtils.cp src_path, target_path
  end
end

With my sample data defined in my Spree site extension, I run rake db:bootstrap SKIP_CORE=1 AUTO_ACCEPT=1 to create the above products, variants, and taxonomy structure. I commit my changes to the git repository, and other developers can work with the same set of products, variants, taxonomies including product images. During development, I also add unit tests to test model methods that interact with our sample data. An alternative to setting up Spree sample data described in this article is to dump entire databases and reimport them and manage the sample images manually, but I find that the approach described here forces you to understand the Spree data model better.

Sample product image created with the sample data above.

In addition to setting up sample product data, I've worked through creating sample orders, shipping configuration, and tax configuration. I hope to discuss these adventures in the future.

Learn more about End Point's Ecommerce Development or Ruby on Rails Ecommerce Services.

Why is my load average so high?

One of the most common ways people notice there's a problem with their server is when Nagios, or some other monitoring tool, starts complaining about a high load average. Unfortunately this complaint carries with it very little information about what might be causing the problem. But there are ways around that. On Linux, where I spend most of my time, the load average represents the average number of process in either the "run" or "uninterruptible sleep" states. This code snippet will display all such processes, including their process ID and parent process ID, current state, and the process command line:

#!/bin/sh

ps -eo pid,ppid,state,cmd |\
    awk '$3 ~ /[RD]/ { print $0 }'

Most of the time, this script has simply confirmed what I already anticipated, such as, "PostgreSQL is trying to service 20 times as many simultaneous queries as normal." On occasion, however, it's very useful, such as when it points out that a backup job is running far longer than normal, or when it finds lots of "[pdflush]" operations in process, indicating that the system was working overtime to write dirty pages to disk. I hope it can be similarly useful to others.

Views across many similar tables

An application I'm working on has a host of (a dozen or so) status tables, each containing various rows that reflect the state of associated rows in other tables. For instance:

Table "public.inventory"
...
status_code      | character varying(50)       | not null

Table "public.inventory_statuses"
code          | character varying(50)       | not null
display_label | character varying(70)       | not null

SELECT * FROM inventory_statuses;

  code    | display_label
-----------+---------------
ordered   | Ordered
shipped   | Shipped
returned  | Returned
repaired  | Repaired
etc.

Several of the codes are common to several tables. For instance, "void" is a status that occurs in seven tables. The application cares about this; there are code-level triggers that will respond to a change of status to "void" in one table, and pass that information along to another table higher up the chain.

Since I wasn't present at the birth of the system (nor do I have unlimited memory to keep 180+ codes in my head), I needed a way to answer the question, "In which table(s) does status 'foo' occur?" This was made rather easier by attention to detail early on: each of the status tables was named "*_statuses"; each primary key was named "code"; and each human-readable description field was named "display_label". I wrote a Pl/PgSQL function to create a view spanning all the tables. (I could have just created the SQL by hand, but I wanted a way to reproduce this effort later, if tables are added, dropped, or modified.)

CREATE FUNCTION create_all_statuses()
RETURNS VOID
LANGUAGE 'plpgsql'
AS $$
DECLARE
   stmt TEXT;
   tbl RECORD;
BEGIN
   stmt := '';
   FOR tbl IN EXECUTE $SQL$
SELECT DISTINCT table_name
FROM information_schema.columns a
JOIN information_schema.columns b
USING (table_name)
JOIN information_schema.tables t
USING (table_name)
WHERE a.column_name = 'code'
AND   b.column_name = 'display_label'
AND   table_name ~ '_statuses$'
AND   t.table_type  = 'BASE TABLE'
$SQL$
   LOOP
       IF (LENGTH(stmt) > 0)
       THEN
           stmt := stmt || ' UNION ';
       END IF;
       stmt := stmt || 'SELECT code, display_label, ' ||
           quote_literal(tbl.table_name) ||
           ' AS table_name FROM ' ||
           quote_ident(tbl.table_name);
   END LOOP;

   EXECUTE 'CREATE VIEW all_statuses AS ' || stmt;
   RETURN;
END;
$$;
Now it's easy to answer the question:
select * from all_statuses where code = 'void';

code | display_label |              table_name
------+---------------+--------------------------------------
void | Void          | inventory_statuses
void | Void          | parcel_statuses
void | Void          | pick_list_statuses
etc.

If your database uses boilerplate columns such as "last_modified" or "date_created" to record timestamps on rows, you could use similar logic to create a view that would tell you which tables were the most recently modified.

Spree vs Magento: Feature List Revisited

A little over a month ago, I wrote an article on Spree vs Magento Features. Recently, a client asked me to describe the features mentioned in that article. I thought this was another great opportunity to expand on my response to the client. So, here I am, revisiting ecommerce features in Spree and Magento. The original article can be referenced to compare availability of these features in Spree and Magento.

Features on a Single Product or Group of Product

  • Product reviews and/or ratings: functionality to allow customers to review and rate products. See a Backcountry.com product page for an example.
  • Product QnA: functionality allow customers to ask and answer questions on products. See a Backcountry.com product page for an example.
  • Product SEO (URL, title, meta data control): functionality to allow site administrators to manage product URLs, product page titles, and product meta data.
  • Advanced/flexible taxonomy: functionality to build a custom taxonomy / navigation structure for product browsing. For example, build multiple categories and subcategories with complex hierarchy. The taxonomy at Spree's demo includes two categories of brand and category and subcategories in each.
  • SEO for taxonomy pages: functionality to allow site administrators to manage taxonomy URLs, taxonomy page titles, and taxonomy meta data.
  • Configurable product search: functionality to allow the developers and site administrators to adjust parameters used in search, such as products to show per page, and to show products with no on hand stock.
  • Bundled products for discount: functionality to allow site administrators to create bundles or group products together and then apply a discount to the entire bundle. For example, product X, Y and Z purchased together will yield a $10.00 discount.
  • Recently viewed products: functionality to show customers products they recently visited. This can be displayed on other product pages or the navigation pages to aid in navigation back to those products if the user would like to revisit the products. See the image below for an example of this functionality in action at Paper Source.
  • Soft product support/downloads: functionality to sell soft products such as mp3 or pdf files.
  • Product comparison: functionality to allow customers to compare multiple products, such as a comparison of price or technical features. See Backcountry.com for an example.
  • Upsell: functionality to encourage the customer to purchase a similar product from a higher price point, or to purchase an add on, may or may not include the functionality to allow site administrators to manage the upsell products.
  • Cross sell: functionality to encourage the customer to purchase related items, may or may not include the functionality to allow site administrators to manage the cross sell products. See the image below for an example of this functionality at Backcountry.com after adding an item to the cart.
  • Related items: functionality to display related items on product pages, may or may not include the functionality to allow site administrators to manage the related products.
  • RSS feed of products: functionality to produce a RSS feed with product detail releases. steepandcheap.com offers a RSS feed, however, they are unique in that they offer a one deal at a time business model, so the RSS feed contains a stream of products for sale. This may or may not include the functionality to allow site administrators to manage the RSS feed contents.
  • Multiple images per product: functionality to allow site administrators to upload multiple products per image. See a Spree demo product for an example.
  • Product option selection (variants): functionality to allow site administrators to create and manage variants for products to offer multiple variants per product, such as variants by size and color. See a Spree demo product for an example.
  • Wish list: functionality to allow customers to create product wish lists. See Amazon.com for a description of their wish list functionality.
  • Send product email to friend: functionality to allow customers to send emails to their friend to visit a specific product.
  • Product tagging / search by tagging: functionality to allow site administrators to assign tags to products for navigation or searching. See CityPass's blog for an example of tag use in a content management system; in ecommerce context, the tags would navigate to a set of products instead of a set of blog articles.
  • Breadcrumbs: functionality that renders the product navigation hierarchy on navigation and product pages to allow customers to navigate to previous pages visited. See a Spree demo navigation page for an example.

CMS Features

  • Blogging functionality: functionality to allow site administrators to create, manage and display blog articles.
  • Static page management: functionality to allow site administrators to create, manage, and display static pages such as “About Us”, “Information”.
  • Media management: functionality to allow site administrators to create, manage, and display media such as images, video, audio. See the image below for an example of WordPress's dashboard for content management.
  • Contact us form: functionality to allow customers to submit a request for contact. See End Point's contact page for an example.
  • Polls: functionality to allow site administrators to create and manage basic polls, functionality to allow customers to submit answers to basic polls.

Checkout Support

  • One page checkout: functionality to allow customers to complete checkout on one page, rather than move forward through checkout through multiple address, payment pages. See Paper Source or Backcountry.com's checkout processes for examples.
  • Guest checkout: functionality to checkout without creating a user account. Checkout at the Spree demo without being logged in for an example.
  • SSL support: functionality to configure use of SSL during checkout. In Spree's case, the site administrator may turn SSL off during development and on during production.
  • Discounts: functionality to allow customers to apply discount coupons to orders for a percentage or dollar amount reduction.
  • Gift certificates: functionality to allow customers to purchase and use gift certificates as credit for purchases.
  • Saved shopping cart: the functionality to save the products in a customers shopping cart so their shopping cart will be pre-populated on their next visit.
  • Saved addresses: functionality to allow customers to create and manage addresses to be selected during checkout for billing or shipping rather than requiring the customer to re-enter their address. See the image below for an example of using saved addresses during Backcountry.com's checkout.

Shipping Support

  • Real time rate lookup (UPS, USPS, FedEx): the functionality to request rates from UPS, USPS, or FedEx during checkout for more accurate rate pricing rather than using a flat shipping rate.
  • Order tracking: functionality to allow the site administrators to enter order tracking information and allow the user to review that tracking information, may or may not include sharing this information in a “Your order has been shipped” email.
  • Multiple shipments per order: functionality to allow site administrators to split orders into multiple packages if specific products can not be shipped at the same time or can not be shipped together. See the image below for an example of Spree's backend shipping interface.
  • Complex rate lookup: functionality to calculate ship rates based on weight or price.
  • Free shipping: functionality to offer free shipping.

Payment Support

  • Multiple payment gateways: integration of multiple payment gateways such as Authorize.NET, Beanstream, Paypal, SagePay, etc.
  • Authorize.Net: integration of the Authorize.Net payment gateway; may or may include the use of profiles (Authorize.Net CIM).
  • Authorize and capture versus authorize only: functionality to allow site administrators to configure whether or not credit cards should be authorized only during checkout completion or authorized and captured. If the credit card is authorized only, site administrators may finalize an order by capturing on the backend interface.
  • Google Checkout: integration of Google Checkout.
  • Paypal Express: integration of Paypal Express.

Admin Features

  • Sales reporting: the functionality to display sales statistics, such as profits on sales or year-over-year sales.
  • Sales management tools: functionality to allow site administrators to create and manage product sales. For example, the site administrator might create a 50% off sale to cover 25% of the products to begin in a week and end in two weeks.
  • Inventory management: functionality to allow the site administrator to manage individual inventory units and their current state (on_hand, shipped, backordered) and order assignment.
  • Purchase order management: functionality to allow the site administrator to create and manage purchase orders. See the image below for an example of a potential backend interface for purchase orders in Spree.
  • Multi-tier pricing for quantity discounts: functionality to allow customers to buy large quantities of products at a discount, the functionality to allow the site administrator to manage the large quantity product discounts.
  • Landing page tool: functionality to create custom landing pages that may include targeted content or products, typically used for advertising or marketing.
  • Batch import and export of products: functionality to allow the site administrator to import and export products via admin interface or script rather than entering each product individually.
  • Multiple sales reports: See "Sales reporting" above.
  • Order fulfillment: functionality to allow site administrators to manage fulfillment (inventory selection, shipping) of orders.
  • Tax Rate Management: functionality to manage tax rates per zone, where zones are defined by states and/or countries. Note that in Spree, zones can only be defined by a combination of states and countries and tax rates can be tied to one or more zones.

User Account Features

  • User addresses: See "Saved addresses" above.
  • Feature rich user preferences: integration of various user account tools, such as address management, profile management, order review, etc.
  • Order tracking history: functionality to allow a customer to lookup their order history, may or may not include order tracking information.

Site Wide Features

  • Extensibility: functionality to extend the ecommerce core with modular components.
  • Appearance Theming: functionality to change the appearance of the site.
  • Ability to customize appearance at category or browsing level: functionality to create and manage custom and varied appearances for product browsing pages. For example, the categories "Jackets" and "Pants" may have different appearances, motivated by marketing or advertising.
  • Localization: the functionality to translate the ecommerce site to a different language. See the image below for a small example of localization in action in Spree.
  • Multi-store, single admin support: functionality to manage multiple stores from a single administrative location. An example of this might include http://store1.endpoint.com/, and http://store2.endpoint.com/, where both can be managed at http://admin.endpoint.com/.
  • Support for multiple currencies: functionality to translate product prices between currencies.
  • Web service API: functionality to retrieve data from the ecommerce application for third party use. See Spree's documentation on the Spree API.
  • System wide SEO: general site-wide SEO functionality including features such as sitemap, googlebase integration, URL management, page title management.
  • Google Analytics: functionality to allow site administrators to create and manage Google Analytics Ids, functionality to track traffic and conversion on the frontend.
  • Active community: an active developer community with frequent core and extension contributions.

Most of the features described above are well known to ecommerce developers, but this list might also serve as a good checklist to review with a potential client during the estimate process to make sure expectations of an ecommerce platform are managed, especially with a young platform such as Spree where some features are not yet included in the core.

Learn more about End Point's Ecommerce Development or Ruby on Rails Ecommerce Services.

Mock Testing with Perl

I'll start by saying that I probably should have started with Test::MockObject and saved myself all of this trouble. But sometimes things don't work out that way.

So, I'm building unit tests in Perl the hard way. By the hard way, I mean that I am constructing ever more elaborate, interdependent, complex, and brittle test data sets to test the functions that I am hacking on. The data model is moderately complex, so there really isn't any way around it (since I'm doing it the hard way, after all).

At one point, one function (which I am not testing) returns a result that I need for the function I am testing. The problem is that it reaches pretty far away into a section of the data model that I'd rather not set up test data for at the moment just to get that one value. This is where I'm sitting there wishing I had mock objects more than usual, since this would be a perfect place to mock the method. Since I couldn't be bothered to see if someone had written such a handy module, I looked for a hard way to do it. Turns out that there is one.

It's not actually hard, but it could be considered complex if you are not familiar with typeglobs and the workings of the symbol table in Perl. A good discussion can be found in the Perl Cookbook in ch10.14.

In the following example, the function Base::Shipping::Package::weight is called at some point in create_shipment. Being able to call it is imperative to completing create_shipment. In my case, I have to have a successful result from create_shipment in order to test process_shipment.

{
  local *Base::Shipping::Package::weight = \&test_weight
  my $shipment = $class->create_shipment($shipment);
  my $result = $class->process_shipment($shipment);
  test $result;
}

sub test_weight { 4.0 }

In here then, the local call redefines the weight function inside the scope of the block. This turns out to be fairly convenient given that I already had the structure in place to test things this way. There are possibly other cases where something like this might make more sense than using Test::MockObject in the first place although I am somewhat skeptical.