End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

RPM building: Fedora's _sharedstatedir

When Red Hat Enterprise Linux does not offer packages that we need, EPEL (Extra Packages for Enterprise Linux) often has what we want, kept compatible with RHEL. When EPEL also doesn't have a package, or we need a newer release than is offered, we rebuild packages from Fedora, which has consistently high-quality packages even in its "rawhide" development phase. We then distribute our packages in several compatibility-oriented Yum repositories at packages.endpoint.com.

Of course some things in the latest Fedora are not compatible with RHEL. In rebuilding the logcheck package (needed as a dependency for another package), I found that Fedora RPM spec files have begun using the _sharedstatedir macro in /usr/lib/rpm/macros, which RHEL has never used before.

On RHEL that macro has been set to /usr/com, a strange nonexistent path that apparently came from the GNU autoconf tools but wasn't used in RHEL. Now in Fedora the macro is set to /var/lib and is being used, as described in a Fedora wiki page on packaging.

The easiest and most compatible way to make the change without munging the system- or user-wide RPM macros is to add this definition to the top of the spec file where it's needed:

%define _sharedstatedir /var/lib

And then the RPM build is happy.

In related news, alongside the new logcheck package, there are also new End Point RHEL 5 x86_64 packages for the brand-new Git 1.7.7.1 and pbzip2 1.1.5, the multi-CPU core parallel compressor that has had several bugfix releases this year.

Using the new version of imapfilter with mutt


Image by Flickr user p886

My beloved mutt/imapfilter combo recently stopped working after an operating system switch. (tl;dr: that combo rocks; use ipairs instead of pairs) When my laptop wireless stopped working, and after spending some time fighting with it, I decided to simply install a new OS. As all of my important data is on a separate partition, this was not that big a deal. I ended up using Scientific Linux, as I'd heard good things about it, and it was one of the few distros that actually would install on my laptop (failures for one reason or another: Fedora, FreeBSD, Ubuntu, and OpenBSD). After the install, I simply copied my ~/.mutt directory and ~/.muttrc file into place, and similarly copied my ~/.imapfilter directory, which contained the all important config.lua file. The imapfilter program itself was not available via the normal yum repositories, so I simply grabbed the latest and greatest and did a manual install:

$ git clone https://github.com/lefcha/imapfilter.git
$ cd imapfilter
$ sudo yum install gcc lua-devel openssl-devel pcre-devel
$ make
$ sudo make install

I've used a lot of email clients over the years (and may have been using email longer than most people reading this). I started out (because that's all there was) with non-graphical clients such as mail, pine, elm, and mutt. Over the years I also tried out many graphical clients, such as Evolution, Kmail, Eudora, Thunderbird, and Claws Mail. However, nothing ever worked quite right, so I eventually ended up back with mutt, and have been happy with it ever since. The one drawback (or strength) of mutt is its single-mindedness. It does email very well, but lets other tools handle the ancillary tasks. One of those tasks is filtering, and that's where imapfilter comes in. I like to view all email that comes in, so mutt generally runs with my INBOX open. I scan through the items, marking them urgent if I need to keep them around, and deleting them if they are obvious trash. As needed, I'll kick off a imapfilter run, which then puts all my read, non-urgent, non-deleted email into the appropriate IMAP folders for me (mutt is even smart enough to realize that the folder was externally changed by imapfilter).

So I tried running imapfilter per usual on my new system and noticed an odd thing: each item in my filter was getting a minimum of 66 'hits', even when there was not even 66 total emails in my inbox! I output the number of matches to each filter I use, so instead of seeing what I was usually did:

Mediawiki emails moved:    1
Backcountry emails moved:  10
Perl QA messages moved:    0
...
Wiki alerts deleted:       0
Bucardo emails moved:      5
Maatkit emails moved:      0
Mail filtering complete

I saw everything at N+66 instead:

Mediawiki emails moved:   67
Backcountry emails moved: 76
Perl QA messages moved:   66
...
Wiki alerts deleted:      66
Bucardo emails moved:     71
Maatkit emails moved:     66
Mail filtering complete

Obviously, something was wonky. Glancing at the release notes showed that version 2.2 changed the format of the search results:

Since version 2.2, a different format is used for the returned structures of the searching methods, due to the introduction of multiple mailbox searching and meta-searching, and thus any configuration files that rely on them should be updated

Okay, but where was the 66 coming from? I created a ~/.imapfilter/test.lua file to show me exactly what was happening inside the loop over the results table. (imapfilter is written in a nice language called Lua, which calls its main data structures "tables". Probably to the chagrin of those using Lua/database crossover tools like Pl/Lua :) The test.lua file looked like this:

myaccount = IMAP {
  server   = 'mail.example.com',
  username = 'greg',
  password = 'secret',
  ssl      = 'tls1'
}
inbox = myaccount['INBOX']
result = inbox:contain_subject('bats')

count = 0
for k,v in pairs(result) do 
  count = count + 1 
  if count < 10 then
    print(count, "Call to pairs:",k,v)
  end
end
print("Total count for pairs: " .. count);

count = 0
for k,v in ipairs(result) do 
  count = count + 1 
  if count < 10 then
    print(count, "Call to ipairs:",k,v)
  end
end
print("Total count for ipairs: " .. count);

I downloaded and compiled version 2.0 of imapfilter and ran the above code, knowing that there were exactly two emails in my inbox that had a subject containing the string 'bats':

[~/code/imapfilter-2.0] ./imapfilter -c ~/.imapfilter/test.lua
  1      Call to pairs:    9         true
  2      Call to pairs:    32        true
Total count for pairs: 2
Total count for ipairs: 0

So it looked like the results table simply contained two entries, with keys of 9 and 32 (which correspond to where those emails happened to appear in my inbox). Calling ipairs yielded zero matches, which makes sense: there is no key of 1 (which is what Lua tables start with by convention, rather than 0 like almost everything else in the computer world :). The ipairs function goes through each key in order starting with 1 until a nil (undefined) key is found. In this case, 1 itself is nil. The output looks much different when I ran it using the new version (2.3) of imapfilter:

[~/code] imapfilter -c ~/.imapfilter/test.lua
  1      Call to pairs:    1              table: 0x82b0bd8
  2      Call to pairs:    2              table: 0x82b0c48
  3      Call to pairs:    _union         function: 0x81d48d0
  4      Call to pairs:    _mt            table: 0x82a32d0
  5      Call to pairs:    mark_answered  function: 0x81cefe0
  6      Call to pairs:    send_query     function: 0x81d8180
  7      Call to pairs:    is_flagged     function: 0x81c1878
  8      Call to pairs:    unmark_deleted function: 0x81bd890
  9      Call to pairs:    match_message  function: 0x81cd7f8
Total count for pairs: 68
  1      Call to ipairs:    1        table: 0x82b0bd8
  2      Call to ipairs:    2        table: 0x82b0c48
Total count for ipairs: 2

This tells us a quite a few things, and solves the mystery of the 66, which represents some meta-data stored in the results table. So rather than treating results as a simple key/value hash with one entry per match, the results table is now a dual-purpose table where the hash part of it contains some meta-data, while the actual matches are stored in the array (indexed) part of the table. Note how the counting of the matches now starts at 1 and increments, rather than using the position in the inbox, as it did before. Which means we must use ipairs to iterate through the table and get our matching entries, in this case with keys 1 and 2.

(If the "table" structure in Lua looks odd to you, that's because it is. I don't think I would have designed things that way myself - while it's clever to have a single structure that behaves as both an array with indices and a btree hash, it can lead to confusion and some ugly corner cases).

The next step was to get my filters working again - this was simply a matter of a global search and replace (M-x query-replace-regexp) from "pairs" to "ipairs".This is a good a point as any to explain what my file looks like (stored as ~/.imapfilter/config.lua). The first part simply sets some common options - for details on what they do, check out the manpage for imapfilter_config.

options.cache        = true
options.certificates = true
options.create       = false
options.info         = false
options.close        = true
options.expunge      = false

Next, a new table is created with the IMAP function. After that, we exclude all messages that are already marked as deleted, that have not yet been read, and have not been flagged. In other words, everything in my inbox I've already seen, but not flagged as urgent or deleted. The '*' in this case is a logical 'AND', and the output is the search result table we saw in the above code.

myaccount = IMAP {
  server   = 'mail.example.com',
  username = 'greg',
  password = 'secret,
  ssl      = 'tls1'
}

baseresult = inbox:is_seen() * inbox:is_unflagged() * inbox:is_undeleted()

Now that we have a search result, we simply start looking for things of interest and handling them. For example, to move messages to an existing IMAP folder:

-- Put Mediawiki messages into their folder
result = baseresult
  * (
    inbox:contain_to('@lists.wikimedia.org')
    + inbox:contain_to('@lists.wikimedia.org')
  )
count = 0 for k,v in ipairs(result) do count = count + 1 end
if count > 0 then
  inbox:move_messages(myaccount['INBOX/mediawiki'], result)
end
print('Mediawiki emails moved:        ' .. count)

Searches can be applied to an existing search result to create a new table. In the code above, a new table named 'result' is created that is based off of our 'baseresult' table, with the condition that only entries matching a specific "To" or "Cc" field are added.The '+' acts as as a logical 'OR'.

Deletion is handled in a similar way:

-- Delete wiki alerts
result = baseresult
  * inbox:contain_from('WikiAdmin ')
  * inbox:contain_subject('has been')
count = 0 for k,v in ipairs(result) do count = count + 1 end
if count > 0 then
  inbox:delete_messages(result)
end
print('Wiki alerts deleted:           ' .. count)

The rest of my config.lua file is more filtering sections, similar to the above. Adding a new filter is as easy as creating a new section similar to the above by editing the ~/.imapfilter/config.lua file. While that's not as automated as it could be, filter adjustment happens so rarely I have never been bothered by that step.

If you are not using imapfilter, you should check it out, even if you are not using mutt; imapfilter is completely independent of your email reading program, and can be run from anywhere, as it doesn't save or read anything locally. I find that imapfilter is very fast, so even when I used mail programs with built-in filters, I still employed imapfilter from time to time for bulk deletes and moves. Plus, it's a great way to dip your toe into Lua if you are not familiar with it (although without using some of its more interesting features, such as coroutines).

Rails Controllers and Transactions

Actions involving single objects in Rails are generally nicely and automatically handled. By handled, what I mean specifically, is that ActiveRecord will encapsulate saving and updating an object in a transaction and you can set up the various callbacks and validations to ensure that the object and its associations meet whatever requirements you have before you allow it to be committed to the database.

You are allowed some degree of latitude to say "Give it a shot, if it doesn't work, then it's no big deal." One of the upsides is that you can throw whatever random nonsense the UI passes you right on through, and, presumably, the model correctness validation code will do double duty as input validation as well.

Which is nice, as far as it goes, but it tends to be localized. Which is to say, that your objects generally only care about validating themselves. Sometimes, it turns out to be necessary to update the state of multiple (possibly unrelated) objects simultaneously and additionally to ensure that if any part of any of these updates fail, you roll the whole thing back.

The application in mind is in a controller, where we are getting input from the browser and we have access to a set of callbacks, but not the atomic-like transactional control we get around save/update actions in a model single.

I suspect the wording of the problem suggests the solution, but let's discuss the hard way to approach it for a bit first. You could enumerate all of the objects that you knew you were going to update, iterate through each of them, determine what state that you were going to attempt to transition them to, determine if that state is valid independently (easier), determine if that state is valid in concert with all of the other attempted transitions (harder), and then, if everything looked cool, make all of the transitions. You might notice that I left out secondary, tertiary, etc. cascading transitions caused by first order transitions that we were inspecting.

That approach starts looking likely to fail in pretty short order, even for small numbers of items. I believe the impetus for this blog post was about 3 items, and a complete lack of ability to track down all of the corner cases generated.

With that, the obvious solution is to wrap the entire operation in a transaction. It turns out the necessary parts have always been there, waiting. Here's one way to use them together.

Set up your own transactions

First, we need to be able to create our own transaction. We can do so by simply enclosing our code like so:

ActiveRecord::Base.transaction do
    ... code ...
end

We can use either a class object (as in this example), or an instance object. The documentation for ActiveRecord transactions is here. The advantage in this case for using the base class to set up the transaction is that the controller doesn't need to do any semi-mystical nonsense to try to guess the name of the object it might be related to.

How can we get our action into the middle of that transaction block?

The second part is how to get the transaction wrapped around our controller methods. The obvious in retrospect solution is the around_filter provided by ActionController. If you have a controller, then you could implement such a filter like so:

 
class MyObjectController < ApplicationController
    around_filter :transactions_filter

    def transactions_filter
        ActiveRecord::Base.transaction do
            yield
        end
    end
    ... the rest of the controller ...
    end

It probably only makes sense to add the around_filter to actions which are of the modification type. A definition like this might be more reasonable.

    around_filter :transactions_filter, :only => [:create, :update, :destroy]

Its also simple enough to add this to your ApplicationController definition if you want to have all of your controller classes inherit this functionality universally.

PG West 2011 Re-cap

I just recently got back from PG West 2011, and have had some time to ruminate on the experience (do elephants chew a cud?</note-to-self>). I definitely enjoyed San Jose as the location; it's always neat to visit new places and to meet new people, and I have to say that San Jose's weather was perfect for this time of year. I was also glad to be able to renew professional relationships and meet others in the PostgreSQL community.

Topic-wise, I noticed that quite a few talks had to do with replication and virtualization; this certainly seems to be a trend in the industry in general, and has definitely been a pet topic of mine for quite a while. It's interesting to see the various problems that necessitate some form of replication, the tradeoffs/considerations for each specific problem, and wide variety of tools that are available in order to attack each of these problems (e.g. availability, read/write scaling, redundancy, etc).

A few high points from each of the days:

Tuesday

I had dinner with fellow PostgreSQL contributors; some I knew ahead of time, others I got to know. This was followed by additional socializing.

Wednesday

I attended a talk on PostgreSQL HA, which covered the use of traditional cluster-level warm/hot standbys, as well as a solution using pg_pool and slony. This was followed by the keynote address at the conference, given by Charles Fan, Senior Vice President from VMware. This was a high-level overview of the type of work that VMware had been doing in order to support virtualizing PostgreSQL and optimizing for running multiple PostgreSQL instances on separate VMs efficiently.

I was involved in some "lunch track" discussions, and followed this all up with several more talks covering VMWare's specific offerings in more detail.

Evening was dinner and mandatory socializing.

Thursday

I went to Robert Hodges' talk about Tungsten. I had only heard of it in general terms, so it was interesting to get more specific details. Robert's talk covered the basic architecture of Tungsten, as well as how their various adapters between multiple types of databases were used to ensure that the SQL that was executed on heterogeneous clusters would account for differences in datatype representation, encoding, DDL, specific query syntax, etc; for instance when executing a CREATE TABLE statement, MySQL's AUTO_INCREMENT fields would be converted to PostgreSQL's equivalent SERIAL type. There was lots of good discussion after the presentation, and I spoke with Robert after the talk about different design/architecture choices that they made with Tungsten and we discussed differences between that and Bucardo.

At lunchtime I got to meet David Fetter's wife and baby (who looks just like him!), then gave an updated version of my Bucardo: More than just Multimaster talk. Attendance was good, around 30-35, and the audience asked plenty of questions.

After my talk, I attended one about database optimization. This is always an interesting topic for me, so I'm glad to hear other's insights on this subject.

This was all followed up by mandatory socializing.

Friday

I found the talk about Translattice to be very interesting, as it highlighted specific problem domains for distributed, redundant, multi-write database clusters for more fault-tolerant applications. It struck me as utilizing some of the same ideas as Cassandra or other decentralized distributed datastores, but doing so in a way that is transparent to the use of PostgreSQL. What I found particularly interesting about this system was the use of data access/usage patterns, explicit policy, and locality to specify both the costing algorithm for accessing data as well as distributing knowledge about just where each copy of each piece of data exists. The talk, while an introduction to the system, did not skimp on the details and the presenter was happy to answer my many specific questions.

The remaining talks were fairly light-hearted. I went to one called Redis: Data Bacon for the title alone. While I still don't understand why bacon, I walked away with an appreciation of the problem domain Redis addresses and how it could be used in specific cases. The final talk I attended was about Schemaverse, a project which implements a game entirely in SQL. Each player has their own database user created that they can then use from either the web interface or even via just a regular psql connection. I can't speak for the game itself other than the overview given in the talk, but creative use/hacking of the game was explicitly encouraged, and seems like an interesting approach for testing things which may not often be stressed enough in (at least my) regular use of PostgreSQL, such as intra-database security/permissions, huge numbers of users, etc. (It didn't surprise me that this game had been a hit at DEFCON.)

This was followed by the closing session, and final goodbyes, etc. Oh, and (need I say) mandatory socializing.

Final Thoughts

I always enjoy going to PostgreSQL events, and continue to be impressed with the community that surrounds PostgreSQL. Thanks to everyone who attended, and a special thanks to Josh Drake for the work he put into it. Hope to see ya next time!

Viewing schema changes over time with check_postgres


Image by Flickr user edenpictures

Version 2.18.0 of check_postgres, a monitoring tool for PostgreSQL, has just been released. This new version has quite a large number of changes: see the announcement for the full list. One of the major features is the overhaul of the same_schema action. This allows you to compare the structure of one database to another and get a report of all the differences check_postgres finds. Note that "schema" here means the database structure, not the object you get from a "CREATE SCHEMA" command. Further, remember the same_schema action does not compare the actual data, just its structure.

Unlike most check_postgres actions, which deal with the current state of a single database, same_schema can compare databases to each other, as well as audit things by finding changes over time. In addition to having the entire system overhauled, same_schema now allows comparing as many databases you want to each other. The arguments have been simplified, in that a comma-separated list is all that is needed for multiple entries. For example:

./check_postgres.pl --action=same_schema \
  --dbname=prod,qa,dev --dbuser=alice,bob,charlie

The above command will connect to three databases, as three different users, and compare their schemas (i.e. structures). Note that we don't need to specify a warning or critical value: we consider this an 'OK' Nagios check if the schemas match, otherwise it is 'CRITICAL'. Each database gets assigned a number for ease of reporting, and the output looks like this:

POSTGRES_SAME_SCHEMA CRITICAL: (databases:prod,qa,dev)
  Databases were different. Items not matched: 1 | time=0.54s 
DB 1: port=5432 dbname=prod user=alice
DB 1: PG version: 9.1.1
DB 1: Total objects: 312
DB 2: port=5432 dbname=qa user=bob
DB 2: PG version: 9.1.1
DB 2: Total objects: 312
DB 3: port=5432 dbname=dev user=charlie
DB 3: PG version: 9.1.1
DB 3: Total objects: 313
Language "plpgsql" does not exist on all databases:
  Exists on:  3
  Missing on: 1, 2

The second large change was a simplification of the filtering options. Everything is now controlled by the --filter argument, and basically you can tell it what things to ignore. For example:

./check_postgres.pl --action=same_schema \
  --dbname=A,B --filter=nolanguage,nosequence

The above command will compare the schemas on databases A and B, but will ignore any difference in which languages are installed, and ignore any differences in the sequences used by the databases. Most objects can be filtered out in a similar way. There are also a few other useful options for the --filter argument:

  • noposition: Ignore what order columns are in
  • noperms: Do not worry about any permissions on database objects
  • nofuncbody: Do not check function source

The final and most exciting large change is the ability to compare a database to itself, over time. In other words, you can see exactly what changed during a certain time period. We have a client using that now to send a daily report on all schema changes made in the last 24 hours, for all the databases in their system. This is a very nice thing for a DBA to receive: not only is there a nice audit trail in your email, you can answer questions such as:

  • Was this a known change, or did someone make it without letting anyone else know?
  • Did somebody fat-finger and drop an index by mistake?
  • Were the changes applied to database X also applied to database Y and Z?

To enable time-based checks, simply provide a single database to check. The first time it is run, same_schema simply gathers all the schema information and stores it on disk. The next time it is run, it detects the file, reads it in as database "2", and compares it to the current database (number "1"). The --replace argument will rewrite the file with the current data when it is done. So the cronjob for the aforementioned client is as simple as:

10 0 * * * ~/bin/check_postgres.pl --action=same_schema \
  --host=bar --dbname=abc --quiet --replace

The --quiet argument ensures that no output is given if everything is 'OK'. If everything is not okay (i.e. if differences are found), cron gets a bunch of input sent to it and duly mails it out. Thus, a few minutes after 10AM each day, a report is sent if anything has changed in the last day. Here's a slightly redacted version of this morning's report, which shows that a schema named "stat_backup" was dropped at some point in the last 24 hours (which was a known operation):

POSTGRES_SAME_SCHEMA CRITICAL: DB "abc" (host:bar)
  Databases were different. Items not matched: 1 | time=516.56s
DB 1: port=5432 host=bar dbname=abc user=postgres
DB 1: PG version: 8.3.16
DB 1: Total objects: 11863
DB 2: File=check_postgres.audit.port.5432.host.bar.db.abc
DB 2: Creation date: Sun Oct  2 10:06:12 2011  CP version: 2.18.0
DB 2: port=5432 host=bar dbname=abc user=postgres
DB 2: PG version: 8.3.16
DB 2: Total objects: 11864
Schema "stat_backup" does not exist on all databases:
  Exists on:  2
  Missing on: 1

As you can see, the first part is a standard Nagios-looking output, followed by a header explaining how we defined database "1" and "2" (the former a direct database call, and the latter a frozen version of the same.)

Sometimes you want to store more than one version at a time: for example, if you want both a daily and a weekly view. To enable this, use the --suffix argument to create different instances of the saved file. For example:

10 0 * * * ~/bin/check_postgres.pl --action=same_schema \
  --host=bar --dbname=abc --quiet --replace --suffix=daily
10 0 * * Fri ~/bin/check_postgres.pl --action=same_schema \
  --host=bar --dbname=abc --quiet --replace --suffix=weekly

The above command would end up recreating this file every morning at 10:check_postgres.audit.port.5432.host.bar.db.abc.daily and this file each Friday at 10: check_postgres.audit.port.5432.host.bar.db.abc.weekly.

Thanks to all the people that made 2.18.0 happen (see the release notes for the list). There are still some rough edges to the same_schema action: for example, the output could be a little more user-friendly, and not all database objects are checked yet (e.g. no custom aggregates or operator classes). Development is ongoing; patches and other contributions are always welcome. In particular, we need more translators. We have French covered, but would like to include more languages. The code can be checked out at:

git clone git://bucardo.org/check_postgres.git

There is also a github mirror if you so prefer: https://github.com/bucardo/check_postgres.

You can also file a bug (or feature request), or join one of the mailing lists: general, announce, and commit.

CRUD, RESTful, and JSON for Address Management in Interchange

Recently, I worked on a large project for Paper Source that introduced the functionality to allow users to split orders into multiple addresses, which is especially valuable during the holiday season for gift purchase. Paper Source runs on Interchange, one of the ecommerce frameworks End Point is intimately familiar with.

This project requires CRUD functionality for address editing during the checkout process; users must be able to add, edit, list, and remove addresses to be assigned to the various items in the cart. Another challenge here is that both logged in and logged out users are required to have this functionality, where addresses modified by logged in users would persist between sessions, and addresses of logged out users are available during their session and destroyed after they end the session (or in Paper Source's case, when they close the browser window).

With these requirements, I set off to develop an architecture that followed RESTful practices, described below:

Listing Addresses

A Perl module contains a method for listing user addresses, which is shown below in simplified form. The user_addresses scratch variable is created and contains an array of hashed addresses. For those unfamiliar with Interchange, scratch space is created as part of a user session which contains variables accessible throughout the user session or limited to a single page request. The Tag->tmpn call below sets the scratch variable accepting arguments key, value that is accessible during the current page request. This is not unlike setting a variable in a controller to be used in the view in MVC architecture.

Also of note is the use of to_json, which transforms the address into JSON-ified form, which is used rather than manually looping through the addresses.

sub list {
  my $addresses;
  if($::Session->{username}) {    # user is logged in
    # $results = SELECT * FROM addresses WHERE username = ?
    foreach my $key (keys %$results) {
       push @$addresses, $results->{$key};
    }
  }
  else {
    $addresses = $::Session->{stored_addresses};
  }

  foreach my $address (@$addresses) {
    $address->{json} = to_json($address);
  }
  $:Tag->tmpn("user_addresses", $addresses);
  return;
}

In the HTML template, Interchange's loop tag is used to loop through the addresses. Note: There may be a better way to avoid trailing commas in Interchange's loop tag – please share it if you know the secret!

<script type="text/javascript">
var addresses = {
[loop object.mv_results=`$Scratch->{user_addresses}`]
    'address_[loop-param id]': [loop-param json],[/loop]
    'dummy' : ''
};
</script>

For example, the above code might yield the JSON object shown below. These addresses are also used in the dropdowns shown in the screenshot at the beginning of this article.

<script type="text/javascript">
var addresses = {
 'address_116971': {"country":"US","nickname":"Sister","fname":"Jackie", ... },
 'address_116969': {"country":"US","nickname":"Personal","fname":"Stephanie", ... },
 'dummy' : ''
};
</script>

Creating an Address

Next, I added functionality for creating an address. Similar to the method above, the add subroutine is called with the use of a custom Interchange Actionmap. The add method handles logged in and logged out use cases:

sub add {
  # do server-side error checking

  my $result;
  eval {
    my $address;
    if($::Session->{username}) {
      # store address in database with INSERT
      # $address is new address, with id of last_insert_id
    }
    else {
      # determine key to store address in session
      # store address in Session->{stored_addresses}
      # $address is new address, with key as id
    }
    $result = { address => to_json($address), success => 1 };
  };
  if($@) {
    $result = { error_msg => "Error: ...", success => 0 };
  }

  $::Tag->tmpn("result", to_json($result));
  $::CGI->{mv_nextpage} = "ajax/standard.html";  # sets HTML template used

  return;
}

This method was called via AJAX, which looks like this:

$.ajax({
  url: "/address_management/add",
  data: address_data,  //parameterized address data
  dataType: "JSON",
  success: function(data) {
    if(data.success) {
      //updated addresses JavaScript variable
      //updated on page HTML (adds to dropdown)
    } else {
      //notifies user of errors
    }
  },
  failure: function(data) {
    //alert that could not process
  }
});

Edit Address

The edit address functionality is similar to the code for creating an address, the only difference being that the database and Session variable was updated, and the URL called for editing is "/address_management/edit/:id". Mustache is used to render the form prepopulated with the current address values.

Remove Address

Finally, an address management page was created to include the ability to remove addresses. This method uses an AJAX method similar to the one shown above, and the Perl module contains the following:

sub remove {
  my ($self, $dbh) = @_;
  my $result;
  eval {
      # database DELETE FROM addresses
      $result->{success} = 1;
  };
  if ($@) {
      $result->{success} = 0;
  }

  $::Tag->tmpn("result", to_json($result));
  $::CGI->{mv_nextpage} = 'ajax/standard.html';
  return;
}

Conclusion

The advantages to the RESTful CRUD methods described here is that the client side JavaScript and HTML code is reused for both logged in and logged out users. The server-side responds to all requests in a similar manner for both logged in and logged out use cases. Reusing client-side code eases maintenance because there is less code to support and less code to update when changes are necessary. Additionally, these CRUD methods are reused in the user address management page (shown below), which takes advantage of reusable server-side and client-side modular elements.

In addition to the work described here, quite a bit of work was done on the backend for order processing to create a new structure for storing addresses per order. Out of the box Interchange functionality stores the billing and shipping addresses in the transactions (orders) table, and individual item information in the orderlines table (item sku, quantity, price, etc.). With this new functionality, shipping addresses were pulled out of the transactions table into their own table (shipped_addresses), and orderline items mapped to these shipping addresses. To preserve historical order data, shipped_addresses is copied from addresses for logged in users and remains untouched by the user.

Data Model Before
Data Model After

Paper Source has several special products, like gift cards and a large set of personalized products. Additional changes were required as well to accomodate the new data model. For example, a script that reported on gift card shipping addresses required updating to adhere to the new data structure.

A comparison of JasperSoft iReport and Pentaho Reporting

I've recently been involved in reporting projects using both JasperSoft's iReport and Pentaho's Reporting, so this seemed a good opportunity to compare the two. Both are Java-based, open source reporting systems which claim to build "pixel-perfect" documents ("pixel-perfect" means that when you put something somewhere on a report design, it doesn't move around. That this isn't taken for granted is a rant for another time). I have more experience with Pentaho than with JasperSoft, and once reviewed a book on Pentaho; I'll try to give the two a fair evaluation, but in the end I can't promise my greater experience with Pentaho won't affect my conclusions one way or the other. Both suites are available in open source and commercial flavors; I'll consider only the open source versions here.

First let me point out that JasperSoft and Pentaho both produce business intelligence software suites. The two suites exist in both community (open source) and enterprise (commercial) forms, are well worth comparing in their entirety, but I'm focusing principally on the report builder component of each, because that's where my recent experience has led me. These report builder packages allow users to build complex and visually interesting data-based documents from many different kinds of data sources. A "document" could be a simple form letter, an address book, or a complex dashboard of business metrics. In each case, users build documents by dragging and dropping various components into report "bands", and then modifying components' properties. "Bands" are horizontal sections of the page with different roles. The page header and footer bands, for instance, are (obviously) printed at the top and bottom of each page. "Detail" bands print once per row returned by the query that underlies the report. Both iReport and Pentaho Reporting will group and filter query results for you, if you want, and include header and footer bands for each group. Both products allow users to publish finished reports to a server where other users can view them, schedule them to run periodically, or modify them for new purposes, and both provide a Java-based reporting library to embed reporting in other applications.

iReport Query Dialog

Reports in both products are based on queries. These queries may be SQL using JDBC data sources, or they can come from other more obscure data sources, such as MDX queries, XQuery, scripts in various languages, web services, and more. Both also provide a query editor, at least for SQL queries, and in fact, both use the same query editor. I've only used it rarely, and only in Pentaho; both products also allow users to type in queries free-form, which I much prefer. In Pentaho, the data source for the query is embedded in the report itself, in the form of a JDBC URL, a JNDI name, or something else appropriate for that data source, so if you publish the report to a server, you're responsible for ensuring that JNDI name or JDBC URL or whatever makes sense on that server. Jasper, on the other hand, prompts the user with a list of available data sources when publishing a report to a server. Jasper's method seems more friendly, but Pentaho's choice may have its advantages here, because each report is self-contained, whereas Jasper's reports have metadata outside the single report file to describe the associated data source.

The Pentaho Report Builder

The component libraries available in each product are fairly similar. Users can select from basic drawing components such as lines and circles, static formatted text labels, and of course various numeric, textual, and date components to display query results. Both products also include complex charting components, to display visualizations of many different kinds. Though the charting functionality of both products, at least at the basic level I've used, is quite similar, I found iReport's charting dialogs quite helpful for making complex charts much easier to create than they would otherwise be. In Pentaho, after adding a chart component the user can open a special properties window, but the window offers few clues beyond the usual tooltips and occasionally meaningful property names to help the user know what to do. In contrast, iReport describes many of the properties involved in clear language.

iReport's chart selection dialog

This brings me to the topic of documentation, which, in fact, I found lacking for both produts. Yes, these are open source projects, and yes, documentation isn't always as fun to write as code, so yes, open source projects sometimes end up with lousy documentation. The enterprise versions come with documentation in one form or another, and there are several books published on the different suites' components, including reporting. But the documentation available free-of-charge on the web left me unimpressed. Here Pentaho was particularly frustrating, largely because Pentaho's reporting has changed a great deal from version to version, especially with its 3.5 release a couple years ago, so much of the help available in forums and wikis is completely out of date. Jasper documentation was more difficult to find, in general, but more accurate and up-to-date when I found it.

Both products save their results in fairly comprehensible formats, which is helpful if you ever need to modify them by hand, without the help of the report building tool. iReport's files are in a single XML file; Pentaho creates a ZIP-ed archive of several XML files. That fact has come in quite handy several times, both with Pentaho and with iReport, typically because it's faster for me to edit XML by hand than click on each of 20 components and tweak properties one-by-one, because (in Pentaho) I needed to modify a data source, or because the tool failed to figure out what columns to expect from a query and I wanted to enter them manually.

One major difference between the two products stems from what is available in their open source "community editions". Jasper looks a bit more polished to me, the documentation is somewhat more consistent, and its selection of sample reports and sample code is more comprehensive -- but the community edition does little more than support iReport. Pentaho's community edition, in addition to the reporting functions I've discussed, also offers ad-hoc web based reporting and a powerful MDX analysis package. JasperSoft offers those features only in its enterprise edition. That may not be a big deal in some places, but it is the deciding factor in others. Having access to Pentaho's code has made possible a number of things we certainly couldn't have done otherwise.

I've been trying to keep this "a comparison" of Pentaho Reporting and iReport, rather than a "showdown" or "shootout" or even "Jasper vs. Pentaho", but at some point, conclusions must be drawn. I'll draw here somewhat on evidence not mentioned above, because, well, the post was getting a bit long as it was, and I didn't want to describe everything. I admit to liking iReport's interface better, because to me it seems to make better use of screen real estate. Google was much better able to answer questions for iReport than for Pentaho. But, although both products sometimes seem to make simple things hard, Pentaho seemed to do this less than iReport, and the various magic incantations I needed to get things working were fewer in Pentaho. In the end, the much greater capability of Pentaho's open source offering over JasperSoft's clinches it for me. I'll take open source over closed any day.