End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

We are hiring: Ruby on Rails Developer

Job Description:
End Point is looking for a talented software developer who can consult with our clients and develop their Rails web applications. We need someone that focuses on the client and will deliver quality, tested code.

About End Point:
End Point is a 15-year-old web consulting company based in New York City, with 20 full-time employees working remotely from around the United States. Our team is made up of strong e-commerce, database, and system administration talent that leverage a variety of open source technologies.

We service over 200 clients ranging from small mom and pop shops to large corporations. End Point continues to grow this year and we're looking for intelligent and passionate people that want to join our team and make a difference. We prefer open source technology and do collaborative development with Git, GNU Screen, IRC, and voice.

What is in it for you?

  • Work from your home office or our Manhattan based headquarters
  • Flexible full-time work hours
  • Strong balance of work and home life
  • Bonus opportunities
  • Health insurance benefits
  • Ability to move without being tied to your job location

What you will be doing:

  • Consulting with clients to determine their web application needs
  • Building, testing, and releasing web applications for our clients
  • Working with open source tools

What you will need:

  • Experience building and testing Ruby on Rails applications
  • Experience with PostgreSQL, MySQL or another relational database
  • Customer-centric focus
  • A passion for building flexible and scalable web applications
  • Strong verbal and written communication skills
  • Be motivated and able to work from home
  • Ability and desire to learn new technologies

Bonus points for:

  • Spree, Sinatra, DataMapper or NoSQL experience
  • Prior consulting experience
  • Building and working with e-commerce systems
  • System administration and deployment
  • Mobile and location-based technologies

Please email jobs@endpoint.com to apply.

Does this job description not quite fit what you do? Do you have other software development skills such as Perl, Python, or PHP, and think End Point is the place for you to work? We are always looking for talented people, so send us your resume and we'll let you know if there is another position that might be a better fit.

Competing with the big players in e-commerce

While attending the Internet Retailer Conference and Exhibition in San Diego last week I had a few moments to speak with Nathan Barling, the CTO for Shoebacca.com. During our conversation he mentioned he would be speaking at the conference in the track titled: "Small Retailers: Winning Strategies in a tougher market". I attended his talk and was impressed by the things that Shoebacca is doing to appear larger then they are, which helps them compete with the big players in their industry such as Zappos. The tactics that Nathan discussed can be applied to many industries in e-commerce and for all sizes of businesses, even those on limited budgets.

One of the first things Nathan discussed was to make your policies clear and to highlight them on the site so that people are aware of the rules. Nathan recommends this, especially in the case of Shoebacca, where many of their policies encourage people to shop on their site, by reducing risk to the customer. Some of their policies include:

  • Free ground shipping
  • Free return shipping
  • 365 day return policy
  • 110% price match for 14 days

Nathan mentioned many 3rd party tools that they leverage at their company, along with open source tools they are using to keep the costs down while providing a solid platform for their site. Some of the open source tools in use include Magento (e-commerce), Solr (search), and Gearman (jobs system).

One area where Shoebacca is doing cool work is in their "shoe art". These are clips and ads through the site which present the product in a unique and appealing way. They give the illusion that Shoebacca is a large company with an extensive art department and studio for shooting the product. They also leverage the art provided by their vendors such as ads and logos so that their customers associate the site with the quality of those vendors such as Nike and Puma. Another area where they are getting assistance from their vendors is with giveaway merchandise. This is merchandise the vendor provides them to giveaway with purchases of other items on the site. This could be a sticker, water bottle, or jacket, all promo items that Shoebacca does not have to pay for, but is of some value to the end customer and may push the customer to complete the order.

One area that Nathan saw an immediate increase in conversions is with alternate payment methods. They chose to launch Amazon payments, and before they could even put a test order through someone had already checked out. Using alternate payment methods such as Amazon, Google or PayPal allows your customers to have the peace of mind that their credit card details are safe with a trusted company, while they are getting to know your company. Nathan said that many people on subsequent visits ended up entering their credit card directly into the site, after having a successful transaction through an alternate payment method.

Shoebacca is also going to be launching a subscription program in which people pay a nominal yearly fee and receive benefits such as a coupon, access to exclusive sales, free swag, upgrade to free 2 day shipping. The subscription program is a neat idea to help get your customers to commit to using your site on an ongoing basis.

I enjoyed my time away from the exhibit hall for Nathan Barling's talk and took away quite a few things that I can suggest to my clients who are looking to improve their sales.

DBD::Pg UTF-8 for PostgreSQL server_encoding

We are preparing to make a major version bump in DBD::Pg, the Perl interface for PostgreSQL, from the 2.x series to 3.x. This is due to a reworking of how we handle UTF-8. The change is not going to be backwards compatible, but will probably not affect many people. If you are using the pg_enable_utf8 flag, however, you definitely need to read on for the details.

The short version is that DBD::Pg is going return all strings from the Postgres server with the Perl utf8 flag on. The sole exception will be databases in which the server_encoding is SQL_ASCII, in which case the flag will never be turned on.

For backwards compatibility and fine-tuning control, there is a new attribute called pg_utf8_strings that can be set at connection time to override the decision above. For example, if you need your connection to return byte-soup, non-utf8-marked strings, despite coming from a UTF-8 Postgres database, you can say:

  my $dsn = 'dbi:Pg:dbname=foobar';
  my $dbh = DBI->connect($dsn, $dbuser, $dbpass,
    { AutoCommit => 0,
      RaiseError => 0,
      PrintError => 0,
      pg_utf8_strings => 0,
    }
  );

Similarly, you can set pg_utf8_strings to 1 and it will force settings returned strings as utf8, even if the backend is SQL_ASCII. You should not be using SQL_ASCII of course, and certainly not forcing the strings returned from it to UTF-8. :)

All Perl variables (be they strings or otherwise) are actually Perl objects, with some internal attributes defined on them. One of those is the utf8 flag, which can be flipped on to indicate that the string should be treated as possibly containing multi-byte characters, or it can be left off, to indicate the string should always be treated on a byte-by-byte basis. This will affect things like the Perl length function, and the Perl \w regex flag. This is completely unrelated to the Perl pragma use utf8, which DBD::Pg has nothing at all to do with. Have I mentioned that UTF-8, and UTF-8 in Perl in particular, can be quite confusing?

There are a few exceptions as to what things DBD::Pg will mark as utf8. Integers and other numbers will not, boolean values will not, and no bytea data will ever have the flag set. When in doubt, assume that it is set.

The old attribute, pg_enable_utf8, will be deprecated, and have no effect. We thought about re-using that but it seemed clearer and cleaner to simply create a new variable (pg_utf8_strings), as the behavior has significantly changed.

A beta version of DBD::Pg (2.99.9_1) with these changes has been uploaded to CPAN for anyone to experiment with. Right now, none of this is set in stone, but we did want to get a working version out there to start the discussion and see how it interacts with applications that were making use of the pg_enable_utf8 flag. You can web search for "dbdpg" and look for the "Latest Dev. Release", or jump straight to the page for DBD::Pg 2.99.9_1. The trailing underscore is a CPAN convention that indicates this is a development version only, and thus will not replace the latest production version (2.18.1 as of this writing).

As a reminder, DBD::Pg has switched to using git, so you can follow along with the development with:

git clone git://bucardo.org/dbdpg.git

There is also a commits mailing list you can join to receive notifications of commits as they are pushed to the main repo. To sign up, send an email to dbd-pg-changes-subscribe@perl.org.

Internet Retailer exhibits of note

Last night concluded Internet Retailer Conference & Exhibition 2011 in San Diego. We had a lot of good conversations with attendees and other exhibitors at the End Point booth, and our Liquid Galaxy with Google Earth was a great draw for visitors:

The majority of exhibitors at the show were offering software as a service or productized ecommerce services. A couple of our favorite small SaaS companies, both for their knowledgeable and friendly technical staff, and for their challenging some of the less-beloved incumbent giants in the space, were Olark, offering a SaaS live chat service, and SearchSpring, with their SaaS faceted search service. We look forward to trying out their services.

Some of the more dazzling software demonstrations at the show were:

  • Total Immersion, an "augmented reality solution. Their TryLive Eyewear demo had us looking into their webcam and trying out different eyeglass frames that were overlaid on our video image in real time.
  • Styku, a company offering 3-D virtual fitting room software. They had an amazing video demo of mannequins modeling different clothes, and it's all customizable per visitor who wants to use his/her measurements to be fitted online. It's easy to see that this kind of thing has a lot of potential for online clothing sales, and could greatly reduce returns and exchanges due to bad sizing.

E-commerce can seem to make location less relevant, and at End Point our workforce is distributed throughout the U.S., adding to the effect of "dislocation". But at Internet Retailer, people's location was a common topic. I'm located in Teton Valley, Idaho, just a few miles west of Wyoming, and it was nice for me to meet some of my geographic neighbors in Idaho, Utah, and Colorado.

We had good conversations with several companies in the Salt Lake City area: Logica transportation analytics, Molding Box fulfillment outsourcing, Doba drop ship, and AvantLink affiliate technology, headed up by Scott Kalbach who we worked with at Backcountry.com years ago. I also met attendees and exhibitors with offices and staff closer to home in Idaho Falls and Boise, Idaho.

The show ended last night at 7:00 pm and we broke down the booth and packed up our Liquid Galaxy for shipping back to New York City, a somewhat labor-intensive task.

It's been a busy show, staffing our booth from 9:00 am till 7:00 pm each day, so we didn't have much time to enjoy beautiful San Diego or get much sleep or exercise. On the way back to the hotel we made a quick stop at a playground to unwind.

End Point at IRCE 2011

We are in full force with a booth at at Internet Retailer Conference 2011 in San Diego. The exhibit hall opened yesterday afternoon after the last few stragglers flew in from North Carolina (me) and Idaho (Jon) to join Ben, Rick, Carl, and Ron.

We've had a steady flow of booth visitors interested in hearing about our core ecommerce services and Liquid Galaxy. We've also heard from a few companies interested in partnering, which is a nice way to learn about the latest popular technologies in ecommerce, such as mobile and tablet opportunities, live chat integration, real-time user interactivity ecommerce features, and shipping integration and analytics.

Stop by if you're here and interested in hearing more about End Point's open source consulting and development services!

Here at IRCE 2011!
Ben navigates our Liquid Galaxy display.
Rick navigates through San Diego before a team dinner.
Ben & Carl pose in front our our Liquid Galaxy display.

DBD::Pg moves to git!

Just a note to everyone that development the official DBD::Pg DBI driver for PostgreSQL source code repository has moved from its old home in SVN to a git repository. All development has now moved to this repo.

We have imported the SVN revision history, so it's just a matter of pointing your git clients to:

$ git clone git://bucardo.org/dbdpg.git

For those who prefer, there is a github mirror:

$ git clone git://github.com/bucardo/dbdpg.git

Git is available via many package managers or by following the download links at http://git-scm.com/download for your platform.

Enjoy!

MongoDB replication from Postgres using Bucardo

One of the features of the upcoming version of Bucardo (a replication system for the PostgreSQL RDBMS) is the ability to replicate data to things other than PostgreSQL databases. One of those new targets is MongoDB, a non-relational 'document-based' database. (to be clear, we can only use MongoDB as a target, not as a source)

To see this in action, let's setup a quick example, modified from the earlier blog post on running Bucardo 5. We will create a Bucardo instance that replicates from two Postgres master databases to a Postgres database target and a MongoDB instance target. We will start by setting up the prerequisites:

sudo aptitude install postgresql-server \
perl-DBIx-Safe \
perl-DBD-Pg \
postgresql-contrib

Getting Postgres up and running is left as an exercise to the reader. If you have problems, the friendly folks at #postgresql on irc.freenode.net will be able to help you out.

Now for the MongoDB parts. First, we need the server itself. Your distro may have it already available, in which case it's as simple as:

aptitude install mongodb

For more installation information, follow the links from the MongoDB Quickstart page. For my test box, I ended up installing from source by following the directions at the Building for Linux page.

Once MongoDB is installed, we will need to start it up. First, create a place for MongoDB to store its data, and then launch the mongodb process:

$  mkdir /tmp/mongodata
$  mongod --dbpath=/tmp/mongodata --fork --logpath=/tmp/mongo.log
all output going to: /tmp/mongo.log
forked process: 428

You can perform a quick test that it is working by invoking the command-line shell for MongoDB (named "mongo" of course) Use quit() to exit:

$  mongo
MongoDB shell version: 1.8.1
Fri Jun 10 12:45:00
connecting to: test
> quit()
$ 

The other piece we need is a Perl driver so that Bucardo (which is written in Perl) can talk to the MongoDB server. Luckily, there is an excellent one available on CPAN named 'MongoDB'. We started the MongoDB server before doing this step because the driver we will install needs a running MongoDB instance to pass all of its tests. The module has very good documentation available on its CPAN page. Installation may be as easy as:

$  sudo cpan MongoDB

If that did not work for you (case matters!), there are more detailed directions on the Perl Language Center page.

Our next step is to grab the latest Bucardo, install it, and create a new Bucardo instance. See the previous blog post for more details about each step.

$ git clone git://bucardo.org/bucardo.git
Initialized empty Git repository...

$ cd bucardo
$ perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for Bucardo
$ make
cp bucardo.schema blib/share/bucardo.schema
cp Bucardo.pm blib/lib/Bucardo.pm
cp bucardo blib/script/bucardo
/usr/bin/perl -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/bucardo
Manifying blib/man1/bucardo.1pm
Manifying blib/man3/Bucardo.3pm
$ sudo make install
Installing /usr/local/lib/perl5/site_perl/5.10.0/Bucardo.pm
Installing /usr/local/share/bucardo/bucardo.schema
Installing /usr/local/bin/bucardo
Installing /usr/local/share/man/man1/bucardo.1pm
Installing /usr/local/share/man/man3/Bucardo.3pm
Appending installation info to /usr/lib/perl5/5.10.0/i386-linux-thread-multi/perllocal.pod
$ sudo mkdir /var/run/bucardo
$ sudo chown $USER /var/run/bucardo
$ bucardo install
This will install the bucardo database into an existing Postgres cluster.
...
Installation is now complete.

Now we create some test databases and populate with pgbench:

$ psql -c 'create database btest1'
CREATE DATABASE
$ pgbench -i btest1
NOTICE:  table "pgbench_branches" does not exist, skipping
...
creating tables...
10000 tuples done.
20000 tuples done.
...
100000 tuples done.
$ psql -c 'create database btest2 template btest1'
CREATE DATABASE
$ psql -c 'create database btest3 template btest1'
CREATE DATABASE
$ psql btest3 -c 'truncate table pgbench_accounts'
TRUNCATE TABLE

$ bucardo add db t1 dbname=btest1
Added database "t1"
$ bucardo add db t2 dbname=btest2
Added database "t2"
$ bucardo add db t3 dbname=btest3
Added database "t3"
$ bucardo list dbs
Database: t1  Status: active  Conn: psql -p 5432 -U bucardo -d btest1
Database: t2  Status: active  Conn: psql -p 5432 -U bucardo -d btest2
Database: t3  Status: active  Conn: psql -p 5432 -U bucardo -d btest3

$ bucardo add tables pgbench_accounts pgbench_branches pgbench_tellers herd=therd
Created herd "therd"
Added table "public.pgbench_accounts"
Added table "public.pgbench_branches"
Added table "public.pgbench_tellers"

$ bucardo list tables
Table: public.pgbench_accounts  DB: t1  PK: aid (int4)
Table: public.pgbench_branches  DB: t1  PK: bid (int4)
Table: public.pgbench_tellers   DB: t1  PK: tid (int4)

The next step is to add in our MongoDB instance. The syntax is the same as the "add db" above, but we also tell it the type of database, as it is not the default of "postgres". We will also assign an arbitrary database name, "btest1", the same as the others. Everything else (such as the port and host) is default, so all we need to say is:

$  bucardo add db m1 dbname=btest1 type=mongo
Added database "m1"
$  bucardo list dbs
Database: m1  Type: mongo     Status: active  
Database: t1  Type: postgres  Status: active  Conn: psql -p 5432 -U bucardo -d btest1
Database: t2  Type: postgres  Status: active  Conn: psql -p 5432 -U bucardo -d btest2
Database: t3  Type: postgres  Status: active  Conn: psql -p 5432 -U bucardo -d btest3

Next we group our databases together and assign them roles:

$  bucardo add dbgroup tgroup  t1:source  t2:source  t3:target  m1:target
Created database group "tgroup"
Added database "t1" to group "tgroup" as source
Added database "t2" to group "tgroup" as source
Added database "t3" to group "tgroup" as target
Added database "m1" to group "tgroup" as target

Note that "target" is the default action, so we could shorten that to:

$  bucardo add dbgroup tgroup t1:source  t2  t3  m1

However, I think it is best to be explicit, even if it does (incorrectly) hint that m1 could be anything *other* than a target. :)

We are almost ready to go. The final step is to create a sync (a basic replication event in Bucardo), then we can start up Bucardo, put some test data into the master databases, and 'kick' the sync:

$  bucardo add sync mongotest  herd=therd  dbs=tgroup  ping=false
Added sync "mongotest"

$  bucardo start
Checking for existing processes
Starting Bucardo

$  pgbench -t 10000 btest1
starting vacuum...end.
transaction type: TPC-B (sort of)
number of transactions actually processed: 10000/10000
...
tps = 503.300595 (excluding connections establishing)
$  pgbench -t 10000 btest2
number of transactions actually processed: 10000/10000
...
tps = 408.059368 (excluding connections establishing)
$  bucardo kick mongotest

We'll give it a few seconds to replicate those changes (it took 18 seconds on my test box), and then check the output of bucardo status:

$  bucardo status
PID of Bucardo MCP: 3317
 Name        State    Last good    Time    Last I/D/C    Last bad    Time  
===========+========+============+=======+=============+===========+=======
 mongotest | Good   | 21:57:47   | 11s   | 6/36234/898 | none      |

Looks good, but what about the data in MongoDB? Let's get some counts from the Postgres masters and slave, and then look at the data inside MongoDB with the mongo command-line client:

$  psql btest1 -c 'SELECT count(*) FROM pgbench_accounts'
100000
$  psql btest2 -c 'SELECT count(*) FROM pgbench_accounts'
100000
$  psql btest3 -c 'SELECT count(*) FROM pgbench_accounts'
18106
$  psql btest1 -qc 'SELECT min(abalance),max(abalance) FROM pgbench_accounts'
-12071 | 13010
$  psql btest2 -qc 'SELECT min(abalance),max(abalance) FROM pgbench_accounts'
-12071 | 13010
$  psql btest3 -qc 'SELECT min(abalance),max(abalance) FROM pgbench_accounts'
-12071 | 13010

$  mongo btest1
MongoDB shell version: 1.8.1
Fri Jun 10 12:46:00
connecting to: btest1
> show collections
bucardo_status
pgbench_accounts
pgbench_branches
pgbench_tellers
system.indexes
>  db.pgbench_accounts.count()
18106
>  db.pgbench_accounts.find().sort({abalance:1}).limit(1).next()
{
  "_id" : ObjectId("4df39bcb8795839660001de5"),
  "abalance" : -12071,
  "aid" : 84733,
  "bid" : 1,
  "filler" : "               "
}
> db.pgbench_accounts.find().sort({abalance:-1}).limit(1).next()
{
  "_id" : ObjectId("4df39bd08795839660002fb0"),
  "abalance" : 13010,
  "aid" : 45500,
  "bid" : 1,
  "filler" : "               "
}

Why the difference in counts? We only started replicating after we populated the Postgres tables on the master databases with 100,000 rows, so the eighteen thousand is the number of rows that was changed during the subsequent pgbench run. (Note that pgbench uses randomness, so your numbers will be different than the above). In the future Bucardo will support the "onetimecopy" feature for MongoDB, but until then we can fully populate the pgbench_accounts collection simply by "touching' all the records on one of the masters:

$ psql btest1 -c 'UPDATE pgbench_accounts SET aid=aid'
UPDATE 100000
$ bucardo kick mongotest
Kicked sync mongotest
$ echo 'db.pgbench_accounts.count()' | mongo btest1
MongoDB shell version: 1.8.1
Fri Jun 10 12:47:00
connecting to: btest1
> 100000
> bye

A nice feature of MongoDB is its autovivification ability (aka dynamic schemas), which means unlike Postgres you do not have to create your tables first, but can simply ask MongoDB to do an insert, and it will create the table (or, in mongospeak, the collection) automatically for you.

Because MongoDB has no concept of transactions, and because Bucardo does not update, but does deletes plus inserts (for reasons I'll not get into today), there is one more trick Bucardo does when replicating to a MongoDB instance. A collection named 'bucardo_status' is created and updated at the start and the end of a sync (a replication event). Thus, your application can pause if it sees this table has a 'started' value, and wait until it sees 'complete' or 'failed'. Not foolproof by any means, but better than nothing :) You should, of course, carefully consider the way your app and Bucardo will coordinate things.

Feedback from Postgres or MongoDB folk is much appreciated: there are probably some rough edges, but as you can see from above, the basics are there are working. Feel free to email the bucardo-general mailing list or make a feature request / bug report on the Bucardo Bugzilla page.

Using Set Operators with Ruby Arrays

The Array class in Ruby has many methods that are extremely useful. I frequently find myself going to the RDoc just to review the different methods and keeping myself up-to-speed on what options are available for manipulating my data using the native methods. Often, I find that there is already a method that exists that can simplify a big chunk of code that I wrote that was confusing and complex.

In a recent project, I needed a way to handle a complex user interface problem that was caused by a many-to-many (has-and-belongs-to-many) database model. The solution that I came up with was an amazingly simple implementation for a problem that could have involved writing some very convoluted and complex algorithms that would have muddied my code and required me to write extensive tests. As it turns out, I had just read up on Array set operators (Ruby methods) and the solution became easier and monumentally more eloquent.

Introducing the Union, Difference, & Intersection
Since Arrays essentially act as a set[1], they can be manipulated using the set operations union, difference, and intersection. If you go do the Array rdoc, however, you'll notice no methods with these names. So here is a brief look at how they work:

Union
A union is essentially used to combine the unique values of two sets, or in this case, arrays. To perform a union on two arrays you use the pipe as an operator. For example:

[1, 2, 1, 2, 3] | [1, 2, 3, 4] #=> [1, 2, 3, 4]

Difference
Sometimes you just want to know what is different between two arrays. You can do this by using the difference method as an operator like so:

[1, 2, 3] - [3, 4, 5] #=> [1, 2]

Now, that may not have been exactly what you were expecting. Difference works by taking the elements on the left and comparing them to the elements on the right. Whatever is different in the left is what's returned. So the opposite of the above example looks like this:

[3, 4, 5] - [1, 2, 3] #=> [4, 5]

This subtle difference will be the key in the example I'm going to show later on that will elegantly solve a UI problem I mentioned earlier.

Intersection
The intersection of two sets are the elements that are common in both, and like the other set operators, it removes duplicates. To perform an intersection you use the ampersand method as an operator.

[1, 1, 3, 5] & [1, 2, 3] #=> [ 1, 3 ]

A Practical Use Case
Let's face it, building nice interfaces using HTML forms can be a challenge, especially when tying them to multiple models in Rails. Even Ryan Bates, creator of the amazing Railscasts website, took 2 episodes to show how to handle some complex nested tables. Although the example I'm showing here isn't nearly that complex, it does show how set operators can help out with some complex form handling.

Simple Bookshelf
For my example here, I'm going to construct a simple bookshelf application. The entire finished application can be found on under my github account. The idea is that we have a database table full of books. A user can create as many bookshelves as they want and place books on them. The database model for this will require a has-and-belongs-to-many association.
The ERD looks like this:

To set this up in Rails, I'll create a basic many to many association with the following code:

Approaching the UI
Now, in approaching how we are going to tackle assigning books to bookshelves, I want to display the list of books with checkboxes next to them under the bookshelf. When I check a book, I want that book to be added to my shelf. Likewise,when I uncheck the book, I want it removed.

The Implementation
The actual implementation here is grossly over simplified, but it illustrates what the concept well. I used nifty generators to setup some basic scaffolding for the books and bookshelves controllers. All the interesting code here will be done in the bookshelves controller and views. Let's look at the view first:
What we have here is a basic form for changing the name of my bookshelf. The interesting part here is where the books are displayed. In the controller I set @books to Book.all so that I can show all of the books with a checkbox next to them. There are a couple of things to notice that will be important later on. First, I'm using the check_box_tag helper will place the input tag outside the @bookshelf scope. Next for the checkbox name, I use "books[]". This will make it so that when the form is submitted, I will get a hash called books as one of my params to work with. The keys in the hash will be the id of the book. The values will all be "1". Next, I set the checkbox as checked if that book is already included in the @bookshelves assigned books.

Next, we'll look at the update action in the bookshelves controller.

Everything here is pretty standard except the call to the private method called sync_selected_books. This is the real meat and potatoes of what I want to illustrate here so I'll break it down in detail. First, if no books were checked, we wont have a params[:books] value. It will just be nil. So in that case, we are going to remove any associated books with a delete_all method. Next, if we do have any checked books, then I want to create an array that only has those selected books in them and assign it to checked_books. Then I'll get another array that has the currently selected books in them and assign it to current_books.

Using the set operators I described above, I'll be able to determine which books to remove and which books to add using difference. Now I can use some database friendly methods to make the changes.

What makes this nice is how simple it is to understand and to test. The code explains exactly what I want it to do. The beauty of this method is that when I put it together, it worked the first time. The other nice thing about this is how it plays well with the database. We only touch the rows that need to be touched and don't have to worry about the items that are the same.

Wrapping up
Using set operators to manipulate your arrays opens up a lot of possibilities that I hadn't considered before. It's worth your time to practice some of these operators and then use them in your projects where you need to manipulate the elements in multiple arrays.

Once again, the entire rails application I used for this illustration is located out on my github account at https://github.com/mikefarmer/Simple-Bookshelf

Footnotes
1. Ruby does have a Set class, but for my purposes here, I'm going to stick to thinking of arrays as sets as that's generally what we use in our Ruby applications.

SeniorNet

SeniorNet is an organization dedicated to bringing education and technology access to older adults (50+). I began volunteering with them recently, coaching an intermediate Windows class. My role is to shadow half the class, watching for anyone who gets "stuck" or goes astray, as the instructor leads them through basic operations such as formatting text, creating spreadsheets, and so on.

I started this as a way to give back to the community, and to explore things I might want to do later in my career (namely, teach), but I'm finding that this is giving me some unexpected insights in design (visual and functional) for less-experienced computer users. I'm sure there are lots and lots of formal studies on how over-50ers learn to use computers vs. under-20ers, but there's nothing like seeing it first-hand. I've already started mulling over how this new insight might affect the way I structured an on-line application (e.g., an e-commerce checkout) that would cater to an older audience. And while I don't have any answers there, I feel better (and humble) for having begun the process, even at this late date.*

*I'm among the very oldest, if not the actual oldest of the End Point crew, having begun my computer experience in the days of punch cards. That's one of the reasons I'm acceptable to SeniorNet, because they prefer their instructors and coaches to be contemporaries of their students.

June 8, 2011: World IPv6 Day

This post has 6 a lot

I'm a little surprised they didn't do it today. 06-06, what better day for IPv6? Oh well, at least Hurricane Electric was awesome enough to send a Sage certification shirt just in time!

June 8th, 2011 is the day! In just a couple days time World IPv6 Day begins. Several of the largest and most popular sites on the Internet, and many others, turn on IPv6 addresses for a 24-hour interval. Many of them already have it, but you have to seek it out on a separate address. Odds are if you're seeking that out specifically you're configured well enough to not have any problems. But with IPv6 configured on the primary addresses of some of the largest Internet sites, people that don't specifically know they're testing something become part of the test. That's important to track down exactly what composes those 1-problem-in-2,000 configurations, and assess if that's even an accurate number these days.

Not sure about your own connection? http://test-ipv6.com/ is an excellent location to run a number of tests and see how v6-enabled you are. Odds are you'll end up at one end of the spectrum or the other. But if there's a configuration glitch that could help you track it down.

At End Point we decided to get a bit of a head start. For the last 24 hours or so www.endpoint.com has been running with an IPv6 AAAA record. And it was pleasantly surprising, the first IPv6 hits started showing up nearly instantaneously! Our visitors are in general likely to be more on the technical side of the scale, but so far the results have been promising. By all accounts everything works as it should. Soon, we'll likely begin offering to enable it for some of our customer sites.

A part of me wants to express some disappointment that an event like this is even necessary. My favorite database project got IPv6 support way back in the PostgreSQL 7.4 release, now long under EOL. But at the same time I know what a huge undertaking the IPv6 migration is on a number of levels. Encouragement by providers and end user awareness go a long way to help things along. So I applaud those sites taking part in World IPv6 Day, helping pave the way for the next generation protocol.

I kinda feel for the tunnel providers, now that I think about it. I've had a Hurricane Electric tunnel that's been carrying my home traffic for quite a while now. So far, that's primarily been things like IRC over to Freenode, ssh traffic to the servers that have IPv6 addresses, a few random hits to sites that have it, etc. But with high bandwidth sites like YouTube and frequently hit CDN providers on board for World IPv6 Day I'd bet that those tunnels will see traffic spike dramatically.

Granted a number of the tunnel providers are running portions of the Internet backbone anyway, but there's only a handful of tunnel endpoints for that traffic to go through. It's also forcing it to travel over sections of the provider's network before it can find its way out to a peer. Both of these are pretty substantial barriers to route optimization, at least compared to native traffic. Don't get me wrong, even as it is the performance of the tunnel has been great. I just hope providing an arguably necessary (and free!) service isn't too painful to these providers while end-user deployments are still occurring.

Bucardo multi-master for PostgreSQL

The original Bucardo

The next version of Bucardo, a replication system for Postgres, is almost complete. The scope of the changes required a major version bump, so this Bucardo will start at version 5.0.0. Much of the innards was rewritten, with the following goals:

Multi-master support

Where "multi" means "as many as you want"! There are no more pushdelta (master to slaves) or swap (master to master) syncs: there is simply one sync where you tell it which databases to use, and what role they play. See examples below.

Ease of use

The bucardo program (previously known as 'bucardo_ctl') has been greatly improved, making all the administrative tasks such as adding tables, creating syncs, etc. much easier.

Performance

Much of the underlying architecture was improved, and sometimes rewritten, to make things go much faster. Most striking is the difference between the old multi-master "swap syncs" and the new method, which has been described as "orders of magnitudes" faster by early testers. We use async database calls whenever possible, and no longer have the bottleneck of a single large bucardo_delta table.

Improved logging

Not only are more details provided, there is now the ability to control how verbose the logs are. Just set the log_level parameter to terse, normal, verbose, or debug. Those who had busy systems, which was the equivalent of a 'debug' firehose, will really appreciate this.

Different targets

Who says your slave (target) databases need to be Postgres? In addition to the ability to write text SQL files (for say, shipping to a different system), you can have Bucardo push to other systems as well. Stay tuned for more details on this. (Update: there is a blog post about using MongoDB as a target)


This new version is not quite at beta yet, but you can try out a demo of multi-master on Postgres quie easily. Let's see if we can do it in ten steps.

I. Download all prerequisites

To run Bucardo, you will need a Postgres database (obviously), the DBIx::Safe module, the DBI and DBD::Pg modules, and (for the purposes of this demo) the pgbench utility. Systems vary, but on aptitude-based systems, one can grab all of the above like this:

aptitude install postgresql-server \
perl-DBIx-Safe \
perl-DBD-Pg \
postgresql-contrib

II. Grab the latest Bucardo

git clone git://bucardo.org/bucardo.git

III. Install the program

cd bucardo
perl Makefile.PL
make
sudo make install

You can ignore any errors that come up about ExtUtils::MakeMaker not being recent.

IV. Setup an instance of Bucardo

This step assumes there is a running Postgres available to connect to.

sudo mkdir /var/run/bucardo
sudo chown $USER /var/run/bucardo
bucardo install

V. Use the pgbench program to create some test tables

psql -c 'CREATE DATABASE btest1'
pgbench -i btest1
psql -c 'CREATE DATABASE btest2 TEMPLATE btest1'
psql -c 'CREATE DATABASE btest3 TEMPLATE btest1'
psql -c 'CREATE DATABASE btest4 TEMPLATE btest1'
psql -c 'CREATE DATABASE btest5 TEMPLATE btest1'

VI. Tell Bucardo about the databases and tables you are going to use

bucardo add db t1 dbname=btest1
bucardo add db t2 dbname=btest2
bucardo add db t3 dbname=btest3
bucardo add db t4 dbname=btest4
bucardo add db t5 dbname=btest5
bucardo list dbs

bucardo add table pgbench_accounts pgbench_branches pgbench_tellers herd=therd
bucardo list tables

A herd is simply a logical grouping of tables. We did not add the other pgbench table, pgbench_history, because it has no primary key or unique index.

VII. Group the databases together and set their roles

bucardo add dbgroup tgroup t1:source t2:source t3:source t4:source t5:target

We've grouped all five databases together, and made four of them masters (aka source), and one of them a slave (aka target). You can any combination of master and slaves you want, as long as there is at least one master.

VII. Create the Bucardo sync

bucardo add sync foobar herd=therd dbs=tgroup ping=false

Here we simply create a new sync, which is a controllable replication event, telling it which tables we want to replicate, and which databases we are going to use. We also set ping to false, which means that we will not create triggers to automatically fire off replication on any changes, but will do it manually. In a real world scenario, you generally do want those triggers, or want to set Bucardo to check periodically.

VIII. Start up Bucardo

bucardo start

If all went well, you should see some information in the log.bucardo file in the current directory.

IX. Make a bunch of changes on all the source databases.

pgbench -t 10000 btest1
pgbench -t 10000 btest2
pgbench -t 10000 btest3
pgbench -t 10000 btest4

Here, we've told pgbench to run ten thousand transactions against each of the first four databases. Triggers on these tables have captured the changes.

X. Kick off the sync and watch the fun.

bucardo kick foobar

You can now tail the log.bucardo file to see the fun, or simply run:

bucardo status

...to see what it is doing, and the final counts when we are done. Don't forget to stop Bucardo when you are done testing:

bucardo stop

The output of bucardo status, after the sync has completed, should look like this:

bucardo status
Name     State    Last good    Time    Last I/D/C           Last bad    Time
========+========+============+=======+====================+===========+=======
foobar | Good   | 17:58:37   | 3m2s  | 131836/131836/4785 | none      |

Here we see that this syncs has never failed ("Last bad"), the time of day of the last good run, how long ago it was from right now (3 minutes and 2 seconds), as well as details of the last successful run. Last I/D/C stands for number of inserts, deletes, and collisions across all databases for this syncs. This is just an overview of all syncs at a high level, but we can also give status an argument of a sync name to see more details like so:

bucardo status foobar
Last good                       : Jun 02, 2011 17:57:47 (time to run: 42s)
Rows deleted/inserted/conflicts : 131,836 / 131,836 / 4,785
Sync name                       : foobar
Current state                   : Good
Source herd/database            : therd / t1
Tables in sync                  : 3
Status                          : active
Check time                      : none
Overdue time                    : 00:00:00
Expired time                    : 00:00:00
Stayalive/Kidsalive             : yes / yes
Rebuild index                   : 0
Ping                            : no
Onetimecopy                     : 0
Post-copy analyze               : Yes
Last error:                     :

This gives us a little more information about the sync itself, as well as another important metric, how long the sync itself took to run, in this case, 42 seconds. That particular metric might make its way back to the overall "status" view above. Try things out and help us find bugs and improve Bucardo!

Paperclip in Spree: Extending Product Image Sizes

Spree uses the popular gem Paperclip for assigning images as attachments to products. The basic installation requires you to install the gem, create a migration to store the paperclip-specific fields in your model, add the has_attached_file information to the model with the attachment, add the ability to upload the file, and display the file in a view. In Spree, the Image model has an attached file with the following properties:

class Image < Asset
  ...
  has_attached_file :attachment,
                    :styles => { :mini => '48x48>',
                      :small => '100x100>',
                      :product => '240x240>',
                      :large => '600x600>'
                    },
                    :default_style => :product,
                    :url => "/assets/products/:id/:style/:basename.:extension",
                    :path => ":rails_root/public/assets/products/:id/:style/:basename.:extension"
  ...
end

As you can see, when an admin uploads an image, four image sizes are created: large, product, small, and mini.


Four images are created per product image uploaded in Spree (Note: not to scale).

Last week, I wanted to add several additional sizes to be created upon upload to improve performance. This involved several steps, described below.

Step 1: Extend attachment_definitions

First, I had to override the image attachment styles, with the code shown below. My application is running on Spree 0.11.2 (Rails 2.3.*), so this was added inside the extension activate method, but in Rails 3.0 versions of Spree, this would be added inside the engine's activate method.

Image.attachment_definitions[:attachment][:styles].merge!(
      :newsize1 => '200x200>',
      :newsize2 => '284x284>'
)

Step 2: Add Image Helper Methods

Spree has the following bit of code in its base_helper.rb, which in theory should create methods for calling each image (mini_image, small_image, product_image, large_image, newsize1_image, newsize2_image):

Image.attachment_definitions[:attachment][:styles].each do |style, v|
    define_method "#{style}_image" do |product, *options|
      options = options.first || {}
      if product.images.empty?
        image_tag "noimage/#{style}.jpg", options
      else
        image = product.images.first
        options.reverse_merge! :alt => image.alt.blank? ? product.name : image.alt
        image_tag image.attachment.url(style), options
      end
    end
  end

But for some reason in this application, perhaps based on order of extension evaluation, this was only applied to the original image sizes. I remedied this by adding the following code to my extension base helper:

  [:newsize1, newsize2].each do |style|
    define_method "#{style}_image" do |product, *options|
      options = options.first || {}
      if product.images.empty?
        image_tag "noimage/#{style}.jpg", options
      else
        image = product.images.first
        options.reverse_merge! :alt => image.alt.blank? ? product.name : image.alt
        image_tag image.attachment.url(style), options
      end 
    end 
  end 

Step 3: Create Cropped Images for Existing Images

Finally, instead of requiring all images to be re-uploaded to create the new cropped images, I wrote a quick bash script to generate images with the new sizes. This script was placed inside the RAILS_ROOT/public/assets/products/ directory, where product images are stored. The script iterates through each existing directory and creates cropped images based on the original uploaded image with the ImageMagick command-line tool, which is what Paperclip uses for resizing.

#!/bin/bash

for i in `ls */original/*`
do
    image_name=${i#*original\/}
    dir_name=${i/\/original\/$image_name/}
    mkdir $dir_name/newsize1/ $dir_name/newsize2/
    convert $i -resize '200x200' $dir_name/newsize1/$image_name
    convert $i -resize '284x284' $dir_name/newsize2/$image_name
    echo "created images for $i"
done

Step 4: Update Views

Finally, I added newsize1_image and newsize2_image methods throughout the views, e.g.:

<%= link_to newsize1_image(product), product %>
<%= link_to newsize2_image(taxon.products.first), seo_url(taxon) %>

Conclusion

It would be ideal to remove Step 2 described here by investigating why the image methods are not defined by the Spree core BaseHelper module. It's possible that this is working as expected on more recent versions of Spree. Other than that violation of the DRY principle, it is a fairly simple process to extend the Paperclip image settings to include additional sizes.