End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Postgres log_statement='all' should be your default


Modified version of image by Flickr user Ellesmere FNC

Setting the PostgreSQL log_statement parameter to 'all' is always your best choice; this article will explain why. PostgreSQL does not have many knobs to control logging. The main one is log_statement, which can be set to 'none' (do not ever set it to this!), 'ddl' or 'mod' (decent but flawed values), or 'all', which is what you should be using. In addition, you probably want to set log_connections = on, log_disconnections = on, and log_duration = on. Of course, if you do set all of those, don't forget to set log_min_duration_statement = -1 to turn it off completely, as it is no longer needed.

The common objections to setting log_statement to 'all' can be summed up as  Disk Space,   Performance, and  Noise. Each will be explained and countered below. The very good reasons for having it set to 'all' will be covered as well:  Troubleshooting,   Analytics, and   Forensics/Auditing.

Objection: Disk Space

The most common objection to logging all of your SQL statements is disk space. When log_statement is set to all, every action against the database is logged, from a simple SELECT 1; to a gigantic data warehousing query that is 300 lines long and takes seven hours to complete. As one can imagine, logging all queries generates large logs, very quickly. How much depends on your particular system of course. Luckily, the amount of space is very easy to test: just flip log_statement='all' in your postgresql.conf, and reload your database (no restart required!). Let it run for 15 minutes or so and you will have a decent starting point for extrapolating daily disk space usage. For most of our clients, the median is probably around 30MB per day, but we have some as low as 1MB and some over 50GB! Disk space is cheap, but if it is really an issue to save everything, one solution is to dynamically ship the logs to a different box via syslog (see below). Another not-as-good option is to simply purge older logs, or at least ship the older logs to a separate server, or perhaps to tape. Finally, you could write a quick script to remove common and uninteresting lines (say, all selects below 10 milliseconds) from older logs.

Objection: Performance

Another objection is that writing all those logs is going to harm the performance of the server the database is on. A valid concern, although the actual impact can be easily measured by toggling the value temporarily. The primary performance issue is I/O, as all those logs have to get written to a disk. The first solution is to make sure the logs are going to a different hard disk, thus reducing contention with anything else on the server. Additionally, you can configure this disk differently, as it will be heavy write/append with little to no random read access. The best filesystems for handling this sort of thing seem to be ext2 and ext4. A better solution is to trade the I/O hit for a network hit, and use syslog (or better, rsyslog) to ship the logs to a different server. Doing this is usually as simple as setting log_destination = 'syslog' in your postgresql.conf and adjusting your [r]syslog.conf. This has many advantages: if shipping to a local server, you can often go over a non-public network interface, and thus not impact the database server at all. This other server can also be queried at will, without affecting the performance of the database server. This means heavy analytics, e.g. running pgsi or pgfouine, can run without fear of impacting production. It can also be easier to provision this other server with larger disks than to mess around with the production database server.

Objection: Noise

A final objection is that the log files get so large and noisy, they are hard to read. Certainly, if you are used to reading sparse logs, this will be a change that will take some getting used to. One should not be reading logs manually anyway: there are tools to do that. If all your logs were showing before was log_min_duration_statement, you can get the same effect (in a prettier format!) by using the 'duration' mode of the tail_n_mail program, which also lets you pick your own minimum duration and then sorts them from longest to shortest.


Image by Flickr user auntjojo

Advantage: Troubleshooting

When things go wrong, being able to see exactly what is happening in your database can be crucial. Additionally, being able to look back and see what was going on can be invaluable. I cannot count the number of times that full logging has made debugging a production issue easier. Without this logging, the only option sometimes is to switch log_statement to all and then wait for the error to pop up again! Don't let that happen to you - log heavy preemptively. This is not just useful for tracking direct database problems; often the database trail can enable a DBA to work with application developers to see exactly what their application is doing and where things started to go wrong. On that note, it is a good idea to log as verbosely as you can for everything in your stack, from the database to the application to the httpd server: you never know which logs you may need at a moment's notice when major problems arise.

Advantage: Analytics

If the only logging you are doing is those queries that happen to be longer than you log_min_duration_statement, you have a very skewed and incomplete view of your database activity. Certainly one can view the slowest queries and try to speed them up, but tools like pgsi are designed to parse full logs: the impact of thousands of "fast" queries can often be more stressful on your server than a few long-running queries, but without full logging you will never know. You also won't know if those long-running queries sometimes (or often!) run faster than log_min_duration_statement.

We do have some clients that cannot do log_statement = 'all', but we still want to use pgsi, so what we do is turn on full logging for a period of time via cron (usually 15 minutes, during a busy time of day), then turn it off and run pgsi on that slice of full logging. Not ideal, but better than trying to crunch numbers using incomplete logs.

Advantage: Forensics/Auditing

Full logging via log_statement='all' is extremely important if you need to know exactly what commands a particular user or process has run. This is not just relevant to SQL injection attacks, but for rogue users, lost laptops, and any situation in which someone has done something unknown to your database. Not every one of these situations will be noticeable, such as the infamous DROP TABLE students;: often it involves updating a few critical rows, modifying some functions, or simply copying a table to disk. The *only* way to know exactly what was done is to have log_statement = 'all'. Luckily, this parameter cannot be turned off by clients: one must edit the postgresql.conf file and then reload the server.

The advantages of complete logging should outweigh the disadvantages, except in the most extreme cases. Certainly, it is better to start from a position of setting Postgres' log_statement to 'all' and defending any change to a lesser setting. Someday it may save your bacon. Disagreements welcome in the comment section!

Respecting API Call Limit with Radian6

A common practice in the online API world is to enforce the call limit. Twitter allows 150 API calls per hour. Shopify has 500 API calls per 5 minutes limit. You may learn how to work with Shopify call limit from this great article.

One of our projects was built around interaction with Radian6 platform. Radian6 is a new and growing data mining service with the default limit of 100 calls per hour. I will describe our take on the call limit implementation.

Introducing the call counter

First, we need to know how many calls have been executed in the current hour. Every external call increments the counter field on a special model until the counter reaches the limit. The counter is reset back to zero at the beginning of every hour.

script/rails g model RadianCallCount

In the migration:

class CreateRadianCallCounts < ActiveRecord::Migration
  def change
    create_table :radian_call_counts do |t|
      t.integer :count
      t.timestamps
    end
  end
end

In db/seeds.rb file

puts "Initializing Radian6 counter"
RadianCallCount.delete_all
RadianCallCount.create(:count => 0)

Let's roll the counter!

rake db:migrate
rake db:seed

Scheduling the counter reset

It is necessary to reset the counter back to zero in the beginning of each hour otherwise the subsequent calls will not be executed. The excellent 'whenever' gem will take care of this.

gem install whenever
cd your_webapp
wheneverize .

In the model:

class RadianCallCount < ActiveRecord::Base
  def self.reset
    RadianCallCount.first.update_attribute(:count, 0)
  end
end

In config/schedule.rb:

every :hour do 
   runner "RadianCallCount.reset"
end

Tracking call count

We will now use the rcapture gem to intercept a call to external API and increment the counter with it.

gem install rcapture

In the module containing all Radian-specific methods:

require 'rcapture'
API_LIMIT = 100
def self.included(base)
  base.extend RCapture::Interceptable
  base.capture_pre :methods => [:authenticate,:tweet_stats] do |cs|
    RadianCallCount.transaction do 
      calls_per_hour = RadianCallCount.first.count 
      allowed = (calls_per_hour < Radian::API_LIMIT)
      cs.predicate = allowed
      cs.return = false 
      RadianCallCount.first.increment!(:count) if allowed
    end
  end
end

The code introduces a simple check before 'authenticate' and 'tweets_stats' methods. If call count exceeds the allowed limit the method is not executed and the method returns false. Otherwise, the counter increments after the successful method execution. We wrap the code in transaction because the actual count in the database may increase while we are making the API_LIMIT comparison.

Making the limit-aware call

Everything is ready to make the non-blocking API call. I scheduled a twitter statistics update to run every 3 hours:

every 3.hours do
   runner "Article.tweet_stats"
end

The non-blocking calls are suitable for most situations. Sometimes there is a need to just keep trying...

def call_with_timeout(&block)
  timeout = 0.minutes 
  results = false 
  while !results do
    results = block.call
    if !results
      break if timeout >= Radian::API_MAX_TIMEOUT
      Rails.logger.info("Sleeping for 5 minutes")
      sleep(Radian::API_CALL_TIMEOUT)
      timeout += Radian::API_CALL_TIMEOUT
    end
  end
  results 
end
...
call_with_timeout { tweet_stats }

That is all for today. Thanks for your attention. Hope the post was useful.

Three Things: ImageMagick, RequestBin, Responsivness

It's been a while since I wrote a Three Things blog article, where I share tidbits that I've learned that don't quite make it into a full blog post, but are still worth sharing!

Image Stacking via Command Line

I recently needed to stack several images to CSS spritify them. I did some quick searching and found the following ImageMagick command line functionality:

convert add.png add_ro.png -append test.png
convert add.png add_ro.png +append test.png

The -append argument will stack the images vertically, and +append will stack the images horizontally.

RequestBin

Recently, I needed to quickly examine the contents sent during a payment transaction. Richard pointed me to RequestBin a neat and quick tool for examining the contents of an external request. I was testing the Elavon payment gateway integrated in an application using Piggybak. In my code, I made the following change:

class ElavonGateway < ViaklixGateway
  self.test_url = 'https://demo.myvirtualmerchant.com/VirtualMerchantDemo/process.do'
  self.live_url = 'https://www.myvirtualmerchant.com/VirtualMerchant/process.do'
+ self.test_url = 'http://requestb.in/abc1def2'

I ran through a single checkout, then visited the RequestBin inspect URL to examine the variables passed to the payment gateway, to see the following data (the text overwrites itself in Chrome):


RequestBin example output.

A Responsiveness Tool

I’ve been working on responsiveness for a couple of sites recently. One of my clients pointed out responsivepx, which allows you to toggle the width of your browser to review various responsive layouts.

Google I/O 2012, Day 1

I'm here in San Francisco at Google I/O 2012 with coworkers Kiel, Matt, and Ben.

Today began with Sergey Brin giving a really great keynote speech introducing a ton of new Google products and features, including Android 4.1 Jelly Bean, Google's new Nexus 7" tablet, the Nexus Q, and the addition of magazines and TV to the Google Play Store. Not to mention Google Glass, still under heavy development.

Back in the "sandbox" demo area, we've been showing attendees this cool prototype cockpit Liquid Galaxy. A cool new feature introduced with this Google Earth build is helicopter imagery, which is a 3D building layer created by taking photos of the landscape from different angles. I have some pictures for you.

Google I/O attendees looking at the Galaxy:

Andrew Leahy explaining an aspect of the Liquid Galaxy to an interested attendee:

Setting up beforehand:

As it turned out, this isn't the Liquid Galaxy that was intended for Google I/O. The newest cockpit frame was supposed to arrive some days ago, and was only found this morning. Hopefully we'll be able to set it up tonight to have a better Liquid Galaxy experience for the next few days!

Speeding Up Integration Tests with PostgreSQL - Follow Up

Last week I wrote a blog article about speeding up integration tests using PostgreSQL. I proposed there changing a couple of PostgreSQL cluster settings. The main drawback of this method is that those settings need to be changed for the whole cluster. When you have some important data in other databases, you can have a trouble.

In one of the comments Greg proposed using the unlogged table. This feature appeared in PostgreSQL 9.1. The whole difference is that you should use CREATE UNLOGGED TABLE instead of CREATE TABLE for creating all your tables.

For the unlogged table, the data is not written to the write-ahead log. All inserts, updates and deletes are much faster, however the table will be truncated at the server crash or unclean shutdown. Such table is not replicated to standby servers, which is obvious as there are replicated write-ahead logs. What is more important, the indexes created on unlogged tables are unlogged as well.

All the things I describe here are for integrations tests. When database crashes, then all the tests should be restarted and should prepare the database before running, so I really don't care what happens with the data when something crashes.

The bad thing about unlogged tables is that you cannot change normal table to unlogged. There is nothing like:

ALTER TABLE SET UNLOGGED

The easiest way which I found for changing the table into unlogged was to create a database dump and add UNLOGGED to all the table creation commands. To have it a little bit faster, I used this command:

pg_dump pbench | sed 's/^CREATE TABLE/CREATE UNLOGGED TABLE/' > pbench.dump.sql

This time I will just delete all the tables in the database and load this dump before running the tests instead of using pg_bench for generating the data:

psql -c "drop database pbench"
psql -c "create database pbench"
psql pbench < pbench.dump.sql

Time for tests. The previous tests results are in the previous blog article. I'm using standard PostgreSQL settings (the secure ones) and the same scale value for pg_bench.

The tests were made using exactly the same command as last time:

./pgbench -h localhost -c 1 -j 1 -T 300 pbench

Below are results combined with the results from previous article.

  number of clients and threads
  1 2 3
normal settings 78 tps 80 tps 99 tps
dangerous settings 414 tps 905 tps 1215 tps
unlogged table 420 tps 720 tps 1126 tps

As you can see, the efficiency with unlogged tables is almost as good as with the unsafe settings. The great thing is that it doesn't influence other databases in the cluster, so you can use safe/default settings for other databases, and only use unlogged tables for the integration tests, which should be much faster now.

This solution works only with the PostgreSQL 9.1 and newer. If you have older PostgreSQL you have to use the previous method with unsafe settings, or better: just upgrade the database.

KISS: Slurping up File Attachments

I've been heavily involved in an ecommerce project running on Rails 3, using Piggybak, RailsAdmin, Paperclip for file attachment management, nginx and unicorn. One thing that we've struggled with is handling large file uploads both in the RailsAdmin import process as well as from the standard RailsAdmin edit page. Nginx is configured to limit request size and duration, which is a problem for some of the large files that are uploaded, which are large purchasable, downloadable files.

To allow these uploads, I brainstormed how to decouple the file upload from the import and update process. Phunk recently worked on integration of Resque, a popular Rails queueing tool which worked nicely. However, I ultimately decided that I wanted to go down a simpler route. The implementation is described below.

Upload Status

First, I created an UploadStatus model, to track the status of any file uploads. With RailsAdmin, there's an automagic CRUD interface connected to this model. Here's what the migration looked like:

class CreateUploadStatuses < ActiveRecord::Migration
  def change
    create_table :upload_statuses do |t|
      t.string :filename, :nil => false
      t.boolean :success, :nil => false, :default => false
      t.string :message

      t.timestamps
    end
  end
end

RailsAdmin also leverages CanCan, so I updated my ability class to allow list, reads, and delete on the UploadStatus table only, since there is no need to edit these records:

      cannot [:create, :export, :edit], UploadStatus
      can [:delete, :read], UploadStatus

KISS Script

Here's the simplified rake task that I used for the process:

namespace :upload_files do
  task :run => :environment do
    files = Dir.glob("#{Rails.root}/to_upload/*.*")
    files.each do |full_filename|
      begin
        ext = File.extname(full_filename)
        name = File.basename(full_filename, ext)

        (klass_name, field, id) = name.split(':')
        klass = klass_name.classify.constantize
        item = klass.find(id)  

        if item.nil?
          UploadStatus.create(:filename => "#{name}#{ext}", :message => "Could not find item from #{id}.")
          next
        end

        item.send("#{field}=", File.open(full_filename))

        if item.save
          FileUtils.rm(full_filename)
          UploadStatus.create(:filename => "#{name}#{ext}", :success => true)
        end
      rescue Exception => e
        UploadStatus.create(:filename => "#{name}#{ext}", :message => "#{e.inspect}")
      end
    end
  end
end

And here's how the process breaks down:

  1. The script iterates through files in the #{Rails.root}/to_upload directory (lines 3-4).
  2. Based on the filename, in the format "class_name:field:id.extension", the item to be updated is retrieved (line 11).
  3. If the item does not exist, an upload_status record is created with a message that notes the item could not be found (lines 13-16).
  4. If the file exists and the update occurs, the original file is deleted, and a successful upload status is recorded (lines 18-23).
  5. If the process fails anywhere, the exception is logged in a new upload status record (lines 24-26).

This rake task is then called via a nightly cron job to slurp up the files. The simple script eliminates the requirement to upload large files via the admin interface, and decouples the upload from Paperclip/database management. It also has the added benefit of reporting the status to the administrators by leveraging RailsAdmin. Many features can be added to it, but it does the job that we need without much development overhead.

Simple Example of Dependency Injection with Rails

Today I came across a great opportunity to illustrate dependency injection in a simple context. I had a Rails partial that was duplicated across two subclasses. The partial was responsible for displaying options to create a new record from the data of the current record. It also offered two types of copy, shallow and deep. The shallow copy used a button to POST data, while the deep copy offered a form with some additional options. The only difference between the partials was the path to post data to. Let's see this in code.

#app/views/fun_event/_copy_options.html.erb
button_to(t("create_and_edit_shallow_copy"), fun_event_path(:from_event => @event.id, :return => true), :    id => "shallow_copy_btn")

form_tag(fun_event_path(:return => true)) do
  #form code
end


#app/views/boring_event/_copy_options.html.erb
button_to(t("create_and_edit_shallow_copy"), boring_event_path(:from_event => @event.id, :return => true), :    id => "shallow_copy_btn")

form_tag(boring_event_path(:return => true)) do
  #form code
end

The first, failed iteration

To remove the duplication, I passed in a path option into the partial, replacing specific references with the generic.

#app/views/fun_events/copy.html.erb
<%= render :partial => "events/copy_options", :event_path => fun_event_path %>


#app/views/boring_events/copy.html.erb
<%= render :partial => "events/copy_options, :event_path => boring_event_path %>

#app/views/events/_copy_options.html.erb
button_to(t("create_and_edit_shallow_copy"), event_path(:from_event => @event.id, :return => true), :    id => "shallow_copy_btn")

form_tag(event_path(:return => true)) do
  #form code
end

Can you guess where this led?

undefined method `event_path' for ActionView::Base:0xd6acf18

Dude! Inject the dependency!

Obviously the event_path variable I was passing was a string, not a method. I needed the method so I could pass in the appropriate arguments to construct the URL I needed. Had there not been two different calls to the routes, I would likely have just passed in the string needed in each context. But in this case, I was forced to think outside the box. Here's what I ended up with.

#app/views/fun_events/copy.html.erb
<%= render :partial => "events/copy_options", :event_path => method(:fun_event_path) %>


#app/views/boring_events/copy.html.erb
<%= render :partial => "events/copy_options, :event_path => method(:boring_event_path) %>

#app/views/events/_copy_options.html.erb
button_to(t("create_and_edit_shallow_copy"), event_path.call(:from_event => @event.id, :return => true), :    id => "shallow_copy_btn")

form_tag(event_path.call(:return => true)) do
  #form code
end

The changes are really quite subtle, but we use Object's method method to pass the reference to the method we want to call, and simply pass in the arguments when needed. Mind == Blown

.rbenv and Passenger: Working through an Upgrade

Yesterday, I worked on upgrading the Piggybak demo application, which runs on Piggybak, an open source Ruby on Rails ecommerce plugin developed and maintained by End Point. The demo was running on Ruby 1.8.7 and Rails 3.1.3, but I wanted to update it to Ruby 1.9.* and Rails 3.2.6 to take advantage of improved performance in Ruby and the recent Rails security updates. I also wanted to update the Piggybak version, since there have been several recent bug fixes and commits.

One of the constraints with the upgrade was that I wanted to upgrade via .rbenv, because End Point has been happily using .rbenv recently. Below are the steps Richard and I went through for the upgrade, as well as a minor Passenger issue.

Step 1: .rbenv Installation

First, I followed the instructions here to install rbenv and Ruby 1.9.3 locally under the user that Piggybak runs under (let's call it the steph user). I set the local Ruby version to my local install. I also installed bundler using the local Ruby version.

Step 2: bundle update

Next, I blew away the existing bundle config for my application, as well as the installed bundler gem files for the application. I followed the standard steps to install and update the new gems with the local updated Ruby and updated Rails. Then I restarted the app.

Step 3: Fail

At this point, my application would not restart, and the backtrace complained of a Passenger issue, and it referenced Ruby 1.8. Richard and I investigated the errors and concluced that the application's Passenger configuration was still referencing the system Ruby install and the outdated Passenger installation.

Here's where I hit the catch 22: I needed root access to update the passenger.conf as well as to install Passenger against Ruby 1.9.3. This defeated the purpose of using .rbenv and working with a local Ruby install only.

Step 4: Local Passenger Installation

To install Passenger against the local Ruby version, I decided to install it as the steph user. First, I installed the gem:

gem install passenger

Then, I went to the local installed version of Passenger to run the installation:

cd /home/steph/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/bin
./passenger-install-apache2-module

Next, I copied the passenger installation output to the passenger.conf file:

   LoadModule passenger_module /home/steph/.rbenv/versions/1.9.3-p194/lib/ruby/gems/\
     1.9.1/gems/passenger-3.0.13/ext/apache2/mod_passenger.so
   PassengerRoot /home/steph/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/passenger-3.0.13
   PassengerRuby /home/steph/.rbenv/versions/1.9.3-p194/bin/ruby

With a server restart, the Piggybak demo was up and running on updated Ruby and Rails!

Conclusion

Retrospectively, I could have avoided the Passenger issue by installing Ruby 1.9.3 on the server as root, because there isn't much else on the server. But I like using .rbenv and it's possible that a Passenger upgrade won't be required with every Ruby update, so the new Passenger configuration is acceptable [to me, for now].

Speeding Up Integration Tests with PostgreSQL

Many people tend to say they don't want to write tests. One of the reasons is usually that the tests are too slow. The tests can be slow because they are written in a bad way. They can also be slow because of slow components. One such component is usually a database.

The great thing about PostgreSQL is that all types of queries are transactional. It simply means that you can start a transaction, then run the test, which can add, delete and update all the data and database structure it wants. At the end of the integration test, there should be called rollback which just reverts all the changes. It means the next test will always have the same database structure and data before running, and you don't need to manually clear anything.

For running the integration tests we need a test database. One of the most important things when running test is speed. Tests should be fast, programmers don't like to wait ages just to know that there is something wrong.

We can also have a read only databases for the tests. Then you don't need to worry about the transactions, however you always need to ensure the tests won't change anything. Even if you assume your tests won't make any changes, it is always better to use a new transaction for each test and rollback at the end.

The main idea for fast integration tests using PostgreSQL is that those tests don't change anything in the database. If they don't change, we don't need to worry about some possible data loss when the database suddenly restarts. Then we can just restart the tests. Tests should prepare the data before running, assuming the database is in unknown state.

This database should be as fast as possible, even if it means losing data when some unusual things happen. Normally PostgreSQL works really great when someone turns off the server plug suddenly or kills the database process. It just doesn't lose the data.

However we really don't need this stuff when running the tests. The database can be loaded before running tests. If the database is suddenly shut down, we should restart the tests.

The simplest thing is to change a couple of settings which enable great secure writes, however it slows down the database. We don't need to have secure writes, they are only important when something crashes. Then we should restart all the components used for integration tests and load database before testing.

For testing I will use pgbench program which makes a test similar to TPC-B. The tests prepare the data in four tables and then performs a simple transaction:

BEGIN;

UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

END;
where the params are randomly chosen during execution.

The database is created normally using SQL query:

  CREATE DATABASE pbench;

Before running the tests, pgbench has to prepare initial data. This is done using the -i param. My computer is not very slow, so the default size of the database is too small, I used a quite larger database size using the -s 25 param. This way the database size is about 380MB, including indexes.

  ./pgbench -h localhost pbench -i -s 25

The first test will just run using standard PostgreSQL configuration settings and will run for 5 minutes (the -T 300 param).

  ./pgbench -h localhost -T 300 pbench

The initial results show about 80 transactions per second (tps).

  number of clients: 1
  number of threads: 1
  duration: 300 s
  number of transactions actually processed: 23587
  tps = 78.621158 (including connections establishing)
  tps = 78.624383 (excluding connections establishing)

You may probably noticed the "number of clients" and "number of threads" values. It is the scenario where you have sequential tests, so all of them are run one by one. However integration tests written in a good way can be run in parallel, so let's run the pgbench once again, but with three threads and three clients.

  ./pgbench -h localhost -c 3 -j 3 -T 300 pbench 

The results show that it is a little bit faster now:

  number of clients: 3
  number of threads: 3
  duration: 300 s
  number of transactions actually processed: 29782
  tps = 99.268609 (including connections establishing)
  tps = 99.273464 (excluding connections establishing)

Let's now change the PostgreSQL settings to some more dangerous, so it can lose some data when shut down suddenly, but in fact I don't care as all the data is loaded just before running the test.

I've written at the end of the postgresql.conf file the following lines:

  fsync = off                # turns forced synchronization on or off
  synchronous_commit = off   # synchronization level; on, off, or local
  full_page_writes = off     # recover from partial page writes

Those changes need a database restart, and after restarting PostgreSQL, I just run the pgbench tests once again.

All the results are in the following table:

  number of clients and threads
  1 2 3
normal settings 78 tps 80 tps 99 tps
dangerous settings 414 tps 905 tps 1215 tps
change ratio 531 % 1131 % 1227 %

As you can see, you can do three simple things to speed up your integration tests using PostgreSQL:

  • Change default PostgreSQL settings to speed the database up.
  • Change your tests to run in parallel.
  • Run each test in one transaction.
This way I managed to speed up the tests from 78 tps to 1215 tps. It means that the integration tests which normally run in 60 minutes, should now run in 4 minutes.

I've also played with many other settings which could have some impact on the PostgreSQL speed. They really have, however the impact is so small that I don't think it is worth mentioning here. Changing those three settings can make the PostgreSQL fast enough.

Moving a Commit to Another Branch in Git

Perhaps you’ve made the same mistake I have. You’re right in the middle of developing a feature when a request comes up to fix a different completely unrelated problem. So, you jump right in and fix the issue and then you realize you forgot to start a new git feature branch. Suddenly you realize that you need to merge just the fix you made, but don’t want to merge the commits from the previous feature your working on.

Git rocks at manipulating branches and I knew this, but I wasn’t sure how to just move one commit to the master branch. After some digging and a little trial and error, I finally figured it out. This may not be the simplest approach, but it worked for me and wanted to share.

The branches I’ll be working with are master and feature. In the current scenario, the feature branch is 4 commits ahead of the master and the branch that I want to bring over is just the most recent.

First things first, I need to ensure my master branch is up to date.

git checkout master
git pull origin master

Then I’ll checkout my feature branch and make sure it’s completely up to date with the master branch.

git checkout feature
git rebase origin/master

Next, I’ll create a temporary feature branch that I’ll use later on to bring over the commit that I want.

git checkout -b feature_tmp

I’ll do the same for master so that I can perform my merging and rebasing in isolation from the master branch.

git checkout master
git checkout -b master_tmp

Now I’m going to merge the two tmp branches so that I have a history that contains all of my commits. This will give me the history that I want, but will include the 3 commits I don’t want.

git merge feature_tmp

Here’s where the magic happens. I’m going to rebase this branch using interactive mode. I want to rebase everything back to the last commit on the master branch. For simplicity in the commands here, we’ll just use SHA-MASTER in place of the actual SHA1 hash.

git rebase -i SHA-MASTER

This loads the commits into my editor and from here I just delete the 3 commits that I didn’t want on my master branch. This will give me the history I want with the 4th commit coming right after the last commit on the master branch. After deleting the commits, I just save and quit my editor.

Next, I merge my tmp branch into the master branch.

git checkout master
git merge master_tmp
git log

Now in the log, I can see the history is in the correct order, just how I wanted it. To finish things up, I’ll just push my changes and then rebase my feature branch which will reorder my commits to match the master branch and place my feature commits as the last three commits in the log.

git push origin master
git checkout feature
git rebase origin/master
git log

The last thing to do is delete my tmp branches.

git branch -D tmp_master
git branch -D tmp_feature

Guard Cucumber Command Prefix

Guard is an incredibly useful ruby gem that allows you to monitor changes to files and then execute commands when files change. Some of the common uses of guard are to watch code changes and then automatically execute your test suite. Guard comes with a plugin framework that allows developers to write specific watches. Some common plugins are guard-rspec and guard-cucumber. You can see a list of over 120 plugins on the rubygems website.

Yesterday I was working on some cucumber tests and wanted to use guard to automatically run my tests. I currently run all of my cucumber tests using capybara-webkit to allow me to run my tests in the terminal. To do so, I need to run xvfb-run bundle exec cucumber. The xvfb-run command allows the test to run in a headless X11 window. The problem is that the guard-cucumber plugin didn’t allow for a command prefix so my tests wouldn’t run correctly.

Thankfully, the guard-cucumber plugin is available on github and I was able to fork the project and add an option to allow a prefix. When it was completed, I added some documentation and a test and then made sure the tests all passed. I thought others may want this functionality as well so I sent a pull request to the project owner and was happy to see that within an hour it was accepted and merged into the project.

Github really rocks for this kind of development and I was glad to add to the project in a positive way. I get the functionality I need and the community at large does as well. Open source rocks!

Company Meeting Wrap-Up


End Pointers at our annual company meeting!

Many of us End Pointers are back to work after last week's annual company meeting of 2012 held in New York City. We attended a 3-day conference full of technical tips, client updates, and general company news. Everyone participated in live blogging during the conference, producing valuable documentation to allow us to revisit what was discussed. Here's a list of all the articles in case you missed any, or felt overwhelmed by the articles in the RSS feed:

Liquid Galaxy

Liquid Galaxy: Past, Present, and Future
Liquid Galaxy: A Tour
Liquid Galaxy: Trials and Tribulations
Liquid Galaxy: Google Earth Tours

Client Updates

Client Update: CityPass
Client Update: Berkman Center
Client Update: CSI
Client Update: Paper Source
Client Update: College District
Client Update: Locate Express
Client Update: World of Powersports

Client Case Studies

Case Study: SQL Injection
Case Study: Working with Spree
Case Study: Interchange Caching

General Tips & Company News

End Point's Ruby Slippers
Writing Tips
Popular Mobile Apps
Piggybak

Hosting and SysAdmin

IPv6 Basics
OpenSSH Tips and Tricks
Security and Paranoia
Chef and Puppet Overview

Development Miscellany

Clean Editor and Git Workflow Tips
Git Workflows
Popular jQuery Modules
Tips on PayPal Integration
RailsAdmin and Django


Bowling night! Here we are, divided up into our teams.

Since the majority of us work remotely, the meeting also served as a good time to socialize and visit with our coworkers. We participated in a friendly bowling competition, a "Brain" bowl, went out to dinner, and saw some popular sites in NYC.


View from the Empire State Building.

Locate Express presentation

Jon Jensen gave a presentation about our client Locate Express, which is a locator service website that allows the user to instantaneously locate a professional in a specific area. The site was launched in 2011 and has been picking up in popularity ever since.

IMG_0879.JPG

You can "Find a Pro", as they put it, in three easy steps:

  • Step 1, you select the service you are looking for. You can select anything from lawn care to plumbing.
  • Step 2, you enter the address location you would like to locate the closest "pros" for the specified service.
  • Step 3 shows you a list of pros for the selected service on a map. You can then select the pro of your choosing.

This is a very useful and easy to use service. I think Locate Express has great potential to catch on as it continues to gain traction, especially for service providers carrying "smart" mobile devices. In particular I think this will become more popular in utility emergencies on the weekends and after-hours, when it might be more difficult to find someone to fix your broken water pipe.

Popular Mobile Apps from Brian and Adam

In our last lightning talk of the conference, Brian Dillon and Adam Vollrath shared some of their favorite mobile apps. Brian queried the room to see the phone type divide in the room. Here was our split: 33% - iPhones, 33% - Androids, 33% - unsmart phones.

IMG_0881.JPG

Brian likes pure simplicity for iPhone apps:

  • Dropkick, categorization, syncing, require account creation
  • Simplenote: same thing, but for notes

IMG_0883.JPG

Adam has an Android "because Google owns him":

  • Google Googles: Rumored Google technology is integrated into this. Google tries to ad-hoc identify logos, book covers, automatic translation of images, images, QR codes.
  • K9 Mail: email client
  • Andchat: IRC client
  • WiFi Analyzer: provides site survey on network traffic
  • SpeedTest

IPv6 Basics by Josh Williams

Josh Williams reviewed World IPv6 Launch that occurred on 2012-06-06. Google, Yahoo, Bing and Facebook are some of the larger companies that launched on that date. While not exactly on the same scale, endpoint.com did a little over a year earlier.

As many of you may already know, the pool of (32-bit) IPv4 IP addresses that we're all familiar with, has officially run out. IPv6, defined in 1998, has 128-bit addresses that are constructed by 8 hexadecimal segments.

What are some of the advantages of having the huge pool of addresses? Well, there's really no longer any need for NAT - everything becomes directly routable. IPv6 also gives us stateless auto-configuration that uses your MAC address to create your IPv6 address, so this means you're that much more likely to be able to successfully get the same network address. IPsec is optionally built in to IPv6. (Optional because not everything truly needs IPsec.)

Not sure if your computer or mobile device is using IPv6 yet? Hit http://testipv6.com and it'll tell you. Welcome to the future, finally.

World of Powersports Client Report

World of Powersports is a family of websites that runs on Interchange. Carl Bailey describes how a few years after working on their initial website, World of Powersports came to End Point to develop a new website called Dealer Orders which has been very successful. This has allowed End Point the opportunity to work on several other related websites for the client.

IMG_0862.JPG

Since then, we have worked on several other sites including:

All of the websites pull from a single database that is fed by various APIs from parts vendors such as Honda, Suzuki, and Polaris. This updates the inventory counts and other related information for all of the sites. It also interacts with online sites such as eBay, Google Base, and Amazon for checking part availability and pricing.

Implementing the interactions between these different entities has provided End Point with much of the challenge of these sites but continues to provide the client and customers with great value.

Interchange Caching Implementation Under Fire

Richard and David presented a recent case study on an e-commerce hosting client.

IMG_0856.JPG

Several Interchange catalogs drive their individually branded storefronts, on top of a standard single-server LAMP stack boosted by an SSD drive.

Last year the sites came under an intense Distributed Denial of Service attack which lasted nearly two weeks. End Point responded immediately and soon engaged third-party DDoS mitigation firms. This experience later prompted an Interchange caching implementation.

Cache population and expiration is difficult for any dynamic web application using sessions, and doubly so for e-commerce sites. Every shopping cart needs a session, but delaying session creation until the first POST submission enables efficient caching for most of the sitemap. Other Interchange caching improvements made it back into the upstream code.

Developing a Spree Application

Mike Farmer presented one of the projects he worked on. He gave us a very detailed overview of integration of a Spree application into Facebook for a large client.

He also described the changes that were made in the Spree project to customize it to the client's needs.

The most interesting thing in this Spree shop usage is that the company's clients make extensive use of the admin panel and they can sell their own products for other clients.

Mike showed us a great overview of writing better rails applications using better object oriented code along with TDD and extensive SASS usage. He also described the great tools he uses including Screen and Vagrant.

Mike also talked about the things he learned during working on this project, especially about Spree and Ruby.

When not writing in Bash, Perl, Python, and Ruby we write in English: Writing Tips

It is as important for our team to improve our skills writing for humans as it is writing code for computers, says our CTO Jon Jensen. He's right. Thankfully he had a good list of tips for us to make our human writing as effective as our code.

IMG_0845.JPG

Here are a few great gems:

  • Don't be afraid of bullet-points ;-)
  • Keep it short
  • Understand your audience

Jon's recommendations led to some End Pointers to share writing tips of their own:

  1. Use the final thought of your first draft in the beginning of the final message
  2. Writing out a lot, and cutting down 90% of it
  3. When it comes to emails: Lay out clear actionable steps you want a person to take in the beginning or in the last sentence of an email, not buried in a thick paragraph.
  4. For important, longer documents, do editing the day after
  5. Always re-read anything before you send it
  6. Reading what you've written out loud is great way to catch mistakes
  7. Have the wisdom to know when not to reply or send something
  8. Don't be afraid of using the phone if your thoughts aren't concise enough for an email

You shall not pass! Preventing SQL injection

Greg Sabino Mullane presented a few extremely useful techniques for preventing SQL injection. His advice was mostly based on his recent real-world experience.

IMG_0801.JPG

The chunk of simple code was causing a potentially very dangerous security breach to the system:

[query … where order_number='[scratch order_number] and username='[session username]']

This code can generate this SQL query:

select * from orders where order_number = '12345' and username = 'alice';

Or this SQL query:

select * from orders where order_number=' ';  delete from orders where id IS NOT NULL;

This is a vulnerability, and you certainly do not want any random stranger to delete records from the "orders" table in your database.

The problem was solved in no time by escaping user input.

Here is Greg's list of recommendations to make SQL injection impossible:

  1. Escape all user input passed to the database.
  2. Log extensively. If this system hadn't logged SQL queries, they would have never noticed anything strange. They used tail_n_mail that tracks PostgreSQL logs and sends out emails whenever SQL exception occurs.
  3. Introduce fine-grained control for accessing and manipulating the database. Split responsibilities between a lot of database users and selectively grant permissions to them. Run your code as the appropriate database user with the most restrictive set of permissions possible.
  4. Database triggers can become very handy. In Greg's case it was impossible to delete the already shipped order because of the triggers assigned to the record.
  5. Have a lot of eyes on the code to eliminate the obvious mistakes.
  6. And finally, if SQL injection is happening, consider shutting down the database server until you find the cause. This is an emergency!

EP Meeting: Clean Editor and Git Workflows

Having good editor configurations and Git habits is a great way to make work easier and less tedious. David Christensen showed us how to reduce cruft and leverage advanced features of Git to take control of the code.

IMG_0830.JPG

Indentation is a big part of reading and understanding code, too important to be ignored. Tabs can be interpreted differently in different editors, so using spaces makes life easier for you and your coworkers. Most editors have automatic indentation and tab translation settings to standardize the workflow. Remember, code should be optimized for humans.

Commit often! If your commit can not be summarized in one sentence, it is probably not granular enough. Don't hesitate to make multiple commits per work session as you accomplish separate tasks. In your commit messages, describe the 'why,' not the 'how.' Don't mix trivial style or whitespace tweaks with actual code modifications, because it makes it harder to catch important changes in diffs. If you make multiple changes to a single file, you can use -p/--interactive mode to commit hunks of code separately.

Overcoming Adversity on the Road: Stories Behind the Liquid Galaxy

I recall receiving a call from Ben Goldstein about an early LG install in Washington, DC (near my home). After a short discussion about the general project, the gist of the call: "Can Kiel and I use your garage to store and assemble the LG display? It'll just be for a few days."

As it turned out, they didn't need my garage (even though they were welcome to it), but it was a glimpse into some of the obstacles the End Point team faces when we take remote work, outside the familiar trappings of our own environment.

IMG_0824.JPG

Gerard Drazba provided an introduction of the scope of LG installations in 2012. For the current year alone, through Q3, travel to deliver the sizable LG equipment will approach 60,000 miles to dozens of different destinations, from across the street (and up 5 floors) in NYC to destinations as far away as Singapore and South Korea.

The equipment is bulky, complicated to deliver--and heavy! Each shipment is approximately 1800 lbs. Fully 44% of these different installations have been, or will have been, delivered to international locations. The global presence for demand of LG installations is only growing--and has its own challenges beyond the domestic destinations.

Kiel Christofferson spoke next giving an account of one of the earliest installations, in Tokyo, Japan. The combination of travel to unfamiliar locations in conjunction with the detail required to pull together the equipment and prepare an installation had things a bit hectic. There were a number of initial snafus, including showing up at the wrong hotel, that occupied the better part of the first day with complications. However, with local help, and a lot of communication with our directors, the actual presentation went very smoothly and was well received.

Matt Vollrath followed with a recount of an event in Washington, DC, displaying how dependent the remote work for our installations is on the completeness of the facilities where we have little (or no) control. Upon arrival, there was no active network, and insufficient equipment readily available to put things together. Due to rainy conditions, many other complications occurred, including a situation where a coffee pot being plugged in cut off sufficient electrical supply to the Liquid Galaxy. The patchwork continued until after the event was supposed to begin, but ultimately with sufficient effort the show pulled together and was received as a success.

Josh Ausborne provided some background on the importance of hotels, and when difficulties there (and lack of rest) can contribute to how effective we can be in managing problems. More anecdotes followed that continued on Matt's theme that we find adverse conditions and we have to work within them. One specific instance was where the display room had plenty of lighting with about 30 lights--except only 2 of the lights actually worked; thus, the installation had to be assembled essentially in the dark.

Adam Vollrath concluded the stories, providing Matt with phone support on the Washington, DC installation--while simultaneously working on a very difficult installation in Paris. The installation in Paris was so demanding that during a 2-week setup, Adam logged 96 hours in a single week. Other frequent issues on installations revolve around power failures, or insufficient power.

As the result of experiences of many of these jobs--and so many not mentioned--we have refined our process to anticipate better the challenges we will face. We also have seen great improvements in the technology and its stability, which produces a much smoother setup and presentation, contributing to the growing success of the Liquid Galaxy offering.

Security review with Kiel: Time to be paranoid.

IMG_0814.JPG

Interesting! From storing encrypted documents and occasionally signing email, to its usage in pgcrypto in Postgres, I've done a bit with PGP keys and public key cryptography. But Kiel's been running through a quick tutorial on security topics, and some of the PKI components are more important than I originally realized. For instance, part of the security that the public key infrastructure provides is the web of trust, which defines how identities can be automatically trusted based on which keys have signed and trusted other keys. In fact, we're about to have a key signing party...

In the mean time Kiel reminded us of some of the concerns surrounding other forms of PKI cryptography. The SSL infrastructure, for instance, relies on a set of provided certificate authorities which are assumed to be trusted, but may not be reliable. Plug-ins like Certificate Patrol can help, though. Of course also, as indicated by the recent collission attack, MD5 is no longer secure, and a number of CA's still use it.

More generally, the security ninja reminded us of where security breaches can occur and how they can affect us, even if they happen in a space far outside our control. But there are a number of things we can control, so he gave us a few tips to follow:

  • Use full disk encryption, or something like TrueCrypt that'll encrypt data on the fly.
  • Make use of public key cryptography where possible, even (or especially) on your local laptop.
  • Get your public key out there, where people can see and mututally identify it.
  • Also, lock your system when you walk away.
  • Make sure your backups are secure.
  • Avoid password re-use, of course.
  • Avoid SQL injection attacks. (Oh wait, that was Greg's talk.)

Now I'm feeling really paranoid.

RailsAdmin & Django

Steph Skardal and Greg Davidson presented on RailsAdmin and Django (in the context of two of our clients, Musica Russica and Providence Plan).

IMG_0796.JPG

Clients need a browser interface to administrate their Rails apps' assets and configuration. RailsAdmin is an “engine” (an embedded miniature Rails app) for developing an admin interface, and a relatively young open-source project. It offers a CRUD-capable interface. It provides data export, filtering, pagination, and support for file attachments and a popular plug-in called “PaperTrail”. Musica Russica uses this as their site administration. This admin interface can be configured with many different customizable options: formatting, help text, sorting, etc.

RailsAdmin also provides authorization to limit certain actions (“delete”, “read”, “manage”) to objects (“User”, “Order”) by user.

IMG_0798.JPG

In contrast, Django, written in Python, dates back to 2005 and is aimed at complex web applications. It provides an automatic admin interface which of course you can extend and customize, and mostly the same feature set (pagination, filtering, etc). Several high-profile web applications (Instagram, Pinterest, and Mozilla's add-ons site) are powered by Django. Our own work here is represented by the client site for Providence Plan.

Django's admin interface is (mostly) free of effort, once you have developed models for your application. In addition, there are many open-source apps that can be plugged into a project to manage specific things, such as permissions. Filters and searching can be configured, too: this helps limit the data that a user must consider during an administration operation.

OpenSSH Tips and Tricks with Matt Vollrath

Matt Vollrath's presentation focused on unique solutions Liquid Galaxy administration requires.

Specifically, Liquid Galaxy requires secure access to many public sites, which we don't have physical access to. OpenSSH helps handle these remote challenges securely and quickly.

IMG_0803.JPG

Multiplexing for Speed

The LG master node can send commands to all the slave nodes at the same time. This can be helpful to examine all display nodes' current states without manual work. The multiplexed connection uses options -f (background), -M (control socket), -N (no command), and -T (prevents pseudo-terminal from being allocated), and any further connections to the host do not need to authenticate, for speed.:

ssh -fMNT hostname &

Include the ampersand to be able to track the PID of the background ssh connection. This connection will be maintained until it fails or is killed.

Additionally -O allows us to examine the state of the control connection as well as exit. As of OpenSSH 5.9, you can also use -O with the "stop" command to not accept any more multiplexers.

When combined with a simple bash script to inspect the host connection first, the script can create that multiplexed connection first. This allows for very fast work. Without control connection it takes 700ms to run a command; with sockets, 85ms.

For Liquid Galaxy, everything is done over SSH so this speed is critical for a responsive experience while maintaining security. This responsive experience is multiplied for each node in the setup, which in LG's case, may be up to 8 nodes.

Reverse SSH for Security

In order to allow access behind deployments' firewalls, the LG head node connects to a proxy server and then forwards its own SSH port to that proxy server, allowing secure remote connectivity from any network which will allow output SSH.

Handling Ecommerce Transactions with PayPal

IMG_0782.JPG

Options

PayPal has several options for payment processing and Mark Johnson just shared his experiences working with saved credit cards using PayPal's Express Checkout.

Order Types

There are a couple of order types of transaction in Express Checkout:

  • Standard: everything purchased in a single transaction
  • Custom: handles multiple shipments and multiple charges

What's in a name?

PayPal generates something they call an "order" that is distinct from the order for a given merchant. This is typically confusing to merchants because their concept of an order simply refers to the order a customer has just placed in their ecommerce application. The PayPal "order" is created prior to any other transaction. If the authorization fails, the "order" is not removed (which would be nice) but lingers around for 29 days by default. When merchants ask about this, the response PayPal offers is to void the "order". For standard on API call (authorization) if that fails you have to do a second API call to void the order.

The "order" has little value except to specify a charge ceiling for a given ecommerce transactions. Although the ceiling is set by the "order", there is the notion of an "order allowance". By default this allowance or ceiling is set to 115% of the original charge but this can be negotiated and changed if required.

Merchants desire to have the "order" removed when it is no longer needed but this isn't always possible. One of the limitations of PayPal from a developer's perspective is that you cannot initiate any activity unless you are the customer. For example a customer service rep cannot enter an order on behalf of a customer. Also, telephone orders are not possible due to this limitation.

It is not possible to remove "order" if you might have subsequent changes to the order. This is common scenario in ecommerce apps where customers would like to add to or modify their order after it has been shipped. If the PayPal "order" is closed, subsequent transactions like this are not possible.

API Calls for Express Checkout

When a customer initiates the check out process, a "Set Request" API call is made to PayPal to establish a session. The response includes a session ID. With Set Request, URLs need to be submitted which specify resources to redirect the customer to for either success for failure.

Get Request is another API method offered and the response will include the user id, shipping address and email of the customer. You can also force them to include a phone number and billing address if you like. PayPal by default likes to offer less information while ecommerce clients often like to have more information.

For standard orders, there is a "Do payment" call which combines the order setup and authorization into a single requests. For custom orders, the setup and authorization are performed separately. The developer has the option to either complete the order or leave it open. If you choose to leave it open, you can perform multiple capture or charges up to the ceiling amount that has been specified.

PCI Compliance

The payment card industry has a set of regulations including:

  • The full account number must be encrypted
  • The CVC cannot be written to permanent media
  • PAN displays must be masked
Mark noted that Secure decryption is a challenge and he has worked on a secure decryption service for one of our ecommerce clients.

Saved Cards

Authorize.net has a customer information manager (CIM) which supports "profiles". These profiles are referenced by tokens and the data storage includes PANs. There are three types of profiles: customer, payment and address. The default behaviours is to use them for "on-demand" transactions.

Pity the fool...

-->>:[]}}}}-O
Mr. T made several cameo appearances during the talk to keep us all engaged and interested!

Josh Tolley: About Google Earth Tours

Josh Tolley spoke on the building of tours for viewing within Google Earth and the Liquid Galaxy. It seems that everybody has data and wants a way to view it, such as businesses who want to visually represent where their customers are based, or even documenting where lightning strikes within a certain region. Google Earth is a fantastic tool for the viewing of this data.

IMG_0792.JPG

Josh talked about what is required to display geographical data in Google Earth. The data needs to go through the process of geocoding, which is the conversion to latitudinal and longitudinal format. As this is a tedious process, it is highly recommended to use a script and loop through the conversion. Google Earth is based upon KML documents, which are XML documents that contain geographical data. He explained some of the different ways to create the KML documents, including the use of Google Earth itself, writing by hand, or using a tool such as kamelopard or PyKML to create the data.

He demonstrated how a KML file can contain data such as placemarks, polygons, time lapses, overlays, and animations, and he showed his own farm with an overlay placed in the wheat field. Now the zombies know where to find wheat. Josh talked about the uses of Google Earth tours, including the display of ocean currents, tsunami shock waves, wind patterns, and even historical events with a time lapse.

Kamelopard is a Ruby-based KML writer which Josh has written. It uses Ruby code to identify a point, fly to it, stay there for a bit, then fly to the next place. It loops through all of the coordinates given and creates a tour. An example tour that Josh has written is one for Google's Ocean Conservation Group, which displays fishery data around the world.

KML can handle large datasets, and has multiple ways of being displayed. It can have regions and ground overlays, and combining the two can show increasingly detailed images and sets of placemarks as the level of zoom increases. DataAppeal can create maps with various models in them, scaling and coloring them based upon users' data.

Adam Vollrath shows Liquid Galaxy Tours at the End Point Company Meetings

No one has presented more Liquid Galaxy Tours than Adam Vollrath.

IMG_0785.JPG

Silver Spring Networks has made extensive use of Liquid Galaxy Tours to demonstrate the capabilities of their Smart Grid technology. Adam most recently presented some of this work at the EMC 2012 in Las Vegas. The tours visualize data on such things as peak power usage time and place, outages and repair times, even lightning strikes.

This data can bore down to individual meters per residence or business. Graphic representation of this data is overlayed on Google Earth. This creates stunning visualizations of Silver Spring Networks' capabilities.

There are applications for high-end real estate as well. Requiring a trip to the property with a panoramic camera, a property can be toured virtually. End Point is actively developing this capability. Early demonstrations have taken us to Grand Central Station, an Occupy rally, and Highline Park in Manhattan.

Not featured in this talk is a large library of tours created for Google Ocean. These tours have been seen at conferences around the world.

College District presentation

Our company meeting at End Point opened today with a presentation about one of our clients, College District by Terry Grant and Ron Phipps, explaining how the site works from a mostly technical angle. College District is a "community where fans and designers create, grade, and promote collegiate products you can't find anywhere else." Each team has its own site: for example, the University of Florida Gators can be found at GatorDistrict.com.

Ron and Terry explained the history of how College District moved from a brick and mortar store to their current wide array of sites. They also explained all the technical underpinnings of how things work behind the scene. The sites are powered by Interchange, Postgres 9.1, Git, and other cool technologies. They also explored some of the exciting upcoming ideas for the site, including ...no, that would be telling.

Why Piggybak exists

There are some clients debating between using Spree, an e-commerce platform, and a homegrown Rails solution for an e-commerce application.

E-commerce platforms are monolithic -- they try to solve a lot of different problems at once. Also, many of these e-commerce platforms frequently make premature decisions before getting active users on it. One way of making the features of a platform match up better to a user's requirements is to get a minimal viable product out quick and grow features incrementally.

Piggybak was created by first trying to identify the most stable and consistent features of a shopping cart. Here are the various pieces of a cart to consider.

  • Shipping
  • Tax
  • CMS Features
  • Product Search
  • Cart / Checkout
  • Product Features
  • Product Taxonomy
  • Discount Sales
  • Rights and Roles

What doesn't vary? Cart & Checkout.

Shipping, tax, product catalog design, sales promotions, and rights and roles all vary across different e-commerce sites. The only strict commonality is the cart and the checkout.

Piggybak is just the cart and checkout.

You mount Piggybak as a gem into any Rails app, and can assign any object as a purchasable product using a the tag "acts_as_variant" and you're good to go. To learn more, and to see it in action 'checkout' Piggybak.org.

Interesting jQuery Modules

Marina Lohova talked about her favorite jQuery modules. It was the last talk of the day and she did not have much time.

IMG_0760.JPG

Her focus was on the functional plugins. Interesting modules that she mentioned include:

  • Ajax PayPal Cart
  • Sisyphus (uses HTML5 LocalStorage)
  • jImpress (presentations)
  • jPlayer (HTML5)
  • ResponseJS
  • lqTouch (adds mobile events like swipe)

Integration Experiences

Szymon Guz gave a talk which covered his experiences working for a large client on integrating an acquired business's website and fulfillment processes into their existing backend and fulfillment systems.

He gave us a detailed overview of their existing shipping fulfillment infrastructure, along with specific technical issues he encountered when integrating with the fulfillment models of the additional website.

He also detailed some of the non-technical/project management challenges inherent when working with a large number of people on a project.

End Point's Ruby Slippers

The next presentation of the day was Brian Dillon's, entitled "End Point's Ruby Slippers".

IMG_0746.JPG

Even as a relatively new End Point employee (3 months now), Brian has been assisting the growth of End Point in pursuit of sales and marketing. Brian started out by letting his fellow employees know that he finds himself very fortunate to have such a gold mine of unsung talent, expertise, and marketable skill at his disposal. Tortured with the thought of not fully representing that talent, Brian has been boiling a pot of ideas that might help him tell the world who End Point is and what we really do in an accurate and concise way.

After reading aloud each employee's "one sentence" (written upon note cards) attempting to describe End Point, Brian took us all through a little bit of company history he had dug up from the archives. We received an overview of what End Point has done since 1995. A pattern emerged; End Point is not afraid to take interesting chances. End Point is not afraid to venture into the unusual projects that result from discussion between innovative minds. End Point is not afraid to increase the breadth of their knowledge while sharing that knowledge internally in order to increase depth of knowledge. End Point is not afraid to fail, learn from those failures, and apply the compiled knowledge to brilliantly successful projects.

Every employee has something to offer, and End Point uses the employee threads to create their powerful tapestry. Despite our distributed work environment, despite our personal differences, we combine effectively to solve all manner of problems. We take our potential for failure and we embrace it together to grow.

Brian wrapped up by identifying a few things we can each do to work together more coherently and efficiently - but that's our secret.

Chef and Puppet Overview

IMG_0741.JPG I started a job several years ago as a "configuration manager", but had to admit when I started that I didn't have any idea what "configuration management" really meant. The idea, as I soon learned, was to make sure all the servers, configurations, accounts, and other components of a system work together systematically. I'm not sure "configuration management" tools as such existed at the time, but we certainly never used them, though they'd begun to have a presence online before leaving that job for another opportunity.

In systems we run at End Point, whether for ourselves or other clients, such configuration management tools have become critical, in particular for our Liquid Galaxy systems, which require a great deal of repetitive configuration.  So Kiel and Josh Williams have a fair bit of experience with these tools, and I was glad to hear their discussion of Chef and Puppet specifically.

These tools have a common heritage, and are both Ruby-based. Ruby is a particularly good language for writing domain-specific languages (DSLs) like the one Puppet uses, so it's interesting that Chef's developers apparently abandoned the DSL idea, so Chef instructs its users run Ruby directly. Chef is newer, spawned by dissatisfied Puppet developers and users. We're generally shifting toward Chef after concluding we share many of those dissatisfactions, but both have proven very time saving, for us.

IMG_0740.JPG Kiel told of one client whose dozens of app servers we rebuilt in a day, essentially one at a time, simply by kicking off Puppet tasks on each one in turn. As mentioned above, the Liquid Galaxy uses Chef as well. Whereas it used to take a fully day of manual work building what's called the Galaxy's "head node", now, in combination with scripted, automatic operating system installations, we can set up a new head node in minutes. We're still working to get everything into Chef, but in particular all the monitoring scripts and tools used on a Liquid Galaxy are built from Chef recipes, so every new head node is all ready to monitor from the beginning. Since we deploy these systems all over the world, and must manage them remotely, this is critically important.

Building "recipes" for these tools -- that is, sets of instructions that tell the tool what to build -- can be a detailed and difficult process. Kiel and Josh recommended being explicit about the configuration you actually want, rather than simply accepting defaults, principally because later on, it's difficult to know precisely what the original author intended. They also recommend starting with small, easily tested services, such as NTP. For many systems, breaking NTP for a while won't cause problems, so it can be a good service to begin playing with.

One attendee was curious to know how many servers a system needed to involve for Chef or Puppet to be worth considering. The rule of thumb is, apparently, about 10, but Josh Williams suggested having "more than one" was enough to start writing recipes. I guess I'm sold.

The Berkman Center projects

Brian Gadoury and his awesome socks gave a great and informative presentation on our client Berkman Center.

IMG_0738.JPG

The Berkman Center's mission is to explore and understand cyberspace; and to study its development, dynamics, norms and standards.

Brian's first project was called Zone One that is a document repository tool that houses all of the electronic documents for use by Harvard. You can search by document name, tags, and content. Zone One is written using open source technologies such as Ruby on Rails, Postgres, Solr, and DevCamps.

The application includes an array of features that allow you to interact with the search results and the documents themselves. Each entry can have detailed meta-data associated with it, that can be referenced by future users.

One of the nice features of Zone One is its upload feature. It allows you to drag and drop files into a section the form for upload. The upload interface also includes a very nice SFTP system that utilizes virtual users on the system you are uploading to, to manage the files being uploaded and connected to the entry you are creating.

Phase two of the project will include features to export to external repositories, a bulk import feature, and an API.

Web service integration in PHP, jQuery, Perl and Interchange

Jeff Boes presented on one of his latest projects.

IMG_0732.JPG

CityPass.com decided on a project to convert their checkout from being served by Interchange to have the interface served by PHP, but continue to interact with Interchange for the checkout process through a web service.

The original site was entirely served by Interchange, the client then took on a project to convert the frontend to PHP while leveraging Interchange for frontend logic such as pricing and shipping as well as for backend administration for order fulfillment.

Technologies used in the frontend rewrite:

  • PHP
  • jQuery for jStorage, back-button support and checkout business logic
  • AJAX web services for prices, discounts, click-tracking

The Interchange handler is conduit.am that handles the processing of the URL. From this ActionMap the URLs are decoded and passed to a Perl module, Data.pm, which handles processing the input and returning the results.

An order is just a JSON object so testing of the web service is easy. We have a known hash, we post to the proper URL and compare the results and verify they are the same. New test cases are also easy, we can capture any order (JSON) to a log file and add it as a test case.

Git Workflows

David Christensen is talking today about Git workflows.

IMG_0721.JPG

There are different ways that you can work with Git. Git doesn't dictate a certain workflow so you are free to implement one that works best for you. Understanding git and how it works will help you develop an effective workflow.

The Git object model provides Git's flexibility and is as follows:

  • trees, blobs
  • commits
  • named commits: tags, branches

Branch flexibility comes through combining of branches (merges, rebase).

Good commits are key to flexibility/tools and should encapsulate the smallest logical change and a good log message describing the commit. It's important to provide the why in your commit message in addition to what was fixed so that it's clear to future developers.

Branches contain all the magic of Git in that it's just a pointer to a commit.

Topic branches are convention driven branches that are merged off the master branch. They usually deal with a single topic and can be rebased onto master to provide a clean history. They can also be thrown away later so they don't clutter up the repository.

Integration branches are usually for different levels of the application integration, for example, staging and production. They can be used to resolved conflicts and other small issues with the code.

Git also makes it easy to get quick version control 'git init; git add' is all you need. Then you can use 'git grep', which is faster than 'grep -R' for searching. A helpful command for understanding Git workflows is 'git help workflows'.

Paper Source Client Profile

I know Richard best from his sysadmin work on End Point's Hosting Team. But I also worked with him for our client Paper Source, covered by his presentation at our company meeting today.

Paper Source began in Chicago back in 1983. Since then they've opened more brick and mortar stores every year, now up to 54. End Point is now maintaining their e-commerce store, which supplements these well.

Richard and his team have integrated Interchange with many other software packages, including Adobe's Scene7, Authorize.net, Great Plains, Endeca, and CoreMetrics. This adds up to many scheduled cron jobs every night. We also handle hosting of the production and Camps development servers.