News

Welcome to End Point’s blog

Ongoing observations by End Point people

Garbage collection in your head, or how to vacation

Recently I returned from the longest (8 workdays) vacation I have ever taken from this job (almost 11 years). I made an interesting discovery which I'm happy to share with you:

Life goes on without you.

I spent most of the time aboard a small cruise ship touring Alaska's Inside Passage and Glacier Bay. During almost all of that, I was out of cell phone range (and, unlike a lot of my colleagues, I don't carry a smart phone but just a dumb $5 flip phone). With no wifi on the ship, I was completely cut off from the Internet for the longest stretch in at least the last 15 years – maybe the longest since I first got Internet at my house back in the mid-90s.

Life (on the Internet) goes on without you.

Facebook posts get posted, liked, commented on. Tweets happen, get re-tweeted. Emails are sent (for those that still use it, anyway). And life goes on.

I can't say I came back from vacation recharged in some stupendous way, but I think I'm better off than if I'd taken a shorter vacation in an Internet-connected location, checking up on the virtual world before breakfast and bedtime every day.

So take vacations, and take meaningful ones – disconnect from work. Don't worry about what's piling up, or what's happening in the online communities in which you participate. If you're going away for a week, really go away and leave work behind. If a crisis arises, make sure someone else is equipped to at least try to handle it, but don't go into your vacation planning for it to be interrupted.

If you can't do that, you should start preparing for it anyway. Train someone to be able to jump in and do your job (inefficiently, sure, but that's far better than "not at all"). Because quite frankly, if you can't cut the ties that bind you to that mission-critical production system on a voluntary, scheduled basis, then you are far too vulnerable to the random interruptions of life (car accident, death in the family, lengthy power failure for us telecommuters) that will come (oh, they will come).

Postgres migration speedup with table change analysis


(A Unicode rabbit face 🐰 will never be as cute
as this real bunny. Photo by Wade Simmons)

One of our clients recently reached out to us for help in upgrading their Postgres database. The use of the pg_upgrade program was not an option, primarily because the client was also taking the opportunity to change from their SQL_ASCII encoding to UTF-8. (If any of your databases, gentle reader, are still SQL_ASCII, please do the same!). Naturally, we also took advantage of the lack of pg_upgrade to enable the use of data checksums, another action we highly recommend. Although there were plenty of wrinkles, and stories to be told about this migration/upgrade, I wanted to focus on one particular problem we had: how to detect if a table has changed.

We needed to know if any applications were modifying certain tables because the speed of the migration was very important. If we could assert that no changes were made, there were some shortcuts available that would greatly speed things up. Initial testing showed that the migration was taking over eight hours, a time unacceptable to the client (no worries, we eventually reduced the time to under an hour!).

Looking closer, we found that over half that time was spent converting a single small (50MB) table from SQL_ASCII to UTF-8. How this conversion was performed is a story for another day, but suffice to say the table had some really, really messy bytes inside of it; the conversion program had to struggle mightily. When you are converting a database to a new encoding, it is imperative to examine every byte and make sure it gets changed to a format that Postgres will accept as valid UTF-8, or the entire table import will fail with an error similar to this:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf4 0xa5 0xa3 0xa5

Looking closer at the data in the table showed that it might - just might! - be a historical table. In other words, it no longer receives updates, just selects. We really wanted this to be true, for it meant we could dump the whole table, convert it, and simply load the converted table into the new database (which took only a few seconds!). First, however, we had to confirm that the table was not changing.

Detecting changes may be done in several ways. For all of them, you can never prove that the table shall not change at some point in the future, but you can prove that it has not changed over a certain period of time. How you go about doing that depends on what kind of access you have. If you do not have super-user access, you could add a simple trigger to the table that updates another table when a update, insert, or delete is performed. Then, checking in on the second table will indicate if any changes have been made.

A better solution is to simply look at the underlying file that makes up the table. To do this, you need be a Postgres superuser or have access to the underlying operating system. Basically, we will trust the operating system's information on when the table was last changed to determine if the table itself has changed. Although not foolproof, it is an excellent solution. Let's illustrate it here. First: create a test table and add some rows:

$ psql
greg=# CREATE TABLE catbox AS SELECT 8675309::INT AS id FROM generate_series(1,1000);
SELECT 1000

Now we can use the pg_stat_file() function, which returns some basic information about a file on disk. With the help of the pg_relation_filepath() function, we can see when the table was last modified:

greg=# select * from pg_stat_file( pg_relation_filepath('catbox') ) \x\g
Expanded display is on.
-[ RECORD 1 ]+-----------------------
size         | 40960
access       | 2015-11-08 22:36:00-04
modification | 2015-11-08 22:36:00-04
change       | 2015-11-08 22:36:00-04
creation     | 
isdir        | f

Next we will revisit the table after some time (e.g. 24 hours) and see if the "modification" timestamp is the same. If it is, then the table has not been modified either. Unfortunately, the possibility of a false positive is possible due to VACUUM, which may change things on disk but does NOT change the data itself. (A regular VACUUM *may* modify the file, and a VACUUM FULL *always* modifies it).

greg=# select * from pg_stat_file( pg_relation_filepath('catbox') ) \x\g

-[ RECORD 1 ]+-----------------------
size         | 40960
access       | 2015-11-08 22:36:00-04
modification | 2015-11-08 22:36:00-04
change       | 2015-11-08 22:36:00-04
creation     | 
isdir        | f


greg=# vacuum catbox;
VACUUM

greg=# select * from pg_stat_file( pg_relation_filepath('catbox') );

2016-06-09 22:53:24-04
-[ RECORD 1 ]+-----------------------
size         | 40960
access       | 2015-11-08 22:36:00-04
modification | 2015-11-08 22:40:14-04
change       | 2015-11-08 22:40:14-04
creation     | 
isdir        | f

A second (and more foolproof) method is to simply generate a checksum of the entire table. This is a fairly straightforward approach; just pump the output of pg_dump to a checksum program:

$ pg_dump -t catbox --data-only | sha1sum
6f724565656f455072736e44646c207472536e61  -

The advantage here is that even a VACUUM FULL will not change the checksum. However, because pg_dump does no ORDER BY when dumping out the table, it is possible for the rows to be returned in a different order. To work around that, issue a VACUUM FULL yourself before taking the checksum. As before, come back later (e.g. 24 hours) and re-run the command. If the checksums match, then the table has not changed (and is probably no longer updated by the application). By using this method, we were able to verify that the large, SQL_ASCII byte-soup table was indeed not being updated, and thus we took it out of the direct migration.

Of course, that table needed to be part of the new database, but we simply dumped the table, ran the conversion program on it, and (four hours later), had a complete dump of the table that loads extremely fast into the new database.

That solved only one of the problems, however; another table was also slowing down the migration. Although it did not have the SQL_ASCII conversion issue, it was a large table, and took a large percentage of the remaining migration time. A quick look at this table showed it had a "creation_time" column as well as a SERIAL primary key, and was obviously being updated quite often. Close examination showed that it was possible this was an append-only table, such that older rows were never updated. This called for a similar approach: could we prove that a large chunk of the table was not changing? If we could, we could pre-populate the new database and copy over only the most recent rows during the migration, saving a good bit of time.

The previous tricks would not work for this situation, because the underlying file would change constantly as seen by pg_stat_file(), and a pg_dump checksum would change on every insert. We needed to analyze a slice of the table - in this particular case, we wanted to see about checksumming all rows except those created in the last week. As a primary key lookup is very fast, we used the "creation_time" column to determine an approximate primary key to start with. Then it was simply a matter of feeding all those rows into the sha1sum program:

greg=# CREATE TABLE catbox2 (id SERIAL PRIMARY KEY, creation_time TIMESTAMPTZ);
CREATE TABLE
greg=# INSERT INTO catbox2(creation_time) select now() - '1 year'::interval + (x* '1 hour'::interval) from generate_series(1,24*365) x;
INSERT 0 8760

greg=# select * from catbox2 where creation_time > now()-'1 week'::interval order by 1 limit 1
  id  |         creation_time         
------+-------------------------------
 8617 | 2016-06-11 10:51:00.101971-08

$ psql -Atc "select * from catbox2 where id < 8617 order by 1" | sha1sum
456272656d65486e6f203139353120506173733f  -

## Add some rows to emulate the append-only nature of this table:
greg=# insert into catbox2(creation_time) select now() from generate_series(1,1000)
INSERT 0 1000

## Checksums should still be identical:
$ psql -Atc "select * from catbox2 where id < 8617 order by 1" | sha1sum
456272656d65486e6f203139353120506173733f  -

Despite the large size of this table (around 10 GB), this command did not take that long to run. A week later, we ran the same commands, and got the same checksum! Thus, we were able to prove that the table was mostly append-only - or at least enough for our use case. We copied over the "old" rows, then copied over the rest of the rows during the critical production migration window.

In the future, this client will able to take advantage of pg_upgrade, but getting to UTF-8 and data checksums was absolutely worth the high one-time cost. There were several other tricks used to speed up the final migration, but being able to remove the UTF-8 conversion of the first table, and being able to pre-copy 99% of the second table accounted for the lion's share of the final speed improvements.

Liquid Galaxy for Hyundai Card Travel Library

image courtesy of retaildesignblog
End Point and AZero, a South Korean system integrator, have partnered to deploy a 10-screen Liquid Galaxy to the newly-opened Hyundai Card Travel Library in Seoul, South Korea. This project presented a number of unique challenges for our teams, but we have launched successfully, to the great satisfaction of AZero's client.

The Hyundai Card Travel Library is an incredible space: wall-to-wall maple bookshelves hold travel brochures, photo books, and other travel-related material. The Liquid Galaxy displays itself sits in a small alcove off the main library space. Being fully enclosed, viewers can control the lighting and get a full immersion experience through the two rows of five 47" screens arrayed in an wall-mounted semi-circle arc. The viewer can control the screens via the podium-mounted touchscreen and SpaceNav mouse controller.

We solved several technical challenges for this deployment: the extremely tight space made cabling and display configuration tricky. Also, this isn't a "standard" 7-screen single row deployment, but rather two rows of 5 screens each. Working with AZero, End Point reconfigured the Liquid Galaxy display configurations to account for this unique layout. NOTE: the Liquid Galaxy scales quite easily, and can be arrayed in any number of configurations. Other recent deployments include a 40-screen control room, with 10 columns of 4 screens each!

Intended as a travel planning platform, Hyundai provided a number of set tours to showcase on the Liquid Galaxy, such as "The Orient Express", "Rio!", or "European Capitals". Each tour shows an overview map as a graphic overlay, while the Liquid Galaxy hops from each destination on the route to the next within Google Earth. At each location, the potential traveler can drop into Google Street View and see the fully panoramic images of street scenes in Paris or the top of Sugarloaf Mountain in Brazil. This should allow potential travelers to virtually experience the tour locations, and make more informed decisions about which trip might suit their tastes. Beyond that, it's a pretty cool way to view the planet inside a travel library.

End Point CEO and NYSE Bell Ringing

Wall Street. Back where it all started for me some 35 years ago. Only instead of being an employee of Oppenheimer and Company, I had the experience and honor of representing End Point as one of 7 companies chosen to “ring the bell” at the New York Stock Exchange, chiming in Small Business Week for JPMorgan Chase. (They work with 4,000,000 small companies.)

The morning started early by going through security checks rivaling the airport, except I didn’t have to take my shoes off. After getting my nifty credential, we went back to the street where the President of the NYSE gave a welcoming speech, pointing out the buildings that are still standing from when Hamilton, Monroe and Aaron Burr all started their own banks. As well as where George Washington was sworn in.

All this while also getting free coffee from the main small business honoree, Gregory’s Coffee, and picture taking from the business paparazzi!

We then went inside the storied NYSE building and made our way to the trading floor. The surroundings were immensely impressive, as the Stock Exchange still inhabits a huge floor with gorgeous 40 foot mid-18th century ceilings high above, holding up all sorts of 21st century technology and equipment. In the center of it all is CNBC’ stage set, with the show Squawk Box airing live, and with the infamous Mad Money man himself, Jim Kramer, sitting in, talking about the impact the newly minted presumptive Republican Candidate (to be nameless) might have on the markets.

At about 9:15, the 7 small business owners and the head of marketing and small market banking for Chase made our way to the real stage, a perch overlooking the entire floor where the actual apparatus for the bell ringing takes place. In front of four live cameras, we waited for 9:30 to hit, and then the bell was rung....rather, the button was pressed to start the day’s trading. (Sorry they use a button now, not an actual clapper to ring the bell.) Aired live, we shook hands, smiled for a national audience, passed around compliments and enjoyed the moment.

A few moments we went back down to the floor where we were able to float around and I could watch the operation of what is still the heart of the financial world come to life around me.

The Exchange has changed in all those years since my days in one important way; Instead of thousands of crazed floor traders frantically buying and selling millions of shares using ticker tape and hand signals, there were maybe a couple of hundred standing by their booths, watching millions of shares being exchanged electronically. Looking almost bored, they calmly talked with each other sipping more of Gregory’s coffee.

I’ll dispense with the rest of the details of the day and the ensuing reception. Rather, I shall now explain the importance of the event as it relates to End Point and to our clients and my own history.

The first day I was on the job in Wall Street, the chairman of the company said to me something that to this day still guides me as to how I run End Point and I have repeated countless times to whomever will listen. He said, “Rick, to have a good restaurant, you have to have a great kitchen!” So, he had me learn all aspects of the company from back office, to accounting, to the fledgling world of computers to the front office and to how to work effectively with employees and customers alike.

End Point may be one of the smaller of the “small companies” chosen by Chase this day. But, we were selected because we are a company that personifies good management, great engineering, unrelenting commitment to our customers, and a company that has great potential. Why? Because we believe in having a great kitchen! Our engineering staff is exemplary, our clients are fully appreciative of our partnerships with them, and we are doing all we can to be a model business.

While I may not have rung the actual bell this year, our banker and Chase has every confidence – as do I – that one day we will.


Adding Bash Completion To a Python Script

Bash has quite a nice feature, you can write a command in a console, and then press <TAB> twice. This should should you possible options you can write for this command.

I will show how to integrate this mechanism into a custom python script with two types of arguments. What's more, I want this to be totally generic. I don't want to change it when I will change the options, or change config files.

This script accepts two types of arguments. One type contains mainly flags beginning with '--', the other type is a host name taken from a bunch of chef scripts.

Let's name this script show.py - it will show some information about the host. This way I can use it with:

show.py szymon

The szymon part is the name of my special host, and it is taken from one of our chef node definition files.

This script also takes huge number of arguments like:

show.py --cpu --memory --format=json

So we have two kinds of arguments: one is a simple string, one begins with --.

To implement the bash completion on double <TAB>, first I wrote a simple python script, which is prints a huge list of all the node names:

#!/usr/bin/env python

from sys import argv
import os
import json

if __name__ == "__main__":
    pattern = ""
    if len(argv) == 2:
        pattern = argv[1]

    chef_dir = os.environ.get('CHEF_DIR', None)
    if not chef_dir:
        exit(0)
    node_dirs = [os.path.join(chef_dir, "nodes"),
                 os.path.join(chef_dir, "dev_nodes")]
    node_names = []

    for nodes_dir in node_dirs:
        for root, dirs, files in os.walk(nodes_dir):
            for f in files:
                try:
                    with open(os.path.join(root, f), 'r') as nf:
                        data = json.load(nf)
                        node_names.append(data['normal']['support_name'])
                except:
                    pass

    for name in node_names:
        print name

Another thing was to get a list of all the program options. We used the below one liner. It uses the help information shown by the script. So each time the script changed its options, and it is shown when used show.py --help, the tab completion will have show these new options.

$CHEF_DIR/repo_scripts/show.py --help | grep '  --' | awk '{print $1}'

The last step to make all this work was making a simple bash script, which uses the above python script, and the one liner. I placed this script in a file $CHEF_DIR/repo_scripts/show.bash-completion.

_show_complete()
{
    local cur prev opts node_names
    COMPREPLY=()
    cur="${COMP_WORDS[COMP_CWORD]}"
    prev="${COMP_WORDS[COMP_CWORD-1]}"
    opts=`$CHEF_DIR/repo_scripts/show.py --help | grep '  --' | awk '{print $1}'`
    node_names=`python $CHEF_DIR/repo_scripts/node_names.py`

    if [[ ${cur} == -* ]] ; then
        COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
        return 0
    fi

    COMPREPLY=( $(compgen -W "${node_names}" -- ${cur}) )
}

complete -F _show_complete show.py

The last thing was to source this file, so I've added the below line in my ~/.bashrc.

source $CHEF_DIR/repo_scripts/show.bash-completion

And now pressing the <TAB> twice in a console shows quite nice completion options:

$ show.py 
Display all 42 possibilities? (y or n)
... and here go all 42 node names ...
$ show.py h
... and here go all node names beginning with 'h' ...
$ show.py --
.. and here go all the options beginning with -- ...

The merchant login ID or password is invalid or the account is inactive, and to how to fix it in Spree

Authorize.net has disabled the RC4 cipher suite on their test server. Their production server update will follow soon. So, in order to ensure your, or your client's, site(s) do not experience any interruption in payment processing it is wise to place a test order in the Authorize.net test environment.

The projects I was testing were all Spree Gem (2.1.x). The Spree Gem uses the ActiveMerchant Gem (in Spree 2.1.x it's ActiveMerchant version 1.34.x). Spree allows you to sign into the admin and select which server your Authorize.net payment method will hit- production or test. There is another option for selecting a "Test Mode" transaction. The difference between a test server transaction and a test mode transaction is explained quite succinctly on the Authorize.net documentation. To summarize it, test server transactions are never sent to financial institutions for processing but are stored in Authorize.net (so you can see their details). Transactions in test mode however are not stored and return a transaction ID of zero.

I wanted to use my Authorize.net test account to ensure my clients were ready for the RC4 Cypher Suite disablement. I ran across a few strange things. First, for three sites, no matter what I did, I kept getting errors saying my Authorize.net account was either inactive or I was providing the wrong credentials. I signed in to Authorize.net and verified my account was active. I triple checked the credentials, they were right. So, I re-read the Spree docs thinking that perhaps I needed to use a special word or format to actually use the test server ("test" versus "Test" or something like that).

Below is a screenshot of the test payment method I had created and was trying to use.

Since I kept getting errors I looked through the Spree code, then the ActiveMerchant Gem that Spree is using.

Below, you can see that the ActiveMerchant is deciding which URL to use (test or live) based on the value of test? (line 15). active_merchant/lib/active_merchant/billing/gateways/authorize_net.rb

require 'nokogiri'

module ActiveMerchant #:nodoc:
  module Billing #:nodoc:
    class AuthorizeNetGateway < Gateway
      include Empty

      self.test_url = 'https://apitest.authorize.net/xml/v1/request.api'
      self.live_url = 'https://api2.authorize.net/xml/v1/request.api'

.
.
.
      def url
        test? ? test_url : live_url
      end

How and where is this set? Spree passes the ActiveMerchant Gem some data which the ActiveMerchant Gem uses to create Response objects. Below is the code where ActiveMerchant handles this data.
active_merchant/lib/active_merchant/billing/response.rb


module ActiveMerchant #:nodoc:
  module Billing #:nodoc:
    class Error < ActiveMerchantError #:nodoc:
    end

    class Response
      attr_reader :params, :message, :test, :authorization, :avs_result, :cvv_result, :error_code, :emv_authorization
.
.
.
      def test?
        @test
      end
.
.
.
      def initialize(success, message, params = {}, options = {})
        @success, @message, @params = success, message, params.stringify_keys
        @test = options[:test] || false
        @authorization = options[:authorization]
        @fraud_review = options[:fraud_review]
        @error_code = options[:error_code]
        @emv_authorization = options[:emv_authorization]

        @avs_result = if options[:avs_result].kind_of?(AVSResult)
          options[:avs_result].to_hash
        else
          AVSResult.new(options[:avs_result]).to_hash
        end

        @cvv_result = if options[:cvv_result].kind_of?(CVVResult)
          options[:cvv_result].to_hash
        else
          CVVResult.new(options[:cvv_result]).to_hash
        end
      end
    end
active_merchant/lib/active_merchant/billing/gateway.rb
      # Are we running in test mode?
      def test?
        (@options.has_key?(:test) ? @options[:test] : Base.test?)
      end

Now that I was more familiar with ActiveMerchant, I wanted to verify that Spree was passing the data as intended

I could see in spree/core/app/models/spree/gateway.rb that Spree was setting ActiveMerchant::Billing::Base.gateway_mode equal to the server param as a symbol. I verified it with some logging.

    def provider
      gateway_options = options
      gateway_options.delete :login if gateway_options.has_key?(:login) and gateway_options[:login].nil?
      if gateway_options[:server]
        ActiveMerchant::Billing::Base.gateway_mode = gateway_options[:server].to_sym
      end 
      @provider ||= provider_class.new(gateway_options)
    end 

At this point I was satisfied that Spree was sending a server param. I also knew Spree was setting Active Merchant's Base.gateway_mode as intended. I then reviewed active_merchant/lib/active_merchant/billing/gateway.rb once more

      # Are we running in test mode?
      def test?
        (@options.has_key?(:test) ? @options[:test] : Base.test?)
      end
and active_merchant/lib/active_merchant/billing/base.rb
      def self.test?
        self.gateway_mode == :test
      end

So, that's it! We know from the exceptions I raised that Spree is sending a test key and a test_mode key. They seem to be the same value but with different keys (I'm guessing that's a mistake), and they both just seem to indicate if the test mode checkbox was checked or not in the Spree admin. However, Base.test? is the server selection and comes from whatever anyone enters in the server input box in the Spree admin. So, we just need to update the ternary operator to check if @options[:test] (test mode) or Base.test? (test server) is true.

Since this is Spree, I created a decorator to override the test? method.

app/models/gateway_decorator.rb

ActiveMerchant::Billing::Gateway.class_eval do
  def test?
    @options.has_key?(:test) && @options[:test] || ActiveMerchant::Billing::Base.test?
  end 
end

Lastly, I placed some test orders and it all worked as intended.

Summary

Authorize.net is disabling the RC4 Cypher suite. If your site(s) uses that, your payment processing may be interrupted. Since the test environment has been updated by Authorize.net, you can see if your site(s) is compliant by posting test transactions to the test environment. If it works, then your site(s) should be compliant and ready when Authorize.net applies the changes to the production server.

Spree 2.1.x (and perhaps all other Spree versions) ALWAYS send the test key, so the ActiveMerchant Gem will always just use the boolean value of that key instead of ever checking to see what the server was set to. Further, this fix makes things a little bit more robust in my opinion by checking if test mode OR the test server was specified, rather than only checking which server (gateway_mode) was specified if the test key was absent.

Alternatively, you could probably make Spree only pass the test key if the value was true. Either way, if you are trying to send test orders to the test environment for a Spree site of at least some versions and have not implemented one of these changes, you will be unable to do so until you add a similar fix as I have described here. If you need any further assistance, please reach out to us at ask@endpoint.com.