End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Login shells in scripts called from cron

The problem

I would guess that almost anyone who has set up a cron job has also had a cron job not work for initially mysterious reasons that often stem from cron running in a minimal environment very different from the same user's normal login shell. For example, cron typically runs with:

  • SHELL=/bin/sh
  • PATH=/sbin:/bin:/usr/sbin:/usr/bin

Whereas a common login shell has this in its environment, often with much more in the PATH:

  • SHELL=/bin/bash
  • PATH=/usr/local/bin:/bin:/usr/bin:$HOME/bin

/bin/sh may be bash, but bash behaves differently when invoked as sh. It may also be another shell such as dash. And the impact of PATH differences is obvious, not just on the commands in the crontab but also in scripts they invoke.

I've been dealing with this for many years but it reached a new level of frequency with new systems like rvm, rbenv, Perlbrew, and pyenv, among others, which depend on the environment or shell aliases being modified.

The benefits of such multi-version local user-installed software are obvious, but the downside is that you have users installing various software that ends up being used in production, without sufficient wariness of production gotchas such as:

  • the vagaries of things not running the same in cron
  • starting services automatically at boot time
  • service monitoring
  • routing automated and bounced email
  • logfile rotation
  • and most of all, verifying all of the above things are not only set up, but actually work as expected in the days to follow when you're not watching closely.

This is not to say that system administrators, or their brutal new DevOps 2.0 overlords, get this all right all the time. But usually they have more experience with it and know more what to watch out for.

The scenario

Now to my point: When writing e.g. a start-app script that runs a Rails or Dancer site running in rbenv or perlbrew, developers often get it working nicely from their shell, then put it in a @reboot cron job, and leave. And that almost never works at reboot. At the next maintenance reboot, sysadmins discover via monitoring or manual testing that the daemon didn't start. If sysadmins are in a hurry or being a little lazy, they'll go make a "fix" to the crontab to change something or other, but also not test it, and it still won't work after the next reboot. We get it right fairly often, but we also get it wrong more often than we'd like.

The solution

This has a complicated and supposedly correct fix that makes the tinker/test loop take a long time.

It also has a simple, always-consistent, and supposedly incorrect fix that works very well in our environment.

  1. Make sure your start-app script is marked executable: chmod +x
  2. Give it the proper shebang line: #!/bin/bash, most often, unless you really know how to keep with the classic Bourne shell subset and use #!/bin/sh
  3. Invoke the script directly in your crontab, e.g. bin/start-app, not with a manually-specified shell such as /bin/bash bin/start-app, etc. The goal is to always run the script the same way, and have that way encapsulated in the script itself.
  4. Finally, since the problem is that cron is using a different environment, and you always have your login environment working with rbenv, perlbrew, etc., fix that by making your script's shebang line invoke a full login shell: #!/bin/bash -l

Yes, it's that simple: #!/bin/bash -l and a direct invocation are all it takes.

It's true that this is abusing the idea of a "login shell", since it's not interactive. That has never mattered for us in our fairly plain-vanilla bash setups we use in production server accounts. It has the virtue of being exactly what a developer expects. It avoids repetition of environment setups or version-specific invocations in the crontab, which can become out of date quickly since it is easily forgotten. It follows the wise dicta "different makes a difference" (#21 in Practices of an Agile Developer) and "don't repeat yourself" (from the book The Pragmatic Programmer).

None of this is as clean as having RHEL/CentOS RPM or Debian/Ubuntu .deb system packages for the language and its modules, but I don't know of anything like the essential Ruby Bundler that works with system packages, and coexisting multiple system-wide versions of Ruby, Perl, Python, etc. are rare. So we have to work with this for now.

Out in the wilds of the web, several projects and individuals adopt this same approach. Their rationales can be summed up as: it works, and I don't see any actual downsides.

Impure!

The dissenting voices such as Daniel Szmulewicz on cron jobs using rvm are eloquent, but all seem to focus on the "ickiness" of starting a noninteractive shell session as a login session. The main actual problem reported is an rvm corner case where rvm goes into an infinite loop. But this appears to me to have been a bug in rvm, and one of several rvm infinite loop bugs that led us to move to rbenv and away from rvm's tightly-coupled shell trickery in the first place.

Further study is worthwhile to understand how all these pieces fit together. I recommend your own shell's manpage, rbenv's Unix shell initialization overview, and this StackOverflow answer on exposing cron's run environment.

Honor your elders (and others)

What an odd topic, you might be thinking. Some time ago, I began teaching my elders (who are rapidly becoming my contemporaries, curse you, Father Time) how to navigate the Internet. After almost two years of this, I've become more sensitized to seeing the 'Net from their POV.

Here's a small fragment of a page I saw today (actual size):

The page numbers and arrowheads are the only navigation elements on this huge, complex page that allow you to move around the list of products. The numbers are an 11px font, and the arrowheads are 6px by 8px. For a senior citizen, or indeed anyone with less than perfect vision and eye-hand coordination, this is pretty challenging to find on the page, let alone use.

Here's a quick challenge for you as web designer: try dimming your screen down a bit, to the point where it's noticeably uncomfortable to use. Does your page still seem easy to navigate? Now try switching from your dominant hand to your other hand, and try clicking your page navigation elements (not just buttons, but drop-downs, menus, etc.). Still usable? If not, you may be designing your page to keep away some users. Just a thought.

GnuPG: list all recipients of a message

At End Point we frequently use GnuPG for PGP-compatible secure data storage and delivery, both internally and with some of our clients.

For many years, GnuPG 1 has been the standard for Unix-like operating systems such as Linux and Mac OS X, as well as Windows. Relatively new is GnuPG 2, which is a modularized version but not (yet) a replacement. It's often built and installed as "gpg2" so it can coexist with trusty old "gpg" version 1. I mention this to raise awareness, since it seems to be little known.

When you have an encrypted file, how can you see who the recipients are who will be able to decrypt it? It's easy enough to test if you can decrypt it, by just trying and seeing if it lets you. But what if you want to confirm others can see it before you send it to them? The manpage shows this option:

--list-only

Changes the behaviour of some commands. This is like --dry-run but different in some cases. The semantic of this command may be extended in the future. Currently it only skips the actual decryption pass and therefore enables a fast listing of the encryption keys.

That sounds like the answer. And it almost is. However, for no reason I can discern, it doesn't show any recipients who have a secret key in the keyring of the running GnuPG instance! They just aren't included. We can't simply assume we are recipients, either, because there's no visible difference between not being a recipient and being one with it being omitted.

I've looked for an answer to this before, and found people saying --list-only does include everyone, but for both gpg 1 and 2 that just isn't true for me.

Taking desperate measures, I moved my ~/.gnupg/secring.gpg away and then it worked fine, because it can no longer see my secret keys, so I'm like any other recipient.

Now, to achieve that same effect without actually moving the secret keyring around. Here's how:

gpg --list-only --no-default-keyring --secret-keyring /dev/null $infile

I'd love to hear of any easier way to achieve this, but in the meantime, that works.

(Cute safe icon by VisualPharm.)

End Point Liquid Galaxy Projects at Google I/O 2013

This last week End Point participated in the Google I/O conference for the third year in a row. As the lead agency for Liquid Galaxy development and deployment, our engineers were active in the development efforts for the two Liquid Galaxy systems that were showcased at the conference this year.

We sent two of our rock stars to the show, Kiel and Matt. This year both Liquid Galaxies used Google Maps API functionality in the browser rather than the stand-alone Google Earth app:

  • Treadmill-driven Google Trekker: Working with Sparks Online, this showed a treadmill connected to the Google Trekker Trails panoramic imagery. Walking on the treadmill moves the view forward through the Bright Angel Trail in Grand Canyon. The tricky part being the curves in the trail, especially switchbacks: with no mouse to adjust the view, how to keep the view on the path when the input (the movement of the treadmill) was just “straight forward”? Our engineer, Kiel, used functions around Maps API data to automatically calculate the “closest frame” that would be next in line, and then force-feeds it to the Trail View, so “forward” is always centered on the path, no matter if the next frame is actually five to ten degrees (or in the case of the switchbacks, up to 150 degrees) left or right of center.
  • WebGL Skydiving Game: In support of the creative agency Instrument, Matt provided expert consulting leading up to the show on the Liquid-Galaxy-enabling of the WebGL game demo Instrument developed that allows people to “skydive” down thru a series of rings suspended in mid-air. Maybe it’s just better to show the game:

    End Point’s full support included equipment rental to Instrument for development, developing and configuring the new Liquid Galaxy used in the Skydiving game, setting up the Liquid Galaxies at the show, onsite support, and repacking of the systems at the conclusion of the conference. We make things as turnkey as possible.

Kiel said the following: “Google I/O is getting bigger, more interesting, and more packed every year. The GLASS track was a lot of fun”. Matt added “Google is as committed as ever to user experience, and they happily share all of their tricks with developers year after year.”

Matt and Kiel appreciated the hands-on support of Andreu Ibàñez, from End Point’s partner, Ponent 2002, in the physical setup of the Liquid Galaxies we installed at the show and his sharing in the staffing of the skydiving exhibit.

Feedback from show attendees was overwhelmingly positive, the whole Google Maps area drawing quite a crowd to take part in each experience.

End Point welcomes opportunities to work with creative agencies and event planners to build unique and compelling visualization experiences that utilize the Liquid Galaxy platform. Please contact us at ask@endpoint.com if you’ve got an idea. Also, see our Liquid Galaxy website to see some of the many uses for the system.

Travis build log doesn't display

Remember when "clear your cookies and try it again" was the first suggestion when a webpage was behaving badly? I remember that time of darkness, that time of early Internet Explorer, well, existing. I remember it being the only browser allowed in some offices and even being mandatory for some major websites. Remember that? Pepperidge Farm remembers.

But we've evolved. These are brighter days. Around these parts, "Did you clear your cookies?" is typically only said in jest. So, imagine my surprise when I accidentally discovered that clearing my cookies was exactly what resolved my issue with our Travis-CI.org build logs failing to display. Seriously. Imagine it. Go ahead, I'll wait.

On March 21st 2013, the beautiful and talented Travis CI service deployed a bad build of their own app. It contained a bug that caused build logs to fail to display. You could still see the builds and statuses under the Build History tab, but never any logs. This was right about the same time I had pushed a big refactor that used a new config file format for our app. It passed all our tests locally, but it was driving me nuts that I couldn't find out why it was failing on Travis. It was also displaying that sad-trombone Build: Failing image in our GitHub repo's README.md.

The Travis crew actually fixed their bug just a few hours after it was discovered, but the issue persisted for me for a few days. I was confident enough in our local integration tests that I didn't roll back, but it was driving me nuts. I believe it was Socrates that said, "It's only after we've lost everything that we're free to do anything." So that's what I did - I cleared my cookies for no rational reason, and it worked. Travis logs, both old and new, started displaying correctly. Bam.

Lessons learned: It's important to appreciate the classics (and also to subscribe to Travis updates.)

Isolation Test Helper for Rails Development

Lately I've been inspired to test drive my development as much as possible. One thing that is absolutely critical for test driven development is fast feedback from your tests. Because of this, I try to remove the "Rails" dependency as often as possible from my tests so I don't have to wait for Rails to load for my tests to run. Sure, I could use spork or zeus to pre-load Rails, but I find that those tools don't always reload the files I'm working on. Besides, I believe that much of your application should be plain old ruby objects that are well designed.

One of the things I continually bump against with isolated tests is that there are a few things that I always have to do to get my isolated tests to work. Since we are accustomed to requiring spec_helper.rb or test_helper.rb for tests dependent on Rails, I decided to build a helper for when I run isolated tests to just load some niceties that make running them a little easier.

So, here's the full code from my isolation helper (this one works with RSpec).

# spec/isolation_helper.rb
DO_ISOLATION = ! defined?(Rails)

def isolate_from_rails(&block)
  return unless DO_ISOLATION
  block.call
end

isolate_from_rails do
  require 'awesome_print'
  require 'active_support/all'
  require 'ostruct'
  ap "You are running isolated from Rails!"

  # swallow calls to Rails
  class ::Rails
    def self.root; File.expand_path("../", File.dirname(__FILE__)); end
    def self.method_missing(a,*b); self; end
  end

  # Some RSpec config
  RSpec.configure do |config|
    config.treat_symbols_as_metadata_keys_with_true_values = true
    config.filter_run :focus => true
    config.run_all_when_everything_filtered = true
  end
end

I'm going to walk through this a little as so that you can understand why I'm doing some of these things. First off, I don't want to run isolated if Rails is actually loaded. This ensures that my tests pass when run as a whole and by themselves. So the DO_ISOLATION constant just lets me know whether Rails is loaded or not.

The second part of the helper is where I setup a little method called isolate_from_rails. This is just a method that can be used to hide things from Rails during my isolated tests. For example:

require 'isolation_helper'

isolate_from_rails do 
  class Product; end
end

describe MyProductValidator do

  it "ensures that invalid records return false" do
    Product.stub :find_by_name { OpenStruct.new(:product, :name => "invalid", :valid => false) }
    pv = MyProductValidator.validate Product.find_by_name("invalid")
    pv.valid.should be_false
  end

end

This is obviously a contrived example. But what I want here is to make sure when I run in isolation that I have the behavior of my models stubbed correctly. So I isolate the model from Rails and stub the behavior I want to test against. This test will pass whether Rails is loaded or not. (I know, this is just testing my stub, but I'm attempting to demonstrate an example, not real running code.)

The next thing is I do in the isolation_helper is use the isolate_from_rails method to setup some commonly used things I use in my isolated tests. First are the requires. awesome_print is a handy gem that makes it easy to spit stuff out to SDTOUT in a pretty way. Think pp on steroids. Then I load active_support. This one is optional, but I find that active_support really doesn't take all that long to load and it's worth it to be able to use all the niceties that Rails provides such as blank? and present? methods.

The ostruct library is very nice for stubbing or mocking object dependencies using the OpenStruct class. I've already given an example above of how nice it can be for quickly stubbing out a model, but it works great for just about any object.

Next I just put a friendly reminder out there using awesome_print that I'm running isolated.

One of the biggest annoyances with running isolated tests are logging statements. If I'm testing a class in isolation, I don't really want to see the output of all the Rails.logger.info statements in the code. I also want to make sure Rails dependencies aren't turning up in my code. The next lines in the helper simply swallow all method calls made to Rails. (BTW, this include Rails.cache which is why I always wrap my calls to the rails cache in a facade, so it's easy to switch out during testing to ensure caching is working. I'll have to explain further in another blog post.) The only method left behind is is Rails.root and I leave that there just in case I need to get at the root of the application easily.

The next block of code are some configuration settings for RSpec. These can be removed if you are using TestUnit or MiniTest.

Overall, I've really liked the freedom this little helper gives me during my test development. It's pretty straight-forward and doesn't do anything too magical. It can also easily be added upon, just like spec_helper.rb or test_helper.rb if needed. If you have anything you think should or shouldn't be added in there, please feel free to leave a comment. I'm always looking for a way to improve things.

Making Python Code a Little Bit Cleaner

When you develop a program in a group of programmers, it is really important to have some standards. Especially helpful are standards of naming things and formatting code. If all team members format the code in the same way and use consistent names, then it is much easier to read the code. This also means that the team works faster.

The same rules apply when you develop software in Python.

For Python there is a document which describes some of the most desirable style features for Python code Style Guide for Python Code. However there are some problems with that, as even the standard Python library has some libraries which are not consistent. This shouldn’t be an excuse for your team to be inconsistent as well. Ensuring that the code is nice and readable is worth working for a moment on that.

In Python there are two tools which I use for writing code in Python – Python style guide checker and Python code static checker.

pep8

Program pep8 is a simple tool checking Python code against some of the style conventions in PEP 8 document.

Installation

You can install it within your virtual environment with simple:

pip install pep8

Usage

Let’s test the pep8 command on such an ugly Python file named test.py:

"this is a very long comment line this is a very long comment line this is a very long comment line"
def sth  (  a ):
    return  "x"+a
def sth1 ( a,b,c):
    a+b+c

The basic usage of the program is:

pep8 test.py

The above command prints:

test.py:1:80: E501 line too long (100 > 79 characters)
test.py:2:1: E302 expected 2 blank lines, found 0
test.py:2:11: E201 whitespace after '('
test.py:2:14: E202 whitespace before ')'
test.py:2:8: E211 whitespace before '('
test.py:3:16: E225 missing whitespace around operator
test.py:4:1: E302 expected 2 blank lines, found 0
test.py:4:11: E201 whitespace after '('
test.py:4:13: E231 missing whitespace after ','
test.py:4:15: E231 missing whitespace after ','
test.py:4:9: E211 whitespace before '('
test.py:5:6: E225 missing whitespace around operator
test.py:5:8: E225 missing whitespace around operator
test.py:6:1: W391 blank line at end of file

Configuration

Pep8 is highly configurable. The most important options allow to choose which errors should be ignored. For this there is an argument --ignore. There is also one thing in PEP8 document, which I don’t agree with. This document states that the length of the line shouldn’t be bigger than 80 characters. Usually terminals and editors I use are much wider and having 100 characters doesn’t make your program unreadable. You can set the allowed length of your line with --max-line-length.

So if I want to ignore the errors about empty lines at the end of file and set maximum line length to 100, then the whole customized command is:

pep8 --ignore=W391 --max-line-length=100  test.py 

The output is different now:

test.py:2:1: E302 expected 2 blank lines, found 0
test.py:2:11: E201 whitespace after '('
test.py:2:14: E202 whitespace before ')'
test.py:2:8: E211 whitespace before '('
test.py:3:16: E225 missing whitespace around operator
test.py:4:1: E302 expected 2 blank lines, found 0
test.py:4:11: E201 whitespace after '('
test.py:4:13: E231 missing whitespace after ','
test.py:4:15: E231 missing whitespace after ','
test.py:4:9: E211 whitespace before '('
test.py:5:6: E225 missing whitespace around operator
test.py:5:8: E225 missing whitespace around operator

Config file

The same effect can be achieved using a config file. PEP8 searches for this file at the project level, the file must be named .pep8 or setup.cfg. If such a file is not found, then it looks for a file ~/.config/pep8. Only the first file is taken into consideration. After finding a file, it looks for a pep8 section in, if there is no such section, then no custom settings are used.

To have the same settings as in the above example, you can create a file .pep8 in the project directory with the following content:

[pep8]
ignore = W391
max-line-length = 100

The list of all all possible errors you can find at PEP8 documentation page.

Statistics

Another nice option which I use for checking the code is --statistics. It prints information about the type and number of problems found. I use it along with -qq option which causes pep8 to hide all other informations. The sort -n 1 -k -r part sorts the pep8 output in reverse order (biggest numbers come first) by first column treating that as numbers:

pep8 --statistics -qq django | sort -k 1 -n -r

The first 10 lines of the above command run against Django 1.5.1 code look like:

4685    E501 line too long (80 > 79 characters)
1718    E302 expected 2 blank lines, found 1
1092    E128 continuation line under-indented for visual indent
559     E203 whitespace before ':'
414     E231 missing whitespace after ','
364     E261 at least two spaces before inline comment
310     E251 no spaces around keyword / parameter equals
303     E701 multiple statements on one line (colon)
296     W291 trailing whitespace
221     E225 missing whitespace around operator

pylint

Pylint is a program very similar to pep8, it just checks different things. The pylint’s goal is to look for common errors in programs and find potential code smells.

Installation

You can install pylint in a similar way as pep8:

pip install pylint

Usage

Usage is similar as well:

pylint --reports=n test.py 

Notice there is --reports argument. Without it, the output is much longer and quiet messy.

The output of the above command is:

No config file found, using default configuration
************* Module test
C:  1,0: Line too long (100/80)
C:  2,0:sth: Invalid name "a" for type argument (should match [a-z_][a-z0-9_]{2,30}$)
C:  2,0:sth: Missing docstring
C:  2,12:sth: Invalid name "a" for type variable (should match [a-z_][a-z0-9_]{2,30}$)
C:  4,0:sth1: Comma not followed by a space
def sth1 ( a,b,c):
            ^^
C:  4,0:sth1: Invalid name "a" for type argument (should match [a-z_][a-z0-9_]{2,30}$)
C:  4,0:sth1: Invalid name "b" for type argument (should match [a-z_][a-z0-9_]{2,30}$)
C:  4,0:sth1: Invalid name "c" for type argument (should match [a-z_][a-z0-9_]{2,30}$)
C:  4,0:sth1: Missing docstring
C:  4,11:sth1: Invalid name "a" for type variable (should match [a-z_][a-z0-9_]{2,30}$)
C:  4,13:sth1: Invalid name "b" for type variable (should match [a-z_][a-z0-9_]{2,30}$)
C:  4,15:sth1: Invalid name "c" for type variable (should match [a-z_][a-z0-9_]{2,30}$)
W:  5,4:sth1: Statement seems to have no effect

Configuration

For pylint you can decide which problems should be ignored as well. If I want to ignore some errors, you have to know its number first. You can get the number in two ways, you can check at pylint errors list or add the message number with argument --include-ids=y:

pylint --reports=n --include-ids=y test.py 
No config file found, using default configuration
************* Module test
C0301:  1,0: Line too long (100/80)
C0103:  2,0:sth: Invalid name "a" for type argument (should match [a-z_][a-z0-9_]{2,30}$)
C0111:  2,0:sth: Missing docstring
C0103:  2,12:sth: Invalid name "a" for type variable (should match [a-z_][a-z0-9_]{2,30}$)
C0324:  4,0:sth1: Comma not followed by a space
def sth1 ( a,b,c):
            ^^
C0103:  4,0:sth1: Invalid name "a" for type argument (should match [a-z_][a-z0-9_]{2,30}$)
C0103:  4,0:sth1: Invalid name "b" for type argument (should match [a-z_][a-z0-9_]{2,30}$)
C0103:  4,0:sth1: Invalid name "c" for type argument (should match [a-z_][a-z0-9_]{2,30}$)
C0111:  4,0:sth1: Missing docstring
C0103:  4,11:sth1: Invalid name "a" for type variable (should match [a-z_][a-z0-9_]{2,30}$)
C0103:  4,13:sth1: Invalid name "b" for type variable (should match [a-z_][a-z0-9_]{2,30}$)
C0103:  4,15:sth1: Invalid name "c" for type variable (should match [a-z_][a-z0-9_]{2,30}$)
W0104:  5,4:sth1: Statement seems to have no effect

Now I know the number of the problem I want to ignore, let’s assume it is C0103, then I can ignore it with:

pylint --reports=n --include-ids=y --disable=C0103 test.py 
No config file found, using default configuration
************* Module test
C0301:  1,0: Line too long (100/80)
C0111:  2,0:sth: Missing docstring
C0324:  4,0:sth1: Comma not followed by a space
def sth1 ( a,b,c):
            ^^
C0111:  4,0:sth1: Missing docstring
W0104:  5,4:sth1: Statement seems to have no effect

Config file

Pylint also supports setting the options in a config file. This config file can be a little bit complicated, and I think the best way is to let pylint generate the file, this can be done with the --generate-rcfile argument:

pylint --reports=n --include-ids=y --disable=C0103 --generate-rcfile > .pylint

This will create config file with all default settings and the changes from the command line.

To use the new config file, you should use the --rcfile argument:

pylint --rcfile=.pylint test.py

Remarks

Pylint is great – sometimes even too great.

I usually ignore many of the errors, as too often the changes needed to satisfy pylint are not worth time spending on them. One of common problems found by pylint is that the variable name is too short. It has a rule that all the names should have between 2 and 30 characters. There is nothing bad with one letter variable, especially when it is something like Point(x, y) or it is a small local variable, something like for i in xrange(1,1000).

However on the other hand when a variable has much broader usage, or it should have some meaningful name to have code easier to read, it is a good idea to change the code.

For me it is good to have pylint checking such errors, so I don’t want pylint to ignore them. Sometimes it is OK to have code which violates those rules, so I just ignore them after ensuring that it is on purpose.

Adventures with using Ruby 2.0 and libreadline

I was asked to develop a prototype app for one of our clients lately. The basis for this app was an old Rails app:

  • Rails 3.2.8
  • RailsAdmin
  • MySQL
  • rbenv + ruby-build

I wanted to upgrade the stack to work with latest toys all cool kids are so thrilled about. I also didn’t have Rails console facility at my disposal since the Ruby version installed on the development machine hadn’t been compiled against libreadline.

Not having root or sudo access on the machine I embarked on a sligthly hacky journey to make myself a better working environment.

Ruby 2.0

After reading Mike Farmer’s blog post about Ruby 2.0 and tons of other material about it on the Internet, I wanted to get a feeling of how faster & greater the new Ruby is. It’s always great also to stay up-to-date with latest technologies. It’s great for me as a developer, and more importantly - it’s great for our clients.

Importance of libreadline in development with Ruby

To be productive developing any Rails-based application, we have to have Rails-console available at any moment. It serves a multitude of purposes. It’s also a great scratch-pad when developing methods.

While you don’t need your Ruby to support libreadline for basic uses of irb, you need it when using with Rails.

Installing Ruby 2.0.0 with rbenv (ruby-build)

If you’ve installed ruby-build some time ago, chances are that you need to update it in order to be able to install latest build of Ruby 2.0.0

To do it:

cd ~/.rbenv/plugins/ruby-build
git pull

And you should be able now to have available latest Ruby build to install:

rbenv install 2.0.0-p195

If you want to install Ruby compiled with support for libreadline, you have to have it installed in your system before compiling the build with rbenv install.

If you have access to root or sudo on your system, the easiest way is to e. g:

on Debian-related Linuxes:

apt-get install libreadline-dev

or on Fedora:

yum install readline-devel

Installing libreadline from sources

In my case - I had to download sources and compile them myself. Luckily the system had all needed essential packages installed for building it.

wget "ftp://ftp.cwru.edu/pub/bash/readline-6.2.tar.gz"
tar xvf readline-6.2.tar.gz
cd readline-6.2
./configure --prefix=/home/kamil/libs
make
make install

I had to specify –prefix option pointing at the path where I wanted the libreadline library to be installed after compilation.

Then, I was able to actually build Ruby with readline support “on”:

CONFIGURE_OPTS="--with-readline-dir=/home/kamil/libs" rbenv install 2.0.0-p195

Notice: I was making myself a development environment and compiling from sources was my last resort. It is not a good practice for production environments.

Last thing I needed to do was to get rb-readline working with the project I was working on.

It turnes out that latest rb-readline doesn’t play well with latest Ruby. Also, when using Ruby 2.0.0 one have to explicitely specify it in the Gemfile, or else it won’t be loaded for the console.

Gemfile:

gem 'rb-readline', '~> 0.4.2'

This still isn’t perfect

While this setup works, it won’t let you use arrow keys. The irb process crashes quickly after even first try to navigate through the text.

For some reason, after upgrading Ruby, the RailsAdmin stylesheets stopped working. I noticed that they are being served with comments which should be replaced by other stylesheets like:

/* ...
*= require_self
*= require_tree .
*/

I had to update Rails version in the Gemfile to have my admin back:

Gemfile:

gem 'rails', '3.2.13'

Console:

bundle

Last thing I wanted to do, was to try if I could upgrade Rails even further and have a working Rails4 setup. This was impossible unfortunately since RailsAdmin isn’t yet compatible with it as stated here.

I conclude that latest Ruby is quite usable right now. If you don't mind the quirks with the readline - you're pretty safe to upgrade. This assumes though that your app doesn't use any incompatible elements.

The main Ruby site describes them like so:

There are five notable incompatibilities we know of:

  • The default encoding for ruby scripts is now UTF-8 [#6679]. Some people report that it affects existing programs, such as some benchmark programs becoming very slow [ruby-dev:46547].
  • Iconv was removed, which had already been deprecated when M17N was introduced in ruby 1.9. Use String#encode, etc. instead.
  • There is ABI breakage [ruby-core:48984]. We think that normal users can/should just reinstall extension libraries. You should be aware: DO NOT COPY .so OR .bundle FILES FROM 1.9.
  • #lines, #chars, #codepoints, #bytes now returns an Array instead of an Enumerator [#6670]. This change allows you to avoid the common idiom “lines.to_a”. Use #each_line, etc. to get an Enumerator.
  • Object#inspect does always return a string like #<ClassName:0x…> instead of delegating to #to_s. [#2152]
  • There are some comparatively small incompatibilities. [ruby-core:49119]

Breaking Up Your Asset Pipeline

The Rails Asset Pipeline to me is kind of like Bundler. At first I was very nervous about it and thought that it would be very troublesome. But after a while of using it, I realized the wisdom behind it and my life got a lot easier. The asset pipeline is fabulous for putting all your assets into a single file, compressing them, and serving them in your site without cluttering everything up. Remember these days?

<%= javascript_include_tag 'jquery.validate.min.js','jquery.watermark.min.js','jquery.address-1.4.min','jquery.ba-resize.min','postmessage','jquery.cookie','jquery.tmpl.min','underscore','rails','knockout-1.3.0beta','knockout.mapping-latest' %>

A basic component of the Asset Pipeline is the manifest file. A manifest looks something like this

// Rails asset pipeline manifest file
// app/assets/javascript/application.js

//= require jquery
//= require jquery_ujs
//= require_tree .

For a quick rundown on how the Asset Pipeline works, I highly recommend taking a few minutes to watch Railscast episode 279. But there are a few things that I'd like to point out here.

First, notice that the name of the file is application.js. This means that all the javascript specified in the manifest file will be compiled together into a single file named application.js. In production, if specified, it will also be compressed and uglified. So rather than specifying a really long and ugly javascript_include_tag, you just need one:

<%= javascript_include_tag "application.js" %>

Second, this means that if you only have one manifest file in your application, then all of your javascript will be loaded on every page, assuming that your javascript_include_tag is loading in the head of your layout. For small applications with very little javascript, this is fine, but for large projects where there are large client-side applications, this could be a problem for performance. For example, do you really need to load all of ember.js or backbone.js or knockout.js in the admin portion of your app when it isn't used at all? Granted, these libraries are pretty small, but the applications that you build that go along with them don't need to be loaded on every page.

So I thought there should be a way to break up the manifest file so that only javascript we need is loaded. The need for this came when I was upgrading the main functionality of a website to use the Backbone.js framework. The app was large and complex and the "single-page-application" was only a part of it. I didn't want my application to load on the portion of the site where it wasn't need. I looked high and low on the web for a solution to this but only found obscure references so I thought I would take some time to put the solution out there cleanly hoping to save some of you from trying to figure it out on your own.

The solution lies in the first point raised earlier about the asset pipeline. The fact is, you can create a manifest file anywhere in your assets directory and use the same directives to load the javascript. For example, in the app/assets/javascripts/ directory the application.js file is used by default. But it can be named anything and placed anywhere. For my problem, I only wanted my javascript application loaded when I explicitly called it. My directory structure looked like this:

|~app/
| |~assets/
| | |+fonts/
| | |+images/
| | |~javascripts/
| | | |+admin/
| | | |+lib/
| | | |+my_single_page_app/
| | | |-application.js

My application.js file looks just like the one above but I removed the require_tree directive. This is important because now I don't want all of my javascript to load from the application.js. I just want some of the big stuff that I use all over like jQuery. (Ok, I also added the underscore library in there too because it's just so darn useful!) This file is loaded in the head of my layout just as I described above.

Then, I created another manifest file named load_my_app.js in the root of my_single_page_app/. It looks like this:

//= require modernizr
//= require backbone-min
//= require Backbone.ModelBinder
//= require my_single_page_app/my_app
//= require_tree ./my_single_page_app/templates
//= require_tree ./my_single_page_app/models
//= require_tree ./my_single_page_app/collections
//= require_tree ./my_single_page_app/views
//= require_tree ./my_single_page_app/utils

Then in my view that displays the single page app, I have these lines:

<% content_for :head do %>
  <%= javascript_include_tag "my_single_page_app/load_my_app.js" %>
<% end %>

Now my single page application is loaded into the page as my_single_page_app/load_my_app.js by the Asset Pipeline and it's only loaded when needed. And a big bonus is that I don't need to worry about any of the code that I wrote or libraries I want to use interfering with the rest of the site.

Selenium Testing File Uploads in Django Admin

The Django framework version 1.4 added much better integration with Selenium for in-browser functional testing. This made Test-Driven Development an even more obvious decision for our new Liquid Galaxy Content Management System. This went very well until we needed to test file uploads in the Django admin interface.

A browser's file upload control has some unique security concerns that prevent JavaScript from setting its value. Trying to do so may raise INVALID_STATE_ERR: DOM Exception 11. Selenium's WebDriver may sometimes send keystrokes directly into the input element, but this did not work for me within Django's admin interface.

To work around this limitation, Ryan Kelly developed a Middleware to emulate successful file uploads for automated testing. This middleware inserts additional hidden fields into any forms sent to the client. Setting their value causes a file upload to happen locally on the server. (I used a slightly newer version of this Middleware from another project.)

However, Selenium intentionally will not interact with hidden elements. To work around this, we must send JavaScript to be executed directly in the browser using WebDriver's execute_script method. You can see an example of this here.

        self.browser.execute_script("document.getElementsByName('fakefile_storage')[0].value='placemark_end_point.kml'")

This is a lot of hoops to jump through, but now we have functional tests for file uploads and their post-upload processing. Hopefully the Selenium or Django projects can develop a better-supported method for file upload testing.

Foreign Data Wrappers

Original images from Flickr user jenniferwilliams

One of our clients, for various historical reasons, runs both MySQL and PostgreSQL to support their website. Information for user login lives in one database, but their customer activity lives in the other. The eventual plan is to consolidate these databases, but thus far, other concerns have been more pressing. So when they needed a report combining user account information and customer activity, the involvement of two separate databases became a significant complicating factor.

In similar situations in the past, using earlier versions of PostgreSQL, we've written scripts to pull data from MySQL and dump it into PostgreSQL. This works well enough, but we've updated PostgreSQL fairly recently, and can use the SQL/MED features added in version 9.1. SQL/MED ("MED" stands for "Management of External Data") is a decade-old standard designed to allow databases to make external data sources, such as text files, web services, and even other databases look like normal database tables, and access them with the usual SQL commands. PostgreSQL has supported some of the SQL/MED standard since version 9.1, with a feature called Foreign Data Wrappers, and among other things, it means we can now access MySQL through PostgreSQL seamlessly.

The first step is to install the right software, called mysql_fdw. It comes to us via Dave Page, PostgreSQL core team member and contributor to many projects. It's worth noting Dave's warning that he considers this experimental code. For our purposes it works fine, but as will be seen in this post, we didn't push it too hard. We opted to download the source and build it, but installing using pgxn works as well:

$ env USE_PGXS=1 pgxnclient install mysql_fdw
INFO: best version: mysql_fdw 1.0.1
INFO: saving /tmp/tmpjrznTj/mysql_fdw-1.0.1.zip
INFO: unpacking: /tmp/tmpjrznTj/mysql_fdw-1.0.1.zip
INFO: building extension
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I/usr/include/mysql -I. -I. -I/home/josh/devel/pg91/include/postgresql/server -I/home/josh/devel/pg91/include/postgresql/internal -D_GNU_SOURCE -I/usr/include/libxml2   -c -o mysql_fdw.o mysql_fdw.c
mysql_fdw.c: In function ‘mysqlPlanForeignScan’:
mysql_fdw.c:466:8: warning: ‘rows’ may be used uninitialized in this function [-Wmaybe-uninitialized]
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -shared -o mysql_fdw.so mysql_fdw.o -L/home/josh/devel/pg91/lib -L/usr/lib  -Wl,--as-needed -Wl,-rpath,'/home/josh/devel/pg91/lib',--enable-new-dtags  -L/usr/lib/x86_64-linux-gnu -lmysqlclient -lpthread -lz -lm -lrt -ldl
INFO: installing extension
< ... snip ... >

Here I'll refer to the documentation provided in mysql_fdw's README. The first step in using a foreign data wrapper, once the software is installed, is to create the foreign server, and the user mapping. The foreign server tells PostgreSQL how to connect to MySQL, and the user mapping covers what credentials to use. This is an interesting detail; it means the foreign data wrapper system can authenticate with external data sources in different ways depending on the PostgreSQL user involved. You'll note the pattern in creating these objects: each simply takes a series of options that can mean whatever the FDW needs them to mean. This allows the flexibility to support all sorts of different data sources with one interface.

The final step in setting things up is to create a foreign table. In MySQL's case, this is sort of like a view, in that it creates a PostgreSQL table from the results of a MySQL query. For our purposes, we needed access to several thousand structurally identical MySQL tables (I mentioned the goal is to move off of this one day, right?), so I automated the creation of each table with a simple bash script, which I piped into psql:

for i in `cat mysql_tables`; do
    echo "CREATE FOREIGN TABLE mysql_schema.$i ( ... table definition ...)
        SERVER mysql_server OPTIONS (
            database 'mysqldb',
            query 'SELECT ... some fields ... FROM $i'
        );"
done

In a step not shown above, this script also consolidates the data from each table into one, native PostgreSQL table, to simplify later reporting. In our case, pulling the data once and reporting on the results is perfectly acceptable; in other words, data a few seconds old wasn't a concern. We also didn't need to write back to MySQL, which presumably could complicate things somewhat. We did, however, run into the same data validation problems PostgreSQL users habitually complain about when working with MySQL. Here's an example, in my own test database:

mysql> create table bad_dates (mydate date);
Query OK, 0 rows affected (0.07 sec)

mysql> insert into bad_dates values ('2013-02-30'), ('0000-00-00');
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

Note that MySQL silently transformed '2013-02-30' into '0000-00-00'. Sigh. Then, in psql we do this:

josh=# create extension mysql_fdw;
CREATE EXTENSION

josh=# create server mysql_svr foreign data wrapper mysql_fdw options (address '127.0.0.1', port '3306');
CREATE SERVER

josh=# create user mapping for public server mysql_svr options (username 'josh', password '');
CREATE USER MAPPING

josh=# create foreign table bad_dates (mydate date) server mysql_svr options (query 'select * from test.bad_dates');
CREATE FOREIGN TABLE

josh=# select * from bad_dates ;
ERROR:  date/time field value out of range: "0000-00-00"

We've told PostgreSQL we'll be feeding it valid dates, but MySQL's idea of a valid date differs from PostgreSQL's, and the latter complains when the dates don't meet its stricter requirements. Several different workarounds exist, including admitting that '0000-00-00' really is wrong and cleaning up MySQL, but in this case, we modified the query underlying the foreign table to fix the dates on the fly:

SELECT CASE disabled WHEN '0000-00-00' THEN NULL ELSE disabled END,
    -- various other fields
    FROM some_table

Fortunately this is the only bit of MySQL / PostgreSQL impedance mismatch that has tripped us up thus far; we'd have to deal with any others we found individually, just as we did this one.

Lanyrd: Finding conferences for the busy or travel-weary developer

Recently I had planned to attend a nearby technical conference on a weekend, but my plans fell through. As a result, my supervisor encouraged me to find a replacement, but having been out of the "free T-shirt and all the presentations you can stay awake through" circuit for many years, I didn't have any ideas of where to start.

I wanted to filter the conferences by topic: no sense attending a Web Development conference if all the presentations were far afield from what I do; I'm not a Ruby developer at present, and have no immediate plans to become one, so not much point in attending a deep exploration of that topic.

I also wanted to stay local: if there's something I can get to by car in a day, I'd prefer it.

I stumbled upon Lanyrd.com almost by accident: it's a sharp, well-engineered central point for data about upcoming conferences on a dizzying array of topics. Not just software: library science, economics, photography, water management, social media, and medicine were represented in just the one day on which I wrote this post.

I subscribed to about a dozen topics, limiting each to the USA, and Lanyrd immediately suggested 46 events in which I might be interested.

You can connect your Twitter or LinkedIn account with Lanyrd, which seems to offer a way to see when your friends and colleagues will be attending conferences. I wasn't able to confirm that, as the overlap of my LinkedIn account with Lanyrd results in only one person, but I imagine that feature would be more valuable for others.

Dynamically adding custom radio buttons in Android

I've been writing a timesheet tracking app for End Point. In working on various features of this app, I've had more than a few problems to work through, since this project is one of my first on Android and I've never used many of the Android features that I'm now using. One particularly fun bit was setting up a scrollable list of radio buttons with numbers from 0 - 23 (no, we don't often have people working 23 hours on a project in a day, but just in case!) when the user is creating an entry, as a prettier and more backward-compatible alternative to using a number picker to indicate how many hours were spent on this particular job.

In Android, view layouts are usually defined in XML like this:

layout/activity_hour_picker.xml

 <?xml version="1.0" encoding="utf-8"?>  
 <RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"  
   android:layout_width="match_parent"  
   android:layout_height="match_parent" >  
   <Button android:id="@+id/button"  
     android:layout_height="wrap_content"  
     android:layout_width="wrap_content"  
     android:onClick="doStuff"  
     android:text="Hello World!" />  
 </RelativeLayout>  

Simple. But, as you can imagine, not so fun when you're adding a list of 24 buttons - so, I decided to add them dynamically in the code. First, though, a ScrollView and RadioGroup (ScrollView only allows one child) need to be defined in the XML, no point in doing that programmatically. Let's add those:

layout/activity_hour_picker.xml

 <?xml version="1.0" encoding="utf-8"?>  
 <RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"  
    android:layout_width="match_parent"  
    android:layout_height="match_parent" >  
    <HorizontalScrollView   
         android:id="@+id/hour_scroll_view"  
         android:layout_width="match_parent"  
         android:layout_height="wrap_content"  
         android:fillViewport="true"  
         android:scrollbars="none" >  
         <RadioGroup  
             android:id="@+id/hour_radio_group"  
             android:layout_width="wrap_content"  
             android:layout_height="match_parent"  
             android:orientation="horizontal">  
             // This is where our buttons will be  
         </RadioGroup>  
     </HorizontalScrollView>  
 </RelativeLayout>  

Okay. So, now, in our Activity, we need to override onCreate if we haven't already and add the following code:

src/com/example/HourPickerActivity.java

 @Override  
 public void onCreate(Bundle icicle) {  
   super.onCreate(icicle);  
   setContentView(R.layout.activity_hour_picker);  // This adds the views from the XML we wrote earlier
   ViewGroup hourButtonLayout = (ViewGroup) findViewById(R.id.hour_radio_group);  // This is the id of the RadioGroup we defined
   for (int i = 0; i &lt; RANGE_HOURS; i++) {  
     RadioButton button = new RadioButton(this);  
     button.setId(i);  
     button.setText(Integer.toString(i));  
     button.setChecked(i == currentHours); // Only select button with same index as currently selected number of hours  
     button.setBackgroundResource(R.drawable.item_selector); // This is a custom button drawable, defined in XML   
     hourButtonLayout.addView(button);  
   }    
 }  

And this is what we get:

It scrolls horizontally like we want, but there's a problem - we don't want the default radio button selector showing up, since we've already got custom button graphics showing. And we can't forget that we're still missing the code to make everything within the RadioGroup work properly - In other words, the buttons won't do anything when a user clicks them. So, let's add a listener to each button as it's created:

src/com/example/HourPickerActivity.java

 button.setId(i);  
 button.setBackgroundResource(R.drawable.item_selector); // This is a custom button drawable, defined in XML
 button.setOnClickListener(new OnClickListener() {  
         @Override  
         public void onClick(View view) {  
             ((RadioGroup) view.getParent()).check(view.getId());  
             currentHours = view.getId();  
         }  
     });  
 button.setText(Integer.toString(i));

So now the currently selected value is ready in our static variable currentHours for whenever the user is finished. Now we need to get rid of the standard radio button graphics. The solution I found is to use selector XML, with just one item that points to a transparent drawable:

drawable/null_selector.xml

 <?xml version="1.0" encoding="utf-8"?>  
 <selector xmlns:android="http://schemas.android.com/apk/res/android" >  
   <item android:drawable="@android:color/transparent" />  
 </selector>  

Set each button to use it (and to center the text) like this (R.drawable.null_selector is our selector XML):

src/com/example/HourPickerActivity.java

 button.setText(Integer.toString(i));
 button.setGravity(Gravity.CENTER);  
 button.setButtonDrawable(R.drawable.null_selector);  
 button.setChecked(i == currentHours); // Only select button with same index as currently selected number of hours  

Now, let's see how this all has pulled together.

There - that looks much better! And it works great.

Using Modernizr with the Rails Asset Pipeline

Like many web developers, I use Google Chrome to develop my front-end user interface. I do this because the Chrome Developer Tools are very nice at letting me fix CSS styles and debug JavaScript. One downfall to using Google Chrome as my only view of my website while developing, however, is that I often find that I've used some feature of HTML or CSS that isn't supported by other browsers. While this problem seems to come less often these days, I still find the occasional glitch. Most notably, this seems to happen with Microsoft's Internet Explorer (IE) more than any other browser.

During a recent project, I finished up the UI so that everything looked and felt great. Then I popped open IE to see what sort of things I would need to fix. One feature of HTML that I didn't know wasn't supported by IE was that of placeholder text for form inputs. For example:

<input type="text" name="user_name" placeholder="Enter your user name" />

This HTML generates a nice looking input box with the words "Enter your user name" already entered into the input field. When a user clicks on the field, the placeholder text disappears so that the user can begin to add their custom text. This is a popular solution for many web-sites so I was shocked to learn that IE up to version 10 doesn't support this feature. My mind turned to horrid thoughts of writing custom javascript to handle placeholder text in input fields. It didn't take long for me to think, surely someone has to have already fixed this.

Thankfully, someone had.

I found placeholders.js on GitHub. It was a clean solution and it solved my problem by just adding it to my application. Since I'm using Rails 3.2, I just added the Placeholders.js file to the vendor/assets/javascript directory and then added it to the bottom of my app/assets/javascript/application.js manifest file like so:

//= require Placeholders.js

Boom, suddenly all my placeholders were working flawlessly. But I was a little irritated with this solution. Why should all my non-IE users have to load this javascript file? And if I have more issues, there could be many of these little javascript solutions I need for things like gradients and rounded borders. And what about custom CSS that I had to write just for these features to work? Surely there must be a better way.

Enter Modernizr.

From Modernizr's web site, "Modernizr is a JavaScript library that detects HTML5 and CSS3 features in the user’s browser." In short, Modernizer detects if certain features are available and then gives you a way to load custom assets to deal with those features, or the lack thereof. It introduced to me the concept of polyfills. A polyfill is simply some code that helps you deal with browser incompatibilities. The nice JavaScript Placeholders.js file mentioned earlier is a polyfill.

Modernizr does two things to help you detect these compatibilities and deal with them. First, it adds some custom classes to your <html> tag that indicate what features are enabled and disabled in your browser. A <html> tag in IE might look something like this:

<html class="js no-borderradius no-cssgradients">

While in Chrome it would look like this:

<html class="js borderradius cssgradients">

This enables you to put together some CSS code to add extra styles for incompatible browsers. For example, some SASS to deal with a css gradient issues may look like this:

.no-cssgradients gradient {
  background-image: blue url('assets/ie_gradients/blue_gradient.png') no-repeat;
}

This would then load a custom gradient image rather than using the CSS gradient for incompatible browsers.

The second thing Modernizr does for you is let you custom load assets based on compatibility. It does this using the yepnope.js library. You simply load your polyfills based on what compatibility is detected. This is exactly what I needed for my Placeholders.js file. I only wanted it to load for browsers that didn't support the placehoder input attribute. I found a great blog article on how to implement the Placeholders.js library using Modernizr and started following the directions. And here there be dragons.

YepNope.js is essentially an assets loader. You load assets in Modernizr using the Modernizer.load function that wraps the YepNope functionality; like this:

Modernizr.load({
  test: Modernizr.input.placeholder,
  nope: ['Placeholder.js'],
  complete: function(){Placeholders.init();}
});

Well, that doesn't quite sit well with the other asset loader in my app, the Rails asset pipeline. So, to cut to the chase, here's how you get the two to play nicely with each other.

First, place the Modernizr library (modernizr.js in this case) in the vendor/assets/javascript directory.

Second, place the following line at the top of your asset pipeline manifest js file (such as app/assets/javascripts/application.js):

//= require modernizr

Third, DO NOT, put your polyfill in the asset pipeline manifest. If you do, it will load for all browsers thus defeating the whole point. I moved my Placeholders.js file from the vendor directory to the app/assets/javascripts/polyfills directory and made sure that I wasn't loading this director in my manifest with a require_tree directive.

Fourth, add the following line to your config/application.rb if you want the polyfill to be compiled like the rest of your assets:

config.assets.precompile += ['polyfills/Placeholders.js']

Fifth, you are now ready to use Modernizr to load your polyfill. I did so in my JavaScript app using CoffeeScript like so:

# Polyfills
Modernizr.load
  test: Modernizr.input.placeholder
  nope: ['/assets/polyfills/Placeholders.js']
  complete: -> Placeholders.enable() if Placeholders?

Now my Placeholders.js library only loads when the browser doesn't support input placeholders. Hooray!

Note: One thing that tripped me up a little is that you can either use the development Modernizr library or you can build your own with just the functionality you want to detect. This corresponds directly to the test line in the load function options. Make sure if you build your own modernizr.js file that you include all the features you test. In my case, I had to include the "Input Attributes" option.

Dimensional Modeling

People occasionally bring up the question, "What exactly is a data warehouse?" Though answers to this question vary, in short a data warehouse exists to analyze data and its behavior over large swaths of time or many different data sources. There's more to it, though, than simply cramming historical data into the same old database. There are a number of defining characteristics, including the following:

  • Query patterns and behavior
  • Data retention policy
  • Database structure

Query Behavior

Data warehouses are sometimes called "OLAP" databases, which stands for "on-line analytical processing", in contrast to the more common "OLTP", or "on-line transaction processing" databases that manage data for an online storefront, a bug tracker, or a blog. A typical OLTP database supports applications that issue short, simple queries, and expect quick answers and support for many simultaneous transactions. The average OLAP query, by contrast, is generally read-only, but can be quite complex, and might take minutes or hours to complete. Such queries will often include heavy-duty statistical processing and data mining, involving terabytes of data.

Data Retention

In a typical e-commerce database, eventually it becomes helpful to archive away older data that the front-end applications won't need anymore. This helps performance, simplifies backups, and has the nice side-effect of leaving less data available for nefarious black hats that come snooping around. But it doesn't make sense simply to discard much of this data, because it contains valuable information: customer behavior, supplier response time, etc. Often this deleted data remains alive, in a different form perhaps, in a data warehouse, which can contain data spanning many years.

Database structure

Because query patterns in data warehouses differ so much from OLTP databases, it makes sense to structure the OLAP database to support its queries better. OLTP databases typically follow an "entity" model, hence the ubiquitous (if often not particularly useful) entity-relationship diagram. In such a design, tables represent objects stored in the database, such as an order, a user, or a product. OLAP databases, on the other hand, are commonly "dimensionally" modeled, which results in something called a "star schema". In a star schema, the database contains large "fact" tables, full of foreign keys pointing to a set of "dimension" tables; when diagrammed, this looks like star with the fact table in the center, or for the more mechanically oriented, a wheel with the fact table at the hub and dimension tables at the end of each spoke. Rather than modeling a particular entity, the fact table generally describes business processes, such as customer conversion or shipping efficiency.

Dimensional modeling is an interesting topic full of its own rules of thumb, which often differ quite dramatically from typical entity modeling. For instance, whereas many database modelers complain about the use of surrogate keys in OLTP databases, it's recommended in dimensional modeling. Dimensional databases generally don't need OUTER joins, and rarely contain NULL values. As a result, business intelligence applications designed for data warehousing can make certain assumptions about how they'll be asked to query the database. These assumptions obviously place some limits on the types of queries the database can process effectively. However it is through these assumptions that the system gains most of its efficiency.

Biggest among the limitations of an OLAP database is what's called the "grain". The grain describes exactly what information the fact table contains, and at what level of detail; it should be made clear during the first stages of warehouse design and widely understood by all involved. Queries that require information that isn't part of the grain, or at finer levels of detail, must find a different fact table to use. But for queries which depend only on the available data, the fact table can be very efficient, as the database can partition it easily, and scan it unencumbered by simultaneous writes from other transactions, and filtered by simple conditions and INNER joins to the various dimension tables.

Data warehouses differ from the traditional database in several other ways, but this covers some of the basics. Dimensional modeling alone is a well-developed field of study with numerous intricacies, where experience and careful training are important for developing a useful final model. But the analytical power of such databases has been proven.

End Point Europe meeting in Warsaw

Recently End Point's European crew from Italy, Germany, and Poland met up for two full days in Warsaw. We rented an apartment and the four full-time End Pointers (Kamil, Szymon, Lele, and Jon) and two interns (Zed, Phin) worked together there on several projects.

It was great to take advantage of the chance to collaborate in person on several projects, many of which we just got off the ground there:

  • Kamil and Szymon started work on a new AngularJS auto-CRUD admin (like RailsAdmin and the Django admin), which may grow into the Interchange 6 admin. Jon added support to Perl's SQL::Translator (aka SQLFairy) to import/export database schemas in JSON format, which the Angular.js admin app can use.
  • Lele started work on upgrading our Request Tracker instance, which is more complicated than we would like, but needs to be done.
  • Szymon showed us all some details of Debian (and Ubuntu) package building that he's done for Google, and packaged some simple internal scripts that we've only had in RPM and Yum to date.
  • Zed worked more on our Android time tracking app, and Kamil threatened to start an iOS port of it since he's fallen in love with Objective-C.
  • Phin worked more on the Interchange 6 / Perl / Dancer Flower store demo checkout functionality.
  • We started an experiment to use Ansible to manage some server configurations that have been unmanaged to date. While several of us at End Point have experience with Puppet and Chef, only Lele has used Ansible, and it seems it may be a good fit for some cases we have in mind.
  • Server updates: It's very convenient for us to schedule server downtime in Europe's morning while the United States users sleep, so we took advantage of that and got some updates done.

The apartment worked out great. It had a balcony with a nice view, and it gave us an inexpensive place to work and cook most of our food:

Lele volunteered to sleep on the couch since we were one bed short!

A few times we got out to see some of the sights in Warsaw, which is a great city that deserves a proper tourist visit to do it justice.

Too bad we couldn't stay longer!

Rackspace Load Balancers network issues and "desperate" solution

Most people in IT already know the common "Have you tried turning it off and on again?" joke and about all of them also know that sometimes it just works.

In a sense that's what I experienced with Rackspace Load Balancers when after a day of networking troubleshooting, which involved (but was not limited to):

  • iptables rules proof-reading
  • overlapping network mask checks
  • tcpdump network traffic troubleshooting
  • software functionality testing both via localhost and different hosts

I had an enlightenment moment when I realized that while I was waiting for the next desperate solution to pop out some remote areas of my brain I could just remove all nodes from the load balancer (via the web interface) and then add them back, without actually making any other change. Well it turns out that this was the solution that I was looking for after a day of reckless debugging.

So lesson learned: before hurting yourself, try once again the most simple, obvious and possibly silly answers... and just a second before considering the impossible become possible, try "turning it off and on once again".