End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Highlighting Search Pattern Matches in Vim

Vim’s hlsearch option is a commonly-used way to enable visual feedback when searching for patterns in a Vim buffer. When highlighting of search matches is enabled (via :set hlsearch), Vim will add a colored background to all text matching the current search.

Search highlighting is useful but it’s not obvious how to turn off the highlighting when you’re no longer concerned with the results of your last search. This is particularly true if you didn’t enable the hlsearch setting yourself but inherited it from a prebuilt package like Janus or copied someone else’s .vimrc file.

One commonly-used way to clear the highlighting is to search for a garbage string by doing something like /asdfkj in normal mode. This method will clear the search highlights but has the undesired side effect of altering your search history.

A better way to disable search highlighting temporarily is with the :nohlsearch command (which can be abbreviated to :noh). This will clear the highlights temporarily, but they’ll be turned back on the next time you perform a search. Also, you can use the n/N keys to resume your previous search, which isn’t possible if you use the above method of searching for a garbage string.

For more information on highlighting search matches in Vim, check out the Highlight all search pattern matches entry on the Vim Tips Wiki.

Developer Specific Configuration in Ruby on Rails

Here's a quick tip on how to set Rails configurations that you might want for yourself but not for the rest of the team.

I find the default Rails log too noisy when I'm developing because it gives me more information than what I generally need. 90% of the time I only want to see what route is being hit for a request, what controller action responded to the route, and what parameters are being passed to the action. Finding this info with the default Rails log means wading through a jungle of SQL statements, and other things that I'm not interested in. Fortunately, Rails makes it easy to change log levels and the one I prefer is log level "info".

Setting this up however presents a new problem in that I recognize I'm deviating from what's conventional in the Rails world and I only want this configuration for myself and not anyone else working on the project. The typical way to change the log level would be to add a line to the environments/development.rb:
config.log_level = :info

If I do this and then commit the change I've now forced my own eccentricities on everyone else. What I could do instead is simply not commit it but then I create noise in my git workflow by having this unstaged change always sitting in my workspace and if I don't like noisy logs, I don't like dirty git workspaces even more. The solution I've come up with is to create a special directory to hold all my custom configurations and then have git ignore that directory.

  1. Create a directory with a specific name and purpose, I use config/initializers/locals.
  2. Add an entry to .gitignore

    locals/

  3. Add any configurations you want. In my case I created config/initializers/locals/log_level.rb which has the code that will change the log level at start up:

    Rails.logger.level = LOGGER::INFO

As a bonus you can add a "locals" directory anywhere in the application tree where it might be useful, and it will always be ignored. Perhaps you might stick one in app/models/locals where you can add decorators and objects that serve no other purpose than to aid in your local development.

Increasing MySQL 5.5 max_connections on RHEL 5

Busy database-backed websites often hit scalability limits in the database first. In tuning MySQL, one of the first things to look at is the max_connections parameter, which is often too low. (Of course another thing to look at is appropriate fragment caching in your app server, HTTP object caching in your web server, and a CDN in front of it all.)

When using MySQL 5.5 from Oracle's RPMs through cPanel (MySQL55-server-5.5.32-1.cp1136) on RHEL 5.10 x86_64, there is an interesting problem if you try to increase the max_connections setting beyond 214 in /etc/my.cnf. It will silently be ignored, and the limit remains 214:

mysql> show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 214   |
+-----------------+-------+
1 row in set (0.00 sec)

The problem is that the maximum number of open files allowed is too small, by default 1024, to increase max_connections beyond 214.

There are plenty of online guides that explain how to handle this, including increasing the kernel fs.file-max setting, which may be necessary by editing /etc/sysctl.conf, in this example to double the default:

fs.file-max = 2459688

Then run sysctl -p to make the change take immediate effect. (It'll remain after reboot too.)

There are also many guides that say you need to change /etc/security/limits.conf along these lines:

mysql           soft    nofile         4096
mysql           hard    nofile         4096

However, the /etc/security/limits.conf change does not actually work when mysqld is started via the init script in /etc/init.d/mysql or via service mysql restart.

With standard Red Hat mysql-server (5.1) package that provides /etc/init.d/mysqld (not /etc/init.d/mysql as the Oracle and Percona versions do), you could create a file /etc/sysconfig/mysqld containing ulimit -n 4096 and that setting will take effect for each restart of the MySQL daemon.

But the ulimit -n setting hacked into the init script or put into /etc/sysconfig/mysqld isn't really needed after all, because you can simply set open_files_limit in /etc/my.cnf:

[mysqld]
open_files_limit = 8192
max_connections = 1000
# etc.

... and mysqld_safe will increase the ulimit on its own before invoking the actual mysqld daemon.

After service mysql restart you can verify the new open file limit in the running process, like this:

# cat /var/lib/mysql/*.pid
30697
# ps auxww | grep 30697
mysql    30697 97.8  9.8 6031872 1212224 pts/1 Sl   13:09   3:01 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/some.hostname.err --open-files-limit=8192 --pid-file=/var/lib/mysql/some.hostname.pid
# cat /proc/30697/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             96086                96086                processes
Max open files            8192                 8192                 files
Max locked memory         32768                32768                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       96086                96086                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0

And the running MySQL server will reveal the desired max_connections setting stuck this time:

mysql> show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 1000  |
+-----------------+-------+
1 row in set (0.00 sec)

The relevant code in /usr/bin/mysqld_safe is here:

if test -w / -o "$USER" = "root"
then
  # ... [snip] ...
  if test -n "$open_files"
  then
    ulimit -n $open_files
  fi
fi

if test -n "$open_files"
then
  append_arg_to_args "--open-files-limit=$open_files"
fi

I have found that some newer versions of either MySQL55-server or cPanel or some intersection of the two has made manually specifying a higher open_files_limit in /etc/my.cnf no longer necessary, although it does not do any harm.

But in conclusion, if you find yourself hitting the mysterious max_connections = 214 limit, just add the appropriately-sized open_files_limit to the [mysqld] section of /etc/my.cnf and restart the server with service mysql restart, and your problem should be solved!

Building ImageMagick on RHEL/CentOS 6 with Perl 5.18.1

This is a quick tip for anyone in the exact same situation I was recently, and everyone else can probably just skip it!

RHEL 6 and CentOS 6 and other derivatives come with ImageMagick-6.5.4.7-6.el6_2.x86_64, which is a bit dated but still reasonable. They also come with Perl 5.10.1, which has grown very old. We wanted to use the latest version of Perl (5.18.1) with plenv, but the latest version of the Perl libraries for ImageMagick (PerlMagick) does not work with the older ImageMagick 6.5.4.7.

The first task, then, was to locate the matching older version of PerlMagick from BackPAN, the archive of historical CPAN modules: http://backpan.perl.org/authors/id/J/JC/JCRISTY/PerlMagick-6.54.tar.gz, and try to build that.

However, that fails to build without applying a patch to make it compatible with newer versions of Perl. The patch is available from http://trac.imagemagick.org/changeset?format=diff&new=4950, or you can just create a file called typemap in the root of the unpacked directory, with one line:

Image::Magick T_PTROBJ

Then build, test, and install as usual. That's it.

Setting a server role in Salt (comparing Puppet and Salt)

There are many ways to solve a given problem, and this is no truer than with configuration management. Salt (http://www.saltstack.com) is a fairly new tool in the configuration management arena joining the ranks of Puppet, Chef, and others. It has quickly gained and continues to grow in popularity, boasting its scalable architecture and speed. And so with multiple tools and multiple ways to use each tool, it can get a little tricky to know how best to solve your problem.

Recently I've been working with a client to convert their configuration management from Puppet to Salt. This involved reviewing their Puppet configs and designs and more-or-less mapping them to the equivalent for Salt. Most features do convert pretty easily. However, we did run into something that didn't at first- assigning a role to a server.

We wanted to preserve the "feeling" of the configs where possible. In Puppet they had developed and used a convention for using some custom variables in their configs to assign an "environment" and a "role" for each server. These variables were assigned in the node and role manifests. But in Salt we struggled to find a similar way to do that, but here is what we learned.

In Puppet, once a server's "role" and "environment" variables were set, then they could be used in other manifest files to select the proper source for a given config file like so:

    file    {
        "/etc/rsyslog.conf":
            source  =>
            [
                "puppet:///rsyslog/rsyslog.conf.$hostname", 
                "puppet:///rsyslog/rsyslog.conf.$system_environment-$system_role", 
                "puppet:///rsyslog/rsyslog.conf.$system_role", 
                "puppet:///rsyslog/rsyslog.conf.$system_environment", 
                "puppet:///rsyslog/rsyslog.conf"
            ],
            ensure  => present,
            owner   => "root",
            group   => "root",
            mode    => "644"
    }

Puppet will search the list of source files in order and use the first one that exists. For example, if $hostname = 'myniftyhostname' and $system_environment = 'qa' and $system_role = 'sessiondb', then it will use rsyslog.conf.myniftyhostname if it exists on the Puppet master, or if not then use rsyslog.conf.qa-sessiondb if it exists, or if not then rsyslog.conf.sessiondb if it exists, or if not then rsyslog.conf.qa if it exists, or if not then rsyslog.conf.

In Salt, environment is built into the top.sls file, where you match your servers to their respective state file(s), and can be used within state files as {{ env }}. Salt also allows for multiple sources for a managed file to be listed in order and it will use the first one that exists in the same way as Puppet. We were nearly there; however, setting the server role variable was not as straight forward in Salt.

We first looked at using Jinja variables (which is the default templating system for Salt), but soon found that setting a Jinja variable in one state file does not carry over to another state file. Jinja variables remain only in the scope of the file they were created in, at least in Salt.

The next thing we looked at was using Pillar, which is a way to set custom variables from the Salt master to given hosts (or minions). Pillar uses a structure very similar to Salt's top.sls structure- matching a host with its state files. But since the hostnames for this client vary considerably and don't lend themselves to pattern matching easily, this would be cumbersome to manage both the state top.sls file and the Pillar top.sls file and keep them in sync. It would require basically duplicating the list of hosts in two files, which could get out of sync over time.

We asked the salt community on #salt on Freenode.net how they might solve this problem, and the recommended answer was to set a custom grain. Grains are a set of properties for a given host, collected from the host itself- such as, hostname, cpu architecture, cpu model, kernel version, total ram, etc. There are multiple ways to set custom grains, but after some digging we found how to set them from within a state file. This meant that we could do something like this in a "role" state file:

# sessiondb role
# {{ salt['grains.setval']('server_role','sessiondb') }}

include:
  - common
  - postgres

And then within the common/init.sls and postgres/init.sls state files we could use that server_role custom grain in selecting the right source file, like this:

/etc/rsyslog.conf:
  file.managed:
    - source:
      - salt://rsyslog/files/rsyslog.conf.{{ grains['host'] }}
      - salt://rsyslog/files/rsyslog.conf.{{ env }}-{{ grains['server_role'] }}
      - salt://rsyslog/files/rsyslog.conf.{{ grains['server_role'] }}
      - salt://rsyslog/files/rsyslog.conf.{{ env }}
      - salt://rsyslog/files/rsyslog.conf
    - mode: 644
    - user: root
    - group: root

This got us to our desired config structure. But like I said earlier, there are probably many ways to handle this type of problem. This may not even be the best way to handle server roles and environments in Salt, if we were more willing to change the "feeling" of the configs. But given the requirements and feedback form our client, this worked fine.

Database federation performance showdown

Flickr user garryknight

The PostgreSQL Foreign Data Wrapper has gotten a fair bit of attention since its release in PostgreSQL version 9.3. Although it does much the same thing the dblink contrib module has long done, it is simpler to implement for most tasks and reuses the same foreign data wrapper infrastructure employed by several other contrib modules. It allows users to "federate" distinct PostgreSQL databases; that is, it allows them to work in combination as though they were one database. This topic of database federation has interested me for some time -- I wrote about it a couple years ago -- and when postgres_fdw came out I wanted to see how it compared to the solution I used back then.

First, some background. The key sticking point of database federation that I'm focused on is transaction management. Transactions group a series of steps, so either they all complete in proper sequence, or none of them does. While lots of databases, and other technologies like messaging servers, can handle transactions that involve only one service (one database or one messaging server instance, for example), federation aims to allow transactions to span multiple services. If, for instance, given a transaction involving multiple databases, one database fails to commit, all the other databases in the transaction roll back automatically. See my post linked above for a more detailed example and implementation details. In that post I talked about the Bitronix transaction manager, whose job is to coordinate the different databases and other services in a transaction, and make sure they all commit or roll back correctly, even in the face of system failures and other misbehavior. There are other standalone transaction managers available. I used Bitronix simply because a knowledgeable friend recommended it, and it proved sufficient for the testing I had in mind.

So much for introduction. I wanted to see how Bitronix compared to postgres_fdw, and to get started I took the simple sequence of queries used by default by pgbench, and created a test database with pgbench, and then made three identical copies of it (named, of course, athos, porthos, aramis, and dartagnan -- I wasn't energetic enough to include the apostrophe in the name of the fourth database). The plan was to federate athos and porthos with Bitronix, and aramis and dartagnan with postgres_fdw. More precisely, the pgbench test schema consists of a small set of tables representing a simple banking scenario. In its default benchmark, pgbench selects from, inserts into, and updates these tables with a few simple queries, shown below. Like pgbench, my test script replaces identifiers starting with a ":" character with values selected randomly for each iteration.

UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

I decided to configure my test as though pgbench's "accounts" table was in one database, and the "tellers", "branches", and "history" tables were in another. For the Bitronix test I can simply connect to both databases and ignore the tables that aren't applicable, but for testing postgres_fdw I need to set up dartagnan's pgbench_accounts table as a foreign table in the aramis database, like this:

aramis=# drop table pgbench_accounts;
DROP TABLE
aramis=# create server dartagnan foreign data wrapper postgres_fdw options (dbname 'dartagnan');
CREATE SERVER
aramis=# create user mapping for josh server dartagnan options (user 'josh');
CREATE USER MAPPING
aramis=# create foreign table pgbench_accounts (aid integer not null, bid integer, abalance integer, filler character(84)) server dartagnan;
CREATE FOREIGN TABLE

The test script I wrote has two modes: Bitronix mode, and postgres_fdw mode. For each, it repeats the pgbench test queries a fixed number of times, grouping a certain number of these iterations into a single transaction. It then changes the number of iterations per transaction, and repeats the test. In the end, it gave me the following results, which I found very interesting:

The results show that for small transactions, postgres_fdw performs much better. But when the transactions get large, Bitronix catches up and takes the lead. The graph shows a curve that may be part of an interesting trend, but it didn't seem worthwhile to test larger numbers of iterations per single transaction, because the larger transactions in the test are already very large compared to typical real-life workloads. It's difficult to see exactly what's going on in the center of the graph; here's a log rescaling of the data to make it clear what the numbers are up to.

All in all, it doesn't surprise me that postgres_fdw would be faster than Bitronix for small and medium-sized transactions. Being more tightly coupled to PostgreSQL, it has a faster path to get done what it wants to do, and in particular, isn't restricted to using two-phase commit, which is generally considered slow. I was surprised, however, to see that Bitronix managed to catch up for very large transactions.

End Point Partners with A-Zero to Expand Liquid Galaxy Services in South Korea

End Point Corporation continues its global leadership for Liquid Galaxy development and professional services, and has signed a partnership agreement with A-Zero of South Korea to further expand those services for the display platform and its associated professional services.

This partnership promises to be beneficial for the South Korean market. Already, A-Zero has lined up a number of engagements where the Liquid Galaxy could be deployed, bringing the incredible display platform together with the wealth of data resources in one of the most online-savvy countries in the world.

“We look forward to great business opportunities with our new friends at End Point,” said Cho Hyungwan, Director of Business Development for A-Zero. “We can see many uses for the platform here in our market.” A-Zero is a systems integrator and software development company based in Seoul with skills in GIS data manipulation, complex system deployments, and close relations with Google Enterprise partners in the region.

To kick-off this partnership, End Point brought a Liquid Galaxy Express to Seoul to show the platform at a nation-wide GIS conference together with A-Zero. The trade show was a great success, leading to several leads and near constant crowds at the booth.

As the lead agency for development and installation, End Point has brought Liquid Galaxies to over 50 locations around the world, including corporate offices, trade shows, and museum exhibits. End Point has installed numerous Liquid Galaxies in the United States and around the world, including permanent displays in Berlin, Brussels, Hamburg, Jakarta, London, Paris, Mexico City, Monaco, Moscow, Singapore, and Tokyo, and has set up and supported Liquid Galaxies at events in Amsterdam, Berlin, London, Jeju Island, Milan, Munich, Singapore, Sochi, Stockholm, Madrid, Munich and Paris.

Originally developed by Google, the Liquid Galaxy is a multi-screen display for 3D-rendered environments such as Google Earth, Google Street View, panoramic photos, videos, and GIS-data visualizations. End Point developers continue to extend that functionality with new data types, system integrations, visual interfaces for navigation, and content management for the display platform.

End Point is based in New York City, and has been providing technical development and solutions to complex problems for their clients since 1995. With over 35 developers and visualization specialists, End Point is the lead agency for providing turn-key installation, customization, and ongoing support for the Liquid Galaxy platform.

Use Ansible/Jinja2 templates to change file content based on target OS

In the End Point hosting team we really love automating repetitive tasks, especially when it involves remembering many little details which can over time be forgotten, like differences of coreutils location between some versions of Ubuntu (Debian), CentOS (Red Hat) and OpenBSD variants.

In our environment we bind the backup SSH user authorized_keys entry to a custom command in order to have it secured by being, among other aspects, tied to a specific rsync call.

So in our case the content of our CentOS authorized_keys would be something like:

command="/bin/nice -15 /usr/bin/rsync --server --daemon .",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa AAAB3[...]Q== endpoint-backup

Sadly that's only true for CentOS systems so that if you want to automate the distribution of authorized_keys (as we'll show in another post) to different Linux distributions (like Ubuntu) you may need to tweak it to comply to the new standard "/usr/bin" location, which will be eventually adopted by all new Linux versions overtime.. RHEL 7.x onward included.

To do the OS version detection we decided to use an Ansible/Jinja2 template by placing the following line in the Ansible task:

- name: Deploy /root/.ssh/authorized_keys
  template: src=all/root/.ssh/authorized_keys.j2
            dest=/root/.ssh/authorized_keys
            owner=root
            group=root
            mode=0600

And inside the actual file place a slightly modified version of the line above:

command="{% if ansible_os_family != "RedHat" %}/usr{% endif %}/bin/nice -15 /usr/bin/rsync --server --daemon .",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa AAAB3[...]Q== endpoint-backup"

So that if the target OS is not part of the "RedHat" family it will add the "/usr" in front of the "/bin/nice" absolute path.

Easy peasy, ain't it?
Now go out there and exploit this feature to all your needs.

Getting navigation bar to look good in iOS7

Apple has recently released iOS 7 — a major upgrade to its operating system for mobile devices. Whether users like it or not — developers have to treat it seriously. There is nothing worse in the world of technology than being viewed as passé.

From the point of view of users, the new look and feel resembles somewhat the overall movement in the user interface design. The flat UI style is the new hotness nowadays.

On the developers' side of the fence though, this means lots of hard work. Apple has introduced lots of changes so that many iOS apps had to be changed and even redesigned to look acceptable in iOS 7.

Some applications have already dropped support for older versions of iOS

One of them is... Evernote! Its team has decided that supporting older systems would be too costly and that they have decided to dump it. The only way to have the Evernote app is to have it installed before the release of its latest version.

The troublesome navigation bar

One issue I encountered while working on an iOS app lately was that the app started to display oddly. The top bar was overlapping with the contents of the views.

The reason is because now the top bar overlaps with the UI underneath. It applies a blur on whatever there is behind, making apps look a bit more integrated with the OS.

Solution hidden in the XIB designer

If you were using only the designer — you're very lucky. In the latest Xcode, there is an option to set deltas for UI element positions.

The UI delta is nothing more than a value that a particular measurement should be modified by if the app is being run on an iOS version lower than 7.

So keeping in mind that the top bar with the navigation view buttons area take 64 points of height — you have to provide -64 as y delta value. So that the UI in the designer looks great and it will also look nicely on a pre iOS 7 device.

What about views consisting purely of code?

In my case, I had to resort to some workarounds in the code. Most views in the application I was working on were created dynamically. There were no *.xib files to edit with the editor, hence — no way to set those deltas.

The easiest solution I found was to just edit view's frame values. Making the y of the frame at 64 points.

CGRect frame = tableViewController.view.frame;
frame.origin.y = 64;
tableViewController.view.frame = frame;

Supporting older versions in code

The last step was to simulate the behavior of the designer and allow the code to apply changes based on the iOS system version on which the app is currently being executed:

float topValue = 0;
if([[[UIDevice currentDevice] systemVersion] floatValue] >= 7.0f)
{
    topValue = 64;
}
CGRect frame = tableViewController.view.frame;
frame.origin.x = 0;
frame.origin.y = topValue;
tableViewController.view.frame = frame;

More to read

http://www.fastcolabs.com/3016423/open-company/developing-for-ios-7-the-good-the-bad-the-flat-and-the-ugly
https://developer.apple.com/library/ios/documentation/userexperience/conceptual/transitionguide/Bars.html

3 common misconceptions about Ruby on Rails

1) Rails is easy to learn.

Rails has a steep learning curve and that's never going away. From the beginning it was conceived of as an advanced tool for experienced web developers and having a low cost of entry for beginners is not part of the Rails way. Rails makes web development faster by depending on conventions that the developer needs to learn and utilize. There are a lot of these conventions and for a beginner the cost of learning them will be extremely low throughput initially. However, the investment pays off much later down the road when the throughput skyrockets due to familiarity with the framework. Another overlooked thing that makes Rails development hard to learn is that there is a lot more to learn than just Ruby and Rails, there's also things like source code management, databases, front end servers, testing and while these subjects are common to web development in general, you'll still need to learn the "Rails way" to approaching these satellite concerns that make up a typical production application. Brook Riggio created this fantastic chart that shows all the subjects that a Rails dev needs to be familiar with in the excellent article "Why Learning Rails is Hard", one look at the sheer number of items should illustrate why it takes a long time to learn Rails.

2) Rails is equally suited for all types of web apps.

To the extent that your web project veers away from being a dynamic website backed by a relational database with a page-centric design, or to put it another way to the extent that your app is different than Basecamp by 37signals is the extent to which you will be moving out of the sweet spot for developer productivity with Rails. What distinguishes Ruby on Rails from other web frameworks is that it was initially created by extracting it from a real world application rather than developing the framework first. The real world application it was extracted from is Basecamp by 37signals and even to this day the continued development of new Rails features is driven primarily by 37signals' need to "solve our own problems." While the web app landscape is evolving to include JavaScript layers that open up the possibilities for new client side architectures, it should be noted that Rails itself is optimized for building HTML pages and displaying them and any other type of architecture that works with Rails is working orthogonal to this goal.

3) Rails means you are working with Ruby and Ruby is slow.

Twitter is still sometimes cited as Ruby on Rails' biggest failure. Twitter was originally developed in Rails and shortly after its creation Twitter went through a phase where rapid growth of the site caused a major implosion of the site's ability to support the growth and the site was often down during a time that was known as the "Fail Whale" era. The solution to fixing Twitter's massive scaling problems was to strip out Rails as the engine of the site and introduce a massive messaging infrastructure made up of a multitude of different technologies. Rails was still a part of the infrastructure but only in a very superficial way serving up the web pages for the Twitter web client. This led some to the idea that Twitter was proof that Rails can't scale and anything "serious" should be done in a different technology. What the Twitter naysayers missed understanding is that the massive messaging behemoth that is Twitter was way outside of what Rails is designed to handle. More importantly however is that the ethos of the Ruby on Rails community is the "right tool for the right job", and that includes using non-Ruby and non-Rails technologies to support whatever solution you are after. If Java makes it go faster, then use Java. There's no reason why it has to be in Ruby, and while you might choose to use a Java search engine rather than a Ruby one chances are that Ruby still can act as an excellent "glue language" connecting your Java search engine to other technologies much like the Twitter messaging infrastructure was laid below Rails displaying pages in the web client. Rails devs should be comfortable using things outside of Ruby. A final comment on the Twitter controversy is that it's not likely Twitter would have ever come into existence without Rails because the low investment needed to try out an idea is precisely how Twitter came to be, and that in itself should be an example of one of Ruby on Rails' greatest successes.

New Kamelopard version

I recently pushed new Kamelopard version (v0.0.14), and thought I should briefly mention it here. This release includes a few bug fixes, including one that fatally affected several v0.0.13 installations, but its major improvement is a greatly expanded test suite. For quite some time many Kamelopard functions have had only placeholder tests, marked as "pending" in the code, or no test at all. In particular, this includes many of the more complex (or in other words, difficult to test) functions. Version 0.0.14 added 35 new tests, including for the frequently used orbit() function as well as for the relatively new multidimensional function logic.

The typical Kamelopard test creates a Kamelopard object, test that it responds to the right set of methods, renders it to KML, and finally inspects the result for correctness. This can quickly become complicated, as some KML objects can take many different forms. Here are a few selections from one of the new tests, as an example. This is for the ColorStyle object, which is an abstract class handling part of the options in other style objects.

This first section indicates that this test includes several other tests, defined elsewhere. Many objects in Kamelopard descend from Kamelopard::Object, for instance, this is far from the only test that refers to its behaviors.

shared_examples_for 'Kamelopard::ColorStyle' do
    it_should_behave_like 'Kamelopard::Object'
    it_should_behave_like 'KML_includes_id'
    it_should_behave_like 'KML_producer'

The KML spec defines a limited set of "color modes" allowed in a valid ColorStyle object, so we'll test the code that validates these modes, here.

it 'should accept only valid color modes' do
        @o.colorMode = :normal
        @o.colorMode = :random
        begin
            @o.colorMode = :something_wrong
        rescue RuntimeError => f
            q = f.to_s
        end
        q.should =~ /colorMode must be either/
    end

KML asks for its color constants in a different order than I'm used to. HTML asks for three byte color constants, with one byte each for red, green, and blue values, in that order. OpenGL's glColor function variants expect their arguments red first, then green, and then blue, with an optional alpha value at the end. So I sometimes get confused when KML wants alpha values first, then blue, then green, and finally red. Fortunately Kamelopard's ColorStyle object lets you set color and alpha values independently, and can sort out the proper order for you. This test verifies that behavior.

it 'should get settings in the right order' do
        @o.alpha = 'de'
        @o.blue = 'ad'
        @o.green = 'be'
        @o.red = 'ef'
        @o.color.should == 'deadbeef'
    end

Finally, this last segment renders the ColorStyle to KML and tests its validity. This particular test uses a helper function called get_obj_child_content(), defined elsewhere, because its particular XML parsing requirements are very common, but many of these tests which require more complex parsing make heavy use of XPath expressions to test the XML documents Kamelopard produces.

it 'should do its KML right' do
        color = 'abcdefab'
        colorMode = :random
        @o.color = color
        @o.colorMode = colorMode
        get_obj_child_content(@o, 'color').should == color
        get_obj_child_content(@o, 'colorMode').should == colorMode.to_s
    end

This new Kamelopard version also includes the beginnings of what could be a very useful feature. The idea is that Kamelopard objects should be able to create themselves from their KML representation. So, for instance, you could provide some Kamelopard function with a KML file, and it could create a Kamelopard representation which you can then process further. We already support each_placemark(), which iterates through each placemark in a KML document and returns the data therein, but this would expand that ability. Right now we're far from having all Kamelopard objects support parsing themselves from KML, but when it's completed it will open up interesting possibilities. For instance, it was originally conceived as a way to take a pre-existing tour and make a multicam version automatically. This, too, is still some way off.

Python decorator basics

Python decorators have been around since 2005, when they were included in the release of Python 2.4.1. A decorator is nothing more than syntax for passing a function to another function, or wrapping functions. Best put, a decorator is a function that takes a function as an argument and returns either the same function or some new callable. For example,

@foo
@bar
@baz
@qux
def f():
    pass

is shorthand for:

def f():
    pass
f = foo(bar(baz(qux(f))))

Say we have some functions we are debugging by printing out debug comments:

def mul(x, y):
    print __name__
    return x*y

def div(x, y):
    print __name__
    return x/y

The printing in the functions can be extracted out into a decorator like so:

def debug(f):            # debug decorator takes function f as parameter
    msg = f.__name__     # debug message to print later
    def wrapper(*args):  # wrapper function takes function f's parameters
        print msg        # print debug message
        return f(*args)  # call to original function
    return wrapper       # return the wrapper function, without calling it

Our functions get decorated with:

@debug
def mul(x, y):
    return x*y

@debug
def div(x, y):
    return x/y

Which again is just shorthand for:

def mul(x, y):
    return x*y
mul = debug(mul)

def div(x, y):
    return x/y
div = debug(div)

Looking at the definition of the debug function we see that debug(mul) returns wrapper, which becomes the new mul. When we now call mul(5, 2) we are really calling wrapper(5, 2). But how do subsequent calls to wrapper have access to the initial f parameter passed to debug and to the msg variable defined in debug? Closures. Taken from aaronasterling's response to this stackoverflow question, "A closure occurs when a function has access to a local variable from an enclosing scope that has finished its execution." You can read more about closures here, here, and here. So, at the moment that mul is decorated, debug(mul) is executed returning wrapper, which has access to the original mul function and to the msg variable, which is then set as the new mul.

By decorating, we remove code duplication and if the need to ever change the debug logic arises, we only need to do so in one place. Now, decorators with (non-optional) arguments get a bit trickier, but only because the syntax is a bit hard to grasp at first sight. Say that we want to pass the debug message as a parameter to the decorator like so:

@debug("Let's multiply!")
def mul(x, y)
    return x*y

Then the debug decorator would be:

def debug(msg):
    def actual_decorator(f):    # from here to
        def wrapper(*args):     # ...
            print msg           # ...
            return f(*args)     # ...
        return wrapper          # here, looks just like our debug decorator from above!
    return actual_decorator

A decorator with arguments should return a function that takes a function as an argument and returns either the same function or some new callable (what a mouthful, eh?). In other words, a decorator with arguments returns a decorator without arguments.

Looking at what the decorator syntax is shorthand for we can follow along as debug gets executed:

mul = debug("Let's multiply")(mul)

The debug function returns actual_decorator, to which we pass the mul function as the parameter, which then returns wrapper. So again, mul becomes wrapper which has access to msg and f because of closure.

What about decorators with optional arguments? That I'll leave for a future blog post :)

Managing Multiple Hosts and SSH Identities with OpenSSH

When I started working at End Point I was faced with the prospect of having multiple SSH identities for the first time. I had historically used an RSA SSH key with the default length of 2048 bits, but for my work at End Point I needed to generate a new key that was 4096 bits long.

Although I could have used ssh-copy-id to copy my new SSH public key to all of my old servers, I liked the idea of maintaining separate "personal" and "work" identities and decided to look for a way to automatically use the right key based on the server I was trying to connect to.

For the first few days I was specifying my new identity on the command line using:
ssh -i .ssh/endpoint_rsa patrick@server.example.com

That worked, but I often forgot to specify my new SSH identity when connecting to a server, only realizing my mistake when I was prompted for a password instead of being authenticated automatically.

Host Definitions


I had previously learned the value of creating an ssh_config file when I replaced a series of command-line aliases with equivalent entries in the SSH config file.

Instead of creating aliases in my shell:

alias server1='ssh -p 2222 -L 3389:192.168.1.99:3389 patrick@server1.example.com'

I learned that I could add an equivalent entry to my ~/.ssh/config file:

Host server1
  HostName server1.example.com
  Port 2222
  User patrick
  LocalForward 3389 192.168.1.99:3389

Then, to connect to that server, all I needed to do was run ssh server1 and all of the configuration details would be pulled in from the SSH config file. Replacing my series of shell aliases with Host definitions had the added benefit of automatically carrying over to other tools like git and mosh which read the same configuration.

Switching Identities Automatically


There's an easy solution to managing multiple SSH identities if you only use one identity per server; use ssh-add to store all of your keys in the SSH authentication agent. For example, I used ssh-add ~/.ssh/endpoint_rsa to add my new key, and ssh-add -l to verify that it was showing up in the list of known keys. After adding all of your keys to the agent, it will automatically try them in order for SSH connections until it finds one that authenticates successfully.

Manually Defining Identities


If you need more control over which identity an SSH session is using, the IdentityFile option in ssh_config lets you specify which key will be used to authenticate. Here's an example:

Host server2
  HostName server2.example.com
  User patrick
  IdentityFile ~/.ssh/endpoint_rsa

This usage is particularly helpful when you have a server that accepts more than one of your identities and you need to control which one should be used.

An update to the email_verifier gem has been released

I have just released a newer version of the email_verifier gem. It now supports Rails localization out of the box.
If you are one of gem's users — you can safely update your Bundles!

With this release, the project has also gained its first contributors. Some time ago, one of gem's users — Franscesco Gnarra has asked me about a possibility of having validation messages localized. While I was planning to do it myself — Francesco took an action and contributed to the project himself.

The project also received a pull request from another Rails developer — Maciej Walusiak. His commit provided a way to handle Dnsruby::NXDomain exceptions.

It's great to see that our work is helpful for others. If you'd like to contribute yourself — I invite you to do so.

Email verifier at GitHub: https://github.com/kamilc/email_verifier