End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Connected to PgBouncer or Postgres?

Determining if your current database connection is using PgBouncer, or going directly to Postgres itself, can be challenging, as PgBouncer is a very low-level, transparent interface. It is possible, and here are some detection methods you can use.

This was inspired by someone asking on the Perl DBD IRC channel if it was possible to easily tell if your current database handle (usually "$dbh") is connected to PgBouncer or not. Since I've seen this question asked in other venues, I decided to take a crack at it.

There are actually two questions to be answered: (1) are we connected to PgBouncer, and if so, (2) what pool_mode is being run? The quickest and easiest way I found to answer the first question is to try and connect to a non-existent database. Normally, this is a FATAL message, as seen here:

$ psql testdb -p 5432
testdb=# \c ghostdb
FATAL:  database "ghostdb" does not exist
Previous connection kept
testdb=# 

However, a slightly different ERROR message is returned if the same thing is attempted while connected to PgBouncer:

$ psql testdb -p 6432
testdb=# \c ghostdb
ERROR:  No such database: ghostdb
Previous connection kept
testdb=# 

Thus, an ERROR will always indicate that you are connected to PgBouncer and not directly to Postgres, which will always issue a FATAL.

In the future, there will be an even simpler method. As of this writing, pgBouncer 1.6 has not been released, but it will have the ability to customize the application_name. This is a configurable session-level variable that is fairly new in Postgres. Andrew Dunstan wrote a patch which enables adding this to your pgbouncer.ini file:

application_name_add_host = 1

This will make PgBouncer modify the application_name to append some information to it such as the remote host, the remote port, and the local port. This is a feature many PgBouncer users will appreciate, as it offers an escape from the black hole of connection information that PgBouncer suffers from. Here is what it looks like on both a normal Postgres connection, and a PgBouncer connection. As you can see, this is an easier check than the "invalid database connection" check above:

## Postgres:
$ psql testdb -p 5432 -c 'show application_name'
 application_name 
------------------
 psql

## PgBouncer:
$ psql testdb -p 6432 -c 'show application_name'
        application_name        
--------------------------------
 psql - unix(7882@gtsm.com):6432

## DBD::Pg connections to PgBouncer get a very similar change:
$ perl testme.tmp.pl --port 6432
app - unix(6772@gtsm.com):6432

Now we have answered question of "are we connected to PgBouncer or not?". The next question is which pool mode we are in. There are three pool modes you can set for PgBouncer, which controls when your particular connection is returned to "the pool". For "session" mode, you keep the same Postgres backend the entire time you are connected. For "transaction", you keep the same Postgres backend until the end of a transaction. For "statement", you may get a new Postgres backend after each statement.

First, we can check if we are connected to PgBouncer in a statement level pool mode by taking advantage of the fact that multi-statement transactions are prohibited. PgBouncer enforces this by intercepting any attempts to enter a transaction (e.g. by issuing a BEGIN command). A very PgBouncer specific error about "Long transactions not allowed" is issued back to the client like so:

$ psql testdb -p 6432
testdb=# begin;
ERROR:  Long transactions not allowed

So, that takes care of detecting a pool_mode set to 'statement'. The other two modes, transaction and session, will *not* give the same error. Thus, seeing that error indicates you are using a statement-level PgBouncer connection.

The next pool mode is "transaction", which means that the server connection if released back to the pool at the end of a transaction. To figure out if we are in this mode, we take advantage of the fact that PgBouncer can be set to clean up the connection at the end of each transaction by issuing a specific command. By default, the command set by server_reset_query is DISCARD ALL, which invalidates any prepared statements, temporary tables, and other transation-spanning, session-level items. Thus, our test will see if these session-level artifacts get discarded or not:

## Direct Postgres:
$ psql testdb -p 5432
testdb=# prepare abc(int) as select $1::text;
PREPARE
testdb=# execute abc(1);
 text
------
 1

## PgBouncer:
$ psql testdb -p 6432
testdb=# prepare abc(int) as select $1::text;
PREPARE
testdb=# execute abc(1);
ERROR:  prepared statement "abc" does not exist

Keep in mind that there are no true "transactionless" commands in Postgres. Even though we did not use a BEGIN in the psql prompt above, each command is treated as its own mini-transaction. In the case of the PgBouncer connection, the prepare is immediately followed with a DISCARD ALL, which means that our prepared statement no longer exists. Hence, we have determined that we are using a transaction-level PgBouncer connection.

Unfortunately, not getting an error does not necessarily mean your PgBouncer is NOT in transaction mode! It could be that server_reset_query is empty, meaning that temporary artifacts are not discarded at the end of the transaction. In such a case, we can take advantage of the fact that PgBouncer will allow other clients to share in our current connection, and thus be able to see the temporary items. If we create a temporary table in one pgbouncer connection, then connect again as a new client, the temporary table will only show up if we are sharing sessions but not transactions. Easier shown than explained, I suspect:

## Regular Postgres gets a fresh session:
$ psql test1 -p 5432
test1=# create temp table abc(a int);
CREATE TABLE
test1=# select * from abc;
(No rows)
test1=# ^Z ## (we suspend with CTRL-Z)
[2]+  Stopped                 psql test1 -p 5432

$ psql test1 -p 5432
test1=# select * from abc;
ERROR:  relation "abc" does not exist

## PgBouncer will re-use the same session:
$ psql test1 -p 6432
test1=# create temp table abc(a int);
CREATE TABLE
test1=# select * from abc;
(No rows)
test1=# ^Z
[2]+  Stopped                 psql test1 -p 6432

$ psql test1 -p 6432
test1=# select * from abc;
(No rows)

The final PgBouncer pool mode is "session", and basically means the only advantage over a normal Postgres connection is the overhead to start up and connect to a new Postgres backend. Thus, the PgBouncer connections are only returned to the pool upon disconnection. The only way to tell if you are in this mode is by determining that you are *not* in the other two modes. :)

So, although PgBouncer is extremely transparent, there are some tricks to determine if you are connected to it, and at what pool_mode. If you can think of other (SQL-level!) ways to check, please let me know in the comments section.

Job opening: Web developer

We are looking for another talented software developer to consult with our clients and develop their web applications in AngularJS, Node.js, Ruby on Rails, and other technologies. If you like to focus on solving business problems and can take responsibility for getting a job done well without intensive oversight, please read on!

What is in it for you?

  • Flexible full-time work hours
  • Health insurance benefit
  • Paid holidays and vacation
  • 401(k) retirement savings plan (U.S. employees)
  • Annual bonus opportunity
  • Ability to move without being tied to your job location

What you will be doing:

  • Work from your home office, or from our offices in New York City and the Tennessee Tri-Cities area
  • Consult with clients to determine their web application needs
  • Build, test, release, and maintain web applications for our clients
  • Work with open source tools and contribute back as opportunity arises
  • Use your desktop platform of choice: Linux, Mac OS X, Windows
  • Learn and put to use new technologies
  • Direct much of your own work

What you will need:

  • Professional experience building reliable server-side apps in Ruby on Rails, Node.js and Express, Django, CakePHP, etc.
  • Good front-end web skills with responsive design using HTML, CSS, and JavaScript, including jQuery, AngularJS, Backbone.js, Ember.js, etc.
  • Experience with databases such as PostgreSQL, MySQL, MongoDB, CouchDB, Redis, Elasticsearch, etc.
  • A focus on needs of our clients and their users
  • Strong verbal and written communication skills

About us

End Point is a 19-year-old web consulting company based in New York City, with 39 full-time employees working mostly remotely from home offices. We are experts in web development, database, and DevOps, collaborating using ssh, Screen/tmux, IRC, Google+ Hangouts, Skype, and good old phones.

We serve over 200 clients ranging from small family businesses to large corporations. We use free software frameworks in a variety of languages including JavaScript, Ruby, Python, Perl, PHP, and Java, tracked by Git, running on Linux.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of gender, race, religion, color, national origin, sexual orientation, age, marital status, veteran status, or disability status.

How to apply

Please email us an introduction to jobs@endpoint.com to apply. Include a resume, your GitHub or LinkedIn URLs, or whatever else that would help us get to know you. We look forward to hearing from you! Full-time employment seekers only, please. No agencies or subcontractors.

Non-English Google Earth Layers on the Liquid Galaxy

The availability to activate layers within Google Earth is one of the things that makes Earth so powerful. In fact, there are many standard layers that are built into Earth, including weather, roads, place names, etc. There are also some additional layers that have some really interesting information, including one I noticed relatively recently called "Appalachian Mountaintop Removal" which is interesting to me now that I live in Tennessee.


As you can see, however, that while some of these available layers are interesting on a desktop, they're not necessarily very visually appealing on a Liquid Galaxy. We have identified a standard set of layers to enable and disable within Earth so that things don't appear too cluttered while running. Some things we've disabled by default are the weather and the roads, as well as many levels of place names and boundaries. For example, we have boundaries of countries and water bodies enabled, but don't want lines drawn for states, provinces, counties, or other areas such as those.

To disable these layers, we modify the GECommonSettings.conf file on the machines that running Earth. This file has everything pretty well spelled out in a readable manner, and it's fairly easy to determine what layers you're enabling or disabling.

Here's an example of some of the entries contained within the GECommonSettings.conf file:
1st%20Level%20Admin%20Names%20%28States_Provinces%29=false
2nd%20Level%20Admin%20Regions%20%28Counties%29=false
Islands=false
http%3A__mw1.google.com_mw-earth-vectordb_geographic_features_en.kml\Geographic%20Features=false
Water%20Bodies=true
Places%20=false
Panoramio=false
360%20Cities=false
Photorealistic%20=true
Trees=true
Populated%20Places=true
Roads=false
Gray%20=false
http%3A__mw1.google.com_mw-weather_base_files_kml_weather_en.kmz\Clouds=true
http%3A__mw1.google.com_mw-weather_base_files_kml_weather_en.kmz\Radar=true
http%3A__mw1.google.com_mw-weather_base_files_kml_weather_en.kmz\Conditions%20and%20Forecasts=true
http%3A__mw1.google.com_mw-weather_base_files_kml_weather_en.kmz\Information=true
http%3A__mw1.google.com_mw-earth-vectordb_gallery_layers_gallery_root_en.kmz\360Cities=false

See, that's pretty self-explanatory, isn't it?

Well, it is until you start introducing other languages to the mix. For example, we needed to run Earth in Korean language mode for one of our clients. Once we fired up Earth and specified that the language would be Korean, Earth came up with all of the default layers turned on. All of those layers that we'd disabled in English Earth stayed on and cluttered the displays with so many icons for each and every layer.

It took some trial and error, but I was eventually able to figure out what to do to resolve this. I loaded Google Earth in Korean mode on my Ubuntu VM, made my changes to the selected layers via the Layers selection area within Earth, then quit. When I looked at the GECommonSettings.conf file after quitting Earth, I found a bunch of new line items added to the file. It seems that each of the options had new lines, though I couldn't exactly decipher which lines controlled which options.

Here's an example of some of the new entries that are now contained within my GECommonSettings.conf file:
http%3A__kh.google.com%3A80_\%UAD6D%UACBD=false
http%3A__kh.google.com%3A80_\%UAD6D%UAC00%UBA85=false
http%3A__kh.google.com%3A80_\%UD574%UC548%UC120=false
http%3A__kh.google.com%3A80_\1%UCC28%20%UD589%UC815%20%UACBD%UACC4%UC120%28%UC8FC_%UB3C4%29=false
http%3A__kh.google.com%3A80_\2%UCC28%20%UD589%UC815%UB2E8%UC704%20%UC9C0%UC5ED%28%UAD70%29=false
http%3A__kh.google.com%3A80_\%UC778%UAD6C%UBC00%UC9D1%UC9C0%UC5ED=true
http%3A__kh.google.com%3A80_\%UC12C=false

Now, I'll be honest and say that I don't have clue exactly what %UAD6D%UACBD and 1%UCC28%20%UD589%UC815%20%UACBD%UACC4%UC120%28%UC8FC_%UB3C4%29 mean, but I really don't have to know. All I know is that they got disabled when I disabled the undesired layers within Earth. I then copied these lines to the configuration on my Liquid Galaxy, and the next time I fired it up in Korean, the layers were no longer cluttering up the displays.

I was able to use this exact same method to determine which layers to enable or disable for one of our Spanish-language clients, as well.

Raw Packet Manipulation with Scapy

Installation

Scapy is a Python-based packet manipulation tool which has a number of useful features for those looking to perform raw TCP/IP requests and analysis. To get Scapy installed in your environment the best options are to either build from the distributed zip of the current version, or there are also some pre-built packages for Red Hat and Debian derived linux OS.

Using Scapy

When getting started with Scapy, it's useful to start to understand how all the aspects of the connection get encapsulated into the Python syntax. Here is an example of creating a simple IP request:

Welcome to Scapy (2.2.0)
>>> a=IP(ttl=10)
>>> a
<IP  ttl=10 |>
>>> a.dst="10.1.0.1"
>>> a
<IP  ttl=10 dst=10.1.0.1 |>
>>> a.src
'10.1.0.2'
>>> a.ttl
10

In this case I created a single request which was point from one host on my network to the default gateway on the same network. Scapy will allow the capability to create any TCP/IP request in raw form. There are a huge number of possible options for Scapy that can be applied, as well as huge number of possible packet types defined. The documentation with these options and packet types is available on the main site for Scapy.

Creating custom scripts with Scapy

Using Scapy within Python rather than as a standalone application would allow for creating more complex packets, sending them, and then parsing the response that is given. Here is a simple tester script example in which I will initiate a HTTP 1.1 request:

#! /usr/bin/env python
import logging
logging.getLogger("scapy").setLevel(1)

from scapy.all import *

def make_test(x,y):
    request = "GET / HTTP/1.1\r\nHost: " + y  + "\r\n"
    p = IP(dst=x)/TCP()/request
    out = sr1(p)
    if out:
        out.show()
if __name__ == "__main__":
    interact(mydict=globals(), mybanner="Scapy HTTP Tester")

Within this script there is the make_test function which takes as parameters the destination address and host header string respectively. The script will attempt to send the HTTP GET request to that address with the proper Host header set. If the request is successful, it will print out the details of the response packet. It would also be possible to perform more complex analysis of this response packet using the built in psdump and pdfdump functions which will create a human readable analysis of the packet in PostScript and PDF respectively.

Welcome to Scapy (2.2.0)
Scapy HTTP Tester
>>> make_test("www.google.com","google.com")
Begin emission:
...Finished to send 1 packets.
.*
Received 5 packets, got 1 answers, remaining 0 packets
###[ IP ]###
  version= 4L
  ihl= 5L
  tos= 0x20
  len= 56
  id= 64670
  flags=
  frag= 0L
  ttl= 42
  proto= tcp
  chksum= 0x231b
  src= 74.125.28.103
  dst= 10.1.0.2
  \options\
###[ TCP ]###
     sport= http
     dport= ftp_data
     seq= 1130043850
     ack= 1
     dataofs= 9L
     reserved= 0L
     flags= SA
     window= 42900
     chksum= 0x8c7e
     urgptr= 0
     options= [('MSS', 1430), (254, '\xf9\x89\xce\x04bm\x13\xd3)\xc8')]
>>>

Conclusions

Scapy is a powerful tool, if a bit daunting in syntax initially. Creating raw TCP/IP packets systematically will probably challenge most people's understanding of the TCP/IP stack (it certainly did mine!) but exposing this level of configuration has serious advantages. Full control of the requests and responses as well as ability to add custom Python logic allows Scapy to become a packet foundry which you can use for things like unit testing of web applications, verification of state of an unknown network, etc. I will definitely be using Scapy in the future when performing raw HTTP testing of web applications.

RailsConf 2015 for the non-Attendee

This blog post is really for myself. Because I had the unique experience of bringing a baby to a conference, I made an extra effort to talk to other attendees about what sessions shouldn't be missed. Here are the top takeaways from the conference that I recommend (in no particular order):

Right now, the videos are all unedited from the Confreaks live stream of the keynote/main room, and I'll update as the remaining videos become available.

A Message From the Sponsors

My husband: My conferences never have giveaways that cool.
Me: That's because you work in academia.

You can read the full list of sponsors here, but I wanted to comment further:

Hired was the diamond sponsor this year and they ran a ping pong tournament. The winner received $2000, and runners up received $1000, $500, $250. The final match was heavily attended and competitive! Practice up for next year?

Engine Yard also put on a really fun scavenger hunt using Scavify. Since I couldn't attend the multiple parties going on at night, this was a really fun way to participate and socialize.

Protect Interchange Passwords with Bcrypt

Interchange default configurations have not done a good job of keeping up with the best available password security for its user accounts. Typically, there are two account profiles associated with a standard Interchange installation: one for admin users (access table) where the password is stored using Perl's crypt() command (bad); and one for customers (userdb) where the password isn't encrypted at all (even worse). Other hashing algorithms have long been available (MD5, salted MD5, SHA1) but are not used by default and have for some time not been useful protection. Part of this is convenience (tools for retrieving passwords and ability to distribute links into user assets) and part is inertia. And no small part was the absence of a strong cryptographic option for password storage until the addition of Bcrypt to the account management module.

The challenge we face in protecting passwords is that hardware continues to advance at a rapid rate, and with more computational power and storage capacity, brute-force attacks become increasingly effective and widely available. Jeff Jarmoc's Enough with the Salts provides some excellent discussion and background on the subject. To counter the changing landscape, the main line of defense moves toward ensuring that the work required to create and test a given stored password is too expensive, too time-consuming, for brute-force attacks to be profitable.

One of the best options for handling encrypted password storage with a configurable "hardware cost" is Bcrypt. We chose to integrate Bcrypt into Interchange over other options primarily because of its long history of operation with no known exploits, and its cost parameter that allows an administrator to advance the work required to process a password slowly over time as hardware continues to increase in efficiency. The cost feature introduces an exponential increase in calculation iterations (i.e., required processing power and time) as powers of 2, from 1 (2 iterations, essentially no cost) to 31 (2^31, or 2,147,483,648 iterations). Ideally an administrator would want to identify a cost that causes no perceptible penalty to end users, but would be such a burden to any brute-force attack as to have no worthwhile return on investment to crack.

Converting an existing user base from any of the existing encryption schemes to Bcrypt is trivial in Interchange. The existing UserDB profile is changed to the "bcrypt" option and the "promote" boolean set to true. Promote allows your users to continue to validate against their existing stored password, but after the next access will upgrade their storage to the Bcrypt password. In the mean time, a backend process could be developed using the construct_bcrypt() routine in Vend::UserDB to update all outstanding accounts prior to being updated organically.

If the switch on the front end involves going from no encryption to any encrypted storage, including Bcrypt, and your front end uses the default tools for retrieving lost passwords, you'll also need to construct some new code for resetting passwords instead. There is no such facility for the admin, and since the admin accounts are typically far more valuable than the front end accounts, making the change for the admin should be the first priority and have the least effort involved.

Switching accounts to Bcrypt password storage is a simple, effective means for increasing protection on your users' and business' information. Every bit as importantly, it also helps protect your business' reputation, that can be severely damaged by a data breech. Lastly, in particular for your admin accounts, Bcrypt password storage is useful in meeting PCI DSS requirements for strong password hashing.

How to Bring a Baby to a Tech Conference

Last week, I brought my 4 month old to RailsConf. In a game-day decision, rather than drag a two year old and husband along on the ~5 hour drive and send the dogs to boarding, we decided it would ultimately be easier on everyone (except maybe me) if I attended the conference with the baby, especially since a good amount of the conference would be live-streamed.


Daily morning photos at the conference.

While I was there, I was asked often how it was bringing a baby to a conference, so I decided to write a blog post. As with all parenting advice, the circumstances are a strong factor in how the experience turned out. RailsConf is a casual three-day multi-track tech conference with many breaks and social events – it's as much about socialization as it is about technical know-how. This is not my first baby and not my first time at RailsConf, so I had some idea of what I might be getting into. Minus a few minor squeaks, baby Skardal was sleeping or sitting happily throughout the conference.

Here's what I [qualitatively] perceived to be the reaction of others attending the conference to baby Skardal:

In list form:

  • Didn't Notice: Probably about 50% didn't notice I had a baby, especially when she was sleeping soundly in the carrier.
  • Stares: Around 50% may have stared. See below for more.
  • Joke: A really small percentage of people made some variation of the joke "Starting her early, eh?"
  • Conversation: An equally small percentage of people started a conversation with me, which often led to more technical talk.

Here are some reasons I imagined behind the staring:

  • That baby is very cute (I'm not biased at all!)
  • Shock
  • Wonder if day care is provided (No, it wasn't. But with a 4 month old who hasn't been in day care yet, I probably wouldn't have used it.)
  • Too hungover to politely not stare

Pros & Cons

And here is how I felt after the conference, regarding pros and cons on bringing the baby:

Pros
  • A baby is a good conversation starter, which is beneficial in a generally introverted crowd.
  • I realized there are helpful & nice people in the world who offered to help plenty of times.
  • The baby was happiest staying with me.
Cons
  • Because children were the focus of many conversations, I missed out on talking shop a bit.
  • It's tiring, but in the same way that all parenting is.
  • I couldn't participate in all of the social/evening activities of the conference.
  • Staring generally makes me feel uncomfortable.

Tips

And finally, some tips:

  • Plan ahead:
    • Review the sessions in advance and pick out ones you want to attend because you may not have time to do that on the fly.
    • Walk (or travel) the route from your hotel to the conference so you know how long it will take and if there will be challenges.
  • Be agile and adapt. Most parents are already probably doing this with a 4 month old.
  • Manage your expectations:
    • Expect the conference with a baby to be tiring & challenging at times.
    • Expect stares.
    • Expect you won't make it to every session you want, so make a point of talking to others to find out their favorite sessions.
  • If not provided, ask conference organizers for access to a nursing or stashing room.
  • Bring baby gear options: carrier, stroller, bouncy seat, etc.
  • Research food delivery options ahead of time.
  • Order foods that are easy to eat with one hand. Again, another skill parents of a 4 month old may have developed.
  • Sit or stand in the back of talks.

While in these circumstances I think we made the right decision, I look forward to attending more conferences sans-children.