End Point

News

Welcome to End Point’s blog

Ongoing observations by End Point people

Bonked By Basic_auth Because Bcrypt

tl;dr - don't use a high bcrypt cost with HTTP basic auth!

Recently we had a client approach us with reports of a "slow" wiki experience. This was for a MediaWiki we recently installed for them; there were no fancy extensions, and the hardware, the OS, and the Apache web server were solid, perfectly normal choices. I was tasked to dive in and solve this issue.

The first step in any troubleshooting is to verify and duplicate the problem. While the wiki did feel a bit sluggish, it was not as bad as the reports we were getting of taking over 15 seconds to view a page. A side-by-side comparison with a similar wiki seemed a good place to start. I called up the main wiki page on both the client wiki and End Point's internal wiki. Both were running the latest version of MediaWiki, had the same type of servers (located a similar distance from me), were using the same version of Apache, and had roughly the same server load. While both wiki's pages had roughly the same amount of content, the client one loaded noticeably slower. It took less than a second for the End Point wiki, and around ten seconds for the client one!

The first culprit was MediaWiki itself. Perhaps something was misconfigured there, or some extension was slowing everything down? MediaWiki has good debugging tools. Inside the both wiki's LocalSettings.php file I turned on debugging temporarily with:

$wgDebugLogFile         = '/tmp/mediawiki.debug';
$wgDebugDBTransactions  = true;
$wgDebugDumpSql         = true;
$wgDebugTimestamps      = true;

I reloaded the page, then commented out the $wgDebugLogFile line to stop it from growing large (the debug output can be quite verbose!). Here's some snippets from the generated log file:

0.9151   4.2M  Start request GET /wiki/Main_Page
...
[caches] main: SqlBagOStuff, message: SqlBagOStuff, parser: SqlBagOStuff
[caches] LocalisationCache: using store LCStoreDB
0.9266   9.2M  Implicit transaction open enabled.
0.9279   9.2M  Query wikidb (1) (slave): SET /* DatabasePostgres::open  */ client_encoding='UTF8'
0.9282   9.2M  Resource id #127: Transaction state changed from IDLE -> ACTIVE
0.9268   9.2M  Query wikidb (2) (slave): SET /* DatabasePostgres::open  */ datestyle = 'ISO, YMD'
...
0.9587   9.2M  Query wikidb (11) (slave): SELECT /* LCStoreDB::get  */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'deps'  LIMIT 1
0.9573   9.5M  Query wikidb (12) (slave): SELECT /* LCStoreDB::get  */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'list'  LIMIT 1
0.9567  10.8M  Query wikidb (13) (slave): SELECT /* LCStoreDB::get  */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'preload'  LIMIT 1
0.9572  10.8M  Query wikidb (14) (slave): SELECT /* LCStoreDB::get  */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'preload'  LIMIT 1
...
0.9875  21.2M  Query wikidb (195) (slave): SELECT /* LCStoreDB::get Greg */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'messages:accesskey-pt-mycontris'  LIMIT 1
0.9873  21.2M  Query wikidb (196) (slave): SELECT /* LCStoreDB::get Greg */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'messages:tooltip-pt-logout'  LIMIT 1
0.9868  21.2M  Query wikidb (197) (slave): SELECT /* LCStoreDB::get Greg */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'messages:accesskey-pt-logout'  LIMIT 1
0.9883  21.2M  Query wikidb (198) (slave): SELECT /* LCStoreDB::get Greg */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'messages:vector-more-actions'  LIMIT 1

Just to load a simple page, there were 194 SELECT statements! And 137 of those were trying to look in the l10n_cache table, one row at a time. Clearly, there is lots of room for improvement there. Someday, I may even jump in and tackle that. But for now, despite being very inefficient, it is also very fast. Because of the $wgDebugTimestamps, it was easy to compute how much time both wikis spent actually creating the page and sending it back to Apache. In this case, the difference was minimal, which meant MediaWiki was not the culprit.

I then turned my attention to Apache. Perhaps it was compiled differently? Perhaps there was some obscure SSL bug slowing things down for everyone? These were unlikely, but it was worth checking the Apache logs (which were in /var/log/httpd). There are two main logs Apache uses: access and error. The latter revealed nothing at all when I loaded the main wiki page. The access logs looked fairly normal:

85.236.207.120 - greg [19/Jan/2016:12:23:21 -0500] "GET /wiki/Main_Page HTTP/1.1" 200 23558 "-" "Mozilla/5.0 Firefox/43.0"
85.236.207.120 - greg [19/Jan/2016:12:23:22 -0500] "GET /mediawiki/extensions/balloons/js/balloon.config.js HTTP/1.1" 200 4128 "https://wiki.endpoint.com/wiki/Main_Page
" "Mozilla/5.0 Firefox/43.0"
...
85.236.207.120 - greg [19/Jan/2016:12:23:22 -0500] "GET /mediawiki/load.php?debug=false&lang=en&modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.sectionAnchor%7Cmediawiki.skinning.interface%7Cskins.vector.styles&only=styles&skin=vector HTTP/1.1" 200 58697 "https://wiki.endpoint.com/wiki/Main_Page" "Mozilla/5.0 Firefox/43.0"
85.236.207.120 - greg [19/Jan/2016:12:23:22 -0500] "GET /mediawiki/resources/assets/poweredby_mediawiki_88x31.png HTTP/1.1" 200 3525 "https://wiki.endpoint.com/wiki/Main_Page" "Mozilla/5.0 Firefox/43.0"

Still nothing out of the ordinary. What to do next? When all else fails, go to the system calls. It's about as close to bare metal as you can easily get on a Linux system. In this case, I decided to run strace on the Apache daemon to see exactly where the time was being spent. As expected, there were a large handful of httpd processes already spawned and waiting for a connection. While there was no way to know which one would field my requests, some shell-fu allowed me to strace them all at once:

## The -u prevents us from picking the parent httpd process, because it is owned by root!
$ strace -o greg.httpd.trace -tt -ff `pgrep -u apache httpd | xargs -n 1 echo -p | xargs`
Process 5148 attached
Process 4848 attached
Process 5656 attached
Process 4948 attached
Process 5149 attached
Process 5148 attached
Process 4858 attached
Process 5657 attached
Process 4852 attached
Process 4853 attached
^CProcess 5148 detached
Process 4848 detached
Process 5656 detached
Process 4948 detached
Process 5149 detached
Process 5148 detached
Process 4858 detached
Process 5657 detached
Process 4852 detached
Process 4853 detached

Looking at one of the output of one of these revealed some important clues:

$ head greg.httpd.trace.4948
13:00:28.799807 read(14, "\27\3\3\2\221\0\0\0\0\0\0\0\1\35-\332\3123(\200\302\"\251'g\256\363b5"..., 8000) = 666
13:00:28.799995 stat("/wiki/htdocs/mediawiki/load.php", {st_mode=S_IFREG|0644, st_size=1755, ...}) = 0
13:00:28.800126 open("/wiki/htpasswd.users", O_RDONLY|O_CLOEXEC) = 15
13:00:28.800176 fstat(15, {st_mode=S_IFREG|0640, st_size=947, ...}) = 0
13:00:28.800204 read(15, "alice:{SHA}jA0EAgMCMEpo4Wa3n/9gy"..., 4096) = 2802
13:00:28.800230 close(15)               = 0
13:00:29.496369 setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0
13:00:29.496863 rt_sigaction(SIGPROF, {0x7fc962da7ab0, [PROF], SA_RESTORER|SA_RESTART, 0x7fc970605670}, {0x7fc962da7ab0, [PROF], SA_RESTORER|SA_RESTART, 0x7fc970605670

Aha! If you look close at those timestamps, you will notice that the time gap from the call to close() and the subsequent setitimer() is quite large at .69 seconds. That's a long time for Apache to be waiting around for something. The second clue is the file it just opened: "htpasswd.users". Seeing the top of the file, with the {SHA} in quotes, made me realize the problem - htpasswd files now support bcrypt as an authentication method, and bcrypt is designed to be secure - and slow. Sure enough, the htpasswd file had bcrypt entries with a high cost for the people that were having the most issues with the speed. This is what the file looked like (names and values changed):

alice:{SHA}jA0EAgMCMEpo4Wa3n/9gybBBsDPa
greg$2y$13$+lE6+EwgtzP0m8K8VQDnYMRDRMf6rNMRZsCzko07QQpskKI9xbb/y9
mallory$2y$15$ww8Q4HMI1Md51kul2Hiz4ctetPqJ95cmspH8T81JHfqRvmg===rVgn
carol:7RnEKJWc38uEO
bob$apr1$uKX9Z63CqPOGX4lD1R4yVZsloJyZGf+
jon$2y$08$SUe3Z8sgEpyDWbWhUUUU5wtVTwlpEdc7QyXOg3e5WBwM4Hu35/OSo1
eve$apr1$I/hv09PcpU0VfXhyG7ZGaMz7Vhxi1Tm

I recognized the bcrypt format right away ($2y$13$). The people who were complaining the most (e.g. mallory in the example above) about the speed of the wiki had the highest costs, while those with low costs (e.g. jon), and those using something other than bcrypt (everyone else above), were not complaining at all! The 'cost' is the number after the second dollar sign: as you can see, some of them had a cost of 15, which is much more expensive than a cost of 13, which is what my user ("greg") was using. This was a smoking gun, but one more step was needed for proof. I adjusted the cost of my password to something low using the htpasswd program:

$ htpasswd -B -C 6 /wiki/htpasswd.users greg
New password: 
Re-type new password: 
Updating password for user greg

Voila! The page loaded in a flash. I then changed the cost to 15 and suddenly the wiki was even slower than before - taking upwards of 15 seconds to load the main page of the wiki. Mystery solved. All those high cost bcrypt requests are also not good for the server: not only does it use a lot of CPU, but ends up keeping the Apache daemon tied up waiting for the bcrypt to finish, rather than simply finishing up quickly and going back to the main pool.

You may be asking a few questions at this point, however. Why would htpasswd offer a footgun like this? Why such a radical difference in effect for slightly different costs? Is bcrypt a good practice for a htpasswd file? Let's attempt to answer those. Before we do, we have to learn a little bit about bcrypt and passwords in general. Some of this is purposefully oversimplified, so be gently in the comments. :)

Passwords themselves are never stored on a server (aka the machine doing the authentication). Instead, the server stores a hash of the password. This is created by what is known as a "one-way" function, that creates a unique fingerprint of your password. If this fingerprint (aka hash) is discovered, there is no direct way to see the password that created it. When you login to a site, it creates a hash of the password you give it, then compares that hash to the one it has stored. Thus, it can verify that you have given it the correct password without actually having to store the password.

For a long time, very simple algorithms were used to create these hashes. However, as computers became more powerful, and as the field of cryptography advanced, it became easier to "crack" these hashes and determine the password that was used to create them. This was an important problem, and one of the solutions that people came up with was the bcrypt algorithm, which makes the computation of the hash very expensive, in terms of computer speed. Furthermore, that speed is adjustable, and determined by the "cost" given at creation time. You may have noticed the -C option I used in the htpasswd example above. That number indicates the number of rounds the algorithm must go through. However, the cost given leads to 2^code rounds, which means that the cost is exponential. In other words, a cost of 13 means that bcrypt runs 2 to the 13th power rounds, or 8,192 rounds. A cost of 14 is 2 to the 14th power, or 16,384 rounds - twice as slow as a cost of 13! A cost of 15 is 32,768 rounds, etc. Thus, one can see why even a cost of 15 would be much slower than a cost of 13.

A web page usually returns more than just the requested HTML. There are commonly images, CSS, and javascript that must also be loaded from the webserver to fully render the page. Each of these requests must go through basic auth, and thus get slowed down by bcrypt. This is why even though each basic authentication via bcrypt of 15 only takes a couple of seconds, the entire web page can take much longer.

What encryption options are available for htpasswd program? The bcrypt option was introduced without much fanfare in version 2.4.4 of Apache, which was released on February 25, 2013. So, it's been around a while. The output of --help shows us that bcrypt is the only secure one, but allows for other legacy ones to be used. Also note that the range of costs for bcrypt range from 4 to 31:

 -m  Force MD5 encryption of the password (default).
 -B  Force bcrypt encryption of the password (very secure).
 -C  Set the computing time used for the bcrypt algorithm
     (higher is more secure but slower, default: 5, valid: 4 to 31).
 -d  Force CRYPT encryption of the password (8 chars max, insecure).
 -s  Force SHA encryption of the password (insecure).
 -p  Do not encrypt the password (plaintext, insecure).

So should you use bcrypt for your htpasswd? Absolutely yes. Even a lower cost bcrypt is incredibly more secure than using MD5, CRYPT, or SHA. A cost of 10 is roughly the same speed as those, but a much, much better choice. You can measure the time it takes to create or update your password via the command-line htpasswd command to get a rough idea of how much impact it will have on your website. You can use the time it takes to run the htpasswd command as rough proxy for the total page load time. Here are some numbers I generated on my local box. Numbers represent the average of a few runs:

Bcrypt cost htpasswd creation time Web page load time
100.0795.68 seconds
120.2686.77 seconds
140.97910.78 seconds
163.68425.72 seconds
1814.68388.85 seconds
2058.680358.80 seconds
22236.3691357.82 seconds
31186,173 seconds
(51 hours and 42 minutes!!)
Ah...no

There are times where you really do want a higher bcrypt cost. The basic auth usage in this scenario is really the exception, and not the norm. In most cases, a password will be used to log in to something, and you will either create a persistent connection (e.g. SSH), or a cookie with a temporary token will be issued (e.g. almost every website in the world). In those cases, a few seconds delay are quite acceptable, as it is a rare event.

So why do we even care about passwords so much, especially for something like basic auth and a htpasswd file? After all, if someone can view the contents of the htpasswd file, they can also more than likely view whatever material on the web server it was designed to protect. These days, however, it's important to view strong hashes such as bcrypt as not just protecting data, but protecting the password as well. Why? Password reuse. It's very common for people to use the same (or very similar) password on all the sites they visit. The danger is thus not that an attacker can view the file contents protected by the htpasswd file, but that an attacker can use that password on the user's email accounts, or on other sites the user may have visited and used the same password.

What bcrypt cost should you use? The general answer is to use the highest possible cost you can get away with. Take something with such a high cost that is causes discomfort to the users, then dial it back a tiny bit. Measure it out and see what your server can handle. For general bcrypt use, start with 13, but don't be afraid to keep going up until it takes a wall clock second or two to run. For basic auth, use something very fast: perhaps 9 or less. Anything that takes over a second to create via htpasswd will slow a site down noticeably!

Zurb Foundation 5: Applying Styles Conditionally Based on Viewport Size

The Zurb Foundation 5 front-end framework provides many convenient features such as the ability to control the visibility of HTML elements for different browser window sizes using CSS classes. Foundation CSS classes like 'show-for-small-only' and 'hide-for-large-up' (full list at http://foundation.zurb.com/sites/docs/v/5.5.3/components/visibility.html) make it easy to add mobile-specific content to your page or prevent certain page elements from being displayed on mobile devices.

Having an easy way to show/hide elements based on viewport size is nice, but what if you want to style an element differently based on the size of the browser that's viewing the page? Foundation has you covered there, too, though the method is less obvious. It's possible to use Foundation's media query SCSS variables when writing your own custom styles in order to apply different styling rules for different viewport sizes.

For example, if you have an element that you want to offset with a margin in a large window but be flush with the left edge of a small window, you can use the Foundation media query variables to apply a styling override that's specific to small windows:

#element { 
    margin-left: 100px;
 
    @media #{$small-only} {
        margin-left: 0;
    }
}

This will apply a specific set of styling to the element for small viewports and a different set for medium and larger viewports (with the definitions for 'small', 'medium', etc. corresponding to the same values used by Foundation's visibility classes like 'show-for-small-only' which were mentioned at the start of this post).

It wasn't immediately obvious to me how to apply conditional styling using Foundation's own definitions of small, medium, etc. viewport sizes but luckily the variable definitions provided by the SCSS framework make it easy to do so.

Postgres 9.5: three little things

The recent release of Postgres 9.5 has many people excited about the big new features such as UPSERT (docs) and row-level security (docs). Today I would like to celebrate three of the smaller features that I love about this release.

Before jumping into my list, I'd like to thank everyone who contributes to Postgres. I did some quick analysis and found that 85 people, from Adrien to Zeus, have helped version 9.5 of Postgres, at least according to the git logs. Of course, that number is actually higher, as it doesn't take into account people helping out on the #postgresql channel, running buildfarm animals, doing packaging work, keeping the infrastructure running, etc. Thanks to you all!

Feature: REINDEX VERBOSE

The first feature is one I've been wishing for a long time - a verbose form of the REINDEX command. Thanks to Sawada Masahiko for adding this. Similar to VACUUM, REINDEX gets kicked off and then gives no progress or information until it finishes. While VACUUM has long had the VERBOSE option to get around this, REINDEX gives you no clue to which index it was working on, or how much work each index took to rebuild. Here is a normal reindex, along with another 9.5 feature, the ability to reindex an entire schema:

greg=# reindex schema public;
## What seems like five long minutes later...
REINDEX

The new syntax uses parenthesis to support VERBOSE and any other future options. If you are familiar with EXPLAIN's newer options, you may see a similarity. More on the syntax in a bit. Here is the much improved version in action:

greg=# reindex (verbose) schema public;
INFO:  index "foobar_pkey" was reindexed
DETAIL:  CPU 11.00s/0.05u sec elapsed 19.38 sec.
INFO:  index "foobar_location" was reindexed
DETAIL:  CPU 5.21s/0.05u sec elapsed 18.27 sec.
INFO:  index "location_position" was reindexed
DETAIL:  CPU 9.10s/0.05u sec elapsed 19.70 sec.
INFO:  table "public.foobar" was reindexed
INFO:  index "foobaz_pkey" was reindexed
DETAIL:  CPU 7.04s/0.05u sec elapsed 19.61 sec.
INFO:  index "shoe_size" was reindexed
DETAIL:  CPU 12.26s/0.05u sec elapsed 19.33 sec.
INFO:  table "public.foobaz" was reindexed
REINDEX

Why not REINDEX VERBOSE TABLE foobar? Or even REINDEX TABLE foobar WITH VERBOSE? There was a good discussion of this on pgsql-hackers when this feature was being developed, but the short answer is that parenthesis are the correct way to do things moving forward. Robert Haas summed it up well:

The unparenthesized VACUUM syntax was added back before we realized that that kind of syntax is a terrible idea. It requires every option to be a keyword, and those keywords have to be in a fixed order. I believe the intention is to keep the old VACUUM syntax around for backward-compatibility, but not to extend it. Same for EXPLAIN and COPY.

The psql help option should show the new syntax:

greg=# \h REINDEX
Command:     REINDEX
Description: rebuild indexes
Syntax:
REINDEX [ ( { VERBOSE } [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name

Feature: pg_ctl defaults to "fast" mode

The second feature in Postgres 9.5 I am happy about is the change in niceness of pg_ctl from "smart" mode to "fast" mode. The help of pg_ctl explains the different modes fairly well:

pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.

Usage:
  pg_ctl stop    [-W] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]
...
Shutdown modes are:
  smart       quit after all clients have disconnected
  fast        quit directly, with proper shutdown
  immediate   quit without complete shutdown; will lead to recovery on restart

In the past, the default was 'smart'. Which often means your friendly neighborhood DBA would type "pg_ctl restart -D data", then watch the progress dots slowly marching across the screen, until they remembered that the default mode of "smart" is kind of dumb - as long as there is one connected client, the restart will not happen. Thus, the DBA had to cancel the command, and rerun it as "pg_ctl restart -D data -m fast". Then they would vow to remember to add the -m switch in next time. And promptly forget to the next time they did a shutdown or restart. :) Now pg_ctl has a much better default. Thanks, Bruce Momjian!

Feature: new 'cluster_name' option

When you run a lot of different Postgres clusters on your server, as I tend to do, it can be hard to tell them apart as the version and port are not reported in the ps output. I sometimes have nearly a dozen different clusters running, due to testing different versions and different applications. Similar in spirit to the application_name option, the new cluster_name option solves the problem neatly by allowing a custom string to be put in to the process title. Thanks to Thomas Munro for inventing this. So instead of this:

greg      7780     1  0 Mar01 pts/0    00:00:03 /home/greg/pg/9.5/bin/postgres -D data
greg      7787  7780  0 Mar01 ?        00:00:00 postgres: logger process   
greg      7789  7780  0 Mar01 ?        00:00:00 postgres: checkpointer process   
greg      7790  7780  0 Mar01 ?        00:00:09 postgres: writer process   
greg      7791  7780  0 Mar01 ?        00:00:06 postgres: wal writer process   
greg      7792  7780  0 Mar01 ?        00:00:05 postgres: autovacuum launcher process   
greg      7793  7780  0 Mar01 ?        00:00:11 postgres: stats collector process  
greg      6773     1  0 Mar01 pts/0    00:00:02 /home/greg/pg/9.5/bin/postgres -D data2
greg      6780  6773  0 Mar01 ?        00:00:00 postgres: logger process   
greg      6782  6773  0 Mar01 ?        00:00:00 postgres: checkpointer process   
greg      6783  6773  0 Mar01 ?        00:00:04 postgres: writer process   
greg      6784  6773  0 Mar01 ?        00:00:02 postgres: wal writer process   
greg      6785  6773  0 Mar01 ?        00:00:02 postgres: autovacuum launcher process   
greg      6786  6773  0 Mar01 ?        00:00:07 postgres: stats collector process

One can adjust the cluster_name inside each postgresql.conf (for example, to 'alpha' and 'bravo'), and get this:

greg      8267     1  0 Mar01 pts/0    00:00:03 /home/greg/pg/9.5/bin/postgres -D data
greg      8274  8267  0 Mar01 ?        00:00:00 postgres: alpha: logger process   
greg      8277  8267  0 Mar01 ?        00:00:00 postgres: alpha: checkpointer process   
greg      8278  8267  0 Mar01 ?        00:00:09 postgres: alpha: writer process   
greg      8279  8267  0 Mar01 ?        00:00:06 postgres: alpha: wal writer process   
greg      8280  8267  0 Mar01 ?        00:00:05 postgres: alpha: autovacuum launcher process   
greg      8281  8267  0 Mar01 ?        00:00:11 postgres: alpha: stats collector process  
greg      8583     1  0 Mar01 pts/0    00:00:02 /home/greg/pg/9.5/bin/postgres -D data2
greg      8590  8583  0 Mar01 ?        00:00:00 postgres: bravo: logger process   
greg      8592  8583  0 Mar01 ?        00:00:00 postgres: bravo: checkpointer process   
greg      8591  8583  0 Mar01 ?        00:00:04 postgres: bravo: writer process   
greg      8593  8583  0 Mar01 ?        00:00:02 postgres: bravo: wal writer process   
greg      8594  8583  0 Mar01 ?        00:00:02 postgres: bravo: autovacuum launcher process   
greg      8595  8583  0 Mar01 ?        00:00:07 postgres: bravo: stats collector process

There are a lot of other things added in Postgres 9.5. I recommend you visit this page for a complete list, and poke around on your own. A final shout out to all the people that are continually improving the tab-completion of psql. You rock.

Improve SEO URLs for Interchange search pages

This is an article aimed at beginner-to-intermediate Interchange developers.

A typical approach to a hierarchical Interchange site is:

Categories -> Category -> Product

I.e., you list all your categories as links, each of which opens up a search results page filtering the products by category, with links to the individual product pages via the flypage.

Recently I upgraded a site so the category URLs were a bit more SEO-friendly. The original category filtering search produced these lovely specimens:

/search.html?fi=products&st=db&co=1&sf=category&se=Shoes&op=rm
   &sf=inactive&se=yes&op=ne&tf=category&ml=100

but what I really wanted was:

/cat/Shoes.html

Such links are easier to communicate to users, more friendly to search engines, less prone to breakage (e.g., by getting word-wrapped in email clients), and avoid exposing details of your application (here, we've had to admit publicly that we have a table called "products" and that some items are "inactive"; a curious user might decide to see what happens if they change "sf=inactive&se=yes" to some other expression).

Here's how I attacked this.

Creating a category listing page

First, I copied my "results.html" page to "catpage.html". That way, my original search results page can continue to serve up ad hoc search results.

The search results were displayed via:

[search-region]
...
[/search-region]

I converted this to a database query:

[query sql="SELECT * FROM products WHERE NOT inactive AND category = [sql-quote][cgi category][/sql-quote]"
 type=list prefix=item]
...
[/query]

I chose to use a prefix other than the default since it would avoid having to change so many tags in the page, and now both the original search page and new catpage would look much the same internally (and thus, if desired, I could refactor them in the future).

Note that I've defined part of the API for this page: the category to be searched is set in a CGI variable called "category".

In my specific case, there was additional tinkering with this tag, because I had nested [query] tags already in the page within the search-region.

Creating a "cat" actionmap

In order to translate a URL containing SEO-friendly "/cat/Shoes.html" into my search, I need an actionmap. Here's mine; it's very simple.

Actionmap cat <<"CODE"
sub {
  my $url = shift;
  my @url_parts = split '/' => $url;
  shift @url_parts if $url_parts[0] eq 'cat';

  $CGI->{mv_nextpage} = 'catpage.html';
  $CGI->{category} = shift @url_parts;
  return 1;
}
CODE

Actionmaps are called when Interchange detects that a URL begins with the actionmap's name; here "cat". They are passed a parameter containing the URL fragment (after removing all the site stuff). Here, that would be (e.g.) "/cat/Shoes". We massage the URL to get our category code, and set up the page to be called along with the CGI parameter(s) it expects.

Cleaning up the links

At the start of this article I noted that I may have a page listing all my categories. In my original setup, this generated links using a construction like this:


  Shoes

Now my links are the much simpler:

Shoes

In my specific case, these links were generated within a [query] loop, but the approach is the same.

Note: the Strap demo supports SEO-friendly URLs out-of-the-box, and that it is included with the latest Interchange 5.10 release.

Full Screen Gallery with Supersized and video slides

I was recently looking to build a full screen image and video gallery for our client Mission Blue. Something similar to the Google Maps interface you can see in the screenshot below:

After scouring the Internet to find a suitable jQuery plugin I finally decided on Supersized, Full screen background slideshow plugin for jQuery.

After downloading the library, include it on the page:

<link href="/wp-content/plugins/wp-supersized/theme/supersized.shutter.css?ver=4.2.2" id="supersized_theme_css-css" media="all" rel="stylesheet" type="text/css"></link>
<script src="/wp-includes/js/jquery/ui/effect.min.js?ver=1.11.4" type="text/javascript"></script>
<script src="/wp-content/plugins/wp-supersized/js/jquery.easing.min.js?ver=1.3" type="text/javascript"></script>
<script src="/wp-content/plugins/wp-supersized/js/jquery.easing.compatibility.js?ver=1.0" type="text/javascript"></script>
<script src="/wp-content/plugins/wp-supersized/js/jquery.animate-enhanced.min.js?ver=0.75" type="text/javascript"></script>
<script type='text/javascript' src='/wp-content/plugins/wp-supersized/js/supersized.3.2.7.min.js?ver=3.2.7'></script>

Basic functionality

Let's create a variable that will hold all the images in the slideshow:

var images = [];
images.push({
  type: 'IMAGE',
  image: 'img1.jpg',
  title: 'Image 1',
  thumb: 'img1_thumb.jpg',
  url: 'http://www.endpoint.com'
});
images.push({
  type: 'YOUTUBE',
  image: 'screenshot1.jpg',
  title: 'YouTube slide',
  videoid: 'abc12345678',
  thumb: 'screenshot1_thumb.jpg',
  url: 'https://www.youtube.com/watch?v=abc12345678'
});

Let's initialize Supersized:

jQuery.supersized({
  slideshow: 1,
  autoplay: 0,
  min_width: 0,
  min_height: 0,
  vertical_center: 1,
  horizontal_center: 1,
  fit_always: 0,
  fit_portrait: 1,
  fit_landscape: 0,
  slide_links: 'blank',
  thumb_links: 1,
  thumbnail_navigation: 1,
  slides: images,
  mouse_scrub: 0
});

Customizing the toolbar

<div id="thumb-tray" class="load-item">
  <div id="thumb-back"></div>
  <div id="thumb-forward"></div>
</div>
<div id="slidecaption"></div>

Customizing the screen image size

I didn't want to have the full screen image as it was a little overwhelming for the user. I wanted the black bars just like in the Google interface. Supersized allows for easy customization. This CSS did the trick:

#supersized, #supersized li {
  width: 70% !important;
  left: 0 !important;
  right: 0 !important;
  top: 1px !important;
  margin:auto;
}

Introducing video (YouTube) slides

First, I added the Youtube API:

<script type="text/javascript" src="https://www.youtube.com/iframe_api"></script>

Then I added a couple of CSS styles:

#supersized .player {
  margin: auto;
  display: block;
}

Finally, I went into the Supersized library source and modified it. To allow for the video slides to appear, I added the new condition and the slide type 'YOUTUBE'

base._renderSlide = function(loadPrev, options) {
  var linkTarget = base.options.new_window ? ' target="_blank"' : '';
  var imageLink = (base.options.slides[loadPrev].url) ? "href='" + base.options.slides[loadPrev].url + "'" : "";
  var slidePrev = base.el + ' li:eq(' + loadPrev + ')';
  var imgPrev = $('<img src="' + base.options.slides[loadPrev].image + '"/>');

  if (base.options.slides[loadPrev].type == 'YOUTUBE') {
    imgPrev.load(function () {
      var video = $('<div class="player" id="player'+ base.options.slides[loadPrev].videoid + '"></div>');
      video.appendTo(slidePrev);
      var player = new YT.Player('player' + base.options.slides[loadPrev].videoid, {
        height: 390,
        width: 640,
        videoId: base.options.slides[loadPrev].videoid
      });
    });// End Load
  }
  else {
    imgPrev.appendTo(slidePrev).wrap('<a ' + imageLink + linkTarget + '></a>').parent().parent().addClass('image-loading ' + options['class']);

    imgPrev.load(function () {
      $(this).data('origWidth', $(this).width()).data('origHeight', $(this).height());
      base.resizeNow();// Resize background image
    });// End Load
  }
};

Final Result

This is how gallery looks with the customizations:

This is what a video slide looks like:

Hope you found this writeup useful!

Medium-inspired Parallax Blur Effect For WordPress

Are you are running a WordPress blog, but secretly dying to have that Medium parallax blur effect? I recently implemented this, and would like to share it with you. By the way, while I was working on the article, the effect was removed from Medium, which only makes having one on the website more precious.

Let's assume that we have our custom theme class MyTheme. In functions.php:

class MyThemeBaseFunctions {
  public function __construct() {
    add_image_size('blurred', 1600, 1280, true);
    add_filter('wp_generate_attachment_metadata', array($this,'wp_blur_attachment_filter'));
  }
}

We added a custom image size blurred, and a callback wp_blur_attachment_filter to wp_generate_attachment_metadata. Here's where the magic happens.

Before that let's talk a little about ImageMagick, a powerful library for image processing that we will use to create the blurred effect. After some experimenting I figured that the image needed to be darkened, and then a regular blur should be applied with sigma=20. You can read more about these settings at ImageMagick Blur Usage. I used Gaussian Blur at first, but found the processing was extremely slow, and there wasn't much difference in the end result compared to other blur methods.

Now we are ready to write a blurring function:

public function wp_blur_attachment_filter($image_data) {
    if ( !isset($image_data['sizes']['blurred']) )
       return $image_data;
       $upload_dir = wp_upload_dir();
       $src = $upload_dir['path'] . '/' . $image_data['sizes']['large']['file'];
       $destination = $upload_dir['path'] . '/' . $image_data['sizes']['blurred']['file'];
       $imagick = new \Imagick($src);
       $imagick->blurImage(0, 20, Imagick::CHANNEL_ALL);
       $imagick->modulateImage(75, 105, 100);
       $imagick->writeImage($destination);
       return $image_data;
 }

I darken the image:

$imagick->modulateImage(75, 105, 100);

And I blur the image:

$imagick->blurImage(0, 20, Imagick::CHANNEL_ALL);

Now we are able to use the custom image size in the template. Place the helper function in functions.php:

public static function non_blurred($src) {
  $url = get_site_url() . substr($src, strrpos($src, '/wp-content'));
  $post_ID = attachment_url_to_postid($url);
  list($url, $width, $height) = wp_get_attachment_image_src($post_ID, 'large');
  return $url;
}

public static function blurred($src) {
  $url = get_site_url() . substr($src, strrpos($src, '/wp-content'));
  $post_ID = attachment_url_to_postid($src);
  list($url, $width, $height) = wp_get_attachment_image_src($post_ID, 'blurred');
  return $url;
}

And now use it in the template like this:

<div class="blurImg">
  <div style="background-image: url('<?php echo MyTheme::non_blurred(get_theme_mod( 'header' )); ?>')"></div>
  <div style="background-image: url('<?php echo MyTheme::blurred(get_theme_mod('header')); ?>'); opacity: 0;" class="blur"></div>
</div>
<header></header>

Add CSS:

.blurImg {
  height: 440px;
  left: 0;
  position: relative;
  top: 0;
  width: 100%;
  z-index: -1;
}

.blurImg > div {
  background-position: center center;
  background-repeat: no-repeat;
  background-size: cover;
  height: 440px;
  position: fixed;
  width: 100%;
}

header {
  padding: 0 20px;
  position: absolute;
  top: 0;
  width: 100%;
  z-index: 1;
}

Add JavaScript magic sauce to gradually replace the non-blurred image with the blurred version as the user scrolls:

(function() {
    jQuery(window).scroll(function() {
      var H = 240;
      var oVal = jQuery(window).scrollTop() / H;
      jQuery(".blurImg .blur").css("opacity", oVal);
    });
  }).call(this);

Generating the blurred version can be very strenuous on the server. I would oftentimes receive the error PHP Fatal error: Maximum execution time of 30 seconds exceeded. There are ways to work around that. One way is to use a quicker method of blurring the image by shrinking it, blurring, and resizing it back. Another way is to use a background job, something like this:

add_action( 'add_attachment', array($this, 'wp_blur_attachment_filter') );
add_action( 'wp_blur_attachment_filter_hook', 'wp_blur_attachment_filter_callback');
function wp_blur_attachment_filter_callback($path) {
  $path_parts = pathinfo($path);
  $imagick = new \Imagick($path);
  $imagick->blurImage(0, 20, Imagick::CHANNEL_ALL);
  $imagick->modulateImage(75, 105, 100);
  $destination = dirname($path) . "/" . $path_parts['filename'] . "_darken_blur." . $path_parts['extension'];
  $imagick->writeImage($destination);
}

public function wp_blur_attachment_filter($post_ID) {
  $path = get_attached_file( $post_ID );
  wp_schedule_single_event(time(), 'wp_blur_attachment_filter_hook', array($path));
}

Or better yet, use cloud image processing — I wrote about that here.

I hope you found this writeup useful!

PostgreSQL Point-in-time Recovery: An Unexpected Journey

With all the major changes and improvements to PostgreSQL's native replication system through the last few major releases, it's easy to forget that there can be benefits to having some of the tried and true functionalities from older PostgreSQL versions in place.

In particular, with the ease of setting up Hot Standby/Streaming Replication, it's easy to get replication going with almost no effort. Replication is great for redundancy, scaling, and backups, however it does not solve all potential data-loss problems; for best results when used in conjunction with Point-in-time Recovery (PITR) and the archiving features of PostgreSQL.

Background

We recently had a client experience a classic blunder with their database; mainly that of performing a manual UPDATE of the database without wrapping in a transaction block. The table in question was the main table in the application, and the client had done an unqualified UPDATE, unintentionally setting a specific field to a constant value instead of targetting the specific row they thought they were going for.

Fortunately, the client had backups. Unfortunately the backups themselves would not be enough; being a snapshot of the data earlier in the day, we would have lost all changes made throughout the day.

This resulted in a call to us to help out with the issue. We fortunately had information about precisely when the errant UPDATE took place, so we were able to use this information to help target a PITR-based restore.

The Approach

Since we did not want to lose other changes made in this database cluster either before or after this mistake, we came up with the following strategy which would let us keep the current state of the database but just recover the field in question:

  1. Create a parallel cluster for recovery.
  2. Load the WAL until just before the time of the event.
  3. Dump the table in question from the recovery cluster.
  4. Load the table in the main cluster with a different name.
  5. Use UPDATE FROM to update the field values for the table with their old values based on the table's Primary Key.

In practice, this worked out pretty well, though of course there were some issues that had to be worked around.

PostgreSQL's PITR relies on its WAL archiving mechanism combined with taking regular base backups of the data directory. As part of the archive setup, you choose the strategies (such as the frequency of the base backups) and ensure that you can recover individual WAL segment files when desired.

In order for the above strategy to work, you need hardware to run this on. The client had proposed their standby server which was definitely equipped to handle this and did not have much load. The client had initially suggested that we could break the replication, but we recommended against that, due to both having sufficient disk space and being able to avoid future work and risk by having to rebuild the replica after this stage.

We copied over the daily base backup into its own directory/mount point here, adjusted the recovery.conf file to point to the local WAL directory, and copied the necessary WAL files from the archive location to the pg_xlog directory of the new cluster. We also had to adjust a few parameters in the new cluster, most notably the "port" parameter to run the cluster on a different port. We also used the timestamp of the incident as a target for the recovery.conf's recovery_target_time setting. After starting up the new cluster and letting things process, we were able to dump the table in question and finish the recovery on the master.

Some issues did come up for us that we needed expert-level knowledge of the system, as well as having some good luck in the timing if the event. We had to locate several of the WAL files in the initial archive on the primary server due to some issues with the (inherited by us) configuration. Also due to the timing of the event and the amount of time it took to create the parallel cluster, we successfully were able to create the new instance before the next nightly base backup was run, which was fortunate, because it otherwise would have resulted in our inability to resolve this issue. (The client had things configured to keep only a single base backup around.)

Lessons Learned

With any issue, there is a takeaway, so what are those here?

  • Always use explicit transactions when manually modifying data, or modify your production environment's .psqlrc to add\set AUTOCOMMIT off.
  • Not all data-loss situations can be fixed with replication alone—Point in Time Recovery is absolutelystillrelevant these days.
  • It helps to have a PostgreSQL expert on-hand day-or-night. End Point offers 24x7 PostgreSQL support, which you can engage by getting ahold of us here.