News

Welcome to End Point’s blog

Ongoing observations by End Point people

Postgres 9.5: three little things

The recent release of Postgres 9.5 has many people excited about the big new features such as UPSERT (docs) and row-level security (docs). Today I would like to celebrate three of the smaller features that I love about this release.

Before jumping into my list, I'd like to thank everyone who contributes to Postgres. I did some quick analysis and found that 85 people, from Adrien to Zeus, have helped version 9.5 of Postgres, at least according to the git logs. Of course, that number is actually higher, as it doesn't take into account people helping out on the #postgresql channel, running buildfarm animals, doing packaging work, keeping the infrastructure running, etc. Thanks to you all!

Feature: REINDEX VERBOSE

The first feature is one I've been wishing for a long time - a verbose form of the REINDEX command. Thanks to Sawada Masahiko for adding this. Similar to VACUUM, REINDEX gets kicked off and then gives no progress or information until it finishes. While VACUUM has long had the VERBOSE option to get around this, REINDEX gives you no clue to which index it was working on, or how much work each index took to rebuild. Here is a normal reindex, along with another 9.5 feature, the ability to reindex an entire schema:

greg=# reindex schema public;
## What seems like five long minutes later...
REINDEX

The new syntax uses parenthesis to support VERBOSE and any other future options. If you are familiar with EXPLAIN's newer options, you may see a similarity. More on the syntax in a bit. Here is the much improved version in action:

greg=# reindex (verbose) schema public;
INFO:  index "foobar_pkey" was reindexed
DETAIL:  CPU 11.00s/0.05u sec elapsed 19.38 sec.
INFO:  index "foobar_location" was reindexed
DETAIL:  CPU 5.21s/0.05u sec elapsed 18.27 sec.
INFO:  index "location_position" was reindexed
DETAIL:  CPU 9.10s/0.05u sec elapsed 19.70 sec.
INFO:  table "public.foobar" was reindexed
INFO:  index "foobaz_pkey" was reindexed
DETAIL:  CPU 7.04s/0.05u sec elapsed 19.61 sec.
INFO:  index "shoe_size" was reindexed
DETAIL:  CPU 12.26s/0.05u sec elapsed 19.33 sec.
INFO:  table "public.foobaz" was reindexed
REINDEX

Why not REINDEX VERBOSE TABLE foobar? Or even REINDEX TABLE foobar WITH VERBOSE? There was a good discussion of this on pgsql-hackers when this feature was being developed, but the short answer is that parenthesis are the correct way to do things moving forward. Robert Haas summed it up well:

The unparenthesized VACUUM syntax was added back before we realized that that kind of syntax is a terrible idea. It requires every option to be a keyword, and those keywords have to be in a fixed order. I believe the intention is to keep the old VACUUM syntax around for backward-compatibility, but not to extend it. Same for EXPLAIN and COPY.

The psql help option should show the new syntax:

greg=# \h REINDEX
Command:     REINDEX
Description: rebuild indexes
Syntax:
REINDEX [ ( { VERBOSE } [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name

Feature: pg_ctl defaults to "fast" mode

The second feature in Postgres 9.5 I am happy about is the change in niceness of pg_ctl from "smart" mode to "fast" mode. The help of pg_ctl explains the different modes fairly well:

pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.

Usage:
  pg_ctl stop    [-W] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]
...
Shutdown modes are:
  smart       quit after all clients have disconnected
  fast        quit directly, with proper shutdown
  immediate   quit without complete shutdown; will lead to recovery on restart

In the past, the default was 'smart'. Which often means your friendly neighborhood DBA would type "pg_ctl restart -D data", then watch the progress dots slowly marching across the screen, until they remembered that the default mode of "smart" is kind of dumb - as long as there is one connected client, the restart will not happen. Thus, the DBA had to cancel the command, and rerun it as "pg_ctl restart -D data -m fast". Then they would vow to remember to add the -m switch in next time. And promptly forget to the next time they did a shutdown or restart. :) Now pg_ctl has a much better default. Thanks, Bruce Momjian!

Feature: new 'cluster_name' option

When you run a lot of different Postgres clusters on your server, as I tend to do, it can be hard to tell them apart as the version and port are not reported in the ps output. I sometimes have nearly a dozen different clusters running, due to testing different versions and different applications. Similar in spirit to the application_name option, the new cluster_name option solves the problem neatly by allowing a custom string to be put in to the process title. Thanks to Thomas Munro for inventing this. So instead of this:

greg      7780     1  0 Mar01 pts/0    00:00:03 /home/greg/pg/9.5/bin/postgres -D data
greg      7787  7780  0 Mar01 ?        00:00:00 postgres: logger process   
greg      7789  7780  0 Mar01 ?        00:00:00 postgres: checkpointer process   
greg      7790  7780  0 Mar01 ?        00:00:09 postgres: writer process   
greg      7791  7780  0 Mar01 ?        00:00:06 postgres: wal writer process   
greg      7792  7780  0 Mar01 ?        00:00:05 postgres: autovacuum launcher process   
greg      7793  7780  0 Mar01 ?        00:00:11 postgres: stats collector process  
greg      6773     1  0 Mar01 pts/0    00:00:02 /home/greg/pg/9.5/bin/postgres -D data2
greg      6780  6773  0 Mar01 ?        00:00:00 postgres: logger process   
greg      6782  6773  0 Mar01 ?        00:00:00 postgres: checkpointer process   
greg      6783  6773  0 Mar01 ?        00:00:04 postgres: writer process   
greg      6784  6773  0 Mar01 ?        00:00:02 postgres: wal writer process   
greg      6785  6773  0 Mar01 ?        00:00:02 postgres: autovacuum launcher process   
greg      6786  6773  0 Mar01 ?        00:00:07 postgres: stats collector process

One can adjust the cluster_name inside each postgresql.conf (for example, to 'alpha' and 'bravo'), and get this:

greg      8267     1  0 Mar01 pts/0    00:00:03 /home/greg/pg/9.5/bin/postgres -D data
greg      8274  8267  0 Mar01 ?        00:00:00 postgres: alpha: logger process   
greg      8277  8267  0 Mar01 ?        00:00:00 postgres: alpha: checkpointer process   
greg      8278  8267  0 Mar01 ?        00:00:09 postgres: alpha: writer process   
greg      8279  8267  0 Mar01 ?        00:00:06 postgres: alpha: wal writer process   
greg      8280  8267  0 Mar01 ?        00:00:05 postgres: alpha: autovacuum launcher process   
greg      8281  8267  0 Mar01 ?        00:00:11 postgres: alpha: stats collector process  
greg      8583     1  0 Mar01 pts/0    00:00:02 /home/greg/pg/9.5/bin/postgres -D data2
greg      8590  8583  0 Mar01 ?        00:00:00 postgres: bravo: logger process   
greg      8592  8583  0 Mar01 ?        00:00:00 postgres: bravo: checkpointer process   
greg      8591  8583  0 Mar01 ?        00:00:04 postgres: bravo: writer process   
greg      8593  8583  0 Mar01 ?        00:00:02 postgres: bravo: wal writer process   
greg      8594  8583  0 Mar01 ?        00:00:02 postgres: bravo: autovacuum launcher process   
greg      8595  8583  0 Mar01 ?        00:00:07 postgres: bravo: stats collector process

There are a lot of other things added in Postgres 9.5. I recommend you visit this page for a complete list, and poke around on your own. A final shout out to all the people that are continually improving the tab-completion of psql. You rock.

Improve SEO URLs for Interchange search pages

This is an article aimed at beginner-to-intermediate Interchange developers.

A typical approach to a hierarchical Interchange site is:

Categories -> Category -> Product

I.e., you list all your categories as links, each of which opens up a search results page filtering the products by category, with links to the individual product pages via the flypage.

Recently I upgraded a site so the category URLs were a bit more SEO-friendly. The original category filtering search produced these lovely specimens:

/search.html?fi=products&st=db&co=1&sf=category&se=Shoes&op=rm
   &sf=inactive&se=yes&op=ne&tf=category&ml=100

but what I really wanted was:

/cat/Shoes.html

Such links are easier to communicate to users, more friendly to search engines, less prone to breakage (e.g., by getting word-wrapped in email clients), and avoid exposing details of your application (here, we've had to admit publicly that we have a table called "products" and that some items are "inactive"; a curious user might decide to see what happens if they change "sf=inactive&se=yes" to some other expression).

Here's how I attacked this.

Creating a category listing page

First, I copied my "results.html" page to "catpage.html". That way, my original search results page can continue to serve up ad hoc search results.

The search results were displayed via:

[search-region]
...
[/search-region]

I converted this to a database query:

[query sql="SELECT * FROM products WHERE NOT inactive AND category = [sql-quote][cgi category][/sql-quote]"
 type=list prefix=item]
...
[/query]

I chose to use a prefix other than the default since it would avoid having to change so many tags in the page, and now both the original search page and new catpage would look much the same internally (and thus, if desired, I could refactor them in the future).

Note that I've defined part of the API for this page: the category to be searched is set in a CGI variable called "category".

In my specific case, there was additional tinkering with this tag, because I had nested [query] tags already in the page within the search-region.

Creating a "cat" actionmap

In order to translate a URL containing SEO-friendly "/cat/Shoes.html" into my search, I need an actionmap. Here's mine; it's very simple.

Actionmap cat <<"CODE"
sub {
  my $url = shift;
  my @url_parts = split '/' => $url;
  shift @url_parts if $url_parts[0] eq 'cat';

  $CGI->{mv_nextpage} = 'catpage.html';
  $CGI->{category} = shift @url_parts;
  return 1;
}
CODE

Actionmaps are called when Interchange detects that a URL begins with the actionmap's name; here "cat". They are passed a parameter containing the URL fragment (after removing all the site stuff). Here, that would be (e.g.) "/cat/Shoes". We massage the URL to get our category code, and set up the page to be called along with the CGI parameter(s) it expects.

Cleaning up the links

At the start of this article I noted that I may have a page listing all my categories. In my original setup, this generated links using a construction like this:


  Shoes

Now my links are the much simpler:

Shoes

In my specific case, these links were generated within a [query] loop, but the approach is the same.

Note: the Strap demo supports SEO-friendly URLs out-of-the-box, and that it is included with the latest Interchange 5.10 release.

Full Screen Gallery with Supersized and video slides

I was recently looking to build a full screen image and video gallery for our client Mission Blue. Something similar to the Google Maps interface you can see in the screenshot below:

After scouring the Internet to find a suitable jQuery plugin I finally decided on Supersized, Full screen background slideshow plugin for jQuery.

After downloading the library, include it on the page:

<link href="/wp-content/plugins/wp-supersized/theme/supersized.shutter.css?ver=4.2.2" id="supersized_theme_css-css" media="all" rel="stylesheet" type="text/css"></link>
<script src="/wp-includes/js/jquery/ui/effect.min.js?ver=1.11.4" type="text/javascript"></script>
<script src="/wp-content/plugins/wp-supersized/js/jquery.easing.min.js?ver=1.3" type="text/javascript"></script>
<script src="/wp-content/plugins/wp-supersized/js/jquery.easing.compatibility.js?ver=1.0" type="text/javascript"></script>
<script src="/wp-content/plugins/wp-supersized/js/jquery.animate-enhanced.min.js?ver=0.75" type="text/javascript"></script>
<script type='text/javascript' src='/wp-content/plugins/wp-supersized/js/supersized.3.2.7.min.js?ver=3.2.7'></script>

Basic functionality

Let's create a variable that will hold all the images in the slideshow:

var images = [];
images.push({
  type: 'IMAGE',
  image: 'img1.jpg',
  title: 'Image 1',
  thumb: 'img1_thumb.jpg',
  url: 'http://www.endpoint.com'
});
images.push({
  type: 'YOUTUBE',
  image: 'screenshot1.jpg',
  title: 'YouTube slide',
  videoid: 'abc12345678',
  thumb: 'screenshot1_thumb.jpg',
  url: 'https://www.youtube.com/watch?v=abc12345678'
});

Let's initialize Supersized:

jQuery.supersized({
  slideshow: 1,
  autoplay: 0,
  min_width: 0,
  min_height: 0,
  vertical_center: 1,
  horizontal_center: 1,
  fit_always: 0,
  fit_portrait: 1,
  fit_landscape: 0,
  slide_links: 'blank',
  thumb_links: 1,
  thumbnail_navigation: 1,
  slides: images,
  mouse_scrub: 0
});

Customizing the toolbar

<div id="thumb-tray" class="load-item">
  <div id="thumb-back"></div>
  <div id="thumb-forward"></div>
</div>
<div id="slidecaption"></div>

Customizing the screen image size

I didn't want to have the full screen image as it was a little overwhelming for the user. I wanted the black bars just like in the Google interface. Supersized allows for easy customization. This CSS did the trick:

#supersized, #supersized li {
  width: 70% !important;
  left: 0 !important;
  right: 0 !important;
  top: 1px !important;
  margin:auto;
}

Introducing video (YouTube) slides

First, I added the Youtube API:

<script type="text/javascript" src="https://www.youtube.com/iframe_api"></script>

Then I added a couple of CSS styles:

#supersized .player {
  margin: auto;
  display: block;
}

Finally, I went into the Supersized library source and modified it. To allow for the video slides to appear, I added the new condition and the slide type 'YOUTUBE'

base._renderSlide = function(loadPrev, options) {
  var linkTarget = base.options.new_window ? ' target="_blank"' : '';
  var imageLink = (base.options.slides[loadPrev].url) ? "href='" + base.options.slides[loadPrev].url + "'" : "";
  var slidePrev = base.el + ' li:eq(' + loadPrev + ')';
  var imgPrev = $('<img src="' + base.options.slides[loadPrev].image + '"/>');

  if (base.options.slides[loadPrev].type == 'YOUTUBE') {
    imgPrev.load(function () {
      var video = $('<div class="player" id="player'+ base.options.slides[loadPrev].videoid + '"></div>');
      video.appendTo(slidePrev);
      var player = new YT.Player('player' + base.options.slides[loadPrev].videoid, {
        height: 390,
        width: 640,
        videoId: base.options.slides[loadPrev].videoid
      });
    });// End Load
  }
  else {
    imgPrev.appendTo(slidePrev).wrap('<a ' + imageLink + linkTarget + '></a>').parent().parent().addClass('image-loading ' + options['class']);

    imgPrev.load(function () {
      $(this).data('origWidth', $(this).width()).data('origHeight', $(this).height());
      base.resizeNow();// Resize background image
    });// End Load
  }
};

Final Result

This is how gallery looks with the customizations:

This is what a video slide looks like:

Hope you found this writeup useful!

Medium-inspired Parallax Blur Effect For WordPress

Are you are running a WordPress blog, but secretly dying to have that Medium parallax blur effect? I recently implemented this, and would like to share it with you. By the way, while I was working on the article, the effect was removed from Medium, which only makes having one on the website more precious.

Let's assume that we have our custom theme class MyTheme. In functions.php:

class MyThemeBaseFunctions {
  public function __construct() {
    add_image_size('blurred', 1600, 1280, true);
    add_filter('wp_generate_attachment_metadata', array($this,'wp_blur_attachment_filter'));
  }
}

We added a custom image size blurred, and a callback wp_blur_attachment_filter to wp_generate_attachment_metadata. Here's where the magic happens.

Before that let's talk a little about ImageMagick, a powerful library for image processing that we will use to create the blurred effect. After some experimenting I figured that the image needed to be darkened, and then a regular blur should be applied with sigma=20. You can read more about these settings at ImageMagick Blur Usage. I used Gaussian Blur at first, but found the processing was extremely slow, and there wasn't much difference in the end result compared to other blur methods.

Now we are ready to write a blurring function:

public function wp_blur_attachment_filter($image_data) {
    if ( !isset($image_data['sizes']['blurred']) )
       return $image_data;
       $upload_dir = wp_upload_dir();
       $src = $upload_dir['path'] . '/' . $image_data['sizes']['large']['file'];
       $destination = $upload_dir['path'] . '/' . $image_data['sizes']['blurred']['file'];
       $imagick = new \Imagick($src);
       $imagick->blurImage(0, 20, Imagick::CHANNEL_ALL);
       $imagick->modulateImage(75, 105, 100);
       $imagick->writeImage($destination);
       return $image_data;
 }

I darken the image:

$imagick->modulateImage(75, 105, 100);

And I blur the image:

$imagick->blurImage(0, 20, Imagick::CHANNEL_ALL);

Now we are able to use the custom image size in the template. Place the helper function in functions.php:

public static function non_blurred($src) {
  $url = get_site_url() . substr($src, strrpos($src, '/wp-content'));
  $post_ID = attachment_url_to_postid($url);
  list($url, $width, $height) = wp_get_attachment_image_src($post_ID, 'large');
  return $url;
}

public static function blurred($src) {
  $url = get_site_url() . substr($src, strrpos($src, '/wp-content'));
  $post_ID = attachment_url_to_postid($src);
  list($url, $width, $height) = wp_get_attachment_image_src($post_ID, 'blurred');
  return $url;
}

And now use it in the template like this:

<div class="blurImg">
  <div style="background-image: url('<?php echo MyTheme::non_blurred(get_theme_mod( 'header' )); ?>')"></div>
  <div style="background-image: url('<?php echo MyTheme::blurred(get_theme_mod('header')); ?>'); opacity: 0;" class="blur"></div>
</div>
<header></header>

Add CSS:

.blurImg {
  height: 440px;
  left: 0;
  position: relative;
  top: 0;
  width: 100%;
  z-index: -1;
}

.blurImg > div {
  background-position: center center;
  background-repeat: no-repeat;
  background-size: cover;
  height: 440px;
  position: fixed;
  width: 100%;
}

header {
  padding: 0 20px;
  position: absolute;
  top: 0;
  width: 100%;
  z-index: 1;
}

Add JavaScript magic sauce to gradually replace the non-blurred image with the blurred version as the user scrolls:

(function() {
    jQuery(window).scroll(function() {
      var H = 240;
      var oVal = jQuery(window).scrollTop() / H;
      jQuery(".blurImg .blur").css("opacity", oVal);
    });
  }).call(this);

Generating the blurred version can be very strenuous on the server. I would oftentimes receive the error PHP Fatal error: Maximum execution time of 30 seconds exceeded. There are ways to work around that. One way is to use a quicker method of blurring the image by shrinking it, blurring, and resizing it back. Another way is to use a background job, something like this:

add_action( 'add_attachment', array($this, 'wp_blur_attachment_filter') );
add_action( 'wp_blur_attachment_filter_hook', 'wp_blur_attachment_filter_callback');
function wp_blur_attachment_filter_callback($path) {
  $path_parts = pathinfo($path);
  $imagick = new \Imagick($path);
  $imagick->blurImage(0, 20, Imagick::CHANNEL_ALL);
  $imagick->modulateImage(75, 105, 100);
  $destination = dirname($path) . "/" . $path_parts['filename'] . "_darken_blur." . $path_parts['extension'];
  $imagick->writeImage($destination);
}

public function wp_blur_attachment_filter($post_ID) {
  $path = get_attached_file( $post_ID );
  wp_schedule_single_event(time(), 'wp_blur_attachment_filter_hook', array($path));
}

Or better yet, use cloud image processing — I wrote about that here.

I hope you found this writeup useful!

PostgreSQL Point-in-time Recovery: An Unexpected Journey

With all the major changes and improvements to PostgreSQL's native replication system through the last few major releases, it's easy to forget that there can be benefits to having some of the tried and true functionalities from older PostgreSQL versions in place.

In particular, with the ease of setting up Hot Standby/Streaming Replication, it's easy to get replication going with almost no effort. Replication is great for redundancy, scaling, and backups, however it does not solve all potential data-loss problems; for best results when used in conjunction with Point-in-time Recovery (PITR) and the archiving features of PostgreSQL.

Background

We recently had a client experience a classic blunder with their database; mainly that of performing a manual UPDATE of the database without wrapping in a transaction block. The table in question was the main table in the application, and the client had done an unqualified UPDATE, unintentionally setting a specific field to a constant value instead of targetting the specific row they thought they were going for.

Fortunately, the client had backups. Unfortunately the backups themselves would not be enough; being a snapshot of the data earlier in the day, we would have lost all changes made throughout the day.

This resulted in a call to us to help out with the issue. We fortunately had information about precisely when the errant UPDATE took place, so we were able to use this information to help target a PITR-based restore.

The Approach

Since we did not want to lose other changes made in this database cluster either before or after this mistake, we came up with the following strategy which would let us keep the current state of the database but just recover the field in question:

  1. Create a parallel cluster for recovery.
  2. Load the WAL until just before the time of the event.
  3. Dump the table in question from the recovery cluster.
  4. Load the table in the main cluster with a different name.
  5. Use UPDATE FROM to update the field values for the table with their old values based on the table's Primary Key.

In practice, this worked out pretty well, though of course there were some issues that had to be worked around.

PostgreSQL's PITR relies on its WAL archiving mechanism combined with taking regular base backups of the data directory. As part of the archive setup, you choose the strategies (such as the frequency of the base backups) and ensure that you can recover individual WAL segment files when desired.

In order for the above strategy to work, you need hardware to run this on. The client had proposed their standby server which was definitely equipped to handle this and did not have much load. The client had initially suggested that we could break the replication, but we recommended against that, due to both having sufficient disk space and being able to avoid future work and risk by having to rebuild the replica after this stage.

We copied over the daily base backup into its own directory/mount point here, adjusted the recovery.conf file to point to the local WAL directory, and copied the necessary WAL files from the archive location to the pg_xlog directory of the new cluster. We also had to adjust a few parameters in the new cluster, most notably the "port" parameter to run the cluster on a different port. We also used the timestamp of the incident as a target for the recovery.conf's recovery_target_time setting. After starting up the new cluster and letting things process, we were able to dump the table in question and finish the recovery on the master.

Some issues did come up for us that we needed expert-level knowledge of the system, as well as having some good luck in the timing if the event. We had to locate several of the WAL files in the initial archive on the primary server due to some issues with the (inherited by us) configuration. Also due to the timing of the event and the amount of time it took to create the parallel cluster, we successfully were able to create the new instance before the next nightly base backup was run, which was fortunate, because it otherwise would have resulted in our inability to resolve this issue. (The client had things configured to keep only a single base backup around.)

Lessons Learned

With any issue, there is a takeaway, so what are those here?

  • Always use explicit transactions when manually modifying data, or modify your production environment's .psqlrc to add\set AUTOCOMMIT off.
  • Not all data-loss situations can be fixed with replication alone—Point in Time Recovery is absolutely still relevant these days.
  • It helps to have a PostgreSQL expert on-hand day-or-night. End Point offers 24x7 PostgreSQL support, which you can engage by getting a hold of us here.

Breaking Bash

Recently I managed to break the bash shell in an interesting and puzzling way. The initial symptoms were very frustrating: a workflow process we use here (creating a development camp) failed for me, but for no one else. That was at least a clue that it was me, not the workflow process.

Eventually, I narrowed down the culprit to the "grep" command (and that was more through luck than steadfast Sherlock-like detective work).

$ grep foo bar

grep: foo: No such file or directory

Eh? grep is mis-parsing the arguments! How does that happen?

So I began to study my bash environment. Eventually I came up with this fascinating little typo:

export GREP_OPTIONS='—color=auto'

That's supposed to be:

export GREP_OPTIONS='--color=auto'

but it got recorded in my .bashrc as a en-dash, not a double-dash. (My guess is that I cut-and-pasted this from a web page where someone over-helpfully "typeset" this command.)

Ironically, this typo is innocuous under Bash 3.x, but when you slot it into a Bash 4.x installation, all heck busts loose.

Using Google Analytics to understand and grow your business

Google Analytics, a web analytics service offered by Google, is a very handy tool for understanding your audience. It allows you to understand where traffic comes from and what resonates with your audience, which has led to Google Analytics being the most widely used web analytics service on the internet. If you understand your website’s traffic, you then have the ability to focus your website and content to optimize engagement and growth.

With Google Analytics, you have the ability to see traffic from all channels. This will lead to clear insights, and will help you understand what’s working and what’s not.
  • Organic - traffic from search engines which is not paid for
  • Paid Search - visitors that clicked on one of your paid advertisements (also known as Pay-Per-Click or PPC)
  • Direct - visitors that typed your website address directly into the browser (includes bookmarks)
  • Social - traffic from sites that are considered to be social networking sites
  • Referral - visitors that arrived from 3rd party referrals
  • Email - visitors that are directed from an email
  • Display - visitors directed from video and display advertising

It will be helpful to walk through an example. Say you launch an email marketing campaign, and want to understand how your audience responded to your content. First, you can check how many people clicked the ad to come to your website. From there, you can see how they navigated the page. Who came to the webpage? Did they take the actions you were hoping they would take? How long did they spend viewing the page? Where did they click? Did this click lead to a conversion/sale?

Prior to coming to End Point, I was working as an Analytic Strategist at a digital media agency. One of my biggest clients was luxury jewelry company Tiffany & Co. My responsibilities included analyzing seasonal and promotional trends to develop forecasts, broken out by channel. I would also evaluate the marketing effectiveness of initiatives by keeping a close eye on the user experience and navigational behavior, and provide recommendations throughout the customer journey.

Many jewelry promotions are seasonal, so we were constantly making changes to optimize the page and ensure traffic was navigating the website as we hoped. I would make sure to keep in mind the 4 P’s - Product, Price, Place, and Promotion. I wanted to ensure that the layout of your site, price-point, and specials were always in line with what the audience is most looking for during the season.

This experience of understanding client goals, and helping them achieve those goals through the success of their websites, has proven to be a valuable skill since joining End Point’s team. Beyond having the ability to help clients with their websites, this knowledge has also been helpful internally. I keep a close eye on where our website traffic is coming from, including seeing which blog posts resonate with different audiences. I have made recommendations for adjustments to poor-performing pages, and taken time to analyze pages with high bounce rates, to see whether the bounce rates are due to the page not being well-constructed or due to traffic not being properly directed. I also take note of any industry changes, and make website adjustments to accommodate those changes. For example, End Point has a plethora of skill in many programming languages, and if we see one language gaining popularity we will make a point to highlight it more prominently.

I look forward to continuing to use Google Analytics to strengthen our company websites and to help our clients strengthen theirs. Please don't hesitate to reach out at ask@endpoint.com if you would like to learn more.

MediaWiki major upgrade process

Keeping your MediaWiki site up to date with the latest version is, like many sysadmin tasks, a never-ending chore. In a previous article I covered how to upgrade minor revisions of MediaWiki with patches. In this one, I'll cover my solution to doing a "major" upgrade to MediaWiki. While the official upgrade instructions are good, they don't cover everything.

MediaWiki, like Postgres, uses a three-section version number in which the first two numbers combined give the major version, and the number on the end the revision of that branch. Thus, version 1.26.2 is the third revision (0, then 1, then 2) of the 1.26 version of MediaWiki. Moving from one major version to another (for example 1.25 to 1.26) is a larger undertaking than updating the revision, as it involves significant software changes, whereas a minor update (in which only the revision changes) simply provides bug fixes.

The first step to a major MediaWiki upgrade is to try it on a cloned, test version of your wiki. See this article on how to make such a clone. Then run through the steps below to find any problems that may crop up. When done, run through again, but this time on the actual live site. For this article, we will use MediaWiki installed in ~intranet/htdocs/mediawiki, and going from version 1.25.3 to 1.26.2

Preparation

Before making any changes, make sure everything is up to date in git. You do have your MediaWiki site controlled by git, right? If not, go do so right now. Then check you are on the main branch and have no outstanding changes. It should look like this:

$ cd ~/htdocs/mediawiki
$ git status
# On branch master
nothing to commit, working directory clean

Download

Time to grab the new major version. Always get the latest revision in the current branch. For this example, we want the highest in the 1.26 branch, which as of this writing is 1.26.2. You can always find a prominent link on mediawiki.org. Make sure you grab both the tarball (tar.gz) and the signature (.tar.gz.sig) file, then use gnupg to verify it:

$ wget https://releases.wikimedia.org/mediawiki/1.26/mediawiki-1.26.2.tar.gz
$ wget https://releases.wikimedia.org/mediawiki/1.26/mediawiki-1.26.2.tar.gz.sig
$ gpg mediawiki-1.26.2.tar.gz.sig 
gpg: assuming signed data in `mediawiki-1.26.2.tar.gz'
gpg: Signature made Sun 20 Dec 2015 08:13:14 PM EST using RSA key ID 23107F8A
gpg: please do a --check-trustdb
gpg: Good signature from "Chad Horohoe <chad@wikimedia.org>"
gpg:                 aka "keybase.io/demon <demon@keybase.io>"
gpg:                 aka "Chad Horohoe (Personal e-mail) <innocentkiller@gmail.com>"
gpg:                 aka "Chad Horohoe (Alias for existing email) <chadh@wikimedia.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 41B2 ABE8 17AD D3E5 2BDA  946F 72BC 1C5D 2310 7F8A

Copy the tarball to your server, and untar it in the same base directory as your mediawiki installation:

$ cd ~/htdocs
$ tar xvfz ~/mediawiki-1.26.2.tar.gz

Copy files

Copy the LocalSettings.php file over, as well as any custom images (e.g. the logo, which I like to keep nice and visible at the top level):

$ cp mediawiki/LocalSettings.php mediawiki-1.26.2/
$ cp mediawiki/Wiki_logo.png mediawiki-1.26.2/

Setup the images directory. The tarball comes with a dummy directory containing a few unimportant files. We want to replace that with our existing one. I keep the images directory a level up from the actual mediawiki directory, and symlink it in. This allows for easy testing and upgrades:

$ cd ~/htdocs/mediawiki-1.26.2
$ rm -fr images/ ## Careful, make sure you are in the right directory! :)
$ ln -s ../images/ .

Copy extensions

Now it is time to copy over the extensions. MediaWiki bundles a number of extensions in the tarball, as they are considered "core" extensions. We do not want to overwrite these with our old versions. We do want to copy any extensions that exist in our old mediawiki directory, yet not in our newly created one. To help keep things straight and reduce typing, let's make some symlinks for the existing (old) MediaWiki and for the current (new) MediaWiki, naming them "aa" and "bb" respectively. Then we use "diff" to help us copy the right extensions over:

$ cd ~/htdocs
$ ln -s mediawiki aa
$ ln -s mediawiki-1.26.2 bb
## Visually check things over with:
$ diff aa/extensions bb/extensions | grep 'Only in aa' | awk '{print $4}' | more
## Do the copying:
$ diff aa/extensions bb/extensions | grep 'Only in aa' | awk '{print $4}' | xargs -iZ cp -r aa/extensions/Z bb/extensions/Z

Extensions may not be the only way you have modified your installation. There could be skins, custom scripts, etc. Copy these over now, being sure to only copy what is truly still needed. Here's one way to check on the differences:

$ cd ~/htdocs
$ diff -r aa bb | grep 'Only in aa' | more

Check into git

Now that everything is copied over, we can check the 1.26.2 changes into git. To do so, we will move the git directory from the old directory to the new one. Remember to let anyone who might be developing in that directory know what you are doing first!

$ mv aa/.git bb/
## Don't forget this important file:
$ mv aa/.gitignore bb/
$ cd mediawiki-1.26.2
$ git add .
$ git commit -a -m "Upgrade to version 1.26.2"
$ git status
# On branch master
nothing to commit, working directory clean

Extension modifications

This is a good time to make any extension changes that are needed for the new version. These should have been revealed in the first round, using the cloned test wiki. In our case, we needed an updated and locally hacked version of the Auth_remoteuser extension:

$ cd ~/htdocs/mediawiki-1.26.2/extensions
$ rm -fr Auth_remoteuser/
$ tar xvfz ~/Auth_remoteuser.tgz
$ git add Auth_remoteuser
$ git commit -a -m "New version of Auth_remoteuser extension, with custom fix for wpPassword problem"

Core modifications

One of the trickiest part of major upgrades is the fact that all the files are simply replaced. Normally not a problem, but what if you are in the habit of modifying the core files because sometimes extensions cannot do what you want? My solution is to tag the changes prominently - using a PHP comment that contains the string "END POINT". This makes it easy to generate a list of files that may need the local changes applied again. After using "git log" to find the commit ID of the 1.26.2 changes (message was "Upgrade to version 1.26.2"), we can grep for the unique string and figure out which files to examine:

$ git log 1a83a996b9d00444302683fb6de6e86c4f4006e7 -1 -p | grep -E 'diff|END POINT' | grep -B1 END
diff --git a/includes/mail/EmailNotification.php b/includes/mail/EmailNotification.php
-        // END POINT CHANGE: ignore the watchlist timestamp when sending notifications
-        // END POINT CHANGE: send diffs in the emails
diff --git a/includes/search/SearchEngine.php b/includes/search/SearchEngine.php
-       // END POINT CHANGE: Remove common domain suffixes

At that point, manually edit both the new and old version of the files and make the needed changes. After that, remember to commit all your changes into git.

Final changes

Time to make the final change, and move the live site over. The goal is to minimize the downtime, so we will move the directories around and run the update.php script on one line. This is an excellent time to notify anyone who may be using the wiki that there may be a few bumps.

## Inform people the upgrade is coming, then:
$ mv mediawiki old_mediawiki; mv mediawiki-1.26.2 mediawiki; cd mediawiki; php maintenance/update.php --quick
$ rm ~/htdocs/aa ~/htdocs/bb

Testing

Hopefully everything works! Time to do some testing. First, visit your wiki's Special:Version page and make sure it says 1.26.2 (or whatever version you just installed). Next, test that most things are still working by:

  • Logging in, and...
  • Editing a page, then...
  • Upload an image, plus...
  • Test all your extensions.

For that last bullet, having an extension testing page is very handy. This is simply an unused page on the wiki that tries to utilize as many active extensions as possible, so that reloading the page should quickly allow a tally of working and non-working extensions. I like to give each extension a header with its name, a text description of what should be seen, and then the actual extension in action.

That's the end of the major upgrade for MediaWiki! Hopefully in the future the upgrade process will be better designed (I have ideas on that - but that's the topic of another article). One final check you can do is to open a screen and tail -f the httpd error log for your site. After the upgrade, this is a helpful way to spot any issues as they come up.

Install WordPress on Heroku in OS X Yosemite

I wanted to install WordPress locally for my blog (about programming!), but using MAMP, XAMP or even Vagrant for this seemed overkill. I wanted a light setup. PHP and Apache are already integrated into Mac OS X, so why not use them? I wanted to deploy the app to Heroku, so that was another thing, since Heroku only provides PostgreSQL, not MySQL, out of the box. I'd like to share my research on how I did it.

WordPress with Heroku support

I found this handy WordPress template with built-in Heroku support. It has everything one needs to run WordPress on Heroku: PostgreSQL for WordPress (because MySQL on Heroku is a paid service), Amazon S3 and Cloudfront for your uploads since Heroku has an ephemeral file system, WP Sendgrid to send emails and WordPress HTTPS. Check out a copy with this command:

git clone git://github.com/mhoofman/wordpress-heroku.git

Let's run the project locally first because a file cannot be written to Heroku's file system, and updating and installing plugins or themes should be done locally anyways and then pushed to Heroku. I'm using PhpStorm for my PHP development.

Configuring Apache

mkdir -p ~/Sites
echo "<html><body><h1>My site works</h1></body></html>" > ~/Sites/index.html.en

Enable PHP support:

sudo vi /etc/apache2/httpd.conf

Uncomment the following lines to look like this:

LoadModule php5_module libexec/apache2/libphp5.so
LoadModule userdir_module libexec/apache2/mod_userdir.so
Include /private/etc/apache2/extra/httpd-userdir.conf

Save and exit. Open the following file:

sudo vi /etc/apache2/extra/httpd-userdir.conf

Uncomment the following line to look like this:

Include /private/etc/apache2/users/*.conf

Save and exit. Open or create:

sudo vi /etc/apache2/users/~YOURUSERNAME.conf

Type the following in there:

<Directory "/Users/~YOURUSERNAME/Sites/">
    AddLanguage en .en
    LanguagePriority en fr de
    ForceLanguagePriority Fallback
    Options Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from localhost
    Require all granted
</Directory>

Restart Apache with:

sudo apachectl restart

Go to http://localhost/~YOURUSER/wordpress-heroku/ and enjoy the results of your work! OK, not so fast! There are more steps to make it happen ;)

Enabling PostgreSQL for PHP

Your PHP installation appears to be missing the PostgreSQL extension which is required by WordPress with PG4WP.

Here is a handy script to fix this problem Install PHP PGSQL extensions on Mac OS X Yosemite (change PHP_VER with your PHP version).

Creating the database

Hit http://localhost/~YOURUSER/blog-heroku/wp-admin/install.php

Error establishing a database connection

The template we are using is tailored for the deployment to Heroku, which means wp-config.php takes its values from the DATABASE_URL environment variable that Heroku config creates in local environment pointing to the database source on Heroku servers.

pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
> createdb wordpress
> psql wordpress
CREATE USER wordpress WITH PASSWORD 'wordpress';
GRANT ALL PRIVILEGES ON DATABASE wordpress to wordpress; 

In wp-config.php, edit as follows. Make sure it matches the database and user that you just created.

$db = parse_url($_ENV["DATABASE_URL"] ? $_ENV["DATABASE_URL"] : "postgres://wordpress:wordpress@localhost:5432/wordpress");

Now 5 hours later, you are completely ready for the famous 5-min install ;D. Go to http://localhost/~YOURUSER/blog-heroku/wp-admin/install.php

Uploading the custom theme/plugin

What to do next? Of course, upload a custom theme or plugin.

Unable to create directory wp-content/uploads/2015/08. Is its parent directory writable by the server?
$ cd ~/Sites/THESITE
$ sudo chown -R _www wordpress
$ sudo chmod -R g+w wordpress

If you encounter an error asking you for FTP credentials in order to do this:

To perform the requested action, WordPress needs to access your web server.
Please enter your FTP credentials to proceed.

If you do not remember your credentials, you should contact your web host.

The problem is that Apache HTTP Server in Mac OS X runs under the user account _www which belongs to the group _www. To allow WordPress to perform operations with Apache, one way to do this is to change the owner of the wordpress directory and its contents to _www. Keep the group as staff, a group to which your user account belongs and give write permissions to the group.

$ cd ~/Sites/THESITE
$ sudo chown -R _www wordpress
$ sudo chmod -R g+w wordpress

This way, no file nor directory is world-writable.

Remember to commit your plugins/themes because due to the nature of Heroku all of the files will be overwritten there if uncommitted or not in the database, effectively wiping out all of your changes at each server restart if you do them on the server.

I installed this pretty theme for myself called Literatum -- just bragging.

Deployment to Heroku

One of the most exciting last steps. This will make your blog visible to the world! Commit the changes:

rm -rf .git
git init
git add .
git commit -m "Initial commit"

Create Heroku app:

$ cd wordpress-heroku
$ heroku create
$ heroku addons:create heroku-postgresql
$ heroku pg:promote HEROKU_POSTGRESQL_INSTANCE
$ heroku addons:create sendgrid:starter

Your first deployment!

git push heroku master

Go to http://YOURAPP.herokuapp.com/wp-admin/install.php and run the famous 5-minute setup again, activate all the plugins and the chosen custom theme aaand... You are done!

Hope you will find this write-up useful and it will help you create your blog on the web!

Sort product attribute options by the position property in Magento

Introduction

Recently I was working with Magento 1.9.1, trying to get a list of dropdown attribute options sorted by the position property. However there is a known bug in Magento 1.9.1, where the position property is not respected.

I looked for a patch to fix this issue, however there was no official patch, and none of the available community fixes were good enough. So again, I needed to fix it by myself.

Tip! If you know how to apply a patch file it is here. If not, please continue.

Part 1

We need to overwrite some Magento core code unfortunately. The good thing is that there is a cool way of doing this in Magento so we don't need to overwrite the files directly, we need to create a local copy.

Copy app/code/Core/Mage/Catalog/Model/Resource/Product/Type/Configurable/Attribute/Collection.php file to app/code/local/Mage/Catalog/Model/Resource/Product/Type/Configurable/Attribute/Collection.php. You need to create the whole directory structure. If you use Unix system it is simple as: (running from Magento root)

mkdir -p app/code/local/Mage/Catalog/Model/Resource/Product/Type/Configurable/Attribute/
cp app/code/Core/Mage/Catalog/Model/Resource/Product/Type/Configurable/Attribute/Collection.php app/code/local/Mage/Catalog/Model/Resource/Product/Type/Configurable/Attribute/Collection.php

Part 2

Fill a local file with a content from this source: https://gist.github.com/peter-hank/c917394ea9f1171ddeb8

Notes

After these changes it will work as expected and attribute options will be sorted by a position set.

This fix should work for any Magento 1.9.* version.

File names the same except for capitalization

Most Unix filesystems, including all the common Linux ones, are fully case-sensitive, meaning you can have two files in the same directory that differ only by case:

  • a-very-nice-image.png
  • a-VERY-nice-image.png

However, this is not true on Windows and Mac OS X. They will preserve your chosen capitalization, but each file name must be unique regardless of the case.

I don't know of situations where it would be wise to have such conflicting mixed-case files even on Linux where it works fine. But for various reasons this can happen in the messy real world. If you then send those files to someone on Windows or Mac OS X in a zip file, or via Git version control, they're going to be confused.

When unzipping, usually the last file to be extracted will overwrite the earlier one with the nearly-same name. So a file that is perhaps important will just be mysteriously gone.

When pulling in files with Git, the same thing happens, but you also immediately have an unclean working copy that Git will tell you about:

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   a-VERY-nice-image.png

To fix that, it's easiest to go back to the (presumably) Linux system where the conflicting files were created, and remove one of them. Then on the Windows or Mac OS X side, remove both files (with normal rm) and then git pull or at least git checkout . to write the surviving file to disk.

If you have a large set of files with several conflicting names scattered throughout and want an easy way to see them all at once so you can clean them up, there's a simple Unix pipeline you can run at the shell:

find . | sort -f | uniq -Di

That is a variation of one of the suggestions from a Stack Exchange discussion, but changed to show both file names with the uniq -D option that is only in the GNU version of uniq. Other versions have uniq -d which is almost as good but shows just one of the pair of other-than-case-duplicated names.

Happy new year, and I hope you never have need for any of this advice. :)

Liquid Galaxy at New York Tech Meetup

On December 15th, End Point presented the Liquid Galaxy at New York Tech Meetup, the largest meetup group in the world. End Point Co-Founder/President Ben Goldstein and I gave a five minute overview and then answered questions from the audience for another five minutes. At the conclusion of the presentation portion of the event, attendees went to the after-party, where we had a full Liquid Galaxy system set up for attendees to experience for themselves.

I opened up the presentation by speaking about where the Liquid Galaxy is being utilized (corporate offices, museums, and commercial enterprises around the world), and about the setup of the system (large paneled HDTVs, a touchscreen, 3D joystick, and 3 rack-mount servers). I talked about how the Liquid Galaxy was originally a tool designed to view Google Earth in an immersive setting, but from there End Point has expanded the Liquid Galaxy’s capabilities to make it a tool for educational and commercial use, marketing, sales, and research.

I went on to explain how End Point’s Content Management System gives users the ability to tell their story and show information and data points on the system. You easily can include images, panoramic videos, KML and other sorts of overlays. By using your company’s GIS (Geographic Information System) data, you have an entirely new way of visualizing data, 3D models, and presenting demographic and other critical information to your target audience, while still providing the excitement of the immersive experience. This includes things like population density, property value data, bar graphs and more.

I closed by explaining how each of the screens is being calculated to match the physical angle of the display. When those displays are put in an arch around the viewers, the sense of immersion and depth comes alive to the point of giving a real sense of flight and place.

From here, Ben G. touched on some of the more technical elements of the Liquid Galaxy. He explained that the main applications running on the Liquid Galaxy are Google Earth, Street View, a local panoramic still image and video viewer, and End Point’s Content Management System.

Ben G. discussed remote support, and how when End Point deploys a new system, it monitors the system in a variety of ways, one of which is with screen captures. Screenshots are generated so End Point can see remotely what its customers are seeing on the ground.

Ben G. finished the talk by explaining the architecture of the typical Liquid Galaxy systems. The physical architecture has a headnode and a number of display nodes. The head node is plugged into the network and it serves as a NATing router for the display nodes which run on an internal LAN. The display nodes are blank machines that netboot from an ISO on the headnode. This makes it easy to remotely upgrade a system. End Point support personnel generate a new ISO on the headnode and restart the display nodes remotely.

Q&A Session

After our presentation, Ben G. and I were asked questions about the Liquid Galaxy. Some highlights:
  • "How does the Liquid Galaxy fit in the context of AR and VR?" Ben G. explained that one thing that distinguishes the Liquid Galaxy is the communal experience. Often with VR, the user is having an individual experience. We foresee combining with the LG with a VR headset, so that you will be able to display what you’re seeing on the headset right onto the LG.
  • "How is End Point engaging content creators for the platform?" There is a big, vibrant community that we want to integrate with. For a simple but useful example, cell phones can generate panoramas, which can be dropped into the CMS.
  • "Can you use the Liquid Galaxy to map an emergency before and after the event?" Ben G: Absolutely. We think the LG is an excellent display platform to show events as they are occurring. One of the things that’s great is the Liquid Galaxy takes in your peripheral vision, so if you have a spherical or panoramic collection tool then you can really see it on the system.
  • "How close to live data can you get?" The Liquid Galaxy is primarily being run with Google Earth at the moment. You can incorporate panoramic images, depending on what your data source is. If you’d like to incorporate a panoramic webcam, your info can display instantaneously on the LG.

During the after-party, attendees had the opportunity to free fly on the Liquid Galaxy, to view presentations, and to navigate around on Street View. Six members of End Point’s team were present, so attendees also had the opportunity to ask questions and to learn more about the system.