End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

How to Apply a Rails Security Patch

With the announcement of CVE-2013-0333, it's time again to secure your Rails installation. (Didn't we just do this?) If you are unable to upgrade to the latest, secure release of Rails, this post will help you apply a Rail security patch, using CVE-2013-0333 as an example.

Fork Rails, Patch

The CVE-2013-0333 patches so kindly released by Michael Koziarski are intended for use with folks who have forked the Rails repository. If you are unable to keep up with the latest releases, a forked repo can help you manage divergences and make it easy to apply security patches. Unfortunately, you cannot use wget to download the attached patches directly from Google Groups, so you'll have to do this in the browser and put the patch into the root of your forked Rails repo. To apply the patch:

cd $RAILS_FORK_PATH
git checkout $RAILS_VERSION
# Download attachment from announcement in browser, sorry no wget!
git am < $CVE.patch

You should see the newly committed patch(es) at the HEAD of your branch. Push out to GitHub and then bundle update rails on your servers.

Patching without Forks

If you are in the unfortunate case where there have been modifications or patches applied informally outside version control or you are otherwise compelled to modify the Rails source on your server directly, you are still able to use the provided patches.

Before begining, take a look at the diffstat at the top of the patch:

 .../lib/active_support/json/backends/okjson.rb     |  644 ++++++++++++++++++++
 .../lib/active_support/json/backends/yaml.rb       |   71 +---
 activesupport/lib/active_support/json/decoding.rb  |    2 +-
 activesupport/test/json/decoding_test.rb           |    4 +-

As you can see the base path of the diff is "activesupport". (The triple dots are simply there to truncate the paths so the diffstats line up nicely.) However, when the activesupport gem is installed on your system, the version number is appended in the path. This means we need to use the -p2 argument for patch to "strip the smallest prefix containing num leading slashes from each file name found in the patch file." We'll see how to do this in just a second, but first, let's find the source files we need to patch.

Locating Rails Gems

To find the installed location of your Rails gems, make sure you are using the desired RVM installation@gemset (check with rvm current), and then run "gem env" and look for the "GEM PATHS" section. If you're using the user-based installation of RVM it might look something like this:

/home/$USER/.rvm/gems/ree-1.8.7-2012.02

Now that we know where the installed gems are, we need to get our patch and apply.

cd /home/$USER/.rvm/gems/ree-1.8.7-2012.0/gems/activesupport-2.3.15
# Download attachment from announcement in browser, sorry no wget!
patch -p2 < $CVE.patch

Often times these patches will include changes to tests which are not included in the ActiveSupport gem installations. You may get an error like this while patching CVE-2013-0333:

patch -p2 < cve-2013-0333.patch                                                 
patching file lib/active_support/json/backends/okjson.rb
patching file lib/active_support/json/backends/yaml.rb
patching file lib/active_support/json/decoding.rb
can't find file to patch at input line 768
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/activesupport/test/json/decoding_test.rb b/activesupport/test/json/decoding_test.rb
|index e45851e..a7f7b46 100644
|--- a/activesupport/test/json/decoding_test.rb
|+++ b/activesupport/test/json/decoding_test.rb
--------------------------
File to patch: 
Skip this patch? [y] y
Skipping patch.
1 out of 1 hunk ignored

This error is just saying it cannot find the file test/json/decoding_test.rb. It's OK to skip this patch, because the file doesn't exist to patch.

Verify the Patch is Installed

When doing any kind of security patching it is essential that you have confidence your actions were applied successfully. The strategies on doing this will verify based on the type of changes made. For CVE-2013-0333, it's a fairly simple check.

# Before applying patch
script/console
Loading development environment (Rails 2.3.15)
>> ActiveSupport::JSON::DECODERS
=> ["Yajl", "Yaml"]

# After applying patch
script/console
Loading development environment (Rails 2.3.15)
>> ActiveSupport::JSON::DECODERS
=> ["Yajl", "OkJson"]

Evading Anti-Virus Detection with Metasploit

This week I attended a free, technical webinar hosted by David Maloney, a Senior Software Engineer on Rapid7's Metasploit team, where he is responsible for development of core features for the commercial Metasploit editions. The webinar was about evading anti-virus detection and covered topics including:

  • Signatures, heuristics, and sandboxes
  • Single and staged payloads
  • Executable templates
  • Common misconceptions about encoding payloads
  • Dynamically creating executable templates

After Kaspersky Lab broke news of the "Red October" espionage malware package last week, I thought this would be an interesting topic to learn more about. In the post, Kaspersky is quoted saying, "the attackers managed to stay in the game for over 5 years and evade detection of most antivirus products while continuing to exfiltrate what must be hundreds of terabytes by now."

Separating Exploits and Payloads

Vocabulary in the world of penetration testing may not be familiar to everyone, so let's go over a few terms you may see.

  • Vulnerability: A bug or design flaw in software that can be exploited to allow unintended behavior
  • Exploit: Software which takes advantage of a vulnerability allowing arbitrary execution of an attacker's code
  • Payload: Code delivered to a victim by an exploit
  • Signature: Set of rules or pattern match against code
  • Sandbox: Protected segments in OS, where code can be run safely

Metasploit by design, separates the payload from the exploit. Payloads can come in two types. A single-stage payload includes all code intended for use in the attack. A staged payload has a small initial exploit which then connects back to a server using shell commands to download subsequent payloads. This is an important distinction because many anti-virus products have signatures for common first-stage exploits, but not for the much wider universe of secondary payloads. By building first-stage exploits that can evade detection, additional payloads can be installed and remain resident without detection.

A Unique Exploit for Every Target

To have unique initial exploits that will not have anti-virus signatures, Metasploit Pro includes tools to bundle exploits inside otherwise randomly generated executables. These tools create C code which assign variables in random orders and with random values. Functions are created at random to manipulate and perform calculations on these variables. The functions are then called randomly, building a random call tree, making it very difficult to develop a signature because the execution flow and memory maps are all random.

Of course, eventually, we want the random calculations to stop and the exploit to execute so a payload can be downloaded and executed. Amazingly, one of the key ways to hide the payload from the anti-virus is simply to wait to decode the encoded (obfuscated) exploit until after the anti-virus has completed its scan of the executable. Anti-virus vendors are keenly aware that their products hurt end user performance and so the amount of time which they can sandbox and scan an executable is limited. If the initial payload's random functions take a sufficient time, then the anti-virus releases the file from the sandbox. This delay is configurable and is very effective, allowing the exploit to be decoded and executed without detection.

The Next Generation of Exploits

It's been 8 months since these randomization generators were released with Metasploit Pro and anti-virus companies are starting to catch up. Still, only 8 of the 44 scanners used at VirusTotal detected one of these exploits bundled with randomized code. The next generation of generators are designed to avoid using shell code entirely, further reducing anti-virus products' ability to detect malicious behavior. Instead of shell code, system calls are starting to be used directly, pulling payloads directly into memory. Since anti-virus depends heavily on scanning writes to the file system, this also reduces the exploits surface area. PowerShell version 2.0 seems to be the vehicle of choice for these next generation of exploits and thus far has gone completely unnoticed by anti-virus vendors (according to David anyway).

Additional Resources

JavaScript-driven Interactive Highlighting

One project I've been involved in for almost two years here at End Point is the H2O project. The Ruby on Rails web application behind H2O serves as a platform for creating, editing, organizing, consuming and sharing course materials that is used by professors and their students.

One of the most interesting UI elements of this project is the requirement to allow highlighting and annotating text interactively. For example, when one reads a physical textbook for a college course, they may highlight and mark it up in various ways with different colors and add annotated text. They may also highlight a section that is particularly important for an upcoming exam, or they may highlight another section with a different color and notes that may be needed for a paper.


An example of highlighted text, by sergis on Flickr

The H2O project has required support for digitizing interactive highlighting and annotating. Since individual text is not selectable as a DOM element, each word is wrapped into an individual DOM element that is selectable, hoverable, and has DOM properties that we can assign it. For example, we have the following text:

The cow jumped over the moon.

Which is manipulated to create individual DOM elements by word:

<span>The </span>
<span>cow </span>
<span>jumped </span>
<span>over </span>
<span>the </span>
<span>moon.</span>

And an id is assigned to each element:

<span id="e1">The </span>
<span id="e2">cow </span>
<span id="e3">jumped </span>
<span id="e4">over </span>
<span id="e5">the </span>
<span id="e6">moon.</span>

This markup is the foundation of digitizing highlighting behavior: it allows the starting and ending boundaries of a highlighted section to be selected. It also allows additional highlighting boundary elements to be created before and after highlighted text, as well as provide the ability to interactively toggle highlighting. Without this, we can't easily parse or identify starting and ending points other than trying to determine it by other methods such as substring-ing text or using positioning details to identify the current word.

How do we Highlight?

With our example, I'll describe the history of highlighting functionality that I've been involved in on the project. For this post, let's say our desired emulated highlighted behavior is the following, where "cow jumped over" is highlighted in pink, and "over the" is highlighted in blue:

The cow jumped over the moon.

And our HTML markup may look like this to indicate the highlighted layers:

<span id="e1">The </span>
<span id="e2" class="pink">cow </span>
<span id="e3" class="pink">jumped </span>
<span id="e4" class="pink blue">over </span>
<span id="e5" class="blue">the </span>
<span id="e6">moon.</span>

Highlighting Iteration 1

One of the challenges with emulating highlighting is that a DOM element can only have one background color. We can't easily layer pink and blue highlights over a specific word to give a layered highlighted effect. So in our example, the pink and blue highlighted words will show up fine, but no color combinations will show up because a node cannot have multiple background colors. During my first iteration on this functionality, I implemented behavior to track the history of our highlights per textual node. The following steps are an example of a use case that demonstrates the highlight overlap:

  • Starting from unhighlighted text, first a user highlights pink:
    The cow jumped over the moon.
  • Next, a user highlightes blue:
    The cow jumped over the moon.
  • Next, a user unhighlights blue:
    The cow jumped over the moon.
  • Finally, a user unhighlights pink:
    The cow jumped over the moon.

Or another simple use case:

  • Starting from unhighlighted text, first a user highlights pink:
    The cow jumped over the moon.
  • Next, a user highlightes blue:
    The cow jumped over the moon.
  • Next, a user unhighlights pink:
    The cow jumped over the moon.
  • Finally, a user unhighlights pink:
    The cow jumped over the moon.

Programmatically, this method required that a history of highlights be stored to each text node in the form of an array. When something was highlighted or unhighlighted, the array was manipulated to remove or add highlights and the last highlight applied to the text node. While this method was fairly simply to implement, it did not allow users to visualize the overlapping highlighted sections. This iteration was in place for over a year, but ultimately it did not appropriately demonstrate overlapped highlights.

Highlighting Iteration 2

In the next iteration, I attempted to implement a method that added opaque, absolute and fixed positioned elements underneath the text, similar to how one might see layers in Photoshop:


In the second iteration, additional colored & opaque nodes were created under the text to provide a layered highlighting effect.


Unhighlighted markup looked like this:

<span id="e1">
    <span class="highlights"></span>
    The
</span>
<span id="e2" class="pink">
  <span class="highlights"></span>
    cow
</span>
<span id="e3" class="pink">
    <span class="highlights"></span>
    jumped
</span>
<span id="e4" class="pink blue">
    <span class="highlights"></span>
    over
</span>
<span id="e5" class="blue">
    <span class="highlights"></span>
    the
</span>
<span id="e6">
    <span class="highlights"></span>
    moon.
</span>


And highlighted markup looked like this:

<span id="e1">
    <span class="highlights"></span>
    The
</span>
<span id="e2" class="pink">
    <span class="highlights">
        <span class="highlight_pink"></span>
    </span>
    cow
</span>
<span id="e3" class="pink">
    <span class="highlights">
        <span class="highlight_pink"></span>
    </span>
    jumped
</span>
<span id="e4" class="pink blue">
    <span class="highlights">
        <span class="highlight_pink"></span>
        <span class="highlight_blue"></span>
    </span>
    over
</span>
<span id="e5" class="blue">
    <span class="highlights">
        <span class="highlight_blue"></span>
    </span>
    the
</span>
<span id="e6">
    <span class="highlights"></span>
    moon.
</span>

In the above markup, the following should be noted:

  • Each span.highlights node is absolutely positioned with a width and height matching the text node.
  • Each span.highlights span node (e.g. a node with the highlight_pink class) has a width and height of 100%, a background color defined in CSS, and an opacity that is scaled based on the number of highlights.
  • Whenever highlights are toggled in the text, the children nodes of span.highlights are manipulated (added or removed), as well as the opacity.

While this functionality provides a nice highlighted layering effect, absolute positioning is probably my least favorite thing to work with in cross browser development, and it did not always behave as expected. Specifically, IE and Chrome behaved somewhat as expected, but Firefox did not. This absolute positioning also caused problems with other absolutely positioned elements on the page.

Additionally, this markup was subject to cause significant performance issues. In one case, content with 44,000 words (and text nodes) alone caused performance implications, but additional highlighting layering caused extreme performance pain, to the point that Chrome would not load the content and the content would take more than 30 seconds to load in Firefox, so I went searching for a better solution.

Highlighting Iteration 3

Finally, in the most recent iteration, after identifying that Iteration 2 produced significant performance issues, after more research, I came across the jQuery xColor plugin. This plugin allows you to do mathematical operations on colors, such as combining colors. While the plugin itself only lets you combine 2 colors at one time, I created a method to combine multiple opaque layers:

$.each($('span.' + highlighted_class), function(i, el) {
    var current = $(el);
    var highlight_colors = current.data('highlight_colors');
    if(highlight_colors) {
      highlight_colors.push(hex);
    } else {
       highlight_colors = new Array(hex);
    }    
    var current_hex = '#FFFFFF';
    var opacity = 0.4 / highlight_colors.length;
    $.each(highlight_colors, function(i, color) {
        var new_color = $.xcolor.opacity(current_hex, color, opacity);
        current_hex = new_color.getHex();
     });
     current.css('background', current_hex);
     current.data('highlight_colors', highlight_colors);        
});

Step by step, the above code does the following:

  • For each span node with a specific highlight:
    • Retrieve the array of highlights applied to that node, or create a new array with the new highlight color.
    • For each highlight, layer an opaque version of that highlight on top of the summation of colors.
    • Set the background to the final combination of layered highlights.
    • Store the new array of highlights applied to that node.

The markup is back to the original markup with no additional children elements per text node. Each highlighting interaction triggers a recalculation of the background color per text node based on the data stored to that node.

Conclusion

Significant limitations during this work have included the described performance limitations, as well as the inability to set multiple background images on a DOM element. Absolutely positioning, while valuable at times, proved to be quite challenging because of other absolute and fixed positioned elements on the page. In addition to emulating this highlighting behavior, there are additional interactive requirements included with the interactive highlighting.


Example of additional features needed in interactive highlighting tool.

Additional UI functional requirements include:

  • Functionality to show paragraph numbers, and hide all paragraph numbers when they have no visible children (Hint: advanced jQuery selectors are used here).
  • Ability to toggle display of unhighlighted text. The […] in the above image trigger the unhighlighted text in that section to display, while the left and right arrows trigger the unhighlighted text in that section to be hidden.
  • Ability to toggle display of highlighted text, similar to the toggle of unhighlighted text.
  • Ability to toggle between a "read" and "edit" mode for owners of the text, which allows for these users to interactively add additional highlights and dynamically modify the markup. In the edit mode, additional markup is added to identify these highlighted layers.
  • Ability to toggle display of the annotation. In the above image, clicking on the green asterisk toggles this display. No asterisk is shown if there is no annotation.
  • Ability for highlights to encompass HTML nodes that are not individual span elements. For example, highlighted sections may encompass multiple paragraphs and headers of <span> nodes, which is why simply adding a wrapping element to highlights will not work.

One might suggest we go to a better tool to manage content markup, but ultimately these types of markup tools do not provide the interactivity we seek, and they require that the end user have knowledge of HTML markup, which is not always the case.

Create a key pair using SSH on Windows

I recently joined End Point as a full-time employee after interning with the company since August 2012. I am part of the marketing and sales team, working out of the New York City office.

One of the frequent queries we receive from our non-technical clients is how to create an SSH key pair. This post is an introduction to using SSH on Windows for anyone who needs some clarification on this network protocol.

SSH stands for Secure Shell, which is used to provide secure access to remote systems. PuTTY is an SSH client that is available for Windows. Using the concept of “key-based” SSH logins, you can avoid the usual username/password login procedure, meaning only those with a valid private/public key pair can log in. This allows for a more secure system.

To begin, install PuTTYgen, PuTTY and Pageant on your Windows system:

Let’s focus on PuTTYgen – used to create a private/public key pair.

  1. After downloading PuTTYgen, run puttygen.exe
  2. In the “Parameters” --- “Type of key” section, make sure “SSH-2 RSA” is selected:
  3. *Note: SSH-2 RSA is what End Point recommends. The others work as well, and your business may have some reason to use them instead.

  4. Click Generate in the “Actions” section above.
  5. Once the key starts to generate follow the instructions written, by moving the mouse pointer over the blank area to generate some randomness. The faster you move the mouse, the faster it will generate:
  6. Congratulations! Your private/public key pair has been generated. Next to “Key comment”, enter text (usually your email address). Next, enter and confirm your “Key passphrase”. Note – you will need this passphrase to log in to SSH with your new key, so make sure to note this information down in a secure place. The passphrase should be unique, not used for anything else, especially not any online services, to limit exposure if some other accounts are “hacked”.
  7. *Note: the default key size of 1024 is a minimum, but 2048 or 4096 bits makes more sense given the power of modern computers to crack keys.

  8. Next, click on “Save private key” and save it on your computer (in a location only you can access). Do the same for “Save public key”.
  9. After you’re done saving, copy the public key in the section titled “Public key for pasting into OpenSSH authorized_keys file” and send it to your server administrator.

The above description demonstrates only one of the uses of SSH: a reasonably secure method of remotely connecting to another computer. Once a secure SSH connection has been established, it can also be used to transfer files over an encrypted method using SCP, or “Secure Copy”. In addition, SFTP, or “Secure File Transfer Program”, is an encrypted method of transferring files between two computers. SFTP is the secure alternative to “classic FTP” and the best common secure option for transferring files to or from a server.

CSS sprites: The easy way?

I've always been interested in the use of CSS sprites to speed up page load times. I haven't had a real chance to use them yet but my initial reaction was that sprites would be quite painful to maintain. In my mind, you would have to load up the sprite into Gimp or Photoshop, add the new image and then create the css with the right coordinates to display the image. Being a guy with very little image editing skills, I felt that managing multiple images frequently would be quite time consuming. Recently, I was dealing with some page load times for a client and the use of sprites for the product listing pages came up as an option to speed them up. I knew the client wouldn't have time to create sprites for this so I went searching for a command line tool that would allow me to create sprites. I was quite happy when I stumbled upon Glue.

Glue is free program that will take a directory of images and create a png sprite and a css file with the associated CSS classes. It has a ton of useful options. A few of the ones I thought were handy was being able to prefix the path to the image with a url instead of a relative path, being able to downgrade the png format to png8 to make the file sizes much smaller and being able to specify the start of the name of the class with something to help with being able to use them on a dynamic page.

The most basic use of the command is as follows:

glue blog output

In this example, blog is the directory where the images live that you want Glue to create a sprite of and output is the directory where it will generate the blog.png and blog.css file.

The output css file looks like this:

/* glue: 0.2.9.1 hash: f9c9d6aa5b */
.sprite-blog-product2,
.sprite-blog-product1{background-image:url('blog.png');background-repeat:no-repeat}
.sprite-blog-product2{background-position:0px 0px;width:159px;height:200px;}
.sprite-blog-product1{background-position:-159px 0px;width:200px;height:168px;}

The naming convention by default is sprite-$input_directory-$filename. You can override this with a few options. The version number and hash are used by Glue to ensure it doesn't rebuild the sprite if none of the source images have changed. With these settings, I believe this could be a great program to run as a nightly routine to rebuild the sprites. This is the explanation from the documentation:

By default glue store some metadata inside the generated sprites in order to not
rebuild it again if the source images and settings are the same. Glue set two different
keys, glue with the version number the sprite was build and hash, generated using the 
source images data, name and all the relevant sprite settings like padding, margin 
etc...

I'm still tinkering with all the different options and thinking about how to include the use of this program in our work flow for this client. I'll make sure to write a follow up with more information as I learn more.

If you have a different command line tool which helps manage sprites, don't hesitate to leave a comment and let me know!



Camp tools

Devcamps are such a big part of my everyday work that I can't imagine life without them. Over the years, I developed some short-cuts in navigating camps that I also can't live without: I share them below.

function camp_top() {
  if [ -n "$1" ]
  then
      cd ~/camp${1}
  elif [[ $(pwd) =~ 'camp' ]]
  then
      until [[ $(basename $(pwd)) =~ '^camp[[:digit:]]+' ]]
      do
          if [[ $(pwd) =~ 'camp' ]]
          then
              cd ..
          else
              break
          fi
      done
  fi
}
alias ct='camp_top; pwd'

function cat_root() {
  camp_top $*
  cd catalogs/* >/dev/null
}
alias cr='cat_root; pwd'

function pages_root() {
  cat_root $*
  cd pages >/dev/null
}
alias pr='pages_root; pwd'

function what_camp() {
  c=$( camp_top $* 2> /dev/null; basename $( pwd ))
  echo $c
}

("cat_root" and "pages_root" are very Interchange-specific; you may find other short-cuts more useful in your particular camp.)

There's nothing terribly ground-breaking here, but if bash is not your native shell-tongue, then you might find these useful.

What I do is to stash these somewhere like "$HOME/.bash_camps", then alter my .bashrc:
# Source campy definitions
if [ -f ~/.bash_camps ]; then
 . ~/.bash_camps
fi

That's all it takes. Have you a camp-y shell script, function, or alias? Please share in the comments!

Use Metasploit to Verify Rails is Secured from CVE-2013-0156

On January 8th, 2013 Aaron Patterson announced a major security vulnerability on the Rails security mailing list, affecting all releases of the Ruby on Rails framework. This vulnerability allows an unskilled attacker to execute commands remotely on any unpatched Rails web server. Unsurprisingly, it's getting a lot of attention; Ars Technica estimates more than 200,000 sites may be vulnerable. With all the hype, it's important to separate the facts from the fiction and use the attacker's own tools to verify your site is secure.

Within 36 hours of the announcement of CVE-2013-0156, the developers at Rapid7 released a metasploit exploit module. Metasploit lowers the barriers to entry for attackers, making the whole process a point and click affair with a slick web GUI. Fortunately, the Rails security team has provided many easy to implement mitigation options. But, how do *know* you've really closed the vulnerability, particularly to the most automated and unskilled attacks? No better way than to try and exploit yourself.

It's best to scan your unpatched site first so you can be certain the scan is working as expected and you don't end up with a false positive that you've eliminated the vulnerability. Here is the quick and dirty introduction to running Metasploit, and executing a scan:

UPDATE: I've changed the Metasploit instructions here a bit to include setting the VHOST option. Teammate Steph Skardal was using these instructions and together we found that without a VHOST set, the RHOSTS are resolved to an IP address. It's worth checking your Rails logs to verify a request is being received and processed. If you don't see anything there, check your nginx or Apache (or whatever) access logs for any possible 301 redirects.

git clone git://github.com/rapid7/metasploit-framework.git
cd metasploit-framework
./msfconsole
use auxiliary/scanner/http/rails_xml_yaml_scanner
set RHOSTS mycompany.com
set VHOST app.mycompany.com
set RPORT 80
set URIPATH /rails_app
set VERBOSE true
show options
run
[+] mycompany.com:80 is likely vulnerable due to a 500 reply for invalid YAML
[*] Scanned 1 of 1 hosts (100% complete)
[*] Auxiliary module execution completed

If you don't get back a "likely vulnerable" message, it's probably because you're still running Ruby 1.8.7. As of this writing the metasploit module exploit states:

The technique used by this module requires the target to be running a fairly version of Ruby 1.9 (since 2011 or so). Applications using Ruby 1.8 may still be exploitable using the init_with() method, but this has not been demonstrated.

It's only a matter of time before Ruby 1.8 becomes supported, but this does give some folks a bit more time. Now let's review the mitgation strategies provided by the announcement to show you just how easy it can be to secure yourself.

Disabling XML Entirely

The nature of the vulnerability is in parsing XML in request parameters. If you don't parse XML, you should disable XML parsing entirely by placing one of the following snippets inside an application initializer.

# Rails 3.2, 3.1 and 3.0 
ActionDispatch::ParamsParser::DEFAULT_PARSERS.delete(Mime::XML) 
# Rails 2.3 
ActionController::Base.param_parsers.delete(Mime::XML) 

Removing YAML and Symbol support from the XML parser

I couldn't say it better than Aaron than myself, so I'll give it to you straight from the announcement:

If your application must continue to parse XML you must disable the YAML and Symbol type conversion from the Rails XML parser. You should place one of the following code snippets in an application initializer to ensure your application isn't vulnerable. You should also consider greatly reducing the value of REXML::Document.entity_expansion_limit to limit the risk of entity explosion attacks.
The entity_expansion_limit recommendation is not strictly part of CVE-2013-0156, but should be implemented as well to limit your exposure to entity explosion attacks.

To disable the YAML and Symbol type conversions for the Rails XML parser add these lines to an initializer:

#Rails 3.2, 3.1, 3.0 
ActiveSupport::XmlMini::PARSING.delete("symbol") 
ActiveSupport::XmlMini::PARSING.delete("yaml") 

#Rails 2.3 
ActiveSupport::CoreExtensions::Hash::Conversions::XML_PARSING.delete('symbol') 
ActiveSupport::CoreExtensions::Hash::Conversions::XML_PARSING.delete('yaml') 

Removing YAML Parameter Parsing

While it's *much* less common to parse YAML in request params than XML, Rails does support this, but not by default (except in version 1.1.0!). There is no fix for YAML params injection, so it must be disabled . The methods for doing this differ in Rails among rails versions.

# Rails 2.x: find and remove all instances
ActionController::Base.param_parsers[Mime::YAML] = :yaml

# Rails 3.x: add to initializer
ActionDispatch::ParamsParser::DEFAULT_PARSERS.delete(Mime::YAML)

# Rails 3.2, 3.1, 3.0: add to initializer
ActiveSupport::XmlMini::PARSING.delete("symbol")
ActiveSupport::XmlMini::PARSING.delete("yaml")

# Rails 2.3: add to initializer
ActiveSupport::CoreExtensions::Hash::Conversions::XML_PARSING.delete('symbol')
ActiveSupport::CoreExtensions::Hash::Conversions::XML_PARSING.delete('yaml')

Go Check your Site!

As you can see, with just a few lines of code, any site can manage their exposure to this risk. I strongly urge you to read the security announcement and avoid the hype. Then go patch your site!

Conversion Tracking via JavaScript

Most analytics conversion tracking is done these days with JavaScript or invisible pixel requests on the page that indicates a user has reached a conversion event, such as the receipt page. For example, Google Analytics conversion code might look like this on the receipt page:

_gaq.push(['_setAccount', 'UA-XXXXX-X']);
_gaq.push(['_trackPageview']);
_gaq.push(['_addTrans',
   '1234',           // transaction ID - required
   'Womens Apparel', // affiliation or store name
   '28.28',          // total - required
   '1.29',           // tax
   '15.00',          // shipping
   'San Jose',       // city
   'California',     // state or province
   'USA'             // country
]);
_gaq.push(['_addItem',
   '1234',           // transaction ID - necessary to associate item with transaction
   'DD44',           // SKU/code - required
   'T-Shirt',        // product name
   'Olive Medium',   // category or variation
   '11.99',          // unit price - required
   '1'               // quantity - required
]);
_gaq.push(['_trackTrans']);

But what happens when a single page with various JavaScript-driven UI updates drives a user to a conversion event via AJAX? All of the conversion events have to be triggered via JavaScript after the conversion event. After some experimentation and verification with Google's Developer Tools (Network) or Firebug's Net Tool, I've implemented the following tracking services and code in JavaScript upon a conversion event for our Ruby on Rails client Mobixa:

MSN Conversion

tagid = /* tag id */
domainid = /* domain id */
actionid = /* action id */
jQuery('<iframe src="//flex.atdmt.com/mstag/tag/' + tagid
  + '/analytics.html?dedup=1&domainId=' + domainid + '&type=1&actionid='
  + actionid + '" frameborder="0" scrolling="no" width="1" height="1" 
  style="visibility:hidden;display:none"><0/iframe>')
  .appendTo('#hidden_tracking');

Google Ad Services

google_conversion_id = /*conversion id*/
google_conversion_label = /* conversion label */
var image = new Image(1,1);
image.src = 'http://www.googleadservices.com/pagead/conversion/' +
  google_conversion_id + '/?label=' + google_conversion_label + 
  '&value=1&guid=ON&script=0';

Floodlight Conversion

jQuery('<img height="1" width="1" src="http://sa.jumptap.com/a/conversion?
  event=Purchase" />').appendTo('#hidden_tracking');

AdParlor Conversion

adid = /* Ad Parlor id */
jQuery('<img height="1" width="1" alt="AP_pixel"
  src="http://fbads.adparlor.com/conversion.php?adid="'
  + adid + '  />').appendTo('#hidden_tracking');

AdKnowledge

adknowledgeid = /* adknowledge id */
jQuery('<iframe src="https://www.lynxtrack.com/track.frame.php?g='
  + adknowledgeid + '&o=' + /* order number */ + '&s=' + /* total */
  + '" height="1" width="1" frameborder="0">  <script
  language="JavaScript" src="https://www.lynxtrack.com/trackjs/g-'
  + adknowledgeid + '/o-' + /* order number */ + '/s-' + /* total */
  + '/track.js"> </script><noscript>
  <img src="https://www.lynxtrack.com/track.php?g=' + adknowledgeid
  + '&o=' + /* order number */ + '&s=' + /* total */ + '" width="1"
  height="1" border="0"></noscript></iframe>')
  .appendTo('#hidden_tracking');

Google Analytics Ecommerce Conversion

_gaq.push(['_addTrans',
    /* order number */,
    /* affiliation */,
    /* total */,
    /* tax */,
    /* shipping */,
    /* city */,
    /* state */,
    /* country */
]);
$.each(purchased_items, function(i, item) {
    gaq.push(['_addItem',
        /* order number */,
        /* item sku */,
        /* item name */,
        /* item category */,
        /* item price */,
        /* item quantity */
    ]);
});
_gaq.push(['_trackTrans']);

Conclusion

Here are a few important takeaways in working with conversion tracking via JavaScript:

  • Most of the above conversion tracking calls have a specific ID that is provided by the marketing service.
  • Images and iframes are appended to a div with an id of hidden_tracking on the page to trigger the conversion request. Tracking did not appear to work if the images or iframes were appended to the body element. Also in the case of Google Ad Services conversion, the image itself did not have to be appended to the page; a request alone was enough.
  • It's important and extremely helpful to use Google Developer Tools or Firebug to verify these requests go through during development.
  • Google Analytics tracking does not look much different from non-AJAX conversion tracking, but much of the other tracking code differs from what you might see in on-page tracking events.

Company Update January 2013

With the busy holiday season just behind us, we haven’t had as much time to write blog posts about what we’ve been doing in the past few months. So here’s an update on some of our latest projects:

  • Brian Buchalter has been implementing new features for of a major release of Collaborative Software Initiative’s open source product, TriSano, which provides case and outbreak management, surveillance and analytics for global public health.
  • Dave has worked on deepening our contacts with content providers and agencies using or interested in using the Liquid Galaxy platform. Recently back from Japan, Dave sold a Liquid Galaxy system to a research group in Kyoto.
  • David has been working on an HA (highly available) PostgreSQL database system with automatic failover, dynamic node creation/population, and configuration synchronization.
  • Greg Sabino Mullane has been speeding up slow queries, debugging pg_bouncer problems, expanding the abilities of Bucardo, and many other PostgreSQL-related activities.
  • Jeff has been working on some major updates to the HydroPool site, including adding a set of “parts” products from an external supplier, with an interface that displays the schematic diagram of selected pool products and offers a way to order parts based on the exploded diagram. This has involved extensive amounts of AJAX/JSON, both in the customer UI and in the background, allowing a hierarchical drill-down into the categories, manufacturers and models.
  • Jon has been helping tackle scalability problems at two busy sites using PHP and MySQL, and has done some system intrusion forensics and cleanup work as well as implemented some PCI compliance remedies. He has also been working with international payments, upgrading some outdated servers, and hiring new people to help out with development and system administration.
  • Josh Tolley has been working on improving ocean-related content for Liquid Galaxy, and making new and better ways to display that content. He has also been working on TriSano for the Collaborative Software Initiative.
  • Kamil has worked on making Locate Express multi-tenant across both the Rails-based admin area and the Sinatra-based API front end. He has also been doing browser testing for Mobixa and its email verification.
  • Mark has been involved in developing and supporting payment APIs for our customers, primarily with PayPal, and investigating other processors to support multiple currencies. He has also been helping with moving one of our client’s file-backed sessions to database-backed sessions on Interchange.
  • Phunk has been working as the tech lead on a large search API and metadata project using Rails 3, ElasticSearch, and CouchDB, all running on the DevCamps system.
  • Richard has been setting up two new servers for Partner Fusion. One of these servers will be a web server using nginx to serve as a caching proxy to Apache for delivering PHP code. The other will be a development and database server with two Percona MySQL 5.5 daemons on different ports as replication slaves for two different origin servers.
  • Ron helped with the setup and migration to a new server for Lens Discounters, launched some new shop features for College District, started working on a new project for FrozenCPU, and worked on several internal projects.
  • Steph has continued working on building out features for Piggybak, including product variants, a demo tour, and documentation on Heroku support. She’s been involved in building out features and doing maintenence for Mobixa. For Paper Source, she has been working on full page caching via nginx, mentioned in this article yesterday.
  • Szymon has been working on several Python projects, including web and command-line applications which manage data in PostgreSQL and CouchDB databases. He has also been working on a special script creating customized PostgreSQL database dumps making a deep copy of part of a table with all dependent rows from other tables.
  • Tim worked on a large Rails and Solr project, wrote an extension for Piggybak that integrates the Stripe payment gateway, and did some Ruby on Rails work for Mobixa.
  • Zed has been working on a new timesheet app for Android which will let us enter our work time on the go without an Internet connection.

We also recently welcomed new full-time employees Bianca Rodrigues, Will Plaut, and Miguel Alatorre, who are quickly getting up to speed.

Happy new year, everyone!

Paper Source: The Road to nginx Full Page Caching in Interchange

Background & Motivation

During the recent holiday season, it became apparent that some efforts were needed to improve performance for Paper Source to minimize down-time and server sluggishness. Paper Source runs on Interchange and sells paper and stationery products, craft products, personalized invitations, and some great gifts! They also have over 40 physical stores which in addition to selling products, offer on-site workshops.

Over the holiday season, the website experienced a couple of instances where server load spiked causing extreme sluggishness for customers. Various parts of the site leverage Interchange's timed-build tag, which creates static caches of parts of a page (equivalent to Rails' and Django's fragment caching). However, in all cases, Interchange is still being hit for the page request and often the pages perform repeated logic and database hits that opens an opportunity for optimization.

The Plan

The long-term plan for Paper Source is to move towards full page nginx caching, which will yield speedily served pages that do not require Interchange to be touched. However, there are several code and configuration hurdles that we have to get over first, described below.

Step 1: Identify Commonly Visited Pages

First, it's important to recognize which pages are visited the most frequently and to tackle optimization on those pages first, essentially profiling the site to determine where we will gain the most from performance optimization. In the case of Paper Source, popular pages include:

  • Thumbnail page, or the template where multiple products are shown in list format
  • Product detail page, or the template that serves the basic product page
  • Swatching page, or the template that serves a special product page with special product options
  • Personalization detail page, or a template that serves the special product page for personalizeable products (e.g. wedding invitatations, birth announcements, etc.)

Step 2: Remove Dynamic User Elements on pages of interest

The next step in the process is to remove dynamic elements on the page, by having cookies or AJAX render these dynamic elements. Below are a couple of examples of these dynamic elements on two primary page templates.


The thumbnail page contains two dynamic elements: the mini-cart template, which shows how many items are in the user's cart, and the log in information, which shows "my account" and "log out" links if the user is logged in, and shows a "log in" link if the user is not logged in.


In addition to the mini-cart and logged in elements, the product page contains additional dynamic elements which signify if a user has added an item to their cart, and presentation of the user's previously viewed items.

In the examples above, the following changes were applied to replace these dynamic elements:

  • Mini-cart Component: This section utilizes cookies with the jQuery.cookie plugin. A cookie stored in the browser identifies the number of cart items and the cart subtotal. After the DOM loads, the mini-cart is rendered and displayed if the user has a non-empty cart. These cookies are manipulated whenever the cart contents are modified.
  • Login Component: This section also reads a browser-stored cookie. If the cookie indicates the user is logged in, the navigation elements are updated to reflect that.
  • Added to cart Component: On the product detail page, this code was invasively modified to allow items to be added to cart via AJAX. The AJAX call results in an update of the cart cookies specified above. The feedback of the AJAX call is presented to the user to indicate that the item has been successfully added.
  • Previously Viewed Component: Finally, the product page also contains previously viewed items. This is generated via a cookie that stores recent skus visited by the user, and each sku has an associated cookie that includes information such as the image source, link, description, and price. Because a maximum of three previously viewed items is shown, cookies here of older previously viewed items are deleted to minimize cookie build-up.

Step 3: Implement fully timed-build caching pages

During this incremental process to reach the end goal of full nginx caching, the next step is to implement fully timed-build pages, or use Interchange's caching mechanism to fully cache these pages and reduce repetitive database hits and backend logic. In this step, the entire page is wrapped in a timed-build tag, which results in writing and serving a static cached file for that page. While this step is not a necessity, it does allow for us to deploy and test our changes in preparation for nginx caching. In adddition to giving us an opportunity to work out kinks, this step also gives us an added bump in performance because several of these page templates have no caching at all.

Step 4: Reproduce redirect logic outside of Interchange

Next up, we plan to move logic that handles page redirects outside of Interchange to nginx. At the moment, Interchange is responsible for handling 301 redirects on old product and navigation pages. This will need to be moved to nginx redirects to minimize the hits on Interchange here.

Step 5: Implement nginx architecture on camps

Another non-trivial step in this process will be to implement nginx architecture on DevCamps (or camps). DevCamps is an open source tool developed by End Point for developing on multiple instances of copies of the production server. Camps are heavily used for Paper Source because several End Point and internal Paper Source employees simultaneously work on different projects on their development instances or camps. Nginx caching will need to be set up to also work with the camp system in place.

Step 6: Turn nginx caching on!

Finally, we can turn on nginx caching for specific pages of interest. Nginx will then serve these fully cached pages and will avoid Interchange entirely. Cookies and AJAX will still be used to render the dynamic elements on the fully cached pages. While we'd ideally like to cache every page on the site except for the cart, checkout and my account pages, it makes more sense to find the bottlenecks and tackle them incrementally.

Where are we now?

At the moment, I've made progress on steps 1-3 for several subsets of pages, including the thumbnail and product detail pages. I plan to continue these steps for additional bottleneck pages. I have worked out out a couple of minor kinks throughout the recent progress, but things have been progressing well. Richard plans to make progress on the nginx related tasks in preparation for reaching the end goal.