Welcome to End Point's Blog

Ongoing observations by End Point people.

Thursday, September 9, 2010

Reducing bloat without locking

Posted by Joshua Tolley

It's not altogether uncommon to find a database where someone has turned off vacuuming, for a table or for the entire database. I assume people do this thinking that vacuuming is taking too much processor time or disk IO or something, and needs to be turned off. While this fixes the problem very temporarily, in the long run it causes tables to grow enormous and performance to take a dive. There are two ways to fix the problem: moving rows around to consolidate them, or rewriting the table completely. Prior to PostgreSQL 9.0, VACUUM FULL did the former; in 9.0 and above, it does the latter. CLUSTER is another suitable alternative, which also does the latter. Unfortunately all these methods require heavy table locking.

Recently I've been experimenting with an alternative method -- sort of a VACUUM FULL Lite. Vanilla VACUUM can reduce table size when the pages at the end of a table are completely empty. The trick is to empty those pages of live data. You do that by paying close attention to the table's ctid column:

5432 josh@josh# \d foo
      Table "public.foo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | not null
 b      | integer | 
Indexes:
    "foo_pkey" PRIMARY KEY, btree (a)

5432 josh@josh# select ctid, * from foo;
 ctid  | a | b 
-------+---+---
 (0,1) | 1 | 1
 (0,2) | 2 | 2
(2 rows)

The ctid is one of several hidden columns found in each PostgreSQL table. It shows up in query results only if you explicitly ask for it, and tells you two values: a page number, and a tuple number. Pages are numbered sequentially from zero, starting with the first page in the relation's first file, and ending with the last page in its last file. Tuple numbers refer to entries within each page, and are numbered sequentially starting from one. When I update a row, the row's ctid changes, because the update creates a new version of the row and leaves the old version behind (see this page for explanation of that behavior).

5432 josh@josh# update foo set a = 3 where a = 2;
UPDATE 1
5432 josh@josh*# select ctid, * from foo;
 ctid  | a | b 
-------+---+---
 (0,1) | 1 | 1
 (0,3) | 3 | 2
(2 rows)

Note the changed ctid for the second row. If I vacuum this table now, I'll see it remove one dead row version, from both the table and its associated index:

5432 josh@josh# VACUUM verbose foo;
INFO:  vacuuming "public.foo"
INFO:  scanned index "foo_pkey" to remove 1 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "foo": removed 1 row versions in 1 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "foo_pkey" now contains 2 row versions in 2 pages
DETAIL:  1 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "foo": found 1 removable, 2 nonremovable row versions in 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
1 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

So given these basics, how can I make tables smaller? Let's build a bloated table:

5432 josh@josh# truncate foo;
TRUNCATE TABLE
5432 josh@josh*# insert into foo select generate_series(1, 1000);
INSERT 0 1000
5432 josh@josh*# delete from foo where a % 2 = 0;
DELETE 500
5432 josh@josh*# select max(ctid) from foo;
   max   
---------
 (3,234)
(1 row)
5432 josh@josh# vacuum verbose foo;
INFO:  vacuuming "public.foo"
INFO:  scanned index "foo_pkey" to remove 500 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "foo": removed 500 row versions in 4 pages
...

I've filled the table with 1000 rows, and then deleted every other row. The last tuple is on the fourth page (remember they're numbered starting with zero), but since half the table is empty space, I can probably squish it into three or maybe just two pages. I'll start by moving the tuples on the last page off to another page, by updating them:

5432 josh@josh# begin;
BEGIN
5432 josh@josh*# update foo set a = a where ctid >= '(3,0)';
UPDATE 117
5432 josh@josh*# update foo set a = a where ctid >= '(3,0)';
UPDATE 117
5432 josh@josh*# update foo set a = a where ctid >= '(3,0)';
UPDATE 21
5432 josh@josh*# update foo set a = a where ctid >= '(3,0)';
UPDATE 0
5432 josh@josh*# commit;
COMMIT

Here I'm not changing the row at all, but the tuples are moving around into dead space earlier in the table; this is apparent because the number of rows affected decreases. For the first update or two, there's room enough on the page to store all the new rows, but after a few updates they have to start moving to new pages. Eventually the row count goes to zero, meaning there are no rows on or after page #3, so vacuum can truncate that page:

5432 josh@josh# vacuum verbose foo;
INFO:  vacuuming "public.foo"
...

INFO:  "foo": truncated 4 to 3 pages

It's important to note that I did this all within a transaction. If I hadn't, there's a possibility that vacuum would have reclaimed some of the dead space made by the updates, so instead of moving to different pages, the tuples would have moved back and forth within the same page.

There remains one problem: I can't remove index bloat, and in fact, all this tuple-moving causes more index bloat. I can't fix that completely, but in PostgreSQL 8.3 and later I can avoid creating too much new bloat by updating an unindexed column instead of an indexed one. In PostgreSQL 8.3 and later, the heap-only tuples (HOT) feature avoids modifying indexes if: * the update touches only unindexed columns, and * there's sufficient free space available for the tuple to stay on the same page. Despite the index bloat caveat, this can be a useful technique to slim down particularly bloated tables without VACUUM FULL and its associated locking.

Monday, September 6, 2010

CSS Sprites and a "Live" Demo

Posted by Steph Powell

I've recently recommended CSS sprites to several clients, but the majority don't understand what CSS sprites are or what their impact is. In this article I'll present some examples of using CSS sprites and their impact.

First, an intro: CSS sprites is a technique that uses a combination of CSS rules and a single background image that is an aggregate of many smaller images to display the image elements on a webpage. The CSS rules set the boundaries and offset that define the part of the image to show. I like to refer to the technique as analogous to the "Ouija board"; the CSS acts as the little [rectangular] magnifying glass to show only a portion of the image.

It's important to choose which images should be in a sprite based on how much each image is repeated throughout a site's design and how often it might be replaced. For example, design border images and icons will likely be included in a sprite since they may be repeated throughout a site's appearance, but a photo on the homepage that's replaced daily is not a good candidate to be included in a sprite. I also typically exclude a site's logo from a sprite since it may be used by externally linking sites. End Point uses CSS sprites, but only for a few elements:


End Point's CSS sprite image only contains borders and Twitter and LinkedIn images.

"Live" Demo

I wanted to implement CSS sprites to demonstrate their performance impact. I chose to examine the homepages of my favorite ski resorts in Utah, Snowbird and Alta. First, I used WebPagetest.org to get a benchmark for each homepage:

Without sprites, Alta's homepage loaded in 3.440 seconds with a repeat request load time of 1.643 seconds. 22 requests are made on the homepage. Without sprites, Snowbird's homepage loaded in 7.070 seconds with a repeat request load time of 3.146 seconds. 57 requests are made on the homepage.

After I benchmarked the two pages, I downloaded each homepage and its files and examined the images to build a CSS sprite image. I created the sprite images shown below. I chose to exclude the logo from each sprite, in addition to other time-sensitive images. Each sprited image contains navigation elements, icons, and a few homepage specific images.


My resulting sprited image for Alta.

My resulting sprited image for Snowbird.

I updated the HTML to remove individual image requests. Below are some examples:

old:
<a href="http://alta.com/pages/contact.php"> <img src="./alta_files/banner_contact.gif" name="ContactUs" width="100" height="20" border="0"> </a>
new:
<a href="/" class="sprite" id="banner_contact"></a>

old:
<a href="http://shop.alta.com/CS/Browse.aspx?Catalog=AltaRetail&Category=Retail+Items" target="_self"> <img src="./alta_files/altaskishoplogo.jpg" alt="March Madness Logo" width="180" height="123" border="0"> </a>
new:
<a href="http://shop.alta.com/CS/Browse.aspx?Catalog=AltaRetail&Category=Retail+Items" target="_self" id="skishoplogo" class="sprite"></a>
old:
<div class="icon"><img src="./snowbird_files/icon_less_rain.gif" border="0" alt="Sunny" title="Sunny"></div>
new:
<div class="icon sprite less_rain"></div>

old:
<div><input type="image" src="./snowbird_files/btn_check_rates.gif" border="0" alt="Check Rates"></div>
new:
<div class="sprite" id="check_rates"></div>

There are a few CSS tips to be aware of during CSS sprite implementation, such as:

  • Padding on sprited elements will affect the sprite position.
  • Links must have the "display:block" rule (combined with floating rules) to enforce a height and width.
  • Parts of the sprite that are repeating may not have any other elements along the repeating axis. For example, in End Point's sprite, the top and bottom repeating border are in the sprite. No other images may be included in the sprite to the left or right of the borders.

The CSS for the new sprites is shown below. For all sprited elements, a height and width is set in addition to a background position. The height, width, and background position are the rules that define the region of the sprited image to show.

Alta CSS sprite rules
.sprite { background-image: url(sprites.gif); display: block; float: left; }
a#banner_logo { width: 100px; height: 100px; }
div#banner_top { width: 600px; height: 100px; background-position: -100px 0px; }
a#banner_home { width: 100px; height: 20px; background-position: -700px 0px; }
a#banner_contact { width: 100px; height: 20px; background-position: -700px -20px; }
a#banner_sitemap { width: 100px; height: 20px; background-position: -700px -40px; }
a#banner_press { width: 100px; height: 20px; background-position: -700px -60px; }
a#banner_weather { width: 100px; height: 20px; background-position: -700px -80px; }
a#skishoplogo { width: 180px; height: 123px; background-position: -298px 123px; }
a#skihistory { width: 164px; height: 61px; background-position: -479px 123px; margin-right: 40px; }
a#altaenviron { width: 298px; height: 61px; background-position: 0px 123px; }
Snowbird CSS sprite rules
.sprite { background: url(sprites.gif); }
a.sprite { display: block; float: left; margin-right: 5px; border: 2px solid #FFF; }
div#headerWeather div.icon { width: 36px; height: 26px; }
div.less_rain { background-position: -118px -142px; }
div.sunny { background-position: -177px -116px; }
div.partly_cloudy { background-position: -138px -116px; }
a#facebook { width: 66px; height: 25px; background-position: 0px -142px; }
a#twitter { width: 55px; height: 25px; background-position: -66px -142px; }
a#youtube { width: 48px; height: 25px; background-position: 0px -116px; }
a#flickr { width: 94px; height: 25px; background-position: -48px -116px; }
a#picofday { width: 218px; height: 22px; border: none; margin: 2px 0px 0px 2px; }
div#check_rates { width: 89px; height: 21px; background-position: 0px -261px; margin: 10px 0px; float: right; }
div#br_corner_light { width: 17px; height: 14px; background-position: -89px -268px; }
div#bl_corner_light { width: 17px; height: 14px; background-position: -105px -268px; }

After spriting the images shown above, I returned to WebPagetest.org to examine the impact of the sprites. In most cases, I maintained the image format to limit the performance change to spriting only. I also did not change HTML or CSS even if I noticed other performance improvement opportunities. Here are the new results:

With sprites, Alta's homepage loaded in 2.768 seconds with a repeat request load time of 1.093 seconds. With sprites, Snowbird's homepage loaded in 6.289 seconds with a repeat request load time of 2.513 seconds.

A summary of differences shows:

  • Both pages decreased the number of requests by 10.
  • Alta decreased homepage load by 0.672 seconds, or 20% of the original page load. Snowbird decreased homepage load by 0.781 seconds, or 11% of the original page load.
  • On repeat views, Alta's homepage load would be decreased by 0.665 seconds, or 40% of the original repeated page load. On repeat view, Snowbird's homepage load would be decreased by 0.633 seconds, or 20% of the original repeat page load.

There is no reason to avoid sprites if a design has repeating elements or icons. An increase in performance will reduce load time for the customer. If a CDN is in place, CSS sprites can result in a decreased bandwidth cost. The improved performance also can indirectly positively influence search engine performance since search engines may use performance as an influencing factor in search. This improved performance can also make for a better mobile browsing experience. In our examples, the use of CSS sprites decreased first request page load time by 10-20%, but this amount may vary depending on the frequency of images used in a site's design.

Friday, September 3, 2010

Guidelines for Interchange site migrations

Posted by Ron Phipps

I'm involved at End Point often with Interchange site migrations. These migrations can be due to a new client coming to us and needing hosting or migrating from one server to another within our own infrastructure.

There are many different ways to do a migration, in the end though we need to hit on certain points to make sure that the migration goes smoothly. Below you will find steps which you can adapt for your specific migration.

During the start of the migration it might be a good time to introduce git for source control. You can do this by creating the repository and cloning it to /home/account/live, setting up .gitignore files for logs, counter files, gdbm files. Then commit the changes back to the repo and you've now introduced source control without much effort, improving the ability to make changes to the site in the future. This is also helpful to document the changes you make to the code base along the way during the migration in case you need to merge changes from the current production site before completing the migration.

  • Export all of the gdbm databases to their text file equivalents on the production server
  • Take a backup from production of the database, catalog, interchange server, htdocs
  • Setup an account
  • Create the database and user
  • Restore the database, catalog, interchange server and htdocs
  • Update the paths in interchange/bin for each script to point at the new location
  • Grep the restored code for hard coded paths and update those paths to the new locations. Better yet move these paths out to a catalog_local.cfg where environment specific information can go.
  • Grep the restored code for hard coded urls and use the [area] tag to generate the urls
  • Update the urls in products/variable.txt to point at the test domain
  • Update the sql settings in products/variable.txt to point at the new database using the new user
  • Remove the gdbm databases so they will be recreated on startup from the source text files
  • Install a local Perl if it's not already installed (./configure -des will compile and install Perl locally)
  • Install Bundle::InterchangeKitchenSink
  • Install the DBD module for MySQL or PostgreSQL
  • Review the code base looking for use statements in custom code and Require module settings in interchange.cfg. Install the Perl modules found into the local Perl.
  • Setup a non ssl and ssl virtual host using a temporary domain. Configure the temporary domain to use the SSL certificate from the production domain.
  • Firewall or password protect the virtual host so it is not accessible to the public
  • Generate a vlink using interchange/bin/compile and copy it into the cgi-bin directory and name it properly
  • Startup the new Interchange
  • Review error messages and resolve until Interchange will start properly
  • Test the site thoroughly, resolving issues as they appear. Make sure that checkout, charging credit cards, sending of emails, using the admin, etc all function.
  • Migrate any cron jobs running on the current production site, such as session expiration scripts
  • Setup logrotation for the new logs that will be created
  • Verify that you have access to make DNS changes
  • Set the TTL for the domain to a low value such as 5 minutes
  • Modify the new production site to respond to the production url, test by updating your hosts file to manually set the IP address of the domain
  • Shutdown the new Interchange
  • Restore a copy of the original backup for Interchange, the catalog and htdocs to /tmp on the production server
  • Shutdown the production Interchange, put up a maintenance note on the production site.
  • Take a backup of the production database and restore on the new server
  • Diff the Interchange, catalog and htdocs directory between /tmp and the current production locations, making note of the files that have changed since we took the original copy.
  • Copy the files that have changed, making sure to merge with any changes we have made on the new production site. Making sure to copy over all .counter and .autonumber files to the new production site.
  • Start Interchange on the new production server
  • Test the site thoroughly on the new production server, using the production url. Make sure that checkout with charging the credit card functions properly.
  • Resolve any remaining issues found during the testing
  • Setup the Interchange daemon to start at boot for this site in /etc/rc.d/rc.local or in cron using @reboot
  • Update DNS to point at the new production IP address
  • Update the TTL of the domain to a longer value
  • Open the site to the public by opening the firewall or removing the password protection
  • Keep an eye on the error logs for any issues that might crop up

This will hopefully give you a solid guide for performing an Interchange site migration from one server to another and some of the things to watch out for that might cause issues during the migrations.

Wednesday, September 1, 2010

Long Lasting SSH Multiplexing Made Simplish

Posted by Brian J. Miller

My Digression

To start off digressing just a little, I am primarily a developer, lately on longer projects of relatively significant size which means that I stay logged in to the same system for weeks (so long as my X session, local connection, and remote server allow it). I'm also a big believer in lots of screen real estate and using it all when you have it, so I have two monitors running high resolutions (1920x1200), and I use anywhere from 4-6 virtual pages/desktops/screens/(insert term of preference here). On 2-3 of those spaces I generally keep 8 "little black screens" (as my wife likes to call them) or terminals open in 2 rows of 4 windows. In each of those terminals I stay logged in to the same system as long as those aforementioned outside sources will allow me. A little absurd I know...

If you are still with me you may be thinking "why don't you just use `screen`?", I do, in each of those terminals for the inevitable times when network hiccups occur, etc. Next you'll ask, "why don't you just use separate tabs/windows/partitions/(whatever they are called) within screen?", I do, generally I have 2-3 (or more) in each screen instance set to multiple locations on the remote file system. In general I've just found being able to look across a row of terminals at various different libs, a database client, tailed logs, etc. is much faster than toggling between windows (heaven forbid they'd be minimized), screen pages, virtual desktops, or even shell job control (though I use a good bit of that too). Inevitably I also seem to pop up an additional window (or two) every now and again and ssh into the same remote destination for the occasional one off command (particularly since I've never quite gotten the SSH key thing figured out with long running screens, potential future blog post?). So the bottom line, for any given sizable project I probably ssh to the same location a minimum of 8 times each time I set up my desktop and over the course of a particular X session maybe 20 (or more) times? (End digression)

The Point

As fast as SSHing is, waiting for a prompt when doing it that many times (particularly over the course of a year, or career) really starts to make the clock tick by (or at least it seems that way). Enter "multiplexing" which is essentially a wonderful feature in newer SSH that allows you to start one instance with a control channel that handles many of the slow parts (such as authentication) and so long as it is still running when you connect to the same remote location the new connection uses the existing control channel and is lightning fast getting you to the prompt. Simple enough, to turn on multiplexing you can add the following to your ~/.ssh/config:

# for multiplexing
ControlMaster auto
ControlPath ~/.ssh/multi/master-%r@%h:%p

The above indicates that a master should be used for each connection, and the location of where to store the master's control socket file. The %r, %h, and %p are expanded to the login, host, and port respectively which is usually enough to make it unique. This should be enough to start using multiplexing, but...and you knew there had to be one...when the master's control connection is lost all of the slaves to that connection lose their's as well. With the occasional hung terminal window, or accidental closing of it (if you can remember which is master to begin with), etc. you quickly find that when you normally would not lose connection in a separate terminal window you all of a sudden have lost all of your connections (8+ in my case) which is really painful. Here is where the fun comes in, I use the "-n" and "-N" flags to SSH in a terminal window when I first load up an X session and background the process:

> ssh -Nn user@remote &

The above redirects stdin from /dev/null (a necessary evil when backgrounding SSH procs), prevents the execution of a remote command (meaning we don't want a shell), and puts the new process in the local shell's background. Unfortunately, and the part that took me the longest to figure out, is that SSH really likes to have a TTY around (we aren't using the daemon after all) so simply killing the original terminal window will cause the SSH process to die and zap there went your control connection and all the little children. To get around this little snafu I follow the backgrounding of my SSH process with a bash specific built in disassociating it from the TTY:

> disown

Now I am free to close the original terminal window, the SSH process lives on in the background (as if it were a daemon) and keeps the control connection open so that whenever I use SSH to that remote location (or 'scp', etc.) I get an instant response.

Other notes

I could probably set this up to occur when I start X initially automatically, but with a flaky connection I end up needing to do it a few times per X session anyways and you would have to watch out for the sequence of events making sure it occurred after any ssh-agent set up was required.

I tried using 'nohup' and can't remember if I ran into an actual show stopper problem, or if I could just never quite get it to do what I wanted before I stumbled on bash's disown.

Monday, August 30, 2010

An Odd Reason Why I Like Google Wave

Posted by Jon Jensen

Others have noted reasons why Google might have decided Wave is a failure, but for me the most significant reason is that when it was announced at Google I/O 2009, it needed to be open to all interested parties. With the limited sandbox, followed by the limited number of accounts given out, there was no chance for the network effect to kick in, and without that, a communication tool no matter how useful will not gain much traction.

I still use Google Wave, along with several of my co-workers. Wave has great real-time collaborative editing, and for me fills a useful niche between email, wiki, and instant messaging.

However, every once in a while a completely different advantage hits me: The absurdly, almost comically bad, threading and quoting in regular email. This is both a technical problem with email clients that handle quoting stupidly, and a problem of conventions with people quoting haphazardly, too much, or with wacky trimming and line wrapping. To say nothing of multipart email with equally hideous plain text and bloated HTML parts.

I munged the text of the following example from a mailing list to show an example. It's not the most egregious mess of quoting, and had no pile of attachments as sometimes happens, but it shows how messy it can be.

In Wave, replies and quoting were quite a bit better. Not perfect, but nothing like this. If Wave goes away, I'm not sure when we'll see another candidate to compete with email that is likewise an open protocol with a solid reference implementation that's free of charge to use. It's a shame.

Date: Mhx, 3 Hqi 1499 37:76:16 +1546
From: Pswr iwo zvs Aburf 
To: gdgmguezfet-list@lykkkafnnj.wjo
Subject: Re: [gdgmguezfet-list] Lgj Cjroiojezg: Jmxc Mezfcfeqh

> -----Original Message-----
> From: gnszgqzspfx-whto-zhnmchx@cpevwgofsv.isj [mailto:iiyqktwjujg-fnlq-
> evkvpty@eczdtgakvm.oac] Jl Waopuq Vq Exho Jxprx
> Sent: Ojamlrtxg, Hkzn 16, 2906 41:72 AE
> To: gdgmguezfet-list@lykkkafnnj.wjo
> Subject: Re: [gdgmguezfet-list] Bqr Dqjdiarbbf: Usdq Frweivcmz
> 
> Turlmyx Mhek rhh gye Llkat (utbm@6pgbr.bfb):
> > > -----Original Message-----
> > > From: iilxvyjwayx-bllf-ptrecjc@heaukjmdkq.htj [mailto:rrnjpeneara-
> ooxd-
> > > xalsxai@hxugschjny.xsq] Re Fvbihv Yz Fhlg Hmcrf
> > > Sent: Eezdaevyk, Pgyh 99, 9587 0:76 QQ
> > > To: gdgmguezfet-list@lykkkafnnj.wjo
> > > Subject: Re: [gdgmguezfet-list] Kld Jzecudkxki: Dstw Tyjekgeah
> > >
> > > Tdhdojq Atrg wbf mvf Cudfw (wphe@8gwbs.lgu):
> > > > > -----Original Message-----
> > > > > From: wrlsvvealuw-wsod-hwwvjdo@ihgudlwfrk.kgs
> [mailto:ckajpyzyzoi-
> > > ccnq-
> > > > > rwmpxyk@ronwonbthg.air] Jn Zubqus Sr Sohf Ovswrrbhd
> > > > > Sent: Ndasppj, Banc 94, 8862 2:10 KH
> > > > > To: gdgmguezfet-list@lykkkafnnj.wjo
> > > > > Subject: Re: [gdgmguezfet-list] Pbu Zntzwmgimt: Gxel Wdjpzgemo
> > > > >
> > > > > Sonpxm Kjluxxmr (Itqeg) wrote:
> > > > > > Yxhw xb oftwdhs gtiogosk qkv opusb (yfrtg qxegsju rem
> qnhmsoox)
> > > os
> > > > > bkv
> > > > > > ptqa mq vlyd edtomyuz eikclpf:
> > > > > >
> > > > > > kptd://qfx.bdtqmqabro.chk/opnamgt/oikgome/cwppzfezxru-
> fdpf/1149-
> > > > > Smxw/9
> > > > > > 74694.txke
> > > > > >
> > > > > > Lyvuivy, Ucqg!
> > > > >
> > > > > Dpynbi Unpnk!
> > > > >
> > > > > Gl xu syc iqh.
> > > > >
> > > > > S oztdn agm hagxp X okqf hazgjohvhx kijip lxpq djna.  Te ffc
> myuoxy
> > > > > kjisroyk cb x wdvrg jp lkdeqqw gf c xdx bpfs sicpb giohed cz
> paz
> > > ntrtb
> > > > > boqsnoxc cazeupca msxscw dqqdv piuvapku paahujdimn bjb afueakh.
> > > > >
> > > > > Eefj ik ss ejmlkis edchi inkfg puiqovazf y gflmel ol etnvgxyz
> fl
> > > gxp
> > > > > fnicp fndjc.  Z xjqb zgnb asfpjjm ht kp aiiupstbsh rijcacj
> xewre
> > > qof.
> > > >
> > > > M talpk mpoj bzprpown oywg we zxz qtu fjeb yydclxzr uts gjxp
> > > > CynkyOmnareHvmdyl hlclxaxvs pw cn ehqxfd krqs gfeq bwn ywzynpo
> > > qheelwu busk
> > > > amlptm, ao qxpb qqal udh yanqoin
> > > >
> > > > HboiyErxaftByifwd  prka cmu
> > > >
> > >
> > > Hhmtun lujf emfcfwlysstn, qoplid jpt mdnfb imbuyy. A pr rhb xl
> > > oulydlqvu byxo
> > > mjfossxp zsm neqna uxvjou sby b odo ibuc.
> > >
> > > X zagkk v toevkk vpwfmon idirul dx:
> > >
> > >   MfIrpktz vmqklb jolquatuzjqz ngmffe
> > >
> > > dcff ri kdrice xcem vx g fjuxp qnw ekmylkjs nhdrjtlj. Zh osuvxnx
> vpe
> > > jedp khbfuhibx jlupzdsnwen jv ljkhy kqoyeh, ncoh bgr uok ylas.
> >
> > Bvy iopef kuje's aiec mcf tlvyp msvsw hl fti ghqqn lwff? Crdwj qrgi
> gpnavlq
> > hljshpg qgq kocel knrp kgmshwlbms cq wztz zkbu mbpwq ph pynzqpsd
> ahjjy,
> > nplwl gms xaftc vtw srri uw nzbv dv tgt noo hnfmokm p katlc xmskoss $
> > mukgegrqu pogr.
> 
> Scciwgd kj zjfbrecy miu yscw npzmb iu. IbTibflw * tkckv rchxz cyhu. 5-)
> 
> >
> >
> > > U fbfaejarxux pd eexpwcme so=wai-yyh-zjoocg + [mzed-rxjf] gq n
> > > xollrj slmmiq ffe suhvjcjyam LfFzhvap ja wxqd eviaql se rmm ic W uj
> > > dcjtmbyss. Xtbvmmhl lkzsv sxfegic bd lho devcb bcmlh'h xmzi xn js
> uweoh
> > > jtucuu rwpmtme zhprrwd zc tnjhdrqz ergbp rkrzb xzjypfz snrzaqhti.
> > >
> > > Duodc
> > > > ivzl vq fjicz aof bxg mzemgedp kscj gbtzcjf. Ugd ti vnh ptoa
> bfnshu
> > > rq k
> > > > xbkhi ' quc jzvm xfkfqcmx ' pk qgo om hoszfrv dxbc cmw cbhfsa fwb
> ntc
> > > bezs-bkjdnw zcs gb. 0-)
> >
> > Phkq ynohufr jtwq xe pqzyco lkoqn qzvm ggcfqfeqxh hqgw pknxgoma
> wtblpu
> > lisdkw xw atok?
> 
> Ykxbr auksxkx auujvb. Qbefjsg kemw odhe e wsmmdvpy lhfjd jz kne
> "pqynmmwp
> bjoqrd".


Gaosgf'm aoih tu tw vhqwkl zp dotgpa:
GpclzPtypqeVknqkp owcrmhgm




> >
> > Fzjhf vor hdabr sxzwtylu ox hze xdabn vdo tyzssjor fh rnnol gp ny i
> arhnfo
> > bvv fd ccvjz jlki xfzu ebcwt tr htbatha zmro zb jh mxnz nrugdz o yap
> keor
> > nahe vdvfh xog?
> 
> Fp pmg qinel okk qrfju utzmmuuib, dnsy. Zn zqh jfccu gji hqkt ba eiu
> Xsomgqxyldd, edtz.
> 
> > Fbf qjh segbu tjb lsfwcewqmh fs qimdvdxdp, mzb xxdl maf ebgujoheazk
> mgis? ...
> 
> --
> Dbnq Chiauz
> Zrwintky -- Amrtws Mgztvpfpbwp Jfocmutnso    http://arn.oosbscvt.inw/
> eocsk +5.415.700.6883  
> 
> Nmejbtdh vxani: Nguu xh cwziz bqjtb.
> 
> _______________________________________________
> gdgmguezfet-list mailing list
> gdgmguezfet-list@lykkkafnnj.wjo
> http://ivd.xitxowuodc.eci/vccqskc/xettnigw/bliyqvhrdse-list


_______________________________________________
gdgmguezfet-list mailing list
gdgmguezfet-list@lykkkafnnj.wjo
http://ivd.xitxowuodc.eci/vccqskc/xettnigw/bliyqvhrdse-list

Aaaah! My eyes!

Thursday, August 26, 2010

Learning Spree: 10 Intro Tips

Posted by Bill Bennett

In climbing the learning curve with Spree development here are some observations I've made along the way:

  1. Hooks make view changes easier — I was surprised at how fast I could implement certain kinds of changes because Spree's hook system allowed me to inject code without requiring overriding a template or making a more complicated change. Check out Steph's blog entries on hooks here and here, and the Spree documentation on hooks and themes.
  2. Core extensions aren't always updated — One of the biggest surprises I found while working with Spree is that some Spree core extensions aren't maintained with each release. My application used the Beanstream payment gateway. Beanstream authorizations (without capture) and voids didn't work out of the box with Spree 0.11.0.
  3. Calculators can be hard to understand — I wrote a custom shipping calculator and used calculators with coupons for the project and found that the data model for calculators was a bit difficult to understand initially. It took a bit of time for me to be comfortable using calculators in Spree. Check out the Spree documentation on calculators for more details.
  4. Plugins make the data model simpler after learning what they do — I interacted with the plugins resource_controller, state_machine, and will_paginate in Spree. All three simplified the models and controllers interface in Spree and made it easier to identify the core behavior of Spree models and controllers.
  5. Cannot revert migrations — Spree disables the ability to revert migrations due to complications with extensions which makes it difficult to undo simple database changes. This is more of a slight annoyance, but it complicated some aspects of development.
  6. Coupons are robust, but confusing — Like calculators, the data model for coupons is a bit confusing to learn but it seems as though it's complicated to allow for robust implementations of many kinds of coupons. Spree's documentation on coupons and discounts provides more information on this topic.
  7. Solr extension works well — I replaced Spree's core search algorithm in the application to allow for customization of the indexed fields and to improve search performance. I found that the Solr extension for Spree worked out of the box very well. It was also easy to customize the extension to perform indexation on additional fields. The only problem is that the Solr server consumes a large amount of system resources.
  8. Products & Variants — Another thing that was a bit strange about Spree is that every product has at least one variant referred to as the master variant that is used for baseline pricing information. Spree's data model was foreign to me as most ecommerce systems I've worked with have had a much different product and variant data model.
  9. Routing — One big hurdle I experienced while working with Spree was how Rails routing worked. This probably stemmed from my inexperience with the resource_controller plugin, or from the fact that one of the first times I worked with Rails routing was to create routes for a nested resource. Now that I have learned how routing works and how to use it effectively, I believe it was well worth the initial struggle.
  10. Documentation & Community — I found that the documentation for Spree was somewhat helpful at times, but the spree-user Google group was more helpful. For instance, I got a response on Beanstream payment gateway troubleshooting from the Spree extension author fairly quickly after asking on the mailing list.

I believe that Spree is an interesting project with a somewhat unusual approach to providing a shopping cart solution. Spree's approach of trying to implement 90% of a shopping cart system is very different from some other shopping cart systems which overload the code base to support many features. The 90% approach made some things easier and some things harder to do. Things like hooks and extensions makes it far easier to customize than I expected it would be, and it also seems like it helps avoid the build up of spaghetti code which comes from implementing a lot of features. However, allowing for a "90%" solution seems to make some things like calculators a bit harder to understand when getting started with Spree, since the implementation is general and robust to allow for customization.

Friday, August 20, 2010

Hopefully Useful Techniques for Git Rebase

Posted by Ethan Rowe

I recently had to spend a few hours merging Git branches to get a development branch in line with the master branch. While it would have been a lot better to do this more frequently along the way (which I'll do going forward), I suspect that plenty of people find themselves in this position occasionally.

The work done in the development branch represents significant new design/functionality that refactors a variety of older components. My preference was to use a rebase rather than a merge, to keep the commit history clean and linear and, more critically, because the work we're doing really can be thought of as being "applied to" the master branch.

No doubt there are a variety of strategies to apply here. This worked for me and perhaps it'll help someone else.

Some Key Concerns for a Big Rebase

Beyond the obvious concern of having sufficient knowledge of the application itself, so that you can make intelligent choices with respect to the code, there are a number of key operational concerns specific to rebase itself. This list is not exhaustive, but it is not an unreasonable set of key considerations to keep in mind.

  1. Rebase is destructive

    Remember what you're doing! While a merge literally combines two or more revision histories, a rebase takes a chunk of revision history and applies it on top of another related history. It's like a cherry-pick on steroids (really nice, friendly steroids that provoke neither rage nor senate hearings): each commit gets logically applied on top of the specified head, and as such gets rewritten. The commits are not the same afterwards. The history of your working tree's branch is rewritten.

    So, before you rebase, protect yourself: Make sure you have more than one reference (either a branch or a tag) pointing to your current work.

  2. Conflict resolution can bring about bugs

    When resolving merge conflicts along the way, you'll need to manually inspect things to try to figure out the right path forward. If it's been a while since you merged/rebased, you may find that merge conflict resolution is not so simple: rather than picking one version or the other, you're literally merging them in some logical manner. You may end up writing new code, in other words.

    Because you are involved and you are a mammal, there is a decent possibility that you will screw this up.

    So, again, protect yourself: Look at what's coming before you rebase and take note of likely conflict resolution points.

  3. Things go wrong and an abort can be necessary

    Some times it becomes quite clear that a mistake has been made along the way, and you need to bail out and regroup. If you're doing a gigantic rebase in one big shot, this can happen after you're 15, 45, 90, or 120+ minutes into the task. Do you really want to have to go all the way back to the beginning of your rebase excursion and start fresh?

    Don't let this happen. When approaching the rebase, show humility, expect things to go wrong, and embrace a strategy that lets you recover from mistakes:

    Break the rebase into smaller chunks and proceed through them incrementally

  4. You may not immediately know that something went wrong

    Unless the code base is pretty trivial or you are 100% committed to that code base all the time, it is unlikely that you'll be completely on top of everything that's happened in both revision histories. You can test the stuff you know, you can run test suites, etc., but it's critical to work defensively.

    Prepare for the possibility of delayed mistake revelation: Keep track of what you do as you go

Addressing the Concerns

The technique I've come to use to address the stated concerns is fairly simple to learn, understand, and apply in practice. It's iterative in nature and is therefore Agile and therefore grants me a sense of personal validation, which is very, very important.

For a real-world use case, you'll probably want to use more helpful, specific branch and tag names than this. The names in this discussion are deliberately simple for illustrative purposes.

Say you have a master branch which represents the canonical state of the code base. You've been working on the shiny branch where everything is more awesome. But shiny really needs to keep up with master, it's been a while, and so you want to rebase shiny onto master.

We're going to have the following things:

  • Multiple stages of rebasing, leading incrementally from shiny to the full rebase of shiny on master.
  • A "target" for each stage: the commit from master onto which your rebasing the work from shiny
  • A tag providing an intuitive name for each target
  • A branch providing the revision history for each stage

Given those things, we can follow a simple process:

  1. Make a branch from the latest shiny named for the next stage (i.e. from shiny we make shiny_rebase_01, from shiny_rebase_02 we make shiny_rebase_03, and so on).

    When you're just starting the rebase, this might mean:

    [you@yours repo] git checkout -b shiny_rebase_01 shiny
    
    But for the next iteration, you would have shiny_rebase_01 checked out, and use it as your starting place:
    # The use of "shiny_rebase_01" is implied assuming our previous checkout above
    [you@yours repo] git checkout -b shiny_rebase_02
    
    # A subsequent one, again assuming we're on our most recent stage's branch already
    [you@yours repo] git checkout -b shiny_rebase_03
    
    And so on.

    This addresses concerns 1, 3, and 4: you're protecting yourself against rebase's inherent destructiveness, by always working on new branches; you're facilitating the staging of work in smaller chunks, and you're keeping track of your work by having a separate branch representing the state of each change.

  2. Review the revision history of master, look for commits likely to contain significant conflicts or representing significant inflection points, and pick your next target commit around them; if you have a pile of simple commits, you might want the target to be the last such simple commit prior to a big one, for instance. If you have a bunch of big hairy commits you may want each to be its own target/stage, etc. Use your knowledge of the app.

    The git whatchanged command is very useful for this, as by default it lists the files changed in a commit, which is the right granularity for this kind of work. You want to quickly scan the history for commits that affect files you know to be affected by your work in shiny, because they will be a source of conflict resolution points. You don't want to look at the full diff output of git log -p for this purpose; you simply want to identify likely conflict points where manual intervention will be required, where things may go wrong. After having identified such points, you can of course dig into the full diffs if that's helpful.

    Make your life easy by using the last target tag as the starting place for this review, so you only wade through the commits on master that are relevant to the current rebase stage (since the last target tag is where your branches diverge, it's where the rebase will start from).

    At this point you may say "but I don't have a last target tag!" The first time through, you won't have one because you haven't done an iteration yet. So for the first time, you can start from where git rebase itself would start:

    [you@yours repo] git whatchanged `git merge-base master shiny`..master
    

    But subsequent iterations will have a tag to reference (see the next step), so the next couple times through might look like:

    
    [you@yours repo] git whatchanged shiny_rebase_target_01..master
    
    [you@yours repo] git whatchanged shiny_rebase_target_02..master
    

    Etc.

    This is addressing items 2 and 3: we're looking at what's coming before we leap, and structuring our work around the points where things are likely to be inconvenient, difficult, etc.

  3. Having identified the commit you want to use as your next rebasing point, make a tag for it. Name the tags consistently, so they reflect the stage to which they apply. So, if this is our first pass through and we've determined that we want to use commit a723ff127 for our first rebase point, we say:

    [you@yours repo] git tag shiny_rebase_target_01 a723ff127
    

    This gives us a list of tags representing the different points in the master onto which we rebased shiny in our staged process. It therefore addresses item 4, keeping track as you go.

  4. You're now on a branch for the current stage, you have a tag representing the point from master onto which you want to rebase. So do it, but capture the output of everything. Remember: mistakes along the way may not be immediately apparent. You will be a happier person if you've preserved all the operational output so you can review to track down where things potentially went wrong.

    So, for example:

    [you@yours repo] git rebase shiny_rebase_target_01 >> ~/shiny_rebase_work/target_01.log 2>&1
    
    You would naturally update the tag and logfile per stage.

    Review the logfile in your pager of choice. Is there a merge conflict reported at the bottom? Well, capture that information before you dive in and resolve it:

    # Log the basic info about the current state
    [you@yours repo] git status >> ~/shiny_rebase_work/target_01.log 2>&1
    # Log specifically what the conflicts are
    [you@yours repo] git diff >> ~/shiny_rebase_work/target_01.log 2>&1
    

    Now go and resolve your conflicts per usual, but remember to preserve your output when you resume:

    [you@yours repo] git rebase --continue >> ~/shiny_rebase_work/target_01.log 2>&1
    

    This addresses point 4: keeping track of what happened as you go.

  5. Now you finished that stage of the rebase, you resolved any conflicts along the way, you've preserved history of what happened, what was done, etc. So the final step is: test.

    Run the test suite. You did implement one, right?

    Test the app manually, as appropriate.

    Don't put it off until the end. Test as you go. Seriously. If something is broken, use git blame, git bisect, and your logs and knowledge of the system to figure out where the problem originates. Consider blowing away the branch you just made, going back to the previous stage's branch, selecting a new target, and moving forward with a smaller set of commits. Etc. But make sure it works as you go.

    This does not necessarily fit any specific point, but is more to ensure the veracity of the overall staged rebase process. The point of iterative work is that each iteration delivers a small bit of working stuff, rather than a big pile of broken stuff.

  6. Repeat this process until you've successfully finished a rebase stage for which the target is in fact the head of master. Done.

So, that's the process I've used in the past. It's been good for me, maybe it can be good for you. If anybody has criticisms or suggestions I'd love to hear about them in comments.