I recently read an article that discusses Magento SEO problems and solutions. This got me to think about common search engine optimization issues that I've seen in e-commerce. Below are some highlighted e-commerce search engine optimization issues. The Spree Demo, Interchange Demo, and Magento Demo are used as references.
Duplicate Home Pages (www, non-www, index.html)
Duplicate home pages can come in the form of a homepage with www and without www, a homepage in the form of http://www.domain.com/ and a homepage with some variation of "index" appended to the url, or a combination of the two. In the Interchange demo, http://demo.icdevgroup.org/i/demo1 and http:/demo.icdevgroup.org/i/demo1/index.html are duplicate, http://demo.spreecommerce.com and http://demo.spreecommerce.com/products/ in the spree demo, and finally http://demo.magentocommerce.com/ and http://demo.magentocommerce.com/index.php in the Magento demo.
External links positively influence search engine performance more if they are pointing to one index page rather than being divided between two or three home pages. Since the homepage most likely receives the most external links, this issue can be more problematic than other generated duplicate content. I've also seen this happen in several content management systems.
This article provides directions on mod_rewrite use to apply a 301 redirect from the www.domain.com/index.php homepage to www.domain.com. This solution or other redirect solutions can be applied to Spree, Interchange, and other ecommerce platforms.
Irrelevant Product URLs
A search engine optimization best practice is to provide relevant and indicative text in the product urls. In the Interchange demo, the default catalog uses the product sku in the product url (http://demo.icdevgroup.org/i/demo1/os28073.html). In Magento and Spree, product permalinks with relevant text are used in the product url. In wordpress, the author has the ability to set permalinks for articles. I am unsure if Magento gives you the ability to customize product urls. Spree does not currently give you the ability to manage custom product permalinks. However, for all of these ecommerce platforms, these fixes may all be in the works since it is important for ecommerce platforms to implement search engine optimization best practices.
Duplicate product content
I've observed several situations where products divided into multiple taxonomies results in duplicate content creation via different user navigation paths. For example, in the Spree demo, the "Ruby Baseball Jersey" can be reached through the Ruby brand page, the Clothing page, or the homepage. The three generated duplicate content urls are http://demo.spreecommerce.com/products/ruby-on-rails-ringer-t-shirt, http://demo.spreecommerce.com/t/brands/ruby/p/ruby-baseball-jersey, and http://demo.spreecommerce.com/t/categories/clothing/shirts/p/ruby-baseball-jersey.
Another example of this can be found in the Interchange demo. The left navigation taxonomy tree provides links to any product url with "?open=X,Y,Z" appended to the url. The "open" query string indicates how the DHTML tree should be displayed. For example, the "Digger Hand Trencher" has a base url of http://demo.icdevgroup.org/i/demo1/os28076.html. Depending on which tree nodes are exploded, the product can be reached at http://demo.icdevgroup.org/i/demo1/os28076.html?open=0,11,13,19, http://demo.icdevgroup.org/i/demo1/os28076.html?open=0,11,13, etc. This standard demo functionality yields a lot of duplicate content.
In Magento, products are the in the form of www.domain.com/product-name, although the article I mentioned above mentions that www.domain.com/category/product.html product urls were generated. Perhaps this was a recent fix, or perhaps the demo is configured to avoid generating this type of duplicate content.
Duplicate product page content is often used to indicate which breadcrumb should display or to track user click-through behavior (for example, did a user click on a "featured product"? a "best seller"? a specific "product advertisement"?). In Interchange, session ids are appended to urls which is another source of duplicate content. Instead of using the url to track user navigation or behavior, several other solutions such as using cookies, using a '#' (hash), or using session data can be used to avoid duplicate content generation.
Performance should not be overlooked in ecommerce for search engine optimization. In March of 2008, Google wrote about how landing page load time will be incorporated into the Quality Score for Google Adwords - which is also believed to apply to regular search results. And github recently released some data on how performance improvements influenced http://www.github.com/ Googlebot visits.
Lacking basic CMS management
Basic CMS management such as the ability to manage and update page titles and page meta data is something that has been overlooked by ecommerce platforms in the past, but appears to have been given more attention recently. An ecommerce solution should also have functionality to create and manage static pages.
The Interchange demo does not have meta description and keyword functionality, however, page titles are equal to product names which is an acceptable default. It's also very simple to add a static content page (as a developer) and would require just a bit more effort to have this content managed by a database in Interchange. The Spree core is missing some basic CMS management such as page title and meta data management, but this functionality is currently in development. One Spree contributer developed a Spree extension that provides management of simple static pages using a WYSIWYG editor. At the moment, Magento appears to have the most traditional content management system functionality out of the box.
Another area to improve CMS within Ecommerce is to determine a solution to integrate a blog. A quick search of "magento add blog" revealed how to set up a wordpress blog in Magento with an extension. One of End Point's clients, CCI Beauty, also has wordpress integrated into their Interchange setup. Finally, there has been discussion about the development of "Spradiant", or mixing spree and radiant.
Another missed opportunity in ecommerce platforms is finding a solution to elegantly blend content and product listings to target specific keywords. A "landing page" can have a page title, meta data, and content targeted towards a specific terms. http://www.backcountry.com/store/gear/arcteryx-vests.html and http://www.backcountry.com/store/gear/cargo-pant.html are examples of targeted terms with corresponding products. Going one step farther, search pages themselves can have managed content to attract keywords, such as a page title, and meta data for specific high traffic keywords with the related products. For example, http://www.domain.com/s/ruby_shirt could be a search page for "Ruby Shirt" which contains meaningful content and relevant products.
Mishandled Product Pagination
Finding a search engine optimization solution for pagination can be a difficult problem in ecommerce. When there are less than 100 products for a site, this shouldn't be an issue because a simple taxonomy can appropriately group the products with low crawl depth. A website with 10,000 products must balance between keeping a low taxonomy depth to minimize crawl depth and ensure that all products are listed and indexable.
For example, products may be divided and fit into three levels of navigation: category, subcategory, and group. If there are 10,000 products, divided into 10 categories, 10 subcategories per category, and 10 product groups per category, 10 products can be shown on each group per page with no pagination. However, product taxonomy is not always so ideal. In some groups there may be 2 products and in others there may be 30. Pagination, or pages with an offset of product listings are generated to accommodate these product listings (for example, http://www.backcountry.com/store/group/61/Sun-Hats-Rain-Hats-Safari-Hats.html, http://www.backcountry.com/store/group/61/Sun-Hats-Rain-Hats-Safari-Hats-p1.html).
A few problems can arise from the pagination solution. First, by web 2.0 standards, the content should be generated via ajax. An SEO friendly ajax solution must be implemented - where the onclick event refreshes the content, but the links are still crawlable via search engine bots. Second, page 1 with no product offset will have 1 level less of crawl depth, therefore it will receive the most link juice from it's parent page (subcategory). As a result, there must be thoughtful analysis of which products to present on that page: should high traffic pages get the traffic? should popular items be listed on the first page? should low traffic products be listed to try to bump the traffic on those pages? should products with the most "user interaction" (reviews, qna, ratings) be shown on that page? Another problem that comes up is that the page meta data and title will most likely be very similar since the content is a list of similar products. These two pages can essentially be competing for traffic and may be counted as duplicate content if the page titles and meta data are equal.
Interchange uses the more list to handle pagination, but this functionality is not search engine friendly as it generates urls such as http://demo.icdevgroup.org/i/demo1/scan/MM=3ffffa066192cba677e1428d7461ddc9:10:19:10.html?mv_more_ip=1&mv_nextpage=results&mv_arg=, http://demo.icdevgroup.org/i/demo1/scan/MM=3ffffa066192cba677e1428d7461ddc9:20:27:10.html?mv_more_ip=1&mv_nextpage=results&mv_arg=, etc. The Spree demo had some pagination implementation, but upon recent frontend changes, it is no longer included in the demo. The Magento demo was carefully arranged so that product group pages have no more than 9 products to avoid showing any pagination functionality. However, when modifying the number of products displayed per group or using the "Sort By" mechanism, ?limit=Y and &order=X&dir=asc is appended to the url - which can produce a large volume of duplicate content (try filters on this page).
It is difficult to determine which of the above problems is the most problematic. From personal experience, I have been involved in tackling all duplicate content issues, and then moving on to "optimization" opportunities such as enhancing the content management system. At the very least, developers and users of any ecommerce platform should be aware of common search engine optimization issues.