News

Welcome to End Point’s blog

Ongoing observations by End Point people

Postgres checksum performance impact

Way back in 2013, Postgres introduced a feature known as data checksums. When this is enabled, a small integer checksum is written to each "page" of data that Postgres stores on your hard drive. Upon reading that block, the checksum value is recomputed and compared to the stored one. This detects data corruption, which (without checksums) could be silently lurking in your database for a long time. We highly recommend to our Postgres clients to turn checksums on; hopefully this feature will be enabled by default in future versions of Postgres.

However, because TANSTAAFL (there ain't no such thing as a free lunch), enabling checksums does have a performance penalty. Basically, a little bit more CPU is needed to compute the checksums. Because the computation is fast, and very minimal compared to I/O considerations, the performance hit for typical databases is very small indeed, often less than 2%. Measuring the exact performance hit of checksums can be a surprisingly tricky problem.

There are many factors that influence how much slower things are when checksums are enabled, including:

  • How likely things are to be read from shared_buffers, which depends on how large shared_buffers is set, and how much of your active database fits inside of it
  • How fast your server is in general, and how well it (and your compiler) are able to optimize the checksum calculation
  • How many data pages you have (which can be influenced by your data types)
  • How often you are writing new pages (via COPY, INSERT, or UPDATE)
  • How often you are reading values (via SELECT)

Enough of the theory, let's see checksums in action. The goal is that even a single changed bit anywhere in your data will produce an error, thanks to the checksum. For this example, we will use a fresh 9.4 database, and set it up with checksums:

~$ cd ~/pg/9.4
~/pg/9.4$ bin/initdb --data-checksums lotus
The files belonging to this database system will be owned by user "greg".
...
Data page checksums are enabled.
...
~/pg/9.4$ echo port=5594 >> lotus/postgresql.conf
~/pg/9.4$ bin/pg_ctl start -D lotus -l lotus.log
server starting
~/pg/9.4$ bin/createdb -p 5594 testdb

For testing, we will use a table with a single char(2000) data type. This ensures that we have a relatively high number of pages compared to the number of rows (smaller data types means more rows shoved into each page, while higher types also mean less pages, as the rows are TOASTed out). The data type will be important for our performance tests later on, but for now, we just need a single row:

~/pg/9.4$ psql testdb -p 5594 -c "create table foobar as select 'abcd'::char(2000) as baz"
SELECT 1

Finally, we will modify the data page on disk using sed, then ask Postgres to display the data, which should cause the checksum to fail and send up an alarm. (Unlike my coworker Josh's checksum post, I will change the actual checksum and not the data, but the principle is the same).

~/pg/9.4$ export P=5594
## Find the actual on-disk file holding our table, and store it in $D
~/pg/9.4$ export D=`psql testdb -p$P -Atc "select setting || '/' || pg_relation_filepath('foobar') from pg_settings where name ~ 'data_directory'"`
~/pg/9.4$ echo $D
/home/greg/pg/9.4/lotus/base/16384/16385
## The checksum is stored at the front of the header: in this case it is: 41 47
~/pg/9.4$ hexdump -C $D | head -1
00000000  00 00 00 00 00 00 00 00  41 47 00 00 1c 00 10 18  |........AG......|

## Use sed to change the checksum in place, then double check the result
~/pg/9.4$ LC_ALL=C sed -r -i "s/(.{8})../\1NI/" $D
~/pg/9.4$ hexdump -C $D | head -1
00000000  00 00 00 00 00 00 00 00  4E 49 00 00 1c 00 10 18  |........NI......|
~/pg/9.4$ psql testdb -p$P -tc 'select rtrim(baz) from foobar'
 abcd

Hmmm...why did this not work? Because of the big wrinkle to testing performance, shared buffers. This is a special shared memory segment used by Postgres to cache data pages. So when we asked Postgres for the value in the table, it pulled it from shared buffers, which does not get a checksum validation. Our changes are completely overwritten as the row leaves shared buffers and heads back to disk, generating a new checksum:

~/pg/9.4$ psql testdb -p$P -tc 'checkpoint'
CHECKPOINT
~/pg/9.4$ hexdump -C $D | head -1
00000000  00 00 00 00 80 17 c2 01  7f 19 00 00 1c 00 10 18  |................|

How can we trigger a checksum warning? We need to get that row out of shared buffers. The quickest way to do so in this test scenario is to restart the database, then make sure we do not even look at (e.g. SELECT) the table before we make our on-disk modification. Once that is done, the checksum will fail and we will, as expected, receive a checksum error:

~/pg/9.4$ bin/pg_ctl restart -D lotus -l lotus.log
waiting for server to shut down.... done
server stopped
server starting
~/pg/9.4$ LC_ALL=C sed -r -i "s/(.{8})../\1NI/" $D
~/pg/9.4$ psql testdb -p$P -tc 'select rtrim(baz) from foobar'
WARNING:  page verification failed, calculated checksum 6527 but expected 18766
ERROR:  invalid page in block 0 of relation base/16384/16385

The more that shared buffers are used (and using them efficiently is a good general goal), the less checksumming is done, and the less the impact of checksums on database performance will be. Because we want to see the "worst-case" scenario when doing performance testing, let's create a second Postgres cluster, with a teeny-tiny shared buffers. This will increase the chances that any reads come not from shared buffers, but from the disk (or more likely the OS cache, but we shall gloss over that for now).

To perform some quick performance testing on writes, let's do a large insert, which will write many pages to disk. I originally used pg_bench for these tests, but found it was doing too much SQL under the hood and creating results that varied too much from run to run. So after creating a second cluster with checksums disabled, and after adjusting both with "shared_buffers = 128kB", I created a test script that inserted many rows into the char(2000) table above, which generated a new data page (and thus computed a checksum for the one cluster) once every four rows. I also did some heavy selects of the same table on both clusters.

Rather than boring you with large charts of numbers, I will give you the summary. For inserts, the average difference was 6%. For selects, that jumps to 19%. But before you panic, remember that this tests are with a purposefully crippled Postgres database, doing worst-case scenario runs. When shared_buffers was raised to a sane setting, the statistical difference between checksums and not-checksums disappeared.

In addition to this being an unrealistic worst-case scenario, I promise that you would be hard pressed to find a server to run Postgres on with a slower CPU than the laptop I ran these tests on. :) The actual calculation is pretty simple and uses a fast Fowler/Noll/Vo hash - see the src/include/storage/checksum_impl.h file. The calculation used is:

hash = (hash ^ value) * FNV_PRIME ^ ((hash ^ value) >> 17)

Can you handle the performance hit? Here's a little more incentive for you: if you are doing this as part of a major upgrade (a common time to do so, as part of a pg_dump oldversion | psql newversion process), then you are already getting performance boosts from the new version. Which can nicely balance out (or at least mitigate) your performance hit from enabling checksums! Look how much speedup you get doing basic inserts just leaving the 8.x series:

It is very hard to hazard any number for the impact of checksums, as it depends on so many factors, but for a rough ballpark, I would say a typical database might see a one or two percent difference. Higher if you are doing insane amounts of inserts and updates, and higher if your database doesn't fit at all into shared buffers. All in all, a worthy trade-off. If you want some precise performance impact figures, you will need to do A/B testing with your database and application.

To sum this page up (ha!), enable those checksums! It's worth the one-time cost of not being able to use pg_upgrade, and the ongoing cost of a little more CPU. Don't wait for your corruption to get so bad the system catalogs start getting confused - find out the moment a bit gets flipped.

2015 Perl Dancer Conference videos

The 2015 Perl Dancer Conference has recently released the presentation videos. This year the conference was hosted in beautiful Vienna, Austria. Josh Lavin and I were both honored to attend the conference as well as give talks. Earlier, Josh wrote summaries of the conference:

Conference Recap

Conference Presentations

SpaceCamps “The Final Frontier”

I gave a talk exploring new technologies for End Point's own DevCamps development tool. During the presentation I detailed my research into containers and what a cloud-based development environment might look like.

SpaceCamps Presentation Video

AngularJS & Dancer for Modern Web Development

Josh detailed his experience migrating legacy applications utilizing Dancer, AngularJS, and modern Perl techniques. Josh highlighted the challenges he faced during the process, as well as lessons he learned along the way.

AngularJS & Dancer for Modern Web Development Presentation Video

Lightning Talks

Josh and I both gave short “lightning talks.” Josh’s was on Writing Unit Tests for a Legacy App (Interchange 5), and mine was on Plack & Interchange 5.

To review the rest of the presentations please checkout the Perl Dancer Conference YouTube channel.

Summary

The Perl Dancer community continues to flourish and the conference this year hosted a record 5 core Dancer developers. Dancer is about to release the finalized version of its long awaited plugin infrastructure for Dancer2. A lot of work on this was completed during the conference. Being an organizer of the conference, it brings me great joy to see this success. This news along with the release of Perl 6, I am certain 2016 will be a wonderful year for not only Dancer but the entire Perl community.

Git: pre-receive hook error on CentOS 7

We recently had to move a git repository from an old CentOS 5 to a new CentOS 7 server.

On the old CentOS 5 we had a recent, custom compiled version of git while on the new server we are using the system default old 1.8 version, shipped by the official CentOS repositories. And, as usual when you tell yourself "What could possibly go wrong?", something did: every push began to return the dreaded "fatal: The remote end hung up unexpectedly" error.

After some time spent trying to debug the problem, we managed to isolate the problem to the pre-receive hook, specifically active on that repository. The script was very simple:

 #!/bin/bash
 read_only_users="alice bob"
 for user in $read_only_users
 do
     if [ $USER == $user ]; then
         echo "User $USER has read-only access, push blocked."
         exit 1
     fi
 done

... which apparently had no visible mistakes. On top of the lack of errors, this very same script used to work perfectly for years on the old server. Unfortunately, and quite disappointingly, even changing it to a simple:

 #!/bin/bash
 echo "These are not the droids you are looking for. Move along."

...did not help and the error still persisted.

Searching for clues around forums and wikis, we found this blog post talking about parameters passed through stdin.

On Git docs, we read that pre-receive hooks takes no arguments, but for each ref to be updated it receives on standard input a line of the format: <old-value> SP <new-value> SP <ref-name> LF.

At that point, we tried with a sample script that actually reads and does something with stdin:

 #!/bin/bash
 while read oldrev newrev refname
 do
   echo "OLDREV: $oldrev - NEWREV: $newrev - REFNAME: $refname"
 done

...and voilà: pushes started working again. Lesson learned: never ignore stdin.

Event Listener Housekeeping in Angular Apps

I was recently debugging an issue where a large number of errors suddenly cropped up in an angular application. The client reported that the majority of the errors were occurring on the product wall which was an area of application I was responsible for. After some sleuthing and debugging I determined the culprit was a scroll event listener in an unrelated angular controller. When customers viewed this section of the application, the scroll listener was added to manage the visibility of some navigation elements. However, when the customer moved on to other sections of the site the listener continued to fire in a context it was not expecting.

Scroll event listeners fire very often so this explained the sheer volume of errors. The product wall is a tall section of the site with lots of content so this explained why the bulk of the errors were happening there. The solution was to simply listen to the $destroy event in the controller and unbind the troublesome scroll listener:

$scope.$on('$destroy', function() {
  $window.unbind('scroll');
});

Single page apps do not have the benefit of getting a clean state with each page load. Because of this it's important to keep track of any listeners that are added — especially those that are outside of angular (e.g. window and document) and make sure to clean those up when the related controllers and directives are destroyed.

ROS architecture of Liquid Galaxy

ROS has become the pivotal piece of software we have written our new Liquid Galaxy platform on. We have also recently open sourced all of our ROS nodes on GitHub. While the system itself is not a robot per se, it does have many characteristics of modern robots, making the ROS platform so useful. Our system is made up of multiple computers and peripheral devices, all working together to bring view synced content to multiple displays at the same time. To do this we made use of ROS's messaging platform, and distributed the work done on our system to many small ROS nodes.

Overview

Our systems are made up of usually 3 or more machines:

  • Head node: Small computer that runs roscore, more of a director in the system.
  • display-a: Usually controls the center three screens and a touchscreen + spacenav joystick.
  • display-b: Controls four screens, two on either side of the middle three.
  • display-$N: Controls more and more screens as needed, usually about four a piece.

Display-a and display-b are mostly identical in build. They mainly have a powerful graphics card and a PXE booted Ubuntu image. ROS has become our means to communicate between these machines to synchronize content across the system. The two most common functions are running Google Earth with KML / browser overlays to show extra content, and panoramic image viewers like Google's Street View. ROS is how we tell each instance of Google Earth what it should be looking at, and what should appear on all the screens.

Ros Architecture

Here is a general description all our ROS nodes. Hopefully we will be writing more blog posts about each node individually, as we do links will be filled in below. The source to all nodes can be found here on GitHub.

  • lg_activity: A node that measures activity across the system to determine when the system has become inactive. It will send an alert on a specific ROS topic when it detects inactivity, as well as another alert when the system is active again.
  • lg_attract_loop: This node will go over a list of tours that we provide to it. This node is usually listening for inactivity before starting, providing a unique screensaver when inactive.
  • lg_builder: Makes use of the ROS build system to create Debian packages.
  • lg_common: Full of useful tools and common message types to reduce coupling between nodes.
  • lg_earth: Manages Google Earth, syncs instances between all screens, includes a KML server to automate loading KML on earth.
  • lg_media: This shows images, videos, and text (or really any webpage) on screen at whatever geometry / location through awesome window manager rules.
  • lg_nav_to_device: This grabs the output of the /spacenav/twist topic, and translates it back into an event device. This was needed because Google Earth grabs the spacenav event device, not allowing the spacenav ROS node access.
  • lg_replay: This grabs any event device, and publishes its activity over a ROS topic.
  • lg_sv: This includes a Street View and generic panoramic image viewer, plus a server that manages the current POV / image for either viewer.

Why ROS

None of the above nodes specifically needs to exist as a ROS node. The reason we chose ROS is because as a ROS node, each running program (and sometimes any one of these nodes can exist multiple times at once on one machine) has an easy way to communicate with any other program. We really liked the pub/sub style for Inter-Process Communication in ROS. This has helped us reduce coupling between nodes. Each node can be replaced as needed without detrimental effects on the system.

We also make heavy use of the ROS packaging/build system, Catkin. We use it to build Debian packages which are installed on the PXE booted images.

Lastly ROS has become a real joy to work with. It is a really dependable system, with many powerful features. The ROS architecture allows us to easily add on new features as we develop them, without conflicting with everything else going on. We were able to re-implement our Street View viewer recently, and had no issues plugging the new one into the system. Documenting the nodes from a client facing side is also very easy. As long as we describe each rosparam and rostopic then we have finished most of the work needed to document a node. Each program becomes a small, easy to understand, high functioning piece of the system, similar to the Unix philosophy. We couldn't be happier with our new changes, or our decision to open source the ROS nodes.

ROS Platform Upgrades for Liquid Galaxy

For the last few months, End Point has been rolling out a new application framework along with updated display applications and monitoring infrastructure for the Liquid Galaxy Display Platform. These upgrades center on the ROS framework and allow a great number of functionality extensions and enhancements for the end user, as well as improvements to the stability and security of the core systems. It is intended that the 50+ systems that we currently maintain and support on behalf of our enterprise clients will be upgraded to this new platform.

ROS Overview
ROS is short for “Robot Operating System”. Just as it sounds, it is a framework used for controlling robots, and handles various environmental ‘inputs’ and ‘outputs’ well. End Point chose this framework in conjunction with related ongoing development projects on behalf of our enterprise clients. This system allows complex interactions from a touchscreen, camera, SpaceNav, or other device to be interpreted conditionally and then invoke other outputs such as displaying Google Earth, Street View, or other content on a given screen, speaker, or other output device. For more details, see: http://www.ros.org

Liquid Galaxy Enhancements
This new platform brings a number of improvements to the back-end systems and viewing customer experience.
  • Improved Street View Panospheres
    The new Street View viewer draws Street View tiles inside a WebGL sphere. This is a dramatic performance and visual enhancement over the older method, and can now support spherical projection, hardware acceleration, and seamless panning. For a user, this means tilting the view vertically as well as horizontally, zooming in and out, and improved frame rates.
  • Improved Panoramic Video
    As with the panoramic Street View application, this new platform improves the panoramic video playback as well. YouTube and Google have announced major initiatives to start actively supporting 360° panoramic video, including the financial backing of some high profile projects as example use cases. The Liquid Galaxy, with its panoramic screen layout already in place, is ideally suited for this new media format.
  • Improved Touchscreen
    The touchscreen incorporates a templated layout for easier modification and customization. The scene view selector is now consolidated to a single interface and no longer requires sub-pages or redundant touches. The Street View interface, complete with the ‘pegman’ icon, is a photo-realistic map with pinch and zoom just like a tablet interface.
  • Browser Windows
    The Liquid Galaxy can control multiple browser windows that can appear anywhere on the screens, often across multiple screens. These browser windows can show anything that can appear in a desktop web browser: web pages, videos, social media updates, data visualizations, etc.
  • Content Management System
    Beginning in 2014, End Point began to upgrade the content management system for the Liquid Galaxy. With the new ROS platform, we have updated this application to Roscoe (the ROS content experience). Roscoe gives registered users the ability to create complex presentations with specific scenes. Each scene can have a specific global location to guide the Google Earth or Street View, and then invoke overlays that appear across the screens. These overlays can include photos, data graphs, videos, or web pages. Each scene can also include a specific KML data set e.g., population density data, property value data, etc.) that can appear as 3D bar graphs directly in the ‘Earth’ view.
  • Content Isolation
    Isolating the entire presentation layer in ROS makes it easy to develop exhibits without a full-fledged Liquid Galaxy system. The entire ROS stack can be installed and run on an Ubuntu 14.04 computer or within a Docker container. This ROS stack can be used by a developer or designer to build out presentations that will ultimately run on a Liquid Galaxy system.
  • App Modularization
    Each component of a Liquid Galaxy exhibit is a configurable ROS node, allowing us to reuse large swaths of code and distribute the exhibit across any number of machines. This architecture brings two strong advantages: 1) each ROS node does one specific thing, which increases portability and modularity, and 2) each node can be tested automatically, which improves reliability.
  • Enhanced Platform Stability
    By unifying all deployments on a common base, End Point is able to deploy bug fixes, monitoring scripts, and ongoing enhancements much more quickly and safely. This has enhanced the overall stability for all supported and monitored Liquid Galaxy platforms.

Product Roadmap
The items described above are the great things that we can do already with this new platform. Even greater things are coming soon:
  • LiDAR Point Clouds
    End Point has already built early prototypes for the Liquid Galaxy platform that can view LiDAR point clouds. LiDAR is rapidly gaining traction in the architecture, surveyor, and construction industries. With the large viewing area of the Liquid Galaxy, these LiDAR point clouds become much more impactful and useful to the command and control center.
  • Google Earth and Google Maps Upgrades
    End Point continues to work with the latest developments in the Google Earth and Google Maps platforms and is actively working to integrate new features and functionality. These new capabilities will be rolled out to the fleet of Liquid Galaxies as available.
  • 3D WebGL Visualizations
    The Liquid Galaxy will be enhanced to view completely virtual 3D environments using WebGL and other common formats. These environments include complex data visualizations, interior space renderings for office planning, and even games.
Next Steps
If you’re considering the Liquid Galaxy platform, contact us to discuss these latest enhancements and how they can improve the communications and presentation tools for your organization.

Testing Django Applications

This post summarizes some observations and guidelines originating from introducing the pytest unit testing framework into our CMS (Content Management System) component of the Liquid Galaxy. Our Django-based CMS allows users to define scenes, presentations and assets (StreetView, Earth tours, panos, etc) to be displayed on the Liquid Galaxy.

The purpose of this blog post is to capture my Django and testing study points, summarize useful resource links as well as to itemize some guidelines for implementing tests for newcomers to the project. It also provides a comparison between Python's standard unittest library and the aforementioned pytest. Its focus is on Django database interaction.

Versions of software packages used

This post describes some of our experiences at End Point in designing and working on comprehensive QA/CI facilities for a new system which is closely related to the Liquid Galaxy.

The experiments were done on Ubuntu Linux 14.04:

Testing Django Applications

We probably don't need to talk much about the importance of testing. Writing tests along with the application code has become standard over the years. Surely, developers may fall into a trap of their own prejudice when creating testing conditions which would still result in faulty software but the likelihood of buggy software is certainly higher on a code that has no QA measures. If the code works and is untested, it means it works by accident, they say. As a rule of thumb, unit tests should be very brief testing items seldom interacting with any external services such as the database. Integration tests on the other hand often communicate with external components.

This post will heavily reference an example minimal Django application written for the purpose of experimenting on Django testing. Its README file contains some set up and requirement notes. Also, I am not going to list (m)any code snippets here but rather reference the functional application and its test suite. Hence the points below qualify for more or less assorted little topics or observations. In order to benefit from this post, it will be helpful to follow the README and interact (run tests that is) with the demo django-testing application.

Basic Django unittest versus pytest basic examples

This pair of test modules shows the differences between Django TestCase (unittest) and pytest-django (pytest) frameworks.
  • test_unittest_style.py

    The base Django TestCase class derives along this tree:

        django.test.TestCase
            django.test.TransactionTestCase
                django.test.SimpleTestCase
                    unittest.TestCase
    
    Django adds (among any other aspects) handling of database, the documentation is here, on top of the Python standard unittest library.

  • test_pytest_style.py

    this is a pytest style implementation of the same tests and pytest-django plug-in adds, among other features, Django database handling support.

The advantage of unittest is that it comes with the Python installation - it’s a standard library. That means that one does not have to install anything for writing tests, unlike pytest which is a third-party library and needs to be installed separately. While the absence of additional installation is certainly a plus, it’s dubious whether being a part of Python distribution is a benefit. I seem to recall Guido Van Rossum during Europython 2010 having said the the best thing for pytest is not being part of the Python standard set of libraries for its lively development and evolution would be slowed down by the inclusion.

There are very good talks and articles summarizing advantages of pytest. For me personally, the reporting of error context is supreme. No boiler-plate (no inheritance), using plain Python asserts instead of many assert* methods and flexibility (function, class) are other big plus points

As the comment in the test_unittest_style.py file says, this particular unittest-based test module can be run by both Django manage.py (which boils down to unittest lookup discovery on a lower layer) or by py.test (pytest).

It should also be noted, that pytest's flexibility can bite back if something gets overlooked.

Django database interaction unittest versus pytest (advanced examples)

  • test_unittest_advanced.py

    Since this post concentrates on pytest and since it's the choice for our LG CMS project (naturally :-), this unittest example just shows how the test (fresh) database is determined and how Django migrations are run at each test suite execution. Just as described in the Django documentation: "If your tests rely on database access such as creating or querying models, be sure to create your test classes as subclasses of django.test.TestCase rather than unittest.TestCase." That is true for database interaction but not completely true when using pytest. And "Using unittest.TestCase avoids the cost of running each test in a transaction and flushing the database, but if your tests interact with the database their behavior will vary based on the order that the test runner executes them. This can lead to unit tests that pass when run in isolation but fail when run in a suite." django.test.TestCase, however, ensures that each test runs inside a transaction to provide isolation. The transaction is rolled back once the test case is over.

  • test_pytest_advanced.py

    This file represents the actual core of the test experiments for this blog / demo app and shows various pytest features and approaches typical for this framework as well as Django (pytest-django that is) specifics.

Django pytest notes (advanced example)

Much like the unittest documentation, the pytest-django recommends avoiding database interaction in unittest and concentrate only on the logic which should be designed in such a fashion that it can be tested without database.

  • test database name prefixed "test_" (just like at the unittest example), the base value is taken from the database section of the settings.py. As a matter of fact, it’s possible to run the test suite after previously dropping the main database, the test suite interacts only with "test_" + DATABASE_NAME
  • migration execution before any database interaction is carried out (similarly to unittest example)
  • database interaction marked by a Python decorator @pytest.mark.django_db on the method or class level (or stand-alone function level). It's in fact the first occurrence of this marker which triggers the database set up (its creation and migrations handling). Again analogously to unittest (django.test.TestCase), the test case is wrapped in a database transaction which puts the database back into the state prior to the test case. The database "test_" + DATABASE_NAME itself is dropped once the test suite run is over. The database is not dropped if --db-reuse option is used. The production DATABASE_NAME remains untouched during the test suite run (more about this below)
  • pytest_djangodb_only.py - setup_method - run this module separately and the data created in setup_method end up NOT in the "test_" + DATABASE_NAME database but in the standard one (as configured in the settings.py which would be the production database likely)! Also this data won’t be rolled back. When run separately, this test module will pass (but still the production database would be tainted). It may or may not fail on the second and subsequent run depending whether it creates any unique data. When run within the test suite, the database call from the setup_method will fail despite the presence of the class django_db marker. This has been very important to realize. Recommendation: do not include database interaction in the pytest special methods (such assetup_method or teardown_method, etc), only include database interaction in the test case methods
  • The error message "Failed: Database access not allowed, use the "django_db" mark to enable" was seen on a database error on a method which actually had the marker. This output is not to be 100% trusted
  • data model factories are discussed separately below
  • lastly the test module shows Django Client instance and calling an HTTP resource

pytest setup_method

While the fundamental differences between unittest and pytest were discussed, there is something to be said about Django specific differences of the two. There is different database-related behaviour of unittest setUp method versus the pytest setup_method method. The setUp is included in the transaction and database interactions are rolled back once the test case is over. The setup_method is not included in the transaction. Moreover, interacting with the database from setup_method results in faulty behaviour and difference depending whether the test module is run on its own or as a part of the whole test suite.

The bottom line is: do not include database interaction in setup_method. This setUp, setup_method behaviour was already shown in the basic examples. And more description and demonstration of this behaviour is in the file: pytest_djangodb_only.py. This actually revealed the fact that using django_db database fixture is not supported in special pytest methods and the aforementioned error message is misleading (more references here and here).

When running the whole test suite, this file won't be collected (its name lacks "test_" string). It needs to be renamed to be included in the test suite run.

JSON data fixtures versus factories (pytest advanced example)

The traditional way of interacting with some test data was to perform following steps:
  • have data loaded in the database
  • python manage.py dumpdata
  • the produced JSON file is dragged along the application test code
  • call_command("loaddata", fixture_json_file_name) happens at each test suite run

  • The load is expensive, the JSON dump file is hard to maintain manually if the original modified copy and the current needs diverge (the file has integer primary keys value, etc). Although even the recent Django testing documentation mentions usage of JSON data fixtures, the approach is considered discouraged and the goal is recommended to achieve by means of loading the data in migrations or using model data factories.

    This talk for example compares the both approaches in favour of factory_boy library. A quote from the article: "Factory Boy is a Python port of a popular Ruby project called Factory Girl. It provides a declarative syntax for how new instances should be created. ... Using fixtures for complex data structures in your tests is fraught with peril. They are hard to maintain and they make your tests slow. Creating model instances as they are needed is a cleaner way to write your tests which will make them faster and more maintainable."

    The file test_pytest_advanced.py demostrates interaction with factories defined in the module factories.py, the basic very easy-to-use features.

    Despite its ease of use, the factory_boy is a powerful library capable of modeling Django's ORM many-to-many relationships, among other features.

    Additional useful links

    Conclusion

    You should have a good idea about testing differences via unittest and pytest in the Django environment. The emphasis has been put on pytest (django-pytest) and some recommended approaches. The demo application django-testing brings functional test cases demonstrating the behaviour and features discussed. The articles and talks listed in this post were extremely helpful and instrumental in gaining expertise in the area and introducing rigorous testing approach into the production application.

    Any discrepancy between the behaviour described above and on your own setup may originate from different software versions. In any case, if anything is not clear enough, please let me know in the comments.

    Image Processing In The Cloud With Blitline and Wordpress

    Working with ImageMagick can be difficult. First, you have to get it installed on your OS (do you have Dev libs in place?), then you have to enable it in the language of your choice, then get it working in your application. After all that, do it all over again on the staging server where debugging may be complicated, and you may not have Admin rights. Meet Image Processing in the Cloud. Meet Blitline.


    I'm doing a lot of things with Wordpress now, so we'll set it up with Wordpress and PHP.


    Step 1

    Get a free developer account with Blitline, and note your application id.

    Step 2

    Get the Blitline PHP wrapper library Blitline_php. It's clean and awesome, but unfortunately at the time of writing it was missing a few things, like being able to run your own Image Magick script and set a postback URL for when the job is finished. Yes, those are all useful features of Blitline cloud image processing! I'm still waiting on my pull request to be incorporated into the official version, so you can use mine that has these two useful features for now Ftert's Blitline_php


    Step 3

    Now it's time to integrate it in our application. Since it's Wordpress, I'm doing it in the 'wp_generate_attachment_metadata' callback in functions.php


    require_once dirname(__FILE__) . '/blitline_php/lib/blitline_php.php';
    ...
    add_filter( 'wp_generate_attachment_metadata', array($this,'wp_blur_attachment_filter'), 10, 2 );
    ...
    public function wp_blur_attachment_filter($image_data, $attachment_id) {
    
    $url =  wp_get_attachment_url($attachment_id);
    
    list($src, $width, $length) = wp_get_attachment_image_src($attachment_id);
    
    $data = pathinfo($src);
    
    $dest = $data['filename'] . '_darken_75_105_100_blur_0_20.' . $data['extension'];
    
    $Blit = new Blitline_php();
    
    $Blit->load($url, $dest);
    
    $Blit->do_script("convert input.png -blur 0x20 -modulate 75,105,100 output.png");
    
    $Blit->set_postback_url( get_site_url() . '/wp-json/myapp/v1/blitline_callback');
    
    $results = $Blit->process();
    
    if ($results->success()) {
    foreach($results->get_images() as $name => $url) {
    error_log("Processed: {$name} at {$url}\n");
    }
    } else {
    error_log($results->get_errors());
    }
    }
    

    We are sending a JSON POST request to Blitline to make the blurred and saturated version of the uploaded image. You can track the progress of your jobs here. The request will return a URL to the image on the Blitline server, but the image may not be there right away, because the processing is asynchronous. I tried to set up S3 bucket integration (yes, Blitline can upload to S3 for you!), but the setup procedure is quite tedious. You have to manually enter your AWS Canonical ID (and obtain it first from Amazon) on the Blitline page. Then you have to create a special policy in your bucket for Blitline. This is a lot of hassle, and giving permissions to someone else might not be the way to go for you. For me personally it didn't work, because my policy was being automatically overwritten all the time. I don't even know why. So here's where the postback URL comes in play.


    Step 4

    I'm using this plugin WP-API V2 that soon will become part of Wordpress to make REST endpoints. In wp-content/mu-plugins/my-app-custom-endpoints/lib/endpoints.php


    add_action('rest_api_init', function () {
    
    register_rest_route('portfolio/v1', '/blitline_callback', array(
    'methods' => 'POST',
    'callback' => 'process_blitline_callback',
    ));
    });
    

    In wp-content/mu-plugins/loader.php

    require_once dirname(__FILE__) . '/blitline_php/lib/blitline_php.php';
    
    require_once dirname(__FILE__) . '/my-app-custom-endpoints/api.php';
    

    In wp-content/mu-plugins/my-app-custom-endpoints/api.php


    if( ! defined( 'ABSPATH' ) ) exit;
    
    require_once dirname(__FILE__) . '/lib/endpoints.php';
    


    Here's the fun part. Add to wp-content/mu-plugins/my-app-custom-endpoints/lib/endpoints.php


    use Aws\S3\S3Client;
    
    function process_blitline_callback($request) {
    
    if( !class_exists( 'WP_Http' ) )
    include_once( ABSPATH . WPINC. '/class-http.php' );
    
    $s3Client = S3Client::factory(array(
    'credentials' => array(
    'key'    => 'YOUR S3 KEY',
    'secret' => 'YOUR S3 SECRET'
    )
    ));
    $photo = new WP_Http();
    
    $body = $request->get_body_params();
    
    $var = (array) json_decode(stripslashes($body['results']), true);
    
    if (isset($var['images'][0]['error'])) {
    error_log('Error ' . $var['images'][0]['error']);
    return;
    }
    
    $photo = $photo->request( $var['images'][0]['s3_url'] );
    
    $photo_name = $var['images'][0]['image_identifier'];
    
    $attachment = wp_upload_bits( $photo_name , null,
    $photo['body'],
    date("Y-m", strtotime( $photo['headers']['last-modified'] ) ) );
    
    $upload_dir = wp_upload_dir();
    
    $s3Client->putObject(array(
    'Bucket' => "yourbucket",
    'Key'    => 'wp-content/uploads' . $upload_dir['subdir'] . '/' . $photo_name,
    'SourceFile'   => $attachment['file'],
    'ACL'    => 'public-read'
    ));
    }
    

    In the callback we download the processed image from the temporary Blitline URL. One little bonus in here is the upload to Amazon S3 bucket. I use Amazon PHP SDK to achieve that. Note the permissions. This was one last thing when I actually almost gave up trying to make Blitline postback URL work. When the image finally appeared in my bucket, it wasn't accessible from the outside, because I didn't set permissions


    Step 5. If it doesn't work. Debugging.

    I used Firefox add-on HttpRequester to post the mock response from Blitline to my application. If you don't want to deploy each time you change the code, here's another useful thing LocalTunnel, so you can expose your localhost to the internet and set the postback to your local app.


    And that's how you do image processing in the cloud!

    Odd pg_basebackup Connectivity Failures Over SSL

    A client recently came to me with an ongoing mystery: A remote Postgres replica needed replaced, but repeatedly failed to run pg_basebackup. It would stop part way through every time, reporting something along the lines of:

    pg_basebackup: could not read COPY data: SSL error: decryption failed or bad record mac

    The first hunch we had was to turn off SSL renegotiation, as that isn't supported in some OpenSSL versions. By default it renegotiates keys after 512MB of traffic, and setting ssl_renegotiation_limit to 0 in postgresql.conf disables it. That helped pg_basebackup get much further along, but they were still seeing the process bail out before completion.

    The client's Chef has a strange habit of removing my ssh key from the database master, so while that was being fixed I connected in and took a look at the replica. Two pg_basebackup runs later, a pattern started to emerge:
    $ du -s 9.2/data.test*
    67097452        9.2/data.test
    67097428        9.2/data.test2
    While also being a nearly identical size, those numbers are also suspiciously close to 64GB. I like round numbers, when a problem happens close to one that's often a pretty good tell of some boundary or limit. On a hunch that it wasn't a coincidence I checked around for any similar references and found a recent openssl package bug report:

    https://rhn.redhat.com/errata/RHBA-2015-0772.html

    RHEL 6, check. SSL connection, check. Failure at 64 GiB, check. And lastly, a connection with psql confirmed AES-GCM:
    SSL connection (cipher: DHE-RSA-AES256-GCM-SHA384, bits: 256)

    Once the Postgres service could be restarted to load in the updated OpenSSL library, the base backup process completed without issue.

    Remember, keep those packages updated!

    Broken wikis due to PHP and MediaWiki "namespace" conflicts

    I was recently tasked with resurrecting an ancient wiki. In this case, a wiki last updated in 2005, running MediaWiki version 1.5.2, and that needed to get transformed to something more modern (in this case, version 1.25.3). The old settings and extensions were not important, but we did want to preserve any content that was made.

    The items available to me were a tarball of the mediawiki directory (including the LocalSettings.php file), and a MySQL dump of the wiki database. To import the items to the new wiki (which already had been created and was gathering content), an XML dump needed to be generated. MediaWiki has two simple command-line scripts to export and import your wiki, named dumpBackup.php and importDump.php. So it was simply a matter of getting the wiki up and running enough to run dumpBackup.php.

    My first thought was to simply bring the wiki up as it was - all the files were in place, after all, and specifically designed to read the old version of the schema. (Because the database scheme changes over time, newer MediaWikis cannot run against older database dumps.) So I unpacked the MediaWiki directory, and prepared to resurrect the database.

    Rather than MySQL, the distro I was using defaulted to using the freer and arguably better MariaDB, which installed painlessly.

    ## Create a quick dummy database:
    $ echo 'create database footest' | sudo mysql
    
    ## Install the 1.5.2 MediaWiki database into it:
    $ cat mysql-acme-wiki.sql | sudo mysql footest
    
    ## Sanity test as the output of the above commands is very minimal:
    echo 'select count(*) from revision' | sudo mysql footest
    count(*)
    727977
    

    Success! The MariaDB instance was easily able to parse and load the old MySQL file. The next step was to unpack the old 1.5.2 mediawiki directory into Apache's docroot, adjust the LocalSettings.php file to point to the newly created database, and try and access the wiki. Once all that was done, however, both the browser and the command-line scripts spat out the same error:

    Parse error: syntax error, unexpected 'Namespace' (T_NAMESPACE), 
      expecting identifier (T_STRING) in 
      /var/www/html/wiki/includes/Namespace.php on line 52
    

    What is this about? Turns out that some years ago, someone added a class to MediaWiki with the terrible name of "Namespace". Years later, PHP finally caved to user demands and added some non-optimal support for namespaces, which means that (surprise), "namespace" is now a reserved word. In short, older versions of MediaWiki cannot run with modern (5.3.0 or greater) versions of PHP. Amusingly, a web search for this error on DuckDuckGo revealed not only many people asking about this error and/or offering solutions, but many results were actual wikis that are currently not working! Thus, their wiki was working fine one moment, and then PHP was (probably automatically) upgraded, and now the wiki is dead. But DuckDuckGo is happy to show you the wiki and its now-single page of output, the error above. :)

    There are three groups to blame for this sad situation, as well as three obvious solutions to the problem. The first group to share the blame, and the most culpable, is the MediaWiki developers who chose the word "Namespace" as a class name. As PHP has always had very non-existent/poor support for packages, namespaces, and scoping, it is vital that all your PHP variables, class names, etc. are as unique as possible. To that end, the name of the class was changed at some point to "MWNamespace" - but the damage has been done. The second group to share the blame is the PHP developers, both for not having namespace support for so long, and for making it into a reserved word full knowing that one of the poster children for "mature" PHP apps, MediaWiki, was using "namespace". Still, we cannot blame them too much for picking what is a pretty obvious word choice. The third group to blame is the owners of all those wikis out there that are suffering that syntax error. They ought to be repairing their wikis. The fixes are pretty simple, which leads us to the three solutions to the problem.


    MediaWiki's cool install image

    The quickest (and arguably worst) solution is to downgrade PHP to something older than 5.3. At that point, the wiki will probably work again. Unless it's a museum (static) wiki, and you do not intend to upgrade anything on the server ever again, this solution will not work long term. The second solution is to upgrade your MediaWiki! The upgrade process is actually very robust and works well even for very old versions of MediaWiki (as we shall see below). The third solution is to make some quick edits to the code to replace all uses of "Namespace" with "MWNamespace". Not a good solution, but ideal when you just need to get the wiki up and running. Thus, it's the solution I tried for the original problem.

    However, once I solved the Namespace problem by renaming to MWNamespace, some other problems popped up. I will not run through them here - although they were small and quickly solved, it began to feel like a neverending whack-a-mole game, and I decided to cut the Gordian knot with a completely different approach.

    As mentioned, MediaWiki has an upgrade process, which means that you can install the software and it will, in theory, transform your database schema and data to the new version. However, version 1.5 of MediaWiki was released in October 2005, almost exactly 10 years ago from the current release (1.25.3 as of this writing). Ten years is a really, really long time on the Internet. Could MediaWiki really convert something that old? (spoilers: yes!). Only one way to find out. First, I prepared the old database for the upgrade. Note that all of this was done on a private local machine where security was not an issue.

    ## As before, install mariadb and import into the 'footest' database
    $ echo 'create database footest' | sudo mysql test
    $ cat mysql-acme-wiki.sql | sudo mysql footest
    $ echo "set password for 'root'@'localhost' = password('foobar')" | sudo mysql test
    

    Next, I grabbed the latest version of MediaWiki, verified it, put it in place, and started up the webserver:

    $ wget http://releases.wikimedia.org/mediawiki/1.25/mediawiki-1.25.3.tar.gz
    $ wget http://releases.wikimedia.org/mediawiki/1.25/mediawiki-1.25.3.tar.gz.sig
    
    $ gpg --verify mediawiki-1.25.3.tar.gz.sig 
    gpg: assuming signed data in `mediawiki-1.25.3.tar.gz'
    gpg: Signature made Fri 16 Oct 2015 01:09:35 PM EDT using RSA key ID 23107F8A
    gpg: Good signature from "Chad Horohoe "
    gpg:                 aka "keybase.io/demon "
    gpg:                 aka "Chad Horohoe (Personal e-mail) "
    gpg:                 aka "Chad Horohoe (Alias for existing email) "
    ## Chad's cool. Ignore the below.
    gpg: WARNING: This key is not certified with a trusted signature!
    gpg:          There is no indication that the signature belongs to the owner.
    Primary key fingerprint: 41B2 ABE8 17AD D3E5 2BDA  946F 72BC 1C5D 2310 7F8A
    
    $ tar xvfz mediawiki-1.25.3.tar.gz
    $ mv mediawiki-1.25.3 /var/www/html/
    $ cd /var/www/html/mediawiki-1.25.3
    ## Because "composer" is a really terrible idea:
    $ git clone https://gerrit.wikimedia.org/r/p/mediawiki/vendor.git 
    $ sudo service httpd start
    

    Now, we can call up the web page to install MediaWiki.

    • Visit http://localhost/mediawiki-1.25.3, see the familiar yellow flower
    • Click "set up the wiki"
    • Click next until you find "Database name", and set to "footest"
    • Set the "Database password:" to "foobar"
    • Aha! Looks what shows up: "Upgrade existing installation" and "There are MediaWiki tables in this database. To upgrade them to MediaWiki 1.25.3, click Continue"

    It worked! Next messages are: "Upgrade complete. You can now start using your wiki. If you want to regenerate your LocalSettings.php file, click the button below. This is not recommended unless you are having problems with your wiki." That message is a little misleading. You almost certainly *do* want to generate a new LocalSettings.php file when doing an upgrade like this. So say yes, leave the database choices as they are, and name your wiki something easily greppable like "ABCD". Create an admin account, save the generated LocalSettings.php file, and move it to your mediawiki directory.

    At this point, we can do what we came here for: generate a XML dump of the wiki content in the database, so we can import it somewhere else. We only wanted the actual content, and did not want to worry about the history of the pages, so the command was:

    $ php maintenance/dumpBackup.php --current > acme.wiki.2005.xml
    

    It ran without a hitch. However, close examination showed that it had an amazing amount of unwanted stuff from the "MediaWiki:" namespace. While there are probably some clever solutions that could be devised to cut them out of the XML file (either on export, import, or in between), sometimes quick beats clever, and I simply opened the file in an editor and removed all the "page" sections with a title beginning with "MediaWiki:". Finally, the file was shipped to the production wiki running 1.25.3, and the old content was added in a snap:

    $ php maintenance/importDump.php acme.wiki.2005.xml
    

    The script will recommend rebuilding the "Recent changes" page by running rebuildrecentchanges.php (can we get consistentCaps please MW devs?). However, this data is at least 10 years old, and Recent changes only goes back 90 days by default in version 1.25.3 (and even shorter in previous versions). So, one final step:

    ## 20 years should be sufficient
    $ echo '$wgRCMAxAge = 20 * 365 * 24 * 3600;' >> LocalSettings.php
    $ php maintenance/rebuildrecentchanges.php
    

    Voila! All of the data from this ancient wiki is now in place on a modern wiki!

    Liquid Galaxy at UNESCO in Paris

    The National Congress of Industrial Heritage of Japan (NCoIH) recently deployed a Liquid Galaxy at UNESCO Headquarters in Paris, France. The display showed several locations throughout southern Japan that were key to her rapid industrialization in the late 19th and early 20th century. Over the span of 30 years, Japan went from an agrarian society dominated by Samurai still wearing swords in public to an industrial powerhouse, forging steel and building ships that would eventually form a world-class navy and an industrial base that still dominates many lead global industries.

    End Point assisted by supplying the servers, frame, and display hardware for this temporary installation. The NCoIH supplied panoramic photos, historical records, and location information. Together using our Roscoe Content Management Application, we built out presentations that guided the viewer through several storylines for each location: viewers could see the early periods of Trial & Error and then later industrial mastery, or could view the locations by technology: coal mining, shipbuilding, and steel making. The touchscreen interface was custom-designed to allow a self-exploration among these storylines, and also showed thumbnail images of each scene in the presentations that, when touched, brought the viewer directly to that location and showed a short explanatory text, historical photos, as well as transitioning directly into Google Street View to show the preserved site.

    From a technical point of view, End Point debuted several new features with this deployment:

    • New scene control and editing functionalities in the Roscoe Content Management System
    • A new touchscreen interface that shows presentations and scenes within a presentation in a compact, clean layout
    • A new Street View interface that allows the "pinch and zoom" map navigation that we all expect from our smart phones and tablets
    • Debut of the new ROS-based operating system, including new ROS-nodes that can control Google Earth, Street View, panoramic content viewers, browser windows, and other interfaces
    • Deployment of some very nice NEC professional-grade displays
    Overall, the exhibit was a great success. Several diplomats from European, African, Asian, and American countries came to the display, explored the sites, and expressed their wonderment at the platform's ability to bring a given location and history into such vivid detail. Japan recently won recognition for these sites from the overall UNESCO governing body, and this exhibit was a chance to show those locations back to the UNESCO delegates.

    From here, the Liquid Galaxy will be shipped to Japan where it will be installed permanently at a regional museum, hopefully to be joined by a whole chain of Liquid Galaxy platforms throughout Japan showing her rich history and heritage to museum visitors.

    Taking control of your IMAP mail with IMAPFilter

    Organizing and dealing with incoming email can be tedious, but with IMAPFilter's simple configuration syntax you can automate any action that you might want to perform on an email and focus your attention on the messages that are most important to you.

    Most desktop and mobile email clients include support for rules or filters to deal with incoming mail messages but I was interested in finding a client-agnostic solution that could run in the background, processing incoming messages before they ever reached my phone, tablet or laptop. Configuring a set of rules in a desktop email client isn't as useful when you might also be checking your mail from a web interface or mobile client; either you need to leave your desktop client running 24/7 or end up with an unfiltered mailbox on your other devices.

    I've configured IMAPFilter to run on my home Linux server and it's doing a great job of processing my incoming mail, automatically sorting things like newsletters and automated Git commit messages into separate mailboxes and reserving my inbox for higher priority incoming mail.

    IMAPFilter is available in most package managers and easily configured with a single ~/.imapfilter/config.lua file. A helpful example config.lua is available in IMAPFilter's GitHub repository and is what I used as the basis for my personal configuration.

    A few of my favorite IMAPFilter rules (where 'endpoint' is configured as my work IMAP account):

    -- Mark daily timesheet reports as read, move them into a Timesheets archive mailbox
    timesheets = endpoint['INBOX']:contain_from('timesheet@example.com')
    timesheets:mark_seen()
    timesheets:move_messages(endpoint['Archive/Timesheets'])
    

    -- Sort newsletters into newsletter-specific mailboxes
    jsweekly = endpoint['INBOX']:contain_from('jsw@peterc.org')
    jsweekly:move_messages(endpoint['Newsletters/JavaScript Weekly'])
    
    hn = endpoint['INBOX']:contain_from('kale@hackernewsletter.com')
    hn:move_messages(endpoint['Newsletters/Hacker Newsletter'])
    

    Note that IMAPFilter will create missing mailboxes when running 'move_messages', so you don't need to set those up ahead of time. These are basic examples but the sample config.lua is a good source of other filter ideas, including combining messages matching multiple criteria into a single result set.

    In addition to these basic rules, IMAPFilter also supports more advanced configurations including the ability to perform actions on messages based on the results of passing their content through an external command. This opens up possibilities like performing your own local spam filtering by sending each message through SpamAssassin and moving messages into spam mailboxes based on the exit codes returned by spamc. As of this writing I'm still in the process of training SpamAssassin to reliably recognize spam vs. ham but hope to integrate its spam detection into my own IMAPFilter configuration soon.

    Biennale Arte 2015 Liquid Galaxy Installation

    If there is anyone who doesn’t know about the incredible collections of art that the Google Cultural Institute has put together, I would urge them to visit google.com/culturalinstitute and be overwhelmed by their indoor and outdoor Street View tours of some of the world’s greatest museums. Along these same lines, the Cultural Institute recently finished doing a Street View capture of the interior of 70 pavilions representing 80 countries of the Biennale Arte 2015, in Venice, Italy. We, at End Point, were lucky enough to be asked to come along for the ride: Google decided that not only would this Street View version of the Biennale be added to the Cultural Institute’s collection, but that they would install a Liquid Galaxy at the Biennale headquarters, at Ca’ Giustinian on the Grand Canal, where visitors can actually use the Liquid Galaxy to navigate through the installations. Since the pavilions close in November 2015, and the Galaxy is slated to remain open until the end of January 2016, this will permit art lovers who missed the Biennale to experience it in a way that is astoundingly firsthand.

    End Point basically faced two challenges during the Liquid Galaxy Installations for the Cultural Institute. The first challenge was to develop a custom touch screen that would allow users to easily navigate/choose among the many pavilions. Additionally, wanting to mirror the way the Google Cultural Institute presents content, both online, as well as on the wall at their Paris office, we decided to add a swipe-able thumbnail runway to the touch screen map which would appear once a given pavilion was chosen.

    As we took on this project, it became evident to our R&D team that ordinary Street View wasn't really the ideal platform for indoor pavilion navigation because of the sheer size and scope of the pavilions. For this reason, our team decided that a ROS-based spherical Street View would provide a much smoother navigating experience. The new Street View viewer draws Street View tiles inside a WebGL sphere. This is a dramatic performance and visual enhancement over the old Maps API based viewer, and can now support spherical projection, hardware acceleration, and seamless panning. For a user in the multi-screen Liquid Galaxy setting, this means, for the first time, being able to roll the view vertically as well as horizontally, and zoom in and out, with dramatically improved frame rates. The result was such a success that we will be rolling out this new Street View to our entire fleet.

    The event itself consisted of two parts: at noon, Luisella Mazza, Google’s Head of Country Operations at the Cultural Institute, gave a presentation to the international press; as a result, we have already seen coverage emerge in ANSA, Arte.it, L'Arena, and more. This was followed by a 6PM closed door presentation to the Aspen Institute.

    Using the Liquid Galaxy and other supports from the exhibition, Luisella spoke at length about the role of culture in what Google refers to as the “digital transformation”.

    The Aspen Institute is very engaged with these questions of “whitherto”, and Luisella’s presentation was followed by a long, and lively, round table discussion on the subject.

    We were challenged to do something cool here and we came through in a big way: our touchscreen design and functionality are the stuff of real creative agency work, and meeting the technical challenge of making Street View perform in a new and enhanced way not only made for one very happy client, but is the kind of technical breakthrough that we all dream of. And how great that we got to do it all in Venice and be at the center of the action!

    Top 15 Best Unix Command Line Tools

    Here are some of the unix command line tools which we feel make our hands faster and lives easier. Let’s go through them in this post and make sure to leave a comment with your favourite!

    1. Find the command that you are unaware of

    In many situations we need to perform a command line operation but we might not know the right utility to run. The command (apropos) searches for the given keyword against its short description in the unix manual page and returns a list of commands that we may use to accomplish our need.

    If you can not find the right utility, then Google is our friend :)

    $ apropos "list dir"
    $ man -k "find files"


    2. Fix typos in our commands

    It's normal to make typographical errors when we type so fast. Consider a situation where we need to run a command with a long list of arguments and when executing it returns "command not found" and you noticed that you have made a "typo" on the executed command.
    Now, we really do not want to retype the long list of arguments, instead use the following to simply just correct the typo command and execute
    $ ^typo_cmd^correct_cmd
     $ dc /tmp
     $ ^dc^cd
    The above will navigate to /tmp directory

    3. Bang and its Magic

    Bang quite useful, when we want to play with the bash history commands . Bang helps by letting you execute commands in history easily when you need them
    • !! --> Execute the last executed command in the bash history
    • !* --> Execute the command with all the arguments passed to the previous command
    • !ˆ --> Get the first argument of the last executed command in the bash history
    • !$ --> Get the last argument of the last executed command in the bash history
    • ! --> Execute a command which is in the specified number in bash history
    • !?keyword? --> Execute a command from bash history for the first pattern match of the specified keyword
    • !-N --> Execute the command that was Nth position from the last in bash history
    $ ~/bin/lg-backup
     $ sudo !!
    
    In the last part of the above example we didn't realize that the lg-backup command had to be run as "sudo". Now, Instead of typing the whole command again with sudo, we can just use "sudo !!" which will re-run the last executed command in bash history as sudo, which saves us lot of time.

    4. Working with Incron

    This incron configuration is almost like crontab setup, but the main difference is that "Incron" monitors a directory for specific changes and triggers future actions as specified
    Syntax: $directory $file_change_mask $command_or_action

    /var/www/html/contents/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /usr/bin/rsync –exclude ’*.tmp’ -a /home/ram/contents/ user@another_host:/home/ram/contents/
     /tmp IN_ALL_EVENTS logger "/tmp action for #file"
    The above example shows triggering an "rsync" event whenever there is a change in "/var/www/html/contents" directory. In cases of immediate backup implementations this will be really helpful. Find more about incron here.

    5. Double dash

    There are situations where we end up in creating/deleting the directories whose name start with a symbol. These directories can not be removed by just using "rm -rf or rmdir". So we need to use the "double dash" (--) to perform deletion of such directories
    $ rm -rf -- $symbol_dir
    There are situations where you may want to create a few directory that starts with a symbol. "You can just these create directories using double dash(--) and starting the directory name with a symbol"
    $ mkdir -- $symbol_dir

    6. Comma and Braces Operators

    We can do lot with comma and braces to make our life easier when we are performing some operations, lets see few usages
    • Rename and backup operations with comma & braces operator
    • Pattern matching with comma & braces operator
    • Rename and backup (prefixing name) operations on long file names
    To backup the httpd.conf to httpd.conf.bak
    $ cp httpd.conf{,.bak}
    To revert the file from httpd.conf.bak to httpd.conf
    $ mv http.conf{.bak,}
    To rename the file with prefixing 'old'
    $ cp exampleFile old-!#ˆ

    7. Read only vim

    As we all know, vim is a powerful command line editor. We can also use vim to view files in read only mode if you want to stick to vim
    $ vim -R filename
    
    We can also use the "view" tool which is nothing but read only vim
    $ view filename 

    8. Push and Pop Directories

    Sometimes when we are working with various directories and looking at the logs and executing scripts we find alot of our time is spent navigating the directory structure. If you think your directory navigations resembles a stack structure then use push and pop utilities which will save you lots of time
    • Push the directory using pushd
    • List the stack directories using the command "dirs"
    • Pop the directories using popd
    • This is mainly used in navigating between directories

    9. Copy text from Linux terminal(stdin) to the system clipboard

    Install xclip and create the below alias
    $ alias pbcopy=’xclip -selection clipboard’
    $ alias pbpaste=’xclip -selection clipboard -o’
    We need to have the X window system running it to work. In Mac OS X, these pbcopy and pbpaste commands are readily available to you
    To Copy:
    $ ls | pbcopy
    To Paste:
    $ pbpaste > lstxt.txt 

    10. TimeMachine like Incremental Backups in Linux using rsync --link-dest

    This means that it will not recopy all of the files every single time a backup is performed. Instead, only the files that have been newly created or modified since the last backup will be copied. Unchanged files are hard linked from prevbackup to the destination directory.
    $ rsync -a –link-dest=prevbackup src dst

    11. To display the ASCII art of the Process tree

    Showing your processes in a tree structure is very useful for confirming the relationship between every process running on your system. Here is an option which is available by default on most of the Linux systems.
    $ ps -aux –forest
    –forest is an argument to ps command, which displays ASCII art of process tree

    There are many commands available like 'pstree', 'htop' to achieve the same thing.

    12. Tree view of git commits

    If you want to see git commits in a repo as tree view to understand the commit history better, the below option will be super helpful. This is available with the git installation and you do not need any additional packages.
    $ git log –graph –oneline

    13. Tee

    Tee command is used to store and view (at the same time) the output of any other command.
    (ie) At the same time it writes to the STDOUT, and to a file. It helps when you want to view the command output and at the time same time if you want to write it into a file or using pbcopy you can copy the output
    $ crontab -l | tee crontab.backup.txt
    The tee command is named after plumbing terminology for a T-shaped pipe splitter. This Unix command splits the output of a command, sending it to a file and to the terminal output. Thanks Jon for sharing this.

    14. ncurses disk usage analyzer

    Analysing disk usage with nurses interface, is fast and simple to use.
    $ sudo apt-get install ncdu

    15. hollywood

    You all have seen the hacking scene on hollywood movies. Yes, there is a package which will let you create that for you.
    $ sudo apt-add-repository ppa:hollywood/ppa 
    $ sudo apt-get update
    $ sudo apt-get install hollywood
    $ hollywood