News

Welcome to End Point’s blog

Ongoing observations by End Point people

Postgres checksum performance impact

Way back in 2013, Postgres introduced a feature known as data checksums. When this is enabled, a small integer checksum is written to each "page" of data that Postgres stores on your hard drive. Upon reading that block, the checksum value is recomputed and compared to the stored one. This detects data corruption, which (without checksums) could be silently lurking in your database for a long time. We highly recommend to our Postgres clients to turn checksums on; hopefully this feature will be enabled by default in future versions of Postgres.

However, because TANSTAAFL (there ain't no such thing as a free lunch), enabling checksums does have a performance penalty. Basically, a little bit more CPU is needed to compute the checksums. Because the computation is fast, and very minimal compared to I/O considerations, the performance hit for typical databases is very small indeed, often less than 2%. Measuring the exact performance hit of checksums can be a surprisingly tricky problem.

There are many factors that influence how much slower things are when checksums are enabled, including:

  • How likely things are to be read from shared_buffers, which depends on how large shared_buffers is set, and how much of your active database fits inside of it
  • How fast your server is in general, and how well it (and your compiler) are able to optimize the checksum calculation
  • How many data pages you have (which can be influenced by your data types)
  • How often you are writing new pages (via COPY, INSERT, or UPDATE)
  • How often you are reading values (via SELECT)

Enough of the theory, let's see checksums in action. The goal is that even a single changed bit anywhere in your data will produce an error, thanks to the checksum. For this example, we will use a fresh 9.4 database, and set it up with checksums:

~$ cd ~/pg/9.4
~/pg/9.4$ bin/initdb --data-checksums lotus
The files belonging to this database system will be owned by user "greg".
...
Data page checksums are enabled.
...
~/pg/9.4$ echo port=5594 >> lotus/postgresql.conf
~/pg/9.4$ bin/pg_ctl start -D lotus -l lotus.log
server starting
~/pg/9.4$ bin/createdb -p 5594 testdb

For testing, we will use a table with a single char(2000) data type. This ensures that we have a relatively high number of pages compared to the number of rows (smaller data types means more rows shoved into each page, while higher types also mean less pages, as the rows are TOASTed out). The data type will be important for our performance tests later on, but for now, we just need a single row:

~/pg/9.4$ psql testdb -p 5594 -c "create table foobar as select 'abcd'::char(2000) as baz"
SELECT 1

Finally, we will modify the data page on disk using sed, then ask Postgres to display the data, which should cause the checksum to fail and send up an alarm. (Unlike my coworker Josh's checksum post, I will change the actual checksum and not the data, but the principle is the same).

~/pg/9.4$ export P=5594
## Find the actual on-disk file holding our table, and store it in $D
~/pg/9.4$ export D=`psql testdb -p$P -Atc "select setting || '/' || pg_relation_filepath('foobar') from pg_settings where name ~ 'data_directory'"`
~/pg/9.4$ echo $D
/home/greg/pg/9.4/lotus/base/16384/16385
## The checksum is stored at the front of the header: in this case it is: 41 47
~/pg/9.4$ hexdump -C $D | head -1
00000000  00 00 00 00 00 00 00 00  41 47 00 00 1c 00 10 18  |........AG......|

## Use sed to change the checksum in place, then double check the result
~/pg/9.4$ LC_ALL=C sed -r -i "s/(.{8})../\1NI/" $D
~/pg/9.4$ hexdump -C $D | head -1
00000000  00 00 00 00 00 00 00 00  4E 49 00 00 1c 00 10 18  |........NI......|
~/pg/9.4$ psql testdb -p$P -tc 'select rtrim(baz) from foobar'
 abcd

Hmmm...why did this not work? Because of the big wrinkle to testing performance, shared buffers. This is a special shared memory segment used by Postgres to cache data pages. So when we asked Postgres for the value in the table, it pulled it from shared buffers, which does not get a checksum validation. Our changes are completely overwritten as the row leaves shared buffers and heads back to disk, generating a new checksum:

~/pg/9.4$ psql testdb -p$P -tc 'checkpoint'
CHECKPOINT
~/pg/9.4$ hexdump -C $D | head -1
00000000  00 00 00 00 80 17 c2 01  7f 19 00 00 1c 00 10 18  |................|

How can we trigger a checksum warning? We need to get that row out of shared buffers. The quickest way to do so in this test scenario is to restart the database, then make sure we do not even look at (e.g. SELECT) the table before we make our on-disk modification. Once that is done, the checksum will fail and we will, as expected, receive a checksum error:

~/pg/9.4$ bin/pg_ctl restart -D lotus -l lotus.log
waiting for server to shut down.... done
server stopped
server starting
~/pg/9.4$ LC_ALL=C sed -r -i "s/(.{8})../\1NI/" $D
~/pg/9.4$ psql testdb -p$P -tc 'select rtrim(baz) from foobar'
WARNING:  page verification failed, calculated checksum 6527 but expected 18766
ERROR:  invalid page in block 0 of relation base/16384/16385

The more that shared buffers are used (and using them efficiently is a good general goal), the less checksumming is done, and the less the impact of checksums on database performance will be. Because we want to see the "worst-case" scenario when doing performance testing, let's create a second Postgres cluster, with a teeny-tiny shared buffers. This will increase the chances that any reads come not from shared buffers, but from the disk (or more likely the OS cache, but we shall gloss over that for now).

To perform some quick performance testing on writes, let's do a large insert, which will write many pages to disk. I originally used pg_bench for these tests, but found it was doing too much SQL under the hood and creating results that varied too much from run to run. So after creating a second cluster with checksums disabled, and after adjusting both with "shared_buffers = 128kB", I created a test script that inserted many rows into the char(2000) table above, which generated a new data page (and thus computed a checksum for the one cluster) once every four rows. I also did some heavy selects of the same table on both clusters.

Rather than boring you with large charts of numbers, I will give you the summary. For inserts, the average difference was 6%. For selects, that jumps to 19%. But before you panic, remember that this tests are with a purposefully crippled Postgres database, doing worst-case scenario runs. When shared_buffers was raised to a sane setting, the statistical difference between checksums and not-checksums disappeared.

In addition to this being an unrealistic worst-case scenario, I promise that you would be hard pressed to find a server to run Postgres on with a slower CPU than the laptop I ran these tests on. :) The actual calculation is pretty simple and uses a fast Fowler/Noll/Vo hash - see the src/include/storage/checksum_impl.h file. The calculation used is:

hash = (hash ^ value) * FNV_PRIME ^ ((hash ^ value) >> 17)

Can you handle the performance hit? Here's a little more incentive for you: if you are doing this as part of a major upgrade (a common time to do so, as part of a pg_dump oldversion | psql newversion process), then you are already getting performance boosts from the new version. Which can nicely balance out (or at least mitigate) your performance hit from enabling checksums! Look how much speedup you get doing basic inserts just leaving the 8.x series:

It is very hard to hazard any number for the impact of checksums, as it depends on so many factors, but for a rough ballpark, I would say a typical database might see a one or two percent difference. Higher if you are doing insane amounts of inserts and updates, and higher if your database doesn't fit at all into shared buffers. All in all, a worthy trade-off. If you want some precise performance impact figures, you will need to do A/B testing with your database and application.

To sum this page up (ha!), enable those checksums! It's worth the one-time cost of not being able to use pg_upgrade, and the ongoing cost of a little more CPU. Don't wait for your corruption to get so bad the system catalogs start getting confused - find out the moment a bit gets flipped.

2015 Perl Dancer Conference videos

The 2015 Perl Dancer Conference has recently released the presentation videos. This year the conference was hosted in beautiful Vienna, Austria. Josh Lavin and I were both honored to attend the conference as well as give talks. Earlier, Josh wrote summaries of the conference:

Conference Recap

Conference Presentations

SpaceCamps “The Final Frontier”

I gave a talk exploring new technologies for End Point's own DevCamps development tool. During the presentation I detailed my research into containers and what a cloud-based development environment might look like.

SpaceCamps Presentation Video

AngularJS & Dancer for Modern Web Development

Josh detailed his experience migrating legacy applications utilizing Dancer, AngularJS, and modern Perl techniques. Josh highlighted the challenges he faced during the process, as well as lessons he learned along the way.

AngularJS & Dancer for Modern Web Development Presentation Video

Lightning Talks

Josh and I both gave short “lightning talks.” Josh’s was on Writing Unit Tests for a Legacy App (Interchange 5), and mine was on Plack & Interchange 5.

To review the rest of the presentations please checkout the Perl Dancer Conference YouTube channel.

Summary

The Perl Dancer community continues to flourish and the conference this year hosted a record 5 core Dancer developers. Dancer is about to release the finalized version of its long awaited plugin infrastructure for Dancer2. A lot of work on this was completed during the conference. Being an organizer of the conference, it brings me great joy to see this success. This news along with the release of Perl 6, I am certain 2016 will be a wonderful year for not only Dancer but the entire Perl community.

Git: pre-receive hook error on CentOS 7

We recently had to move a git repository from an old CentOS 5 to a new CentOS 7 server.

On the old CentOS 5 we had a recent, custom compiled version of git while on the new server we are using the system default old 1.8 version, shipped by the official CentOS repositories. And, as usual when you tell yourself "What could possibly go wrong?", something did: every push began to return the dreaded "fatal: The remote end hung up unexpectedly" error.

After some time spent trying to debug the problem, we managed to isolate the problem to the pre-receive hook, specifically active on that repository. The script was very simple:

 #!/bin/bash
 read_only_users="alice bob"
 for user in $read_only_users
 do
     if [ $USER == $user ]; then
         echo "User $USER has read-only access, push blocked."
         exit 1
     fi
 done

... which apparently had no visible mistakes. On top of the lack of errors, this very same script used to work perfectly for years on the old server. Unfortunately, and quite disappointingly, even changing it to a simple:

 #!/bin/bash
 echo "These are not the droids you are looking for. Move along."

...did not help and the error still persisted.

Searching for clues around forums and wikis, we found this blog post talking about parameters passed through stdin.

On Git docs, we read that pre-receive hooks takes no arguments, but for each ref to be updated it receives on standard input a line of the format: <old-value> SP <new-value> SP <ref-name> LF.

At that point, we tried with a sample script that actually reads and does something with stdin:

 #!/bin/bash
 while read oldrev newrev refname
 do
   echo "OLDREV: $oldrev - NEWREV: $newrev - REFNAME: $refname"
 done

...and voilĂ : pushes started working again. Lesson learned: never ignore stdin.

Event Listener Housekeeping in Angular Apps

I was recently debugging an issue where a large number of errors suddenly cropped up in an angular application. The client reported that the majority of the errors were occurring on the product wall which was an area of application I was responsible for. After some sleuthing and debugging I determined the culprit was a scroll event listener in an unrelated angular controller. When customers viewed this section of the application, the scroll listener was added to manage the visibility of some navigation elements. However, when the customer moved on to other sections of the site the listener continued to fire in a context it was not expecting.

Scroll event listeners fire very often so this explained the sheer volume of errors. The product wall is a tall section of the site with lots of content so this explained why the bulk of the errors were happening there. The solution was to simply listen to the $destroy event in the controller and unbind the troublesome scroll listener:

$scope.$on('$destroy', function() {
  $window.unbind('scroll');
});

Single page apps do not have the benefit of getting a clean state with each page load. Because of this it's important to keep track of any listeners that are added — especially those that are outside of angular (e.g. window and document) and make sure to clean those up when the related controllers and directives are destroyed.

ROS architecture of Liquid Galaxy

ROS has become the pivotal piece of software we have written our new Liquid Galaxy platform on. We have also recently open sourced all of our ROS nodes on GitHub. While the system itself is not a robot per se, it does have many characteristics of modern robots, making the ROS platform so useful. Our system is made up of multiple computers and peripheral devices, all working together to bring view synced content to multiple displays at the same time. To do this we made use of ROS's messaging platform, and distributed the work done on our system to many small ROS nodes.

Overview

Our systems are made up of usually 3 or more machines:

  • Head node: Small computer that runs roscore, more of a director in the system.
  • display-a: Usually controls the center three screens and a touchscreen + spacenav joystick.
  • display-b: Controls four screens, two on either side of the middle three.
  • display-$N: Controls more and more screens as needed, usually about four a piece.

Display-a and display-b are mostly identical in build. They mainly have a powerful graphics card and a PXE booted Ubuntu image. ROS has become our means to communicate between these machines to synchronize content across the system. The two most common functions are running Google Earth with KML / browser overlays to show extra content, and panoramic image viewers like Google's Street View. ROS is how we tell each instance of Google Earth what it should be looking at, and what should appear on all the screens.

Ros Architecture

Here is a general description all our ROS nodes. Hopefully we will be writing more blog posts about each node individually, as we do links will be filled in below. The source to all nodes can be found here on GitHub.

  • lg_activity: A node that measures activity across the system to determine when the system has become inactive. It will send an alert on a specific ROS topic when it detects inactivity, as well as another alert when the system is active again.
  • lg_attract_loop: This node will go over a list of tours that we provide to it. This node is usually listening for inactivity before starting, providing a unique screensaver when inactive.
  • lg_builder: Makes use of the ROS build system to create Debian packages.
  • lg_common: Full of useful tools and common message types to reduce coupling between nodes.
  • lg_earth: Manages Google Earth, syncs instances between all screens, includes a KML server to automate loading KML on earth.
  • lg_media: This shows images, videos, and text (or really any webpage) on screen at whatever geometry / location through awesome window manager rules.
  • lg_nav_to_device: This grabs the output of the /spacenav/twist topic, and translates it back into an event device. This was needed because Google Earth grabs the spacenav event device, not allowing the spacenav ROS node access.
  • lg_replay: This grabs any event device, and publishes its activity over a ROS topic.
  • lg_sv: This includes a Street View and generic panoramic image viewer, plus a server that manages the current POV / image for either viewer.

Why ROS

None of the above nodes specifically needs to exist as a ROS node. The reason we chose ROS is because as a ROS node, each running program (and sometimes any one of these nodes can exist multiple times at once on one machine) has an easy way to communicate with any other program. We really liked the pub/sub style for Inter-Process Communication in ROS. This has helped us reduce coupling between nodes. Each node can be replaced as needed without detrimental effects on the system.

We also make heavy use of the ROS packaging/build system, Catkin. We use it to build Debian packages which are installed on the PXE booted images.

Lastly ROS has become a real joy to work with. It is a really dependable system, with many powerful features. The ROS architecture allows us to easily add on new features as we develop them, without conflicting with everything else going on. We were able to re-implement our Street View viewer recently, and had no issues plugging the new one into the system. Documenting the nodes from a client facing side is also very easy. As long as we describe each rosparam and rostopic then we have finished most of the work needed to document a node. Each program becomes a small, easy to understand, high functioning piece of the system, similar to the Unix philosophy. We couldn't be happier with our new changes, or our decision to open source the ROS nodes.

ROS Platform Upgrades for Liquid Galaxy

For the last few months, End Point has been rolling out a new application framework along with updated display applications and monitoring infrastructure for the Liquid Galaxy Display Platform. These upgrades center on the ROS framework and allow a great number of functionality extensions and enhancements for the end user, as well as improvements to the stability and security of the core systems. It is intended that the 50+ systems that we currently maintain and support on behalf of our enterprise clients will be upgraded to this new platform.

ROS Overview
ROS is short for “Robot Operating System”. Just as it sounds, it is a framework used for controlling robots, and handles various environmental ‘inputs’ and ‘outputs’ well. End Point chose this framework in conjunction with related ongoing development projects on behalf of our enterprise clients. This system allows complex interactions from a touchscreen, camera, SpaceNav, or other device to be interpreted conditionally and then invoke other outputs such as displaying Google Earth, Street View, or other content on a given screen, speaker, or other output device. For more details, see: http://www.ros.org

Liquid Galaxy Enhancements
This new platform brings a number of improvements to the back-end systems and viewing customer experience.
  • Improved Street View Panospheres
    The new Street View viewer draws Street View tiles inside a WebGL sphere. This is a dramatic performance and visual enhancement over the older method, and can now support spherical projection, hardware acceleration, and seamless panning. For a user, this means tilting the view vertically as well as horizontally, zooming in and out, and improved frame rates.
  • Improved Panoramic Video
    As with the panoramic Street View application, this new platform improves the panoramic video playback as well. YouTube and Google have announced major initiatives to start actively supporting 360° panoramic video, including the financial backing of some high profile projects as example use cases. The Liquid Galaxy, with its panoramic screen layout already in place, is ideally suited for this new media format.
  • Improved Touchscreen
    The touchscreen incorporates a templated layout for easier modification and customization. The scene view selector is now consolidated to a single interface and no longer requires sub-pages or redundant touches. The Street View interface, complete with the ‘pegman’ icon, is a photo-realistic map with pinch and zoom just like a tablet interface.
  • Browser Windows
    The Liquid Galaxy can control multiple browser windows that can appear anywhere on the screens, often across multiple screens. These browser windows can show anything that can appear in a desktop web browser: web pages, videos, social media updates, data visualizations, etc.
  • Content Management System
    Beginning in 2014, End Point began to upgrade the content management system for the Liquid Galaxy. With the new ROS platform, we have updated this application to Roscoe (the ROS content experience). Roscoe gives registered users the ability to create complex presentations with specific scenes. Each scene can have a specific global location to guide the Google Earth or Street View, and then invoke overlays that appear across the screens. These overlays can include photos, data graphs, videos, or web pages. Each scene can also include a specific KML data set e.g., population density data, property value data, etc.) that can appear as 3D bar graphs directly in the ‘Earth’ view.
  • Content Isolation
    Isolating the entire presentation layer in ROS makes it easy to develop exhibits without a full-fledged Liquid Galaxy system. The entire ROS stack can be installed and run on an Ubuntu 14.04 computer or within a Docker container. This ROS stack can be used by a developer or designer to build out presentations that will ultimately run on a Liquid Galaxy system.
  • App Modularization
    Each component of a Liquid Galaxy exhibit is a configurable ROS node, allowing us to reuse large swaths of code and distribute the exhibit across any number of machines. This architecture brings two strong advantages: 1) each ROS node does one specific thing, which increases portability and modularity, and 2) each node can be tested automatically, which improves reliability.
  • Enhanced Platform Stability
    By unifying all deployments on a common base, End Point is able to deploy bug fixes, monitoring scripts, and ongoing enhancements much more quickly and safely. This has enhanced the overall stability for all supported and monitored Liquid Galaxy platforms.

Product Roadmap
The items described above are the great things that we can do already with this new platform. Even greater things are coming soon:
  • LiDAR Point Clouds
    End Point has already built early prototypes for the Liquid Galaxy platform that can view LiDAR point clouds. LiDAR is rapidly gaining traction in the architecture, surveyor, and construction industries. With the large viewing area of the Liquid Galaxy, these LiDAR point clouds become much more impactful and useful to the command and control center.
  • Google Earth and Google Maps Upgrades
    End Point continues to work with the latest developments in the Google Earth and Google Maps platforms and is actively working to integrate new features and functionality. These new capabilities will be rolled out to the fleet of Liquid Galaxies as available.
  • 3D WebGL Visualizations
    The Liquid Galaxy will be enhanced to view completely virtual 3D environments using WebGL and other common formats. These environments include complex data visualizations, interior space renderings for office planning, and even games.
Next Steps
If you’re considering the Liquid Galaxy platform, contact us to discuss these latest enhancements and how they can improve the communications and presentation tools for your organization.

Testing Django Applications

This post summarizes some observations and guidelines originating from introducing the pytest unit testing framework into our CMS (Content Management System) component of the Liquid Galaxy. Our Django-based CMS allows users to define scenes, presentations and assets (StreetView, Earth tours, panos, etc) to be displayed on the Liquid Galaxy.

The purpose of this blog post is to capture my Django and testing study points, summarize useful resource links as well as to itemize some guidelines for implementing tests for newcomers to the project. It also provides a comparison between Python's standard unittest library and the aforementioned pytest. Its focus is on Django database interaction.

Versions of software packages used

This post describes some of our experiences at End Point in designing and working on comprehensive QA/CI facilities for a new system which is closely related to the Liquid Galaxy.

The experiments were done on Ubuntu Linux 14.04:

Testing Django Applications

We probably don't need to talk much about the importance of testing. Writing tests along with the application code has become standard over the years. Surely, developers may fall into a trap of their own prejudice when creating testing conditions which would still result in faulty software but the likelihood of buggy software is certainly higher on a code that has no QA measures. If the code works and is untested, it means it works by accident, they say. As a rule of thumb, unit tests should be very brief testing items seldom interacting with any external services such as the database. Integration tests on the other hand often communicate with external components.

This post will heavily reference an example minimal Django application written for the purpose of experimenting on Django testing. Its README file contains some set up and requirement notes. Also, I am not going to list (m)any code snippets here but rather reference the functional application and its test suite. Hence the points below qualify for more or less assorted little topics or observations. In order to benefit from this post, it will be helpful to follow the README and interact (run tests that is) with the demo django-testing application.

Basic Django unittest versus pytest basic examples

This pair of test modules shows the differences between Django TestCase (unittest) and pytest-django (pytest) frameworks.
  • test_unittest_style.py

    The base Django TestCase class derives along this tree:

        django.test.TestCase
            django.test.TransactionTestCase
                django.test.SimpleTestCase
                    unittest.TestCase
    
    Django adds (among any other aspects) handling of database, the documentation is here, on top of the Python standard unittest library.

  • test_pytest_style.py

    this is a pytest style implementation of the same tests and pytest-django plug-in adds, among other features, Django database handling support.

The advantage of unittest is that it comes with the Python installation - it’s a standard library. That means that one does not have to install anything for writing tests, unlike pytest which is a third-party library and needs to be installed separately. While the absence of additional installation is certainly a plus, it’s dubious whether being a part of Python distribution is a benefit. I seem to recall Guido Van Rossum during Europython 2010 having said the the best thing for pytest is not being part of the Python standard set of libraries for its lively development and evolution would be slowed down by the inclusion.

There are very good talks and articles summarizing advantages of pytest. For me personally, the reporting of error context is supreme. No boiler-plate (no inheritance), using plain Python asserts instead of many assert* methods and flexibility (function, class) are other big plus points

As the comment in the test_unittest_style.py file says, this particular unittest-based test module can be run by both Django manage.py (which boils down to unittest lookup discovery on a lower layer) or by py.test (pytest).

It should also be noted, that pytest's flexibility can bite back if something gets overlooked.

Django database interaction unittest versus pytest (advanced examples)

  • test_unittest_advanced.py

    Since this post concentrates on pytest and since it's the choice for our LG CMS project (naturally :-), this unittest example just shows how the test (fresh) database is determined and how Django migrations are run at each test suite execution. Just as described in the Django documentation: "If your tests rely on database access such as creating or querying models, be sure to create your test classes as subclasses of django.test.TestCase rather than unittest.TestCase." That is true for database interaction but not completely true when using pytest. And "Using unittest.TestCase avoids the cost of running each test in a transaction and flushing the database, but if your tests interact with the database their behavior will vary based on the order that the test runner executes them. This can lead to unit tests that pass when run in isolation but fail when run in a suite." django.test.TestCase, however, ensures that each test runs inside a transaction to provide isolation. The transaction is rolled back once the test case is over.

  • test_pytest_advanced.py

    This file represents the actual core of the test experiments for this blog / demo app and shows various pytest features and approaches typical for this framework as well as Django (pytest-django that is) specifics.

Django pytest notes (advanced example)

Much like the unittest documentation, the pytest-django recommends avoiding database interaction in unittest and concentrate only on the logic which should be designed in such a fashion that it can be tested without database.

  • test database name prefixed "test_" (just like at the unittest example), the base value is taken from the database section of the settings.py. As a matter of fact, it’s possible to run the test suite after previously dropping the main database, the test suite interacts only with "test_" + DATABASE_NAME
  • migration execution before any database interaction is carried out (similarly to unittest example)
  • database interaction marked by a Python decorator @pytest.mark.django_db on the method or class level (or stand-alone function level). It's in fact the first occurrence of this marker which triggers the database set up (its creation and migrations handling). Again analogously to unittest (django.test.TestCase), the test case is wrapped in a database transaction which puts the database back into the state prior to the test case. The database "test_" + DATABASE_NAME itself is dropped once the test suite run is over. The database is not dropped if --db-reuse option is used. The production DATABASE_NAME remains untouched during the test suite run (more about this below)
  • pytest_djangodb_only.py - setup_method - run this module separately and the data created in setup_method end up NOT in the "test_" + DATABASE_NAME database but in the standard one (as configured in the settings.py which would be the production database likely)! Also this data won’t be rolled back. When run separately, this test module will pass (but still the production database would be tainted). It may or may not fail on the second and subsequent run depending whether it creates any unique data. When run within the test suite, the database call from the setup_method will fail despite the presence of the class django_db marker. This has been very important to realize. Recommendation: do not include database interaction in the pytest special methods (such assetup_method or teardown_method, etc), only include database interaction in the test case methods
  • The error message "Failed: Database access not allowed, use the "django_db" mark to enable" was seen on a database error on a method which actually had the marker. This output is not to be 100% trusted
  • data model factories are discussed separately below
  • lastly the test module shows Django Client instance and calling an HTTP resource

pytest setup_method

While the fundamental differences between unittest and pytest were discussed, there is something to be said about Django specific differences of the two. There is different database-related behaviour of unittest setUp method versus the pytest setup_method method. The setUp is included in the transaction and database interactions are rolled back once the test case is over. The setup_method is not included in the transaction. Moreover, interacting with the database from setup_method results in faulty behaviour and difference depending whether the test module is run on its own or as a part of the whole test suite.

The bottom line is: do not include database interaction in setup_method. This setUp, setup_method behaviour was already shown in the basic examples. And more description and demonstration of this behaviour is in the file: pytest_djangodb_only.py. This actually revealed the fact that using django_db database fixture is not supported in special pytest methods and the aforementioned error message is misleading (more references here and here).

When running the whole test suite, this file won't be collected (its name lacks "test_" string). It needs to be renamed to be included in the test suite run.

JSON data fixtures versus factories (pytest advanced example)

The traditional way of interacting with some test data was to perform following steps:
  • have data loaded in the database
  • python manage.py dumpdata
  • the produced JSON file is dragged along the application test code
  • call_command("loaddata", fixture_json_file_name) happens at each test suite run

  • The load is expensive, the JSON dump file is hard to maintain manually if the original modified copy and the current needs diverge (the file has integer primary keys value, etc). Although even the recent Django testing documentation mentions usage of JSON data fixtures, the approach is considered discouraged and the goal is recommended to achieve by means of loading the data in migrations or using model data factories.

    This talk for example compares the both approaches in favour of factory_boy library. A quote from the article: "Factory Boy is a Python port of a popular Ruby project called Factory Girl. It provides a declarative syntax for how new instances should be created. ... Using fixtures for complex data structures in your tests is fraught with peril. They are hard to maintain and they make your tests slow. Creating model instances as they are needed is a cleaner way to write your tests which will make them faster and more maintainable."

    The file test_pytest_advanced.py demostrates interaction with factories defined in the module factories.py, the basic very easy-to-use features.

    Despite its ease of use, the factory_boy is a powerful library capable of modeling Django's ORM many-to-many relationships, among other features.

    Additional useful links

    Conclusion

    You should have a good idea about testing differences via unittest and pytest in the Django environment. The emphasis has been put on pytest (django-pytest) and some recommended approaches. The demo application django-testing brings functional test cases demonstrating the behaviour and features discussed. The articles and talks listed in this post were extremely helpful and instrumental in gaining expertise in the area and introducing rigorous testing approach into the production application.

    Any discrepancy between the behaviour described above and on your own setup may originate from different software versions. In any case, if anything is not clear enough, please let me know in the comments.

    Image Processing In The Cloud With Blitline and Wordpress

    Working with ImageMagick can be difficult. First, you have to get it installed on your OS (do you have Dev libs in place?), then you have to enable it in the language of your choice, then get it working in your application. After all that, do it all over again on the staging server where debugging may be complicated, and you may not have Admin rights. Meet Image Processing in the Cloud. Meet Blitline.


    I'm doing a lot of things with Wordpress now, so we'll set it up with Wordpress and PHP.


    Step 1

    Get a free developer account with Blitline, and note your application id.

    Step 2

    Get the Blitline PHP wrapper library Blitline_php. It's clean and awesome, but unfortunately at the time of writing it was missing a few things, like being able to run your own Image Magick script and set a postback URL for when the job is finished. Yes, those are all useful features of Blitline cloud image processing! I'm still waiting on my pull request to be incorporated into the official version, so you can use mine that has these two useful features for now Ftert's Blitline_php


    Step 3

    Now it's time to integrate it in our application. Since it's Wordpress, I'm doing it in the 'wp_generate_attachment_metadata' callback in functions.php


    require_once dirname(__FILE__) . '/blitline_php/lib/blitline_php.php';
    ...
    add_filter( 'wp_generate_attachment_metadata', array($this,'wp_blur_attachment_filter'), 10, 2 );
    ...
    public function wp_blur_attachment_filter($image_data, $attachment_id) {
    
    $url =  wp_get_attachment_url($attachment_id);
    
    list($src, $width, $length) = wp_get_attachment_image_src($attachment_id);
    
    $data = pathinfo($src);
    
    $dest = $data['filename'] . '_darken_75_105_100_blur_0_20.' . $data['extension'];
    
    $Blit = new Blitline_php();
    
    $Blit->load($url, $dest);
    
    $Blit->do_script("convert input.png -blur 0x20 -modulate 75,105,100 output.png");
    
    $Blit->set_postback_url( get_site_url() . '/wp-json/myapp/v1/blitline_callback');
    
    $results = $Blit->process();
    
    if ($results->success()) {
    foreach($results->get_images() as $name => $url) {
    error_log("Processed: {$name} at {$url}\n");
    }
    } else {
    error_log($results->get_errors());
    }
    }
    

    We are sending a JSON POST request to Blitline to make the blurred and saturated version of the uploaded image. You can track the progress of your jobs here. The request will return a URL to the image on the Blitline server, but the image may not be there right away, because the processing is asynchronous. I tried to set up S3 bucket integration (yes, Blitline can upload to S3 for you!), but the setup procedure is quite tedious. You have to manually enter your AWS Canonical ID (and obtain it first from Amazon) on the Blitline page. Then you have to create a special policy in your bucket for Blitline. This is a lot of hassle, and giving permissions to someone else might not be the way to go for you. For me personally it didn't work, because my policy was being automatically overwritten all the time. I don't even know why. So here's where the postback URL comes in play.


    Step 4

    I'm using this plugin WP-API V2 that soon will become part of Wordpress to make REST endpoints. In wp-content/mu-plugins/my-app-custom-endpoints/lib/endpoints.php


    add_action('rest_api_init', function () {
    
    register_rest_route('portfolio/v1', '/blitline_callback', array(
    'methods' => 'POST',
    'callback' => 'process_blitline_callback',
    ));
    });
    

    In wp-content/mu-plugins/loader.php

    require_once dirname(__FILE__) . '/blitline_php/lib/blitline_php.php';
    
    require_once dirname(__FILE__) . '/my-app-custom-endpoints/api.php';
    

    In wp-content/mu-plugins/my-app-custom-endpoints/api.php


    if( ! defined( 'ABSPATH' ) ) exit;
    
    require_once dirname(__FILE__) . '/lib/endpoints.php';
    


    Here's the fun part. Add to wp-content/mu-plugins/my-app-custom-endpoints/lib/endpoints.php


    use Aws\S3\S3Client;
    
    function process_blitline_callback($request) {
    
    if( !class_exists( 'WP_Http' ) )
    include_once( ABSPATH . WPINC. '/class-http.php' );
    
    $s3Client = S3Client::factory(array(
    'credentials' => array(
    'key'    => 'YOUR S3 KEY',
    'secret' => 'YOUR S3 SECRET'
    )
    ));
    $photo = new WP_Http();
    
    $body = $request->get_body_params();
    
    $var = (array) json_decode(stripslashes($body['results']), true);
    
    if (isset($var['images'][0]['error'])) {
    error_log('Error ' . $var['images'][0]['error']);
    return;
    }
    
    $photo = $photo->request( $var['images'][0]['s3_url'] );
    
    $photo_name = $var['images'][0]['image_identifier'];
    
    $attachment = wp_upload_bits( $photo_name , null,
    $photo['body'],
    date("Y-m", strtotime( $photo['headers']['last-modified'] ) ) );
    
    $upload_dir = wp_upload_dir();
    
    $s3Client->putObject(array(
    'Bucket' => "yourbucket",
    'Key'    => 'wp-content/uploads' . $upload_dir['subdir'] . '/' . $photo_name,
    'SourceFile'   => $attachment['file'],
    'ACL'    => 'public-read'
    ));
    }
    

    In the callback we download the processed image from the temporary Blitline URL. One little bonus in here is the upload to Amazon S3 bucket. I use Amazon PHP SDK to achieve that. Note the permissions. This was one last thing when I actually almost gave up trying to make Blitline postback URL work. When the image finally appeared in my bucket, it wasn't accessible from the outside, because I didn't set permissions


    Step 5. If it doesn't work. Debugging.

    I used Firefox add-on HttpRequester to post the mock response from Blitline to my application. If you don't want to deploy each time you change the code, here's another useful thing LocalTunnel, so you can expose your localhost to the internet and set the postback to your local app.


    And that's how you do image processing in the cloud!