News

Welcome to End Point’s blog

Ongoing observations by End Point people

Liquid Galaxy Uses at UNC Chapel Hill

An article was posted on The Tech Broadcast last week that featured the UNC Chapel Hill Center for Faculty Excellence's Faculty Showcase. The faculty showcase included a fantastic presentation featuring the many ways students and faculty use their Liquid Galaxy, and discussed other opportunities for using the system in the future.

Exciting examples cited of great classroom successes making use of the Liquid Galaxy include:

  1. A course offered at UNC, Geography 121 People and Places, requires its students to sift through data sets and spend time in the GIS lab/research hub making maps using the data they've collected. The goal of this assignment is to demonstrate understanding of diversity within particular geographic entities. The students use the Liquid Galaxy to present their findings. Examples of studies done for this project include studies of fertility, infant mortality, income inequality, poverty, population density, and primary education.

  2. A group of students working in lab found that the household income of a particular municipality was many times greater than all surrounding municipalities. By looking around on the Liquid Galaxy, they discovered an enormous plantation in a very rural area. They were then able to understand how that plantation skewed the data from the entire municipality.

  3. While studying a web map, students found that average life expectancy dropped by a decade within a very short distance. They decided to look at the Liquid Galaxy to see whether they could make any conclusions by viewing the area. By using the Liquid Galaxy, the students were able to think about what the data looks like, not just statistically but on Earth.

  4. A Geography teacher had a lecture about the geography of Vietnam. The teacher used the Liquid Galaxy to give the class a tour of Vietnam and show how the different areas factored into the course. The teacher asked the class where within Vietnam they’d like to go, and was able to take the students to the different geographical areas on the Liquid Galaxy and tell them in detail about those areas while they had the visual support of the system.

  5. A geography class called The Geography of Latin America focuses on extractive industries. The class discusses things like agriculture in South America, and the percentage of land in Brazil that is used for soy production. The faculty reports that seeing this information in an immersive environment goes a long way in teaching the students.

  6. Urban planning students use the Liquid Galaxy when looking into urban revitalization. Uses for these students include using the system to visit the downtown areas and see firsthand what the areas look like to better understand the challenges that the communities are facing.

  7. Students and faculty have come to LG to look at places that they are about to travel to abroad, or thinking about traveling abroad, in order to prepare for their travels. An example given was a Master of Fine Arts student who was a sculptor and was very interested in areas where there are great quantities of rocks and ice. She traveled around on the the Liquid Galaxy and looked around in Iceland. Researching the system on the Liquid Galaxy helped to pique her interest and ultimately led to her going to Iceland to travel and study.

During the faculty showcase, faculty members listed off some of the great benefits of having the Liquid Galaxy as a tool that was available to them.

  1. The Liquid Galaxy brought everyone together and fostered a class community. Teachers would often arrive to classes that utilize the Liquid Galaxy and find that half the students were early to class. Students would be finding places (their homes, where they studied abroad, and more) and friendships between students would develop as a result of the Liquid Galaxy.

  2. Liquid Galaxy helps students with geographic literacy. They are able to think about concepts covered in class, and fly to and observe the locations discussed.

  3. Students often bring parents and family to see the Liquid Galaxy, which is widely accessible to students on campus. Students are always excited to share what they're doing with the system, with family and with faculty.

  4. Faculty members have commented that students that don’t ask questions in class have been very involved in the Liquid Galaxy lessons, which could be in part because some students are more visual learners. These visual learners find great benefit in seeing the information displayed in front of them in an interactive setting.

  5. From a faculty standpoint, a lot of time was spent planning and trying to work out the class structure, which has developed a lot. Dedicating class-time for the Liquid Galaxy was beneficial, and resulted in teaching less but in more depth and in different ways. The teacher thinks there was more benefit to that, and it was a great learning experience for all parties involved.

Faculty members expressed interest and excitement when learning more about the Liquid Galaxy and the ways it is used. There was a lot of interest in using the Liquid Galaxy for interdisciplinary studies between different departments to study how different communities and cultures work. There was also interest in further utilization of the system’s visualization capabilities. A professor from the School of Dentistry spoke of how he could picture using the Liquid Galaxy to teach someone about an exam of the oral cavity through the LG. Putting up 3D models of the oral cavity using our new Sketchfab capabilities would be a perfect way to achieve this!

We at End Point were very excited to learn more about the many ways that Liquid Galaxy is being successfully used at UNC as a tool for research, for fun, and to bring together students and faculty alike. We look forward to seeing how UNC, among the many other research libraries that use Liquid Galaxy, will implement the system in courses and on campus in the future.

Client Case Study: Carjojo


Carjojo’s site makes use of some of the best tools on the market today for accessing and displaying data. Carjojo is a car buying application that takes data about car pricing, dealer incentives, and rebate programs and aggregates that into a location-specific vehicle pricing search tool. The Carjojo work presented a great opportunity for End Point to utilize our technical skills to build a state-of-the-art application everyone is very proud of. End Point worked on the Carjojo development project from October of 2014 through early 2016, and the final Carjojo application launched in the summer of 2016. This case study shows that End Point can be a technology partner for a startup, enabling the client to maintain their own business once our part of the project is over.

Why End Point?

Reputation in full stack development

End Point has deep experience with full stack development so for a startup getting advice from our team can prove really helpful when deciding what technologies to implement and what timelines are realistic. Even though the bulk of the Carjojo work focused on specific development pieces, having developers available to help advise on the entire stack allows a small startup to leverage a much broader set of skills.

Startup Budget and Timelines

End Point has worked with a number of startups throughout our time in the business. Startups require particular focused attention on budget and timelines to ensure that the minimum viable product can be ready on time and that the project stays on budget. Our consultants focus on communication with the client and advise them on how to steer the development to meet their needs, even if those shift as the project unfolds.

Client Side Development Team

One of the best things about a lot of our clients is their technological knowledge and the team they bring to the table. In the case of Carjojo, End Point developers fit inside of their Carjojo team to build parts that they were unfamiliar with. End Point developers are easy to work with and already work in a remote development environment, so working in a remote team is a natural fit.

Client Side Project Management

End Point works on projects where either the project management is done in-house or by the client. In the case of a project like Carjojo where the client has technical project management resources, our engineers work within that team. This allows a startup like Carjojo insight into the project on a daily basis.

Project Overview

The main goal of the Carjojo project was to aggregate several data sources on car price and use data analytics to provide useful shopper information, and display that for their clients.
Carjojo’s staff had experience in the car industry and leveraged that to build a sizeable database of information. Analytics work on the database provided another layer of information, creating a time- and location-specific market value for a vehicle.

Carjojo kept the bulk of the database collection and admin work in house, as well as provided an in-house designer that closely worked with them on their vision for the project. End Point partnered to do the API architecture work as well as the front end development.

A major component of this project was using a custom API to pull information from the database and display it quickly with high end, helpful infographics. Carjojo opted to use APIs so that the coding work would seamlessly integrate with future plans for a mobile application, which normally require a substantial amount of recoding.

Creating a custom API also allows Carjojo to work with future partners and leverage their data and analytics in new ways as their business grows.

Team

Patrick Lewis: End Point project manager and front end developer. Patrick led development of the AngularJS front end application which serves as the main customer car shopping experience on the Carjojo site. He also created data stories using combinations of integrated Google Maps, D3/DimpleJS charts, and data tables to aid buyers with car searches and comparisons.



Matt Galvin: Front end developer. Matt led the efforts for data-visualization with D3 and DimpleJS. He created Angular services that were used to communicate with the backend, used D3 and DimpleJS to illustrate information graphically about cars, car dealers, incentives, etc., sometimes neatly packaging them into directives for easy re-use when the case fit. He also created a wealth of customizations and extensions of DimpleJS which allowed for rapid development without sacrificing visualization quality.



Josh Williams: Python API development. Josh led the efforts in connecting the database into Django and Python to process and aggregate the data as needed. He also used TastyPie to format the API response and created authentication structures for the API.

 




Project Specifics

API Tools

Carjojo’s project makes use of some of the best tools on the market today for accessing and displaying data. Django and Tastypie were chosen to allow for rapid API development and to keep the response time down on the website. In most cases the Django ORM was sufficient for generating queries from the data, though in some cases custom queries were written to better aggregate and filter the data directly within Postgres.

To use the location information in the database, some GIS location smarts were tied into Tastypie. Location searches tied into GeoDjango and generated PostGIS queries in the database.

Front End Tools

D3 is standard in data-visualization and is great for doing both simple and complicated graphics. Many of Carjojo’s graphs were bar graphs, pie charts and didn’t really require writing out D3 by hand. We also wanted to make many of them reusable and dynamic (often based on search terms or inputs) with use of Angular directives and services. This could have been done with pure D3, but Dimple makes creating simple D3 graphs easy and fast.

DimpleJS was used a lot in this project. Since Carjojo is data-driven, they wanted to display their information in an aesthetically pleasing manner and DimpleJS allowed us to quickly spin up information against some of the project’s tightest deadlines.

The approach worked well for most cases. However, sometimes Carjojo wanted something slightly different than what DimpleJS does out of the box. One example of DimpleJS customization work can be found here on our blog.

Another thing to note about the data visualizations was that sometimes when the data was plotted and graphed, it brought to light some discrepancies in the back-end calculations and analytics, requiring some back-and-forth between the Carjojo DBA and End Point.

Results

Carjojo had a successful launch of their service in the summer of 2016. Their system has robust user capabilities, a modern clean design, and a solid platform to grow from. The best news for Carjojo is that now the project has been turned back over to them for development. End Point believes in empowering our clients to move forward with their business and goals without us. Carjojo knows that we’ll be here for support if they need it.






Office Space Available at End Point HQ!

Our office-mates are leaving, and we are looking to fill their desk space. There are 8 open desks available, including one desk in a private office.

Amenities include free wifi, furniture, conference room access, kitchen access, regular office cleaning, and close proximity (one block) to Madison Square Park.

Our company, End Point, is a tech company that builds ecommerce sites, and also develops the Liquid Galaxy. There are typically 4 or 5 of us in the office on a given day. We are quiet, friendly, and respectful.

Please contact us at ask@endpoint.com for more information.

Seedbank: Structured Seed Files for Rails Projects

Rails seed files are a useful way of populating a database with the initial data needed for a Rails project. The Rails db/seeds.rb file contains plain Ruby code and can be run with the Rails-default rails db:seed task. Though convenient, this "one big seed file" approach can quickly become unwieldy once you start pre-populating data for multiple models or needing more advanced mechanisms for retrieving data from a CSV file or other data store.

The Seedbank gem aims to solve this scalability problem by providing a drop-in replacement for Rails seed files that allows developers to distribute seed data across multiple files and provides support for environment-specific files.

Organizing seed files in a specific structure within a project's db/seeds/ directory enables Seedbank to either run all of the seed files for the current environment using the same rails db:seed task as vanilla Rails or to run a specific subset of tasks by specifying a seed file or environment name when running the task. It's also possible to fall back to the original "single seeds.rb file" approach by running rails db:seed:original.

Given a file structure like:

db/seeds/
  courses.seeds.rb
  development/
    users.seeds.rb
  students.seeds.rb

Seedbank will generate tasks including:

rails db:seed                   # load data from db/seeds.rb, db/seeds/*.seeds.rb, and db/seeds/[ENVIRONMENT]/*.seeds.rb
rails db:seed:courses           # load data from db/seeds/courses.seeds.rb
rails db:seed:common            # load data from db/seeds.rb, db/seeds/*.seeds.rb
rails db:seed:development       # load data from db/seeds.rb, db/seeds/*.seeds.rb, and db/seeds/development/*.seeds.rb
rails db:seed:development:users # load data from db/seeds/development/users.seeds.rb
rails db:seed:original          # load data from db/seeds.rb

I've found the ability to define development-specific seed files helpful in recent projects for populating 'test user' accounts for sites running in development mode. We've been able to maintain a consistent set of test user accounts across multiple development sites without having to worry about accidentally creating those same test accounts once the site is running in a publicly accessible production environment.

Splitting seed data from one file into multiple files does introduce a potential issue when the data created in one seed file is dependent on data from a different seed file. Seedbank addresses this problem by allowing for dependencies to be defined within the seed files, enabling the developer to control the order in which the seed files will be run.

Seedbank runs seed files in alphabetical order by default but simply wrapping the code in a block allows the developer to manually enforce the order in which tasks should be run. Given a case where Students are dependent on Course records having already been created, the file can be set up like this:

# db/seeds/students.seeds.rb
after :courses do
  course = Course.find_by_name('Calculus')
  course.students.create(first_name: 'Patrick', last_name: 'Lewis')
end

The added dependency block will ensure that the db/seeds/courses.seeds.rb file is executed before the db/seeds/students.seeds.rb file, even when the students file is run via a specific rails db:seed:students task.

Seedbank provides additional support for adding shared methods that can be reused within multiple seed files and I encourage anyone interested in the gem to check out the Seedbank README for more details. Though the current 0.4 version of Seedbank doesn't officially have support for Rails 5, I've been using it without issue on Rails 5 projects for over six months now and consider it a great addition to any Rails project that needs to pre-populate a database with a non-trivial amount of data.

Job opening: Fulfillment Manager

Update: This position has been filled! Thanks to everyone who expressed interest.

This role is based in our Bluff City, Tennessee office, and is responsible for everything about fulfillment of our Liquid Galaxy and other custom-made hardware products, from birth to installation. See liquidgalaxy.endpoint.com to learn more about Liquid Galaxy.

What is in it for you?

  • Interesting and exciting startup-like atmosphere at an established company
  • Opportunity for advancement
  • Benefits including health insurance and self-funded 401(k) retirement savings plan
  • Annual bonus opportunity

What you will be doing:

  • Manage receiving, warehouse, and inventory efficiently
  • Oversee computer system building
  • Product testing and quality assurance
  • Packing
  • Shipment pick-up
  • Communicate with and create documents for customs for international shipping
  • Be the expert on international shipping rules and regulations
  • Delivery tracking and resolution of issues
  • Verify receipt of intact, functional equipment
  • Resolve RMA and shipping claims
  • Help test and implement any new warehouse software and processes
  • Design and implement new processes
  • Use effectively our project software (Trello) to receive and disseminate project information
  • Manage fulfillment employees and office facility
  • Work through emergency situations in a timely and controlled manner
  • Keep timesheet entries up to date throughout the day

What you will need:

  • Eagerness to “own” the fulfillment process from end to end
  • Exemplary communication skills with the entire company
  • High attention to detail
  • Consistent habits of reliable work
  • Ability to make the most of your time and resources without external micromanagement
  • Desire, initiative, and follow-through to improve on our processes and execution
  • Work with remote and local team members
  • Strive to deliver superior internal customer service
  • Ability to work through personnel issues
  • Go above and beyond the call of duty when the situation arises

About End Point:

End Point is a 21-year-old Internet consulting company with 50 full-time employees working together from our headquarters in New York City, our office in eastern Tennessee, and home offices around the world. We serve over 200 clients ranging from small family businesses to large corporations, using a variety of open source technologies. Our team is made up of strong product design, software development, database, hardware, and system administration talent.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of gender, race, religion, color, national origin, sexual orientation, age, marital status, veteran status, or disability status.

Please email us an introduction to jobs@endpoint.com to apply. Include your resume and anything else that would help us get to know you. We look forward to hearing from you! Full-time employment seekers only, please. No agencies or subcontractors.

Bash: loop over a list of (possibly non-existing) files using wildcards with nullglob (or failglob) option

Let's say you're working in Bash, and you want to loop over a list of files, using wildcards.

The basic code is:

#!/bin/bash
for f in /path/to/files/*; do
  echo "Found file: $f"
done

Easy as that. However, there could be a problem with this code: if the wildcard does not expand to actual files (i.e. there's no file under /path/to/files/ directory), $f will expand to the path string itself, and the for loop will still be executed one time with $f containing "/path/to/files/*".

How to prevent this from happening? Nullglob is what you're looking for.

Nullglob, quoting shopts man page, "allows filename patterns which match no files to expand to a null string, rather than themselves".

Using shopt -s you can enable BASH optional behaviors, like Nullglob. Here's the final code:

#!/bin/bash
shopt -s nullglob
for f in /path/to/files/*; do
  echo "Found file: $f"
done

Another interesting option you may want to check for, supported by Bash since version 3, is failglob.

With failglob enabled, quoting again, "patterns which fail to match filenames during filename expansion result in an expansion error". Depending on what you need, that could even be a better behavior.

Wondering why nullglob it's not the default behavior? Check this very good answer to the question.

Malaysia Open Source Community Meetup Quarter 4 2015 (MOSCMY Q4 2015)

After a year, finally I decided to publish this post to all of you!

On November 26th 2015 I had a chance to give a talk in a local open source conference here in Malaysia. The organizer requested me to specifically deliver a talk on "remote work". This meetup was organized by Malaysian Development Corporation (MDEC) with the sponsorship of Microsoft Malaysia. Microsoft recently started to become more "open source friendly" given that they are in the effort of pushing their cloud based product, Azure. The full schedule of the event can be referred here.

The conference was divided into two sessions; where the morning session was held in Petronas Club, Tower One of Kuala Lumpur City Centre (KLCC) and the other session was held in Microsoft Malaysia's office in Tower Two KLCC. Generally the morning session was for non parallel track (including my track) and the afternoon sessions were two parallel sessions slot.

Morning Session

The morning session started with a talk by Dinesh Nair, as the Director of Developer Experience and Evangelism, Microsoft Malaysia. The second session in the morning was delivered by Mr Izzat M. Salihuddin, from Multimedia Development Corporation (MDeC), Malaysia. He spoke on the behalf of MDeC explaining the effort by MDeC as a government wing to realize the local cloud infrastructure. One of the challenges that being mentioned by Mr Izzat was the readiness of physical infrastructure as well as the broadband connectivity for the public. The final slot in the morning was delivered by me in which I explained much of the way of how End Pointers do their job, open source software that we used, as well as how we accomplish our job remotely. The morning session was adjourned with a lunch break.



Just in case if you are wondering, this is me delivering the talk on that day

Afternoon Session

The afternoon session was a parallel track session, where I chose to attend a talk on Ubuntu's Juju service. The talk was delivered by Mr Khairul Aizat Kamarudzaman from Informology. Mr Aizat's slides for his talk could be read here. Later, Mr Sanjay shared his Asterisk skills in which the server is hosted on the Azure platform. Mr Sanjay showed to us how make phone call from the computer to the mobile phone. Asterisk is different from Skype because it is using an open protocol (SIP) and with open clients. Mr Sanjay showed a demo on his implementation, in which it looks like the setup is to compete with the typical PABX phone system.

For the next slot I decided to enter the slot on TCPTW kernel patch which was delivered by Mr Faisal from Nexoprima. As far as I understood, Mr Faisal reintroduced his own patch for the Linux kernel in order to handle TCP TIME_WAIT issue which was happened due to extremely busy HTTP requests. Since connection in TIME_WAIT state hold a local port for 1 minute and in many distro the default ports are up to 30,000 - the effort put to search for free port(s) will use intensive CPU and it could was CPU cycle to purge tons of TIME_WAIT connections.


Mr Faisal gave his talks on TCPTW kernel patch

Mr Faisal's TCPTW patch for CentOS 7 could be viewed here. His presentation slides could be viewed here.

Before went back home, I decided to enter a talk on "Dockerizing IOT Service" by Mr Syukor. In this talk Mr Syukor gave a bit theoretical background on Docker and how it can be used on Raspberry Pi board. You can view his slides here. My personal thought is that Raspberry Pi is versatile enough to run any modern operating system and Docker should not be much an issue.

Postgres statistics and the pain of analyze

Anytime you run a query in Postgres, it needs to compile your SQL into a lower-level plan explaining how exactly to retrieve the data. The more it knows about the tables involved, the smarter the planner can be. To get that information, it gathers statistics about the tables and stores them, predictably enough, in the system table known as pg_statistic. The SQL command ANALYZE is responsible for populating that table. It can be done per-cluster, per-database, per-table, or even per-column. One major pain about analyze is that every table *must* be analyzed after a major upgrade. Whether you upgrade via pg_dump, pg_upgrade, Bucardo, or some other means, the pg_statistic table is not copied over and the database starts as a clean slate. Running ANALYZE is thus the first important post-upgrade step.

Unfortunately, analyze can be painfully slow. Slow enough that the default analyze methods sometimes take longer that the entire rest of the upgrade! Although this article will focus on the pg_upgrade program in its examples, the lessons may be applied to any upgrade method. The short version of the lessons is: run vacuumdb in parallel, control the stages yourself, and make sure you handle any custom per-column statistics.

Before digging into the solution in more detail, let's see why all of this is needed. Doesn't pg_upgrade allow for super-fast Postgres major version upgrades, including the system catalogs? It does, with the notable exception of the pg_statistics table. The nominal reason for not copying the data is that the table format may change from version to version. The real reason is that nobody has bothered to write the conversion logic yet, for pg_upgrade could certainly copy the pg_statistics information: the table has not changed for many years.

At some point, a DBA will wonder if it is possible to simply copy the pg_statistic table from one database to another manually. Alas, it contains columns of the type "anyarray", which means it cannot be dumped and restored:

$ pg_dump -t pg_statistic --data-only | psql -q
ERROR:  cannot accept a value of type anyarray
CONTEXT:  COPY pg_statistic, line 1, column stavalues1: "{"{i,v}","{v}","{i,o,o}","{i,o,o,o}","{i,i,i,v,o,o,o}","{i,i,o,o}","{i,o}","{o,o,o}","{o,o,o,o}","{o..."

I keep many different versions of Postgres running on my laptop, and use a simple port naming scheme to keep them straight. It's simple enough to use pg_dump and sed to confirm that the structure of the pg_statistic table has not changed from version 9.2 until 9.6:

$ for z in 840 900 910 920 930 940 950; do echo -n $z: ; diff -sq <(pg_dump \
>  --schema-only -p 5$z -t pg_statistic | sed -n '/CREATE TABLE/,/^$/p') <(pg_dump \
>  --schema-only -p 5960 -t pg_statistic | sed -n '/CREATE TABLE/,/^$/p'); done
840:Files /dev/fd/63 and /dev/fd/62 differ
900:Files /dev/fd/63 and /dev/fd/62 differ
910:Files /dev/fd/63 and /dev/fd/62 differ
920:Files /dev/fd/63 and /dev/fd/62 are identical
930:Files /dev/fd/63 and /dev/fd/62 are identical
940:Files /dev/fd/63 and /dev/fd/62 are identical
950:Files /dev/fd/63 and /dev/fd/62 are identical

Of course, the same table structure does not promise that the backend of different versions uses them in the same way (spoiler: they do), but that should be something pg_upgrade can handle by itself. Even if the table structure did change, pg_upgrade could be taught to migrate the information from one format to another (its raison d'ĂȘtre). If the new statistics format take a long time to generate, perhaps pg_upgrade could leisurely generate a one-time table on the old database holding the new format, then copy that over as part of the upgrade.

Since pg_upgrade currently does none of those things and omits upgrading the pg_statistics table, the following message appears after pg_upgrade has been run:

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Looking at the script in question yields:

#!/bin/sh

echo 'This script will generate minimal optimizer statistics rapidly'
echo 'so your system is usable, and then gather statistics twice more'
echo 'with increasing accuracy.  When it is done, your system will'
echo 'have the default level of optimizer statistics.'
echo

echo 'If you have used ALTER TABLE to modify the statistics target for'
echo 'any tables, you might want to remove them and restore them after'
echo 'running this script because they will delay fast statistics generation.'
echo

echo 'If you would like default statistics as quickly as possible, cancel'
echo 'this script and run:'
echo '    vacuumdb --all --analyze-only'
echo

vacuumdb --all --analyze-in-stages
echo

echo 'Done'

There are many problems in simply running this script. Not only is it going to iterate through each database one-by-one, but it will also process tables one-by-one within each database! As the script states, it is also extremely inefficient if you have any per-column statistics targets. Another issue with the --analyze-in-stages option is that the stages are hard-coded (at "1", "10", and "default"). Additionally, there is no way to easily know when a stage has finished other than watching the command output. Happily, all of these problems can be fairly easily overcome; let's create a sample database to demonstrate.

$ initdb --data-checksums testdb
$ echo port=5555 >> testdb/postgresql.conf 
$ pg_ctl start -D testdb
$ createdb -p 1900 alpha
$ pgbench alpha -p 1900 -i -s 2
$ for i in `seq 1 100`; do echo create table pgb$i AS SELECT \* FROM pgbench_accounts\;; done | psql -p 1900 alpha

Now we can run some tests to see the effect of the --jobs option. Graphing out the times shows some big wins and nice scaling. Here are the results of running vacuumdb alpha --analyze-only with various values of --jobs:

Simple graph showing time decreasing as number of jobs increases

The slope of your graph will be determined by how many expensive-to-analyze tables you have. As a rule of thumb, however, you may as well set --jobs to a high number. Anything over your max_connections setting is pointless, but don't be afraid to jack it up to at least a hundred. Experiment on your test box, of course, to find the sweet spot for your system. Note that the --jobs argument will not work on old versions of Postgres. For those cases, I usually whip up a Perl script using Parallel::ForkManager to get the job done. Thanks to Dilip Kumar for adding the --jobs option to vacuumdb!

The next problem to conquer is the use of custom statistics. Postgres' ANALYZE uses the default_statistics_target setting to determine how many rows to sample (the default value in modern versions of Postgres is 100). However, as the name suggests, this is only the default - you may also set a specific target at the column level. Unfortunately, there is no way to disable this quickly, which means that vacuumdb will always use the custom value. This is not what you want, especially if you are using the --analyze-in-stages option, as it will happily (and needlessly!) recalculate columns with specific targets three times. As custom stats are usually set much higher than the default target, this can be a very expensive option:

$ ## Create a largish table:
$ psql -qc 'create unlogged table aztest as select * from pgbench_accounts'
$ for i in {1..5}; do psql -qc 'insert into aztest select * from aztest'; done
$ psql -tc "select pg_size_pretty(pg_relation_size('aztest'))"
820 MB
$ psql -qc '\timing' -c 'analyze aztest'
Time: 590.820 ms  ## Actually best of 10: never get figures from a single run!
$ psql -c 'alter table aztest alter column aid set statistics 1000'
ALTER TABLE
$ psql -qc '\timing' -c 'analyze aztest'
Time: 2324.017 ms ## as before, this is the fastest of 10 runs

As you can see, even a single column can change the analyze duration drastically. What can we do about this? The --analyze-in-stages is still a useful feature, so we want to set those columns back to a default value. While one could reset the stats and then set them again on each column via a bunch of ALTER TABLE calls, I find it easier to simply update the system catalogs directly. Specifically, the pg_attribute table contains a attstattarget column which has a positive value when a custom target is set. In our example above, the value of attstattarget for the aid column would be 1000. Here is a quick recipe to save the custom statistics values, set them to the default (-1), and then restore them all once the database-wide analyzing is complete:

## Save the values away, then reset to default:
CREATE TABLE custom_targets AS SELECT attrelid, attname, attnum, attstattarget
  FROM pg_atttribute WHERE attstattarget > 0;
UPDATE pg_attribute SET attstattarget = -1 WHERE attstattarget > 0;

## Safely run your database-wide analyze now
## All columns will use default_statistics_target

## Restore the values:
UPDATE pg_attribute a SET attstattarget = c.attstattarget
  FROM custom_targets c WHERE a.attrelid = c.attrelid
  AND a.attnum = c.attnum AND a.attname = c.attname;

## Bonus query: run all custom target columns in parallel:
SELECT 'vacuumdb --analyze-only -e -j 100 ' || 
  string_agg(format('-t "%I(%I)" ', attrelid::regclass, attname), NULL)
FROM pg_attribute WHERE attstattarget > 0;

As to the problems of not being able to pick the stage targets for --analyze-in-stages, and not being able to know when a stage has finished, the solution is to simply do it yourself. For example, to run all databases in parallel with a target of "2", you would need to change the default_statistics_target at the database level (via ALTER DATABASE), or at the cluster level (via ALTER SYSTEM). Then invoke vacuumdb, and reset the value:

$ psql -qc 'alter system set default_statistics_target = 2' -qc 'select pg_reload_conf()'
$ vacuumdb --all --analyze-only --jobs 100
$ psql -qc 'alter system reset default_statistics_target' -qc 'select pg_reload_conf()'

In summary, don't trust the given vacuumdb suggestions for a post-upgrade analyze. Instead, remove any per-column statistics, run it in parallel, and do whatever stages make sense for you.

Implementation of Ruby on Rails 5 Action Cable with Chat Application

Ruby on Rails is a wonderful framework for web development. It was lacking for one important feature since the world has been moving to towards realtime data. Everyone wants to see the realtime data on the applications. Mostly, real-time web applications are now accomplished using WebSocket.

WebSocket provides full-duplex communication between server and client using TCP connection once the handshake completed through HTTP protocol. WebSocket transfers streams of messages on top TCP without being solicited by the client which boosts the data transfer performance on high level compare to HTTP request/response.

WebSocket were adopted on RoR applications with help of third party libraries. But Rails 5 came up with a module called ActionCable which is seamlessly sits with existing framework and integrates the WebSocket to the application. ActionCable provides server and client side framework to implement WebSocket with the application.

ActionCable Overview:

Server Side:

Connection: The Connection only handles authentication and authorisation part of logic. The connection object is instantiated when request from the user comes through browser tab or window or devices. Multiple connections could be created when the user access the server from different devices/browser tabs.

Channel: The Channel is the parent of all custom channels and shares the common logic with all channels. The custom channels will stream the messages to client when the corresponding channels were subscribed by client.

Client Side:

The Client side javascript framework have all functionalities to interact with server side. The Consumer will establish a WebSocket connection with Server to do all the communications. Subscriber subscribes the custom channels to receive the messages from the Server without requests.

Prerequisite:
* Ruby 2.2.2+ is the only supported version for Rails 5. Install the gem package and Rails 5 on your environment.
* ActionCable needs puma as development server to support multithreaded feature.

Let's create the rails 5 chat application...! The application structure will have following default action cable related files.

$ rails new action-cable-chat-example
 - Server Side
        app/channels
        app/channels/application_cable
        app/channels/application_cable/connection.rb
        app/channels/application_cable/channel.rb

 - Client Side
        app/assets/javascripts
        app/assets/javascripts/channels
        app/assets/javascripts/channels/.keep
        app/assets/javascripts/application.js
        app/assets/javascripts/cable.js

Below models and controllers need to be created to have basic chat application.

* User, Room and Message models
* users, rooms, messages, sessions and welcome controllers

The commands to create these items are listed below and skipping the code to focus on ActionCable but the code is available at github to refer or clone.

$ bundle install
$ rails g model user name:string
$ rails g model room title:string name:string user:references
$ rails g model message content:text user:references room:references
$ rake db:migrate

$ rails g controller users new create show
$ rails g controller rooms new create update edit destroy index show
$ rails g controller messages create

$ rails g controller sessions new create destroy
$ rails g controller welcome about

Make necessary changes to controllers, models and views to create chat application with chat rooms(Refer Github Repository). Start the application with help of puma server to verify the basic functionalities.

$ rails s -b 0.0.0.0 -p 8000

The application should meet following actions. The User will sign up or login with username to get the access new or existing rooms to chat. The user can write messages on the chat room but the messages won't appear to other users at the moment without refreshing the page. Let's see how ActionCable handles it.

Action Cable Implementation:

Configurations:

There are few configurations to enable the ActionCable on the application.

config/routes.rb - The server should be mounted on specific path to serve websocket cable requests.

mount ActionCable.server => '/cable'

app/views/layouts/application.html.erb - The action_cable_meta_tag passes the WebSocket URL(which is configured on environment variable config.action_cable.url) to consumer.

<%= action_cable_meta_tag %>

app/assets/javascripts/cable.js - The consumer should be created to establish the WebSocket connection to specified URL in action-cable-url.

(function() {
  this.App || (this.App = {});

  App.cable = ActionCable.createConsumer();

}).call(this);

Once ActionCable was enabled, the WebSocket connection will be established on accessing the application from any client. But the messages will transmitted only through channels. Here is the sample handshake to create WebSocket connection.

General:
Request URL:ws://139.59.24.93:8000/cable
Request Method:GET
Status Code:101 Switching Protocols

Request Headers:
Connection:Upgrade
Host:139.59.24.93:8000
Origin:http://139.59.24.93:8000
Sec-WebSocket-Extensions:permessage-deflate; client_max_window_bits
Sec-WebSocket-Key:c8Xg5vFOibCl8rDpzvdgOA==
Sec-WebSocket-Protocol:actioncable-v1-json, actioncable-unsupported
Sec-WebSocket-Version:13
Upgrade:websocket

Response Headers:
Connection:Upgrade
Sec-WebSocket-Accept:v46QP1XBc0g5JYHW7AdG6aIxYW0=
Sec-WebSocket-Protocol:actioncable-v1-json
Upgrade:websocket

The /cable is the default URI. if there is a custom URI, it need to mentioned environment file. The origins need to be allowed in the configuration if it is other than localhost.

environments/development.rb
# config.action_cable.url = 'wss://example.com/cable'
# config.action_cable.allowed_request_origins = [ 'http://example.com', /http:\/\/example.*/ ]
Workflow:

I created a diagram to illustrate how the pieces fit together and explain the workflow.

Channels:

The Server side messages channel need to be created to stream the messages from Server to all subscribed clients and client side framework to subscribe the channels to receive the messages. Execute the channels generator and create messages channels skeleton to code on server and client side.

$ rails generate channel Messages 

 app/channels/messages_channel.rb
 app/assets/javascripts/channels/messages.js

messages_controller.rb - Whenever the user writes a message in the room, it will be broadcasted to 'messages' channel after the save action.

class MessagesController < ApplicationController
  def create
    message = Message.new(message_params)
    message.user = current_user
    if message.save
      ActionCable.server.broadcast 'messages',
        message: message.content,
        user: message.user.username
      head :ok
    end
  end

  private

    def message_params
      params.require(:message).permit(:content, :chatroom_id)
    end
end

messages_channel.rb - Messages channel streams those broadcasted messages to subscribed clients through established WebSocket connection.

class MessagesChannel < ApplicationCable::Channel  
  def subscribed
    stream_from 'messages'
  end
end  

messages.js The MessagesChannel was subscribed on accessing the Rooms to chat. The client side receives the message as per subscriptions and populate on the chat room dynamically.

App.messages = App.cable.subscriptions.create('MessagesChannel', {  
  received: function(data) {
    $("#messages").removeClass('hidden')
    return $('#messages').append(this.renderMessage(data));
  },
  renderMessage: function(data) {
    return "

" + data.user + ": " + data.message + "

"; } });

These ActionCable channel related changes could make the Chat application to receive the messages on realtime.

Conclusion:

Rails Action Cable adds additional credits to framework by supplying the promising needed realtime feature. In addition, It could be easily implemented on existing Rails application with the nature of interacting with existing system and similar structural implementation. Also, The strategy of the channels workflow can be applied to any kind of live data feeding. The production stack uses redis by default (config/cable.yml) to send and receive the messages through channels.

Perl Dancer Conference 2016 Day 1

Perl Dancer Conference Day 1

The Perl Dancer Conference is a great event, now in its third year. The event took place in the same location as last year in Vienna, Austria at the Hotel Schani Wien. For those of you who have never visited Vienna, it is a perfect place to bring the family. From visiting the beautiful parks to taking a scenic ride on the Danube River, the beautiful and historic city is known for its rich art and musical culture, and has much to offer.

I was very excited to not only attend but also give a talk this year. My talk titled "Dancing in the Clouds" also coincided with the release of 2 new Perl modules Etcd3 and Dancer2::Plugin::Etcd. This article will be the first of a 3 part series, with the final article a focus on my talk and usage examples with the new modules.

Sawyer X (Sawyer X) - A bus tour through Dancer core

The Captain of Dancer core, SawyerX, took us on a bus tour through the core functionality of Dancer2. Using practical examples of code blocks from core, he explained how different areas of the code base worked. I personally enjoyed his explanation of how hooks are implemented and created. Learning from the 1st iteration of Dancer, the second version shows maturity and stability.

Stefan Hornburg (Racke) - No Act on ACT

If you have ever taken the time to review a Perl conference's website or even purchase tickets to attend you have no doubt been in contact with ACT. "Act (A Conference Toolkit) is a multilingual, template-driven, multi-conference website that can manage the users, talks, schedule, and payment for your conference." While this package has been around for many years, it is somewhat dreaded because of its lack of features.

Stefan outlines his work with Interchange6::Schema and the perl.dance website painting a picture of the replacement for ACT. Utilizing Dancer2, DBIx::Class, Moo and other modern Perl tools the infrastructure outlined is very compelling. The package has a user admin, e-commerce, and even a module to print out the passes. Although he notes that this is not a plug and play replacement for ACT yet, with a bit of work and support, it could be the future of Perl conference management.

Andrew Beverly - Implementing i18n in a Dancer application using Plugin::LogReport

Andrew extended his talk last year about internationalization with the Dancer2::Plugin::LogReport module. Using great examples, he not only outlined the usage of the module, but also got into details of how the process works on a technical level. Explaining the different ways that internationalization can be done, he begins to outline how he achieved his task of deploying i18n in a Dancer app.

Theo van Hoesel - Quickstep

Theo was a great addition to the conference this year. He was able to make the event on very short notice after Dancer core Jason Crome was not able to attend due to injury. Theo outlined the Act::Voyager project briefly and the general requirements of adding user friendly features to the ACT toolkit. He also spent a good amount of time explaining the concept of web caching and how many of the existing modules failed in the task of complying with RFC7234. He then explained how all of this brought him to create HTTP::Caching and how it has "The RFC 7234 compliant brains to do caching right". Along with this contribution part of the HTTP::Bundle, his Dancer focused Dancer2::Plugin::HTTP::Bundle was explained.

Job van Achterberg (jkva) - Dancing with Disabilities

Job's talk was a very interesting look into how taking a considerate approach to development and small changes to your code can improve a disabled web user's experience. Using the tools in macOS Job showed how simple things such as naming a list are reflected in a disabled users ability to get information. What I found very interesting in this presentation was how awkward the tools were to use even for an experienced pro like Job. It really made me think a lot about the challenges the disabled face in something many of us take for granted.

Jason Lewis - The Lazy Programmer's Guide to building HTML tables in Dancer2

Jason has become a regular on the #dancer freenode IRC channel. This year he decided to travel all the way from Australia to give his presentation on his experiences replacing Crystal Reports with Dancer2 Template::Toolkit and DataTables. Although a great deal of the presentation was focused on the features of the jQuery plugin DataTables, he gave nice examples of code he used to populate reports and the hurdles he faced replacing Crystal Reports functionality. The reports looked beautiful and were very easy to convert to other data types such as PDF and CSV.

Stefan Seifert (nine) - Perl 5 and Perl 6 - a great team

Stefan is a great presence at the conference, and his fun and witty personality carried over to his presentation. After opening with a really funny skit as a reporter reading the the news, he continued to outline the current state of Perl6 and how important it is for all of us as a community to embrace the fact that we are all in this together. He reminded us of the perils of Python3's launch and the lack of support even today. He then began to focus on the capabilities of using Perl5 with Perl6 together with Inline::Perl5 and Inline::Perl6 modules. To be honest before his talk I had given Perl6 very little time. Stefan's talk opened my eyes to the possibilities of utilizing the two versions together and the advantages that ecosystem has.

Please stop back for links to day 2 of the conference and a breakdown of my talk outlining etcd integration with Perl and Dancer2.

Throw It Away: Suppressing Writes on PostgreSQL Replicas

We needed a way to suppress specific write commands on a Postgres streaming replica. The replica was set up for a DR configuration, with the applications able to be brought up into full service at a moment's notice. But since it's a hot standby, we'd like to still allow the applications to be used in a working but read-only state.

One of the applications on this database is MediaWiki, which worked great in this configuration. But a couple of the other apps have the classic behavior of updating its user object's "last login" field in one form or another when someone authenticates, which would cause the login process to fail entirely.

Of course we want updates to fail, up until that point when (knock on wood) the master server is declared down for the count and the official fail-over happens. Except for the one command that executes on login.

We don't really care about the "last login" type field -- the data is available through logs and other means. The affected apps could probably all be monkey patched to work around that part of the process. But we had a couple different apps doing this, and that adds a maintenance burden for each. And if we could figure out how to make it work at the database level then it'd work for all of them, plus anything else that might pop up.

The first thing we looked at was writing a trigger to intercept the commands, but triggers don't execute on a hot standby replica so that was out pretty quickly. The next hypothesis was that we could write a foreign data wrapper that'd just absorb the writes, or even just use postgres_fdw to route the commands to a local writable database that's more or less a throw-away data sink. But to our surprise, even writes to foreign tables get rejected on a hot standby. I'm slightly tempted to dig in and see what it'd take to enable that.

The third time was the charm: rules. Rules hook in pretty far down into the query parser, and they can be notoriously tricky to work with. But since they're embedded pretty deep, down to the point where views rely on them they're obeyed even on a replica.

So the technique was this: On the master (... obviously) we set up a separate schema, inside which a view was created with the same name as the target table and which had certain commands suppressed:

CREATE SCHEMA replica;

CREATE VIEW replica.users AS SELECT * FROM public.users;

CREATE RULE users_disable_update AS ON UPDATE TO replica.users DO INSTEAD NOTHING;

Plus any permission adjustments the app user might need. On the master server this schema and view are pretty much ignored, as the application user just uses the default search path. But on the replica, we adjust the default search path in postgresql.conf so that it applies to just that server:

search_path = '"$user",replica,public'
app@app:5432=> UPDATE "auth_user" SET "last_login" = now() WHERE "auth_user"."id" = 13;
UPDATE 0

It doesn't quite work everywhere, sadly! Notice the "UPDATE 0"? We found Django actually checks that, and panics with an error to the user when it tries to change something and the row count it gets back is different than what it expects.

Another caveat is that if the target table's schema changes, the view won't automatically follow. Presumably your user table doesn't receive changes all that often, but if you're applying this technique to something else, that might not be the case. Something to be aware of!

Vim Golf: Learning New Skills for Code Editors

Vim is a text-based editor that has been around for 25 years. It comes pre-installed on Linux distributions, so it is a great tool for developing on servers. One of the advantages of Vim is that oft-used keystrokes can be performed without moving your hands from the keyboard (there is no mouse in Vim).

Many of the engineers here at End Point use Vim for our daily development work, and recently, a few of us got together online to try to learn some new tricks and tips from each other. Being efficient with Vim not only improves productivity, it's a lot of fun.

Similar to playing a round of golf, we tested each other with various editing tasks, to see who could perform the task in the fewest number of keystrokes. This is known as "Vim Golf." There is even an entire website devoted to this.

In this post, we share some of the interesting tricks that were shown, and also some links to further learning about Vim.

Tips & Tricks

  • Indenting text: there are multiple ways to do this, but a few are:
    • Visually-select the lines of text to indent (Ctrl v or Shift v), then > to indent, or < to outdent. Press . to perform this action again and again.
    • Locate the line numbers for the lines you wish to change (:set number to turn on line numbering), then :17,36>> to indent lines 17-36 two indentation levels.
    • Define width of a tab :set tabstop=4 would for example set a tab to 4 spaces.
    • Use spaces defined in tabstop instead of an actual tab character (^I) when the Tab key is pressed :set expandtab or :set et
    • Replace tab settings for current line :retab
    • Replace tab settings for current document :retab!
  • Visually-selecting text: Ctrl v will perform a visual column selection, while Shift v will do a row selection.
  • :set will show all the currently-set options.
  • For paging up or down, use Ctrl b and Ctrl f. You can also use PgUp and PgDn keys if you want to move your hands a bit :-)
  • Moving the cursor around the page:
    • Type M to move the cursor to the middle of the screen
    • Type H to move the cursor to the top of the screen
    • Type L to move the cursor to the bottom of the screen
    • Type gg to move the cursor to the top of the document
    • Type G to move the cursor to the bottom of the document
  • Moving the page around the cursor:
    • Type zz to make the current position float to the middle of the screen
    • Type zt to make the current position float to the top of the screen
    • Type zb to make the current position float to the bottom of the screen
  • Search and replace:
    • Find and replace all instances of a string: %s/find_this/replace_with_this/g
    • Case-insensitive find and replace all instances of a string: %s/find_this/replace_with_this/gi
    • Find then ask confirmation before replacing: %s/find_this/replace_with_this/c
    • Search history: Vim maintains search history which is easy to access using / or ? then navigation through the list using the up and down arrows.
  • Deleting from the current position to the bottom of the file: dG
  • Jumping to the first position in the current line: 0
  • Find the next occurrence of a character in the current line: f then the character. To search backwards, use F
  • Undo a command: u (run this multiple times for additional undo steps)
  • Redo your undo: Ctrl r
  • Travel back in time to see the document as it was 30 mins ago :earlier 30m then revert with :later 30m
  • Reselect the last visual selection gv
  • Run a system command from within Vim :! [command]
  • Review your previous vim command history q:

For Further Learning

With contributions from Sam Batschelet

Connect Multiple JPA repositories using Static and Dynamic Methods

The JPA Repository is a useful Spring Framework library that provides object-relational mapping for Java web applications to be able to connect to a wide variety of databases. Most applications need to establish a connection with one database to store and retrieve the data though sometimes there could be more than one database to read and write. There could also be some cases where the application needs to choose which database should be used dynamically, based on each request's parameters. Let's see how to configure and establish connections for these three cases.

1. Single Static Connection

In order to use JPA the following configurations are required to get the database connection handle and define the interface to map a database table by extending JpaRepository class.

UserRepository.java - this part of the code configures how to map the user table
package com.domain.data;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository <User, Integer> {
}
persistent-context.xml - the dataSourceReadWrite bean class defines the database connection while the entityManagerFactoryReadWrite bean helps to access the database from the base package com.domain
...
<jpa:repositories base-package="com.domain" entity-manager-factory-ref="entityManagerFactoryReadWrite" transaction-manager-ref="transactionManager">
</jpa:repositories>

<bean abstract="true" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="abstractDataSource" p:driverclassname="${jdbc.driverClassName}" p:maxactive="20" p:maxidle="20" p:minidle="20" p:testonborrow="true" p:validationquery="SELECT 1" />

<bean id="dataSourceReadWrite" p:password="${jdbc.password}" p:url="${jdbc.url}" p:username="${jdbc.username}" parent="abstractDataSource" />

<bean abstract="true" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="abstractEntityManagerFactory" p:jpadialect-ref="hibernateJpaDialect" p:jpavendoradapter-ref="jpaAdapter">
  <property name="jpaProperties">
    <props>
      <prop key="hibernate.use_sql_comments">true</prop>
      <prop key="hibernate.temp.use_jdbc_metadata_defaults">false</prop>
    </props>
  </property>
</bean>

<bean id="entityManagerFactoryReadWrite" p:datasource-ref="dataSourceReadWrite" p:persistenceunitname="readWritePU" parent="abstractEntityManagerFactory">
  <property name="packagesToScan">
    <list>
      <value>com.domain</value>
    </list>
  </property>
</bean>

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager" p:datasource-ref="dataSourceReadWrite" p:entitymanagerfactory-ref="entityManagerFactoryReadWrite" />
...
UserController.java - the userRepository object access defines how to use a static database configuration to fetch the User object record
@Api(name = "User", description = "User API Service")
@Controller
public class UserController {

  @Autowired
  private UserRepository userRepository;


  @ApiMethod(
    description = "Return the user object using the userId in the URL",
    produces = {MediaType.APPLICATION_JSON_VALUE},
    roles = {"read", "write"}
  )
  @RequestMapping(value = "/users/{userId}", method = RequestMethod.GET, produces = "application/json")
  @ResponseBody
  public UserModel getUser(@PathVariable @ApiParam(name = "userId", description = "User ID") Integer userId) throws ServiceException {
    User user = (userRepository.findOne(userId));
    if (user != null) {
    return new UserModel(user);
  }
  else {
    throw new ResourceNotFoundServiceException("UserId " + userId + " was not found");
  }
}
}

2. Multiple Static Connections

In some cases, we may need to connect more than one database in our application. Usually there will be a primary database and a secondary one which syncs data from the primary, most likely as a readonly replica load balancing approach. In this case the application needs to be configure to establish connection with two different datasources.

To achieve this result it's possible to define ReadWrite and ReadOnly datasources in the spring configuration and then declare the specific Repository classes for each specific datasource.

UserRepository.java - ReadWrite repository definition under the package com.domain.data
package com.domain.data;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository<User, Integer> {
}
UserReadonlyRepository.java - ReadOnly repository definition under the package com.domain.data.readonly
package com.domain.data.readonly;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserReadonlyRepository extends JpaRepository<User, Integer> {
}
persistent-context.xml - this file defines two different datasources (dataSourceReadWrite and dataSourceReadOnly) while jpa repositories specify the repositories package path
...
<jpa:repositories base-package="com.domain" entity-manager-factory-ref="entityManagerFactoryReadWrite" transaction-manager-ref="transactionManager">
  <repository:exclude-filter expression="com.domain.data.readonly" type="regex"></repository:exclude-filter>
</jpa:repositories>

<jpa:repositories base-package="com.domain.data.readonly" entity-manager-factory-ref="entityManagerFactoryReadOnly" transaction-manager-ref="transactionManagerReadOnly">

<bean abstract="true" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="abstractDataSource" p:driverclassname="${jdbc.driverClassName}" p:maxactive="20" p:maxidle="20" p:minidle="20" p:testonborrow="true" p:validationquery="SELECT 1" />
<bean id="dataSourceReadWrite" p:password="${jdbc.password}" p:url="${jdbc.url}" p:username="${jdbc.username}" parent="abstractDataSource" />
<bean id="dataSourceReadOnly" p:password="${jdbc.readonly.password}" p:url="${jdbc.readonly.url}" p:username="${jdbc.readonly.username}" parent="abstractDataSource" />
<bean abstract="true" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="abstractEntityManagerFactory" p:jpadialect-ref="hibernateJpaDialect" p:jpavendoradapter-ref="jpaAdapter">
  <property name="jpaProperties">
    <props>
      <prop key="hibernate.use_sql_comments">true</prop>
      <prop key="hibernate.temp.use_jdbc_metadata_defaults">false</prop>
    </props>
  </property>
</bean>

<bean id="entityManagerFactoryReadWrite" p:datasource-ref="dataSourceReadWrite" p:persistenceunitname="readWritePU" parent="abstractEntityManagerFactory">
  <property name="packagesToScan">
    <list>
      <value>com.domain</value>
    </list>
  </property>
</bean>

<bean id="entityManagerFactoryReadOnly" p:datasource-ref="dataSourceReadOnly" p:persistenceunitname="readOnlyPU" parent="abstractEntityManagerFactory">
  <property name="packagesToScan">
    <list>
      <value>com.domain</value>
    </list>
  </property>
</bean>

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager" p:datasource-ref="dataSourceReadWrite" p:entitymanagerfactory-ref="entityManagerFactoryReadWrite" />

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManagerReadOnly" p:datasource-ref="dataSourceReadOnly" p:entitymanagerfactory-ref="entityManagerFactoryReadOnly" />
...
UserController.java - in this definition it's interesting to note the the readonly flag, which will establish a connection with ReadWrite or ReadOnly database, based on that flag value
@Api(name = "User", description = "User API Service")
@Controller
public class UserController {

  @Autowired
  private UserRepository userRepository;
  @Autowired
  private UserReadOnlyRepository userReadOnlyRepository;

  @ApiMethod(
  description = "Return the user object using the userId in the URL",
  produces = {MediaType.APPLICATION_JSON_VALUE},
  roles = {"read", "write"}
  )
  @RequestMapping(value = "/users/{userId}", method = RequestMethod.GET, produces = "application/json")
  @ResponseBody
  public UserModel getUser(@PathVariable @ApiParam(name = "userId", description = "User ID") Integer userId, @ApiParam(name = "readOnly", description = "Param to set data source to read from") Boolean readOnly) throws ServiceException {
    User user = (readOnly ?
    userReadOnlyRepository.findOne(userId) : userRepository.findOne(userId));

    if (user != null) {
      return new UserModel(user);
    }
    else {
      throw new ResourceNotFoundServiceException("UserId " + userId + " was not found");
    }
  }
}

3. Multiple Dynamic Connections

Recently there was an application that needed to choose the database during API request processing. Unfortunately defining multiple datasources and choosing the database based on the hard coded checks in the code is really cumbersome. Instead it's possible to use JPA Repository which provides a feature to override the database lookup dynamically using AbstractRoutingDataSource when a request is sent to the application.

UserRepository.java - defines mapping to the user table
package com.domain.data;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository<User, Integer> {
}
persistence-context.xml - dataSourceRootDB and dataSourceLiveDB beans defines two different databases. MultitenantRoutingDataSource holds the datasources available to chose dynamically from the code
...
<jpa:repositories base-package="com.domain" entity-manager-factory-ref="genericEntityManagerFactory" transaction-manager-ref="transactionManager">
</jpa:repositories>

<bean abstract="true" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="abstractDataSource" p:driverclassname="${jdbc.driverClassName}" p:maxactive="20" p:maxidle="20" p:minidle="20" p:testonborrow="true" p:validationquery="SELECT 1" />

<bean id="dataSourceRootDB" p:password="${jdbc.password}" p:url="${jdbc.url}" p:username="${jdbc.username}" parent="abstractDataSource" />

<bean id="dataSourceLiveDB" p:password="${jdbc.livedb.password}" p:url="${jdbc.livedb.url}" p:username="${jdbc.livedb.username}" parent="abstractDataSource" />

<bean class="com.domain.route.MultitenantRoutingDataSource" id="dataSource">
  <property name="targetDataSources">
    <map key-type="java.lang.String">
      <entry key="rootdb" value-ref="dataSourceRootDB"></entry>
      <entry key="livedb" value-ref="dataSourceLiveDB"></entry>
    </map>
  </property>
  <property name="defaultTargetDataSource" ref="dataSourceRootDB">
  </property>
</bean>

<bean class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="genericEntityManagerFactory" p:datasource-ref="dataSource" p:jpadialect-ref="hibernateJpaDialect" p:jpavendoradapter-ref="jpaAdapter" p:persistenceunitname="readWriteDynamicPU">
  <property name="jpaProperties">
    <props>
      <prop key="hibernate.use_sql_comments">true</prop>
      <prop key="hibernate.temp.use_jdbc_metadata_defaults">false</prop>
    </props>
  </property>
  <property name="packagesToScan">
    <list>
      <value>com.data.domain</value>
    </list>
  </property>
</bean>

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager" p:datasource-ref="dataSource" p:entitymanagerfactory-ref="genericEntityManagerFactory" />
...
UserController.java - this class choose the datasource dynamically based on the request and calls the selected service to complete the action
...
@Api(name = "User", description = "User API Service") 
@Controller public class UserController {
    @Autowired     private UserService userService;

    @ApiMethod(
            description = "Return the user object using the userId in the URL",
            produces = {MediaType.APPLICATION_JSON_VALUE},
            roles = {"oms-read", "oms-write"}
    )
    @RequestMapping(value = "/users/{userId}", method = RequestMethod.GET, produces = "application/json")
    @ResponseBody
    public UserModel getUser(@PathVariable @ApiParam(name = "userId", description = "User ID") Integer userId, @RequestHeader(value="X-Database", defaultValue= DatabaseEndpointUtils.ROOT_DB, required=false) String databaseEndpoint) throws ServiceException {
        MultiTenantContext.setCurrentTenant(databaseEndpoint);
        return userService.getUser(userId, true);
    }
}
...
MultiTenantContext.java - this code sets the datasource connection based on the request from Controller
package com.domain.common;

import com.domain.util.DatabaseEndpointUtils;
import com.domain.supplychain.app.ws.exceptions.InvalidDatabaseEndpointException;
import com.domain.exceptions.ServiceException;

public class MultiTenantContext {
    private static ThreadLocal<Object> currentTenant = new ThreadLocal<>();

    public static Logger logger = LoggerFactory.getLogger(MultiTenantContext.class.getName());
    public static void setCurrentTenant(Object tenant) throws ServiceException {
        logger.info("MultiTenantContext setCurrentTenant: [{}]", tenant);
        if(DatabaseEndpointUtils.isValidEndpoint(tenant.toString())) {
            currentTenant.set(tenant);
        } else {
            throw new InvalidDatabaseEndpointException("Invalid database endpoint");
        }
    }

    public static Object getCurrentTenant() {
        logger.info("MultiTenantContext getCurrentTenant: [{}]", currentTenant.get());
        return currentTenant.get();
    }

}
MultitenantRoutingDataSource.java - here there's the definition which determines how the datasource establish the connection. Specifically it will get the datasource which was set previously based on the request parameters
package com.domain.route;  
import com.domain.common.MultiTenantContext; 
import org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource;  

public class MultitenantRoutingDataSource extends AbstractRoutingDataSource {

    private Logger logger = LoggerFactory.getLogger(MultitenantRoutingDataSource.class.getName());
    @Override
    protected Object determineCurrentLookupKey() {
        logger.info("MultitenantRoutingDataSource determineCurrentLookupKey: [{}]", MultiTenantContext.getCurrentTenant());
        return MultiTenantContext.getCurrentTenant();
    }

}
DefaultUserService.java - Fetch the user data from the dynamically chosen database.
@Service
public class DefaultUserService implements UserService {

    @Autowired
    private UserRepository userRepository;

    @Override
    @Transactional
    public UserModel getUser(Integer userId, boolean readOnly) throws ServiceException {
        User user = (userRepository.findOne(userId));
        if (user != null) {
            return new UserModel(user);
        }
        else {
            throw new ResourceNotFoundServiceException("UserId " + userId + " was not found");
        }
    }
}

Conclusion

The application establishes a connection with database through any one of these methods, based on the requirement. The single or multiple static connections are commonly used in most of the applications. But when there is a requirement to choose the database dynamically to establish a connection, AbstractRoutingDataSource class in spring framework features a wonderful way to implement the functionality as explained above.

DNS and BIND Training with MyNIC

This is yet another yesteryear's story!

I had a chance to attend a DNS/Bind training which was organized by Malaysia's domain registry (MyNIC). The training took two days and was organized in Bangi, Selangor, Malaysia. Dated November 23 to 24, 2015, the two days' training was packed with technical training for the Domain Name System (DNS) using BIND software. Our trainer was Mr Amir Haris, who is running his own DNS specialist company named Localhost Sendirian Berhad (Sendirian Berhad is equivalent to "Private Limited").

Day One

For Day One, the trainer, Mr Amir Haris taught us on the theoretical details of the DNS. For a start, Mr Amir explained to us on the DNS operation, in which he explained the basic of what DNS is and the function of root servers. Later he explained further on the root servers' functions and locations. It was the followed by the explanation of query process.

alternate text

Mr Amir also explained to us the difference of DNS implementations across different operating system platforms. As for the training since we were using BIND as the name server’s software, we we exposed to the historical background of BIND.

The concept of master and slave DNS server was also being taught. In the master, the server will notify the secondary server if any change happened by the NOTIFY message. The NOTIFY message serves as a method to info the slave(s) that the zone file has changed. The NS records in the zone files are being used to determine who the slave(s) are. The benefit of NOTIFY is that it cuts down the delay for changes.

Day Two

For the second day we were doing pretty much on the DNS practical implementation. Each of us were a given a virtual machine access in order to experience our own BIND setup.

The contents of our lab training are as follows:

  • Installing BIND 9
  • Setting up RNDC
  • Setting up logging
  • Recursive and Cache DNS
  • Authoritative NS - Master and Slave
  • Delegation
  • IPv6
  • DNS Security Extensions (DNSSEC)
  • Stealth (split) DNS
  • Hardening DNS systems

Recursive and Cache Nameserver

Three minimum zone files is needed which are:

  • localhost (forward zone)
  • localhost (reverse zone)
  • root

Forward zone file

;filename 127.0.0.fwd
$TTL 345600
@               IN      SOA     localhost. hostmaster.localhost. (
                                2015112401      ; Serial
                                3600            ; Refresh
                                900             ; Retry
                                3600000         ; Expire
                                3600            ; Min TTL
                                )
                IN      NS      localhost.
localhost.      IN      A 127.0.0.1

Reverse zone file

; filename 127.0.0.rev
$TTL 345600
@               IN      SOA     localhost. hostmaster.localhost.        (
                                2015112401      ; Serial
                                3600            ; Refresh
                                900             ; Retry
                                3600000         ; Expire
                                3600            ; Min TTL
                                )
                IN      NS      localhost.
1               IN      PTR     localhost.

We also had chance to "get our hands dirty" for domain name setup with the DNSSEC key.

At the end of the training we were given brief explanation on DNS hardening. In short they are as follows:

  • Isolate DNS from other service
  • Run named as non-root
  • Hide BIND version
  • Use logging
  • Control zone transfer and queries
  • Prevent DNS server from responding to DNS traffic from certain networks
  • Patch BIND whenever a patch is available or when current bind version has vulnerabilities

In conclusion we were pretty much exposed to the introductory part of DNS. Honestly two days are not enough to cover all in detail but it was well done and gave a good start for attendees to initiate further self study and experimentation.

At End Point we are experienced hosting our own and clients' DNS service using BIND (including in more exotic split-horizon setups) and nsd, and using common SaaS DNS providers such as UltraDNS, SoftLayer, Amazon Web Services Route 53, Hurricane Electric, etc.

DNS has largely become an unseen commodity service to Internet users in general, but that makes it all the more important to have skill handling DNS changes safely, and due to the occasional need for unusual configurations and coping with DDoS attacks such as the recent major attack on Dyn.