News

Welcome to End Point’s blog

Ongoing observations by End Point people

Liquid Galaxy Uses at UNC Chapel Hill

An article was posted on The Tech Broadcast last week that featured the UNC Chapel Hill Center for Faculty Excellence's Faculty Showcase. The faculty showcase included a fantastic presentation featuring the many ways students and faculty use their Liquid Galaxy, and discussed other opportunities for using the system in the future.

Exciting examples cited of great classroom successes making use of the Liquid Galaxy include:

  1. A course offered at UNC, Geography 121 People and Places, requires its students to sift through data sets and spend time in the GIS lab/research hub making maps using the data they've collected. The goal of this assignment is to demonstrate understanding of diversity within particular geographic entities. The students use the Liquid Galaxy to present their findings. Examples of studies done for this project include studies of fertility, infant mortality, income inequality, poverty, population density, and primary education.

  2. A group of students working in lab found that the household income of a particular municipality was many times greater than all surrounding municipalities. By looking around on the Liquid Galaxy, they discovered an enormous plantation in a very rural area. They were then able to understand how that plantation skewed the data from the entire municipality.

  3. While studying a web map, students found that average life expectancy dropped by a decade within a very short distance. They decided to look at the Liquid Galaxy to see whether they could make any conclusions by viewing the area. By using the Liquid Galaxy, the students were able to think about what the data looks like, not just statistically but on Earth.

  4. A Geography teacher had a lecture about the geography of Vietnam. The teacher used the Liquid Galaxy to give the class a tour of Vietnam and show how the different areas factored into the course. The teacher asked the class where within Vietnam they’d like to go, and was able to take the students to the different geographical areas on the Liquid Galaxy and tell them in detail about those areas while they had the visual support of the system.

  5. A geography class called The Geography of Latin America focuses on extractive industries. The class discusses things like agriculture in South America, and the percentage of land in Brazil that is used for soy production. The faculty reports that seeing this information in an immersive environment goes a long way in teaching the students.

  6. Urban planning students use the Liquid Galaxy when looking into urban revitalization. Uses for these students include using the system to visit the downtown areas and see firsthand what the areas look like to better understand the challenges that the communities are facing.

  7. Students and faculty have come to LG to look at places that they are about to travel to abroad, or thinking about traveling abroad, in order to prepare for their travels. An example given was a Master of Fine Arts student who was a sculptor and was very interested in areas where there are great quantities of rocks and ice. She traveled around on the the Liquid Galaxy and looked around in Iceland. Researching the system on the Liquid Galaxy helped to pique her interest and ultimately led to her going to Iceland to travel and study.

During the faculty showcase, faculty members listed off some of the great benefits of having the Liquid Galaxy as a tool that was available to them.

  1. The Liquid Galaxy brought everyone together and fostered a class community. Teachers would often arrive to classes that utilize the Liquid Galaxy and find that half the students were early to class. Students would be finding places (their homes, where they studied abroad, and more) and friendships between students would develop as a result of the Liquid Galaxy.

  2. Liquid Galaxy helps students with geographic literacy. They are able to think about concepts covered in class, and fly to and observe the locations discussed.

  3. Students often bring parents and family to see the Liquid Galaxy, which is widely accessible to students on campus. Students are always excited to share what they're doing with the system, with family and with faculty.

  4. Faculty members have commented that students that don’t ask questions in class have been very involved in the Liquid Galaxy lessons, which could be in part because some students are more visual learners. These visual learners find great benefit in seeing the information displayed in front of them in an interactive setting.

  5. From a faculty standpoint, a lot of time was spent planning and trying to work out the class structure, which has developed a lot. Dedicating class-time for the Liquid Galaxy was beneficial, and resulted in teaching less but in more depth and in different ways. The teacher thinks there was more benefit to that, and it was a great learning experience for all parties involved.

Faculty members expressed interest and excitement when learning more about the Liquid Galaxy and the ways it is used. There was a lot of interest in using the Liquid Galaxy for interdisciplinary studies between different departments to study how different communities and cultures work. There was also interest in further utilization of the system’s visualization capabilities. A professor from the School of Dentistry spoke of how he could picture using the Liquid Galaxy to teach someone about an exam of the oral cavity through the LG. Putting up 3D models of the oral cavity using our new Sketchfab capabilities would be a perfect way to achieve this!

We at End Point were very excited to learn more about the many ways that Liquid Galaxy is being successfully used at UNC as a tool for research, for fun, and to bring together students and faculty alike. We look forward to seeing how UNC, among the many other research libraries that use Liquid Galaxy, will implement the system in courses and on campus in the future.

Client Case Study: Carjojo


Carjojo’s site makes use of some of the best tools on the market today for accessing and displaying data. Carjojo is a car buying application that takes data about car pricing, dealer incentives, and rebate programs and aggregates that into a location-specific vehicle pricing search tool. The Carjojo work presented a great opportunity for End Point to utilize our technical skills to build a state-of-the-art application everyone is very proud of. End Point worked on the Carjojo development project from October of 2014 through early 2016, and the final Carjojo application launched in the summer of 2016. This case study shows that End Point can be a technology partner for a startup, enabling the client to maintain their own business once our part of the project is over.

Why End Point?

Reputation in full stack development

End Point has deep experience with full stack development so for a startup getting advice from our team can prove really helpful when deciding what technologies to implement and what timelines are realistic. Even though the bulk of the Carjojo work focused on specific development pieces, having developers available to help advise on the entire stack allows a small startup to leverage a much broader set of skills.

Startup Budget and Timelines

End Point has worked with a number of startups throughout our time in the business. Startups require particular focused attention on budget and timelines to ensure that the minimum viable product can be ready on time and that the project stays on budget. Our consultants focus on communication with the client and advise them on how to steer the development to meet their needs, even if those shift as the project unfolds.

Client Side Development Team

One of the best things about a lot of our clients is their technological knowledge and the team they bring to the table. In the case of Carjojo, End Point developers fit inside of their Carjojo team to build parts that they were unfamiliar with. End Point developers are easy to work with and already work in a remote development environment, so working in a remote team is a natural fit.

Client Side Project Management

End Point works on projects where either the project management is done in-house or by the client. In the case of a project like Carjojo where the client has technical project management resources, our engineers work within that team. This allows a startup like Carjojo insight into the project on a daily basis.

Project Overview

The main goal of the Carjojo project was to aggregate several data sources on car price and use data analytics to provide useful shopper information, and display that for their clients.
Carjojo’s staff had experience in the car industry and leveraged that to build a sizeable database of information. Analytics work on the database provided another layer of information, creating a time- and location-specific market value for a vehicle.

Carjojo kept the bulk of the database collection and admin work in house, as well as provided an in-house designer that closely worked with them on their vision for the project. End Point partnered to do the API architecture work as well as the front end development.

A major component of this project was using a custom API to pull information from the database and display it quickly with high end, helpful infographics. Carjojo opted to use APIs so that the coding work would seamlessly integrate with future plans for a mobile application, which normally require a substantial amount of recoding.

Creating a custom API also allows Carjojo to work with future partners and leverage their data and analytics in new ways as their business grows.

Team

Patrick Lewis: End Point project manager and front end developer. Patrick led development of the AngularJS front end application which serves as the main customer car shopping experience on the Carjojo site. He also created data stories using combinations of integrated Google Maps, D3/DimpleJS charts, and data tables to aid buyers with car searches and comparisons.



Matt Galvin: Front end developer. Matt led the efforts for data-visualization with D3 and DimpleJS. He created Angular services that were used to communicate with the backend, used D3 and DimpleJS to illustrate information graphically about cars, car dealers, incentives, etc., sometimes neatly packaging them into directives for easy re-use when the case fit. He also created a wealth of customizations and extensions of DimpleJS which allowed for rapid development without sacrificing visualization quality.



Josh Williams: Python API development. Josh led the efforts in connecting the database into Django and Python to process and aggregate the data as needed. He also used TastyPie to format the API response and created authentication structures for the API.

 




Project Specifics

API Tools

Carjojo’s project makes use of some of the best tools on the market today for accessing and displaying data. Django and Tastypie were chosen to allow for rapid API development and to keep the response time down on the website. In most cases the Django ORM was sufficient for generating queries from the data, though in some cases custom queries were written to better aggregate and filter the data directly within Postgres.

To use the location information in the database, some GIS location smarts were tied into Tastypie. Location searches tied into GeoDjango and generated PostGIS queries in the database.

Front End Tools

D3 is standard in data-visualization and is great for doing both simple and complicated graphics. Many of Carjojo’s graphs were bar graphs, pie charts and didn’t really require writing out D3 by hand. We also wanted to make many of them reusable and dynamic (often based on search terms or inputs) with use of Angular directives and services. This could have been done with pure D3, but Dimple makes creating simple D3 graphs easy and fast.

DimpleJS was used a lot in this project. Since Carjojo is data-driven, they wanted to display their information in an aesthetically pleasing manner and DimpleJS allowed us to quickly spin up information against some of the project’s tightest deadlines.

The approach worked well for most cases. However, sometimes Carjojo wanted something slightly different than what DimpleJS does out of the box. One example of DimpleJS customization work can be found here on our blog.

Another thing to note about the data visualizations was that sometimes when the data was plotted and graphed, it brought to light some discrepancies in the back-end calculations and analytics, requiring some back-and-forth between the Carjojo DBA and End Point.

Results

Carjojo had a successful launch of their service in the summer of 2016. Their system has robust user capabilities, a modern clean design, and a solid platform to grow from. The best news for Carjojo is that now the project has been turned back over to them for development. End Point believes in empowering our clients to move forward with their business and goals without us. Carjojo knows that we’ll be here for support if they need it.






Office Space Available at End Point HQ!

Our office-mates are leaving, and we are looking to fill their desk space. There are 8 open desks available, including one desk in a private office.

Amenities include free wifi, furniture, conference room access, kitchen access, regular office cleaning, and close proximity (one block) to Madison Square Park.

Our company, End Point, is a tech company that builds ecommerce sites, and also develops the Liquid Galaxy. There are typically 4 or 5 of us in the office on a given day. We are quiet, friendly, and respectful.

Please contact us at ask@endpoint.com for more information.

Seedbank: Structured Seed Files for Rails Projects

Rails seed files are a useful way of populating a database with the initial data needed for a Rails project. The Rails db/seeds.rb file contains plain Ruby code and can be run with the Rails-default rails db:seed task. Though convenient, this "one big seed file" approach can quickly become unwieldy once you start pre-populating data for multiple models or needing more advanced mechanisms for retrieving data from a CSV file or other data store.

The Seedbank gem aims to solve this scalability problem by providing a drop-in replacement for Rails seed files that allows developers to distribute seed data across multiple files and provides support for environment-specific files.

Organizing seed files in a specific structure within a project's db/seeds/ directory enables Seedbank to either run all of the seed files for the current environment using the same rails db:seed task as vanilla Rails or to run a specific subset of tasks by specifying a seed file or environment name when running the task. It's also possible to fall back to the original "single seeds.rb file" approach by running rails db:seed:original.

Given a file structure like:

db/seeds/
  courses.seeds.rb
  development/
    users.seeds.rb
  students.seeds.rb

Seedbank will generate tasks including:

rails db:seed                   # load data from db/seeds.rb, db/seeds/*.seeds.rb, and db/seeds/[ENVIRONMENT]/*.seeds.rb
rails db:seed:courses           # load data from db/seeds/courses.seeds.rb
rails db:seed:common            # load data from db/seeds.rb, db/seeds/*.seeds.rb
rails db:seed:development       # load data from db/seeds.rb, db/seeds/*.seeds.rb, and db/seeds/development/*.seeds.rb
rails db:seed:development:users # load data from db/seeds/development/users.seeds.rb
rails db:seed:original          # load data from db/seeds.rb

I've found the ability to define development-specific seed files helpful in recent projects for populating 'test user' accounts for sites running in development mode. We've been able to maintain a consistent set of test user accounts across multiple development sites without having to worry about accidentally creating those same test accounts once the site is running in a publicly accessible production environment.

Splitting seed data from one file into multiple files does introduce a potential issue when the data created in one seed file is dependent on data from a different seed file. Seedbank addresses this problem by allowing for dependencies to be defined within the seed files, enabling the developer to control the order in which the seed files will be run.

Seedbank runs seed files in alphabetical order by default but simply wrapping the code in a block allows the developer to manually enforce the order in which tasks should be run. Given a case where Students are dependent on Course records having already been created, the file can be set up like this:

# db/seeds/students.seeds.rb
after :courses do
  course = Course.find_by_name('Calculus')
  course.students.create(first_name: 'Patrick', last_name: 'Lewis')
end

The added dependency block will ensure that the db/seeds/courses.seeds.rb file is executed before the db/seeds/students.seeds.rb file, even when the students file is run via a specific rails db:seed:students task.

Seedbank provides additional support for adding shared methods that can be reused within multiple seed files and I encourage anyone interested in the gem to check out the Seedbank README for more details. Though the current 0.4 version of Seedbank doesn't officially have support for Rails 5, I've been using it without issue on Rails 5 projects for over six months now and consider it a great addition to any Rails project that needs to pre-populate a database with a non-trivial amount of data.

Job opening: Fulfillment Manager

Update: This position has been filled! Thanks to everyone who expressed interest.

This role is based in our Bluff City, Tennessee office, and is responsible for everything about fulfillment of our Liquid Galaxy and other custom-made hardware products, from birth to installation. See liquidgalaxy.endpoint.com to learn more about Liquid Galaxy.

What is in it for you?

  • Interesting and exciting startup-like atmosphere at an established company
  • Opportunity for advancement
  • Benefits including health insurance and self-funded 401(k) retirement savings plan
  • Annual bonus opportunity

What you will be doing:

  • Manage receiving, warehouse, and inventory efficiently
  • Oversee computer system building
  • Product testing and quality assurance
  • Packing
  • Shipment pick-up
  • Communicate with and create documents for customs for international shipping
  • Be the expert on international shipping rules and regulations
  • Delivery tracking and resolution of issues
  • Verify receipt of intact, functional equipment
  • Resolve RMA and shipping claims
  • Help test and implement any new warehouse software and processes
  • Design and implement new processes
  • Use effectively our project software (Trello) to receive and disseminate project information
  • Manage fulfillment employees and office facility
  • Work through emergency situations in a timely and controlled manner
  • Keep timesheet entries up to date throughout the day

What you will need:

  • Eagerness to “own” the fulfillment process from end to end
  • Exemplary communication skills with the entire company
  • High attention to detail
  • Consistent habits of reliable work
  • Ability to make the most of your time and resources without external micromanagement
  • Desire, initiative, and follow-through to improve on our processes and execution
  • Work with remote and local team members
  • Strive to deliver superior internal customer service
  • Ability to work through personnel issues
  • Go above and beyond the call of duty when the situation arises

About End Point:

End Point is a 21-year-old Internet consulting company with 50 full-time employees working together from our headquarters in New York City, our office in eastern Tennessee, and home offices around the world. We serve over 200 clients ranging from small family businesses to large corporations, using a variety of open source technologies. Our team is made up of strong product design, software development, database, hardware, and system administration talent.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of gender, race, religion, color, national origin, sexual orientation, age, marital status, veteran status, or disability status.

Please email us an introduction to jobs@endpoint.com to apply. Include your resume and anything else that would help us get to know you. We look forward to hearing from you! Full-time employment seekers only, please. No agencies or subcontractors.

Bash: loop over a list of (possibly non-existing) files using wildcards with nullglob (or failglob) option

Let's say you're working in Bash, and you want to loop over a list of files, using wildcards.

The basic code is:

#!/bin/bash
for f in /path/to/files/*; do
  echo "Found file: $f"
done

Easy as that. However, there could be a problem with this code: if the wildcard does not expand to actual files (i.e. there's no file under /path/to/files/ directory), $f will expand to the path string itself, and the for loop will still be executed one time with $f containing "/path/to/files/*".

How to prevent this from happening? Nullglob is what you're looking for.

Nullglob, quoting shopts man page, "allows filename patterns which match no files to expand to a null string, rather than themselves".

Using shopt -s you can enable BASH optional behaviors, like Nullglob. Here's the final code:

#!/bin/bash
shopt -s nullglob
for f in /path/to/files/*; do
  echo "Found file: $f"
done

Another interesting option you may want to check for, supported by Bash since version 3, is failglob.

With failglob enabled, quoting again, "patterns which fail to match filenames during filename expansion result in an expansion error". Depending on what you need, that could even be a better behavior.

Wondering why nullglob it's not the default behavior? Check this very good answer to the question.

Malaysia Open Source Community Meetup Quarter 4 2015 (MOSCMY Q4 2015)

After a year, finally I decided to publish this post to all of you!

On November 26th 2015 I had a chance to give a talk in a local open source conference here in Malaysia. The organizer requested me to specifically deliver a talk on "remote work". This meetup was organized by Malaysian Development Corporation (MDEC) with the sponsorship of Microsoft Malaysia. Microsoft recently started to become more "open source friendly" given that they are in the effort of pushing their cloud based product, Azure. The full schedule of the event can be referred here.

The conference was divided into two sessions; where the morning session was held in Petronas Club, Tower One of Kuala Lumpur City Centre (KLCC) and the other session was held in Microsoft Malaysia's office in Tower Two KLCC. Generally the morning session was for non parallel track (including my track) and the afternoon sessions were two parallel sessions slot.

Morning Session

The morning session started with a talk by Dinesh Nair, as the Director of Developer Experience and Evangelism, Microsoft Malaysia. The second session in the morning was delivered by Mr Izzat M. Salihuddin, from Multimedia Development Corporation (MDeC), Malaysia. He spoke on the behalf of MDeC explaining the effort by MDeC as a government wing to realize the local cloud infrastructure. One of the challenges that being mentioned by Mr Izzat was the readiness of physical infrastructure as well as the broadband connectivity for the public. The final slot in the morning was delivered by me in which I explained much of the way of how End Pointers do their job, open source software that we used, as well as how we accomplish our job remotely. The morning session was adjourned with a lunch break.



Just in case if you are wondering, this is me delivering the talk on that day

Afternoon Session

The afternoon session was a parallel track session, where I chose to attend a talk on Ubuntu's Juju service. The talk was delivered by Mr Khairul Aizat Kamarudzaman from Informology. Mr Aizat's slides for his talk could be read here. Later, Mr Sanjay shared his Asterisk skills in which the server is hosted on the Azure platform. Mr Sanjay showed to us how make phone call from the computer to the mobile phone. Asterisk is different from Skype because it is using an open protocol (SIP) and with open clients. Mr Sanjay showed a demo on his implementation, in which it looks like the setup is to compete with the typical PABX phone system.

For the next slot I decided to enter the slot on TCPTW kernel patch which was delivered by Mr Faisal from Nexoprima. As far as I understood, Mr Faisal reintroduced his own patch for the Linux kernel in order to handle TCP TIME_WAIT issue which was happened due to extremely busy HTTP requests. Since connection in TIME_WAIT state hold a local port for 1 minute and in many distro the default ports are up to 30,000 - the effort put to search for free port(s) will use intensive CPU and it could was CPU cycle to purge tons of TIME_WAIT connections.


Mr Faisal gave his talks on TCPTW kernel patch

Mr Faisal's TCPTW patch for CentOS 7 could be viewed here. His presentation slides could be viewed here.

Before went back home, I decided to enter a talk on "Dockerizing IOT Service" by Mr Syukor. In this talk Mr Syukor gave a bit theoretical background on Docker and how it can be used on Raspberry Pi board. You can view his slides here. My personal thought is that Raspberry Pi is versatile enough to run any modern operating system and Docker should not be much an issue.

Postgres statistics and the pain of analyze

Anytime you run a query in Postgres, it needs to compile your SQL into a lower-level plan explaining how exactly to retrieve the data. The more it knows about the tables involved, the smarter the planner can be. To get that information, it gathers statistics about the tables and stores them, predictably enough, in the system table known as pg_statistic. The SQL command ANALYZE is responsible for populating that table. It can be done per-cluster, per-database, per-table, or even per-column. One major pain about analyze is that every table *must* be analyzed after a major upgrade. Whether you upgrade via pg_dump, pg_upgrade, Bucardo, or some other means, the pg_statistic table is not copied over and the database starts as a clean slate. Running ANALYZE is thus the first important post-upgrade step.

Unfortunately, analyze can be painfully slow. Slow enough that the default analyze methods sometimes take longer that the entire rest of the upgrade! Although this article will focus on the pg_upgrade program in its examples, the lessons may be applied to any upgrade method. The short version of the lessons is: run vacuumdb in parallel, control the stages yourself, and make sure you handle any custom per-column statistics.

Before digging into the solution in more detail, let's see why all of this is needed. Doesn't pg_upgrade allow for super-fast Postgres major version upgrades, including the system catalogs? It does, with the notable exception of the pg_statistics table. The nominal reason for not copying the data is that the table format may change from version to version. The real reason is that nobody has bothered to write the conversion logic yet, for pg_upgrade could certainly copy the pg_statistics information: the table has not changed for many years.

At some point, a DBA will wonder if it is possible to simply copy the pg_statistic table from one database to another manually. Alas, it contains columns of the type "anyarray", which means it cannot be dumped and restored:

$ pg_dump -t pg_statistic --data-only | psql -q
ERROR:  cannot accept a value of type anyarray
CONTEXT:  COPY pg_statistic, line 1, column stavalues1: "{"{i,v}","{v}","{i,o,o}","{i,o,o,o}","{i,i,i,v,o,o,o}","{i,i,o,o}","{i,o}","{o,o,o}","{o,o,o,o}","{o..."

I keep many different versions of Postgres running on my laptop, and use a simple port naming scheme to keep them straight. It's simple enough to use pg_dump and sed to confirm that the structure of the pg_statistic table has not changed from version 9.2 until 9.6:

$ for z in 840 900 910 920 930 940 950; do echo -n $z: ; diff -sq <(pg_dump \
>  --schema-only -p 5$z -t pg_statistic | sed -n '/CREATE TABLE/,/^$/p') <(pg_dump \
>  --schema-only -p 5960 -t pg_statistic | sed -n '/CREATE TABLE/,/^$/p'); done
840:Files /dev/fd/63 and /dev/fd/62 differ
900:Files /dev/fd/63 and /dev/fd/62 differ
910:Files /dev/fd/63 and /dev/fd/62 differ
920:Files /dev/fd/63 and /dev/fd/62 are identical
930:Files /dev/fd/63 and /dev/fd/62 are identical
940:Files /dev/fd/63 and /dev/fd/62 are identical
950:Files /dev/fd/63 and /dev/fd/62 are identical

Of course, the same table structure does not promise that the backend of different versions uses them in the same way (spoiler: they do), but that should be something pg_upgrade can handle by itself. Even if the table structure did change, pg_upgrade could be taught to migrate the information from one format to another (its raison d'ĂȘtre). If the new statistics format take a long time to generate, perhaps pg_upgrade could leisurely generate a one-time table on the old database holding the new format, then copy that over as part of the upgrade.

Since pg_upgrade currently does none of those things and omits upgrading the pg_statistics table, the following message appears after pg_upgrade has been run:

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Looking at the script in question yields:

#!/bin/sh

echo 'This script will generate minimal optimizer statistics rapidly'
echo 'so your system is usable, and then gather statistics twice more'
echo 'with increasing accuracy.  When it is done, your system will'
echo 'have the default level of optimizer statistics.'
echo

echo 'If you have used ALTER TABLE to modify the statistics target for'
echo 'any tables, you might want to remove them and restore them after'
echo 'running this script because they will delay fast statistics generation.'
echo

echo 'If you would like default statistics as quickly as possible, cancel'
echo 'this script and run:'
echo '    vacuumdb --all --analyze-only'
echo

vacuumdb --all --analyze-in-stages
echo

echo 'Done'

There are many problems in simply running this script. Not only is it going to iterate through each database one-by-one, but it will also process tables one-by-one within each database! As the script states, it is also extremely inefficient if you have any per-column statistics targets. Another issue with the --analyze-in-stages option is that the stages are hard-coded (at "1", "10", and "default"). Additionally, there is no way to easily know when a stage has finished other than watching the command output. Happily, all of these problems can be fairly easily overcome; let's create a sample database to demonstrate.

$ initdb --data-checksums testdb
$ echo port=5555 >> testdb/postgresql.conf 
$ pg_ctl start -D testdb
$ createdb -p 1900 alpha
$ pgbench alpha -p 1900 -i -s 2
$ for i in `seq 1 100`; do echo create table pgb$i AS SELECT \* FROM pgbench_accounts\;; done | psql -p 1900 alpha

Now we can run some tests to see the effect of the --jobs option. Graphing out the times shows some big wins and nice scaling. Here are the results of running vacuumdb alpha --analyze-only with various values of --jobs:

Simple graph showing time decreasing as number of jobs increases

The slope of your graph will be determined by how many expensive-to-analyze tables you have. As a rule of thumb, however, you may as well set --jobs to a high number. Anything over your max_connections setting is pointless, but don't be afraid to jack it up to at least a hundred. Experiment on your test box, of course, to find the sweet spot for your system. Note that the --jobs argument will not work on old versions of Postgres. For those cases, I usually whip up a Perl script using Parallel::ForkManager to get the job done. Thanks to Dilip Kumar for adding the --jobs option to vacuumdb!

The next problem to conquer is the use of custom statistics. Postgres' ANALYZE uses the default_statistics_target setting to determine how many rows to sample (the default value in modern versions of Postgres is 100). However, as the name suggests, this is only the default - you may also set a specific target at the column level. Unfortunately, there is no way to disable this quickly, which means that vacuumdb will always use the custom value. This is not what you want, especially if you are using the --analyze-in-stages option, as it will happily (and needlessly!) recalculate columns with specific targets three times. As custom stats are usually set much higher than the default target, this can be a very expensive option:

$ ## Create a largish table:
$ psql -qc 'create unlogged table aztest as select * from pgbench_accounts'
$ for i in {1..5}; do psql -qc 'insert into aztest select * from aztest'; done
$ psql -tc "select pg_size_pretty(pg_relation_size('aztest'))"
820 MB
$ psql -qc '\timing' -c 'analyze aztest'
Time: 590.820 ms  ## Actually best of 10: never get figures from a single run!
$ psql -c 'alter table aztest alter column aid set statistics 1000'
ALTER TABLE
$ psql -qc '\timing' -c 'analyze aztest'
Time: 2324.017 ms ## as before, this is the fastest of 10 runs

As you can see, even a single column can change the analyze duration drastically. What can we do about this? The --analyze-in-stages is still a useful feature, so we want to set those columns back to a default value. While one could reset the stats and then set them again on each column via a bunch of ALTER TABLE calls, I find it easier to simply update the system catalogs directly. Specifically, the pg_attribute table contains a attstattarget column which has a positive value when a custom target is set. In our example above, the value of attstattarget for the aid column would be 1000. Here is a quick recipe to save the custom statistics values, set them to the default (-1), and then restore them all once the database-wide analyzing is complete:

## Save the values away, then reset to default:
CREATE TABLE custom_targets AS SELECT attrelid, attname, attnum, attstattarget
  FROM pg_atttribute WHERE attstattarget > 0;
UPDATE pg_attribute SET attstattarget = -1 WHERE attstattarget > 0;

## Safely run your database-wide analyze now
## All columns will use default_statistics_target

## Restore the values:
UPDATE pg_attribute a SET attstattarget = c.attstattarget
  FROM custom_targets c WHERE a.attrelid = c.attrelid
  AND a.attnum = c.attnum AND a.attname = c.attname;

## Bonus query: run all custom target columns in parallel:
SELECT 'vacuumdb --analyze-only -e -j 100 ' || 
  string_agg(format('-t "%I(%I)" ', attrelid::regclass, attname), NULL)
FROM pg_attribute WHERE attstattarget > 0;

As to the problems of not being able to pick the stage targets for --analyze-in-stages, and not being able to know when a stage has finished, the solution is to simply do it yourself. For example, to run all databases in parallel with a target of "2", you would need to change the default_statistics_target at the database level (via ALTER DATABASE), or at the cluster level (via ALTER SYSTEM). Then invoke vacuumdb, and reset the value:

$ psql -qc 'alter system set default_statistics_target = 2' -qc 'select pg_reload_conf()'
$ vacuumdb --all --analyze-only --jobs 100
$ psql -qc 'alter system reset default_statistics_target' -qc 'select pg_reload_conf()'

In summary, don't trust the given vacuumdb suggestions for a post-upgrade analyze. Instead, remove any per-column statistics, run it in parallel, and do whatever stages make sense for you.

Implementation of Ruby on Rails 5 Action Cable with Chat Application

Ruby on Rails is a wonderful framework for web development. It was lacking for one important feature since the world has been moving to towards realtime data. Everyone wants to see the realtime data on the applications. Mostly, real-time web applications are now accomplished using WebSocket.

WebSocket provides full-duplex communication between server and client using TCP connection once the handshake completed through HTTP protocol. WebSocket transfers streams of messages on top TCP without being solicited by the client which boosts the data transfer performance on high level compare to HTTP request/response.

WebSocket were adopted on RoR applications with help of third party libraries. But Rails 5 came up with a module called ActionCable which is seamlessly sits with existing framework and integrates the WebSocket to the application. ActionCable provides server and client side framework to implement WebSocket with the application.

ActionCable Overview:

Server Side:

Connection: The Connection only handles authentication and authorisation part of logic. The connection object is instantiated when request from the user comes through browser tab or window or devices. Multiple connections could be created when the user access the server from different devices/browser tabs.

Channel: The Channel is the parent of all custom channels and shares the common logic with all channels. The custom channels will stream the messages to client when the corresponding channels were subscribed by client.

Client Side:

The Client side javascript framework have all functionalities to interact with server side. The Consumer will establish a WebSocket connection with Server to do all the communications. Subscriber subscribes the custom channels to receive the messages from the Server without requests.

Prerequisite:
* Ruby 2.2.2+ is the only supported version for Rails 5. Install the gem package and Rails 5 on your environment.
* ActionCable needs puma as development server to support multithreaded feature.

Let's create the rails 5 chat application...! The application structure will have following default action cable related files.

$ rails new action-cable-chat-example
 - Server Side
        app/channels
        app/channels/application_cable
        app/channels/application_cable/connection.rb
        app/channels/application_cable/channel.rb

 - Client Side
        app/assets/javascripts
        app/assets/javascripts/channels
        app/assets/javascripts/channels/.keep
        app/assets/javascripts/application.js
        app/assets/javascripts/cable.js

Below models and controllers need to be created to have basic chat application.

* User, Room and Message models
* users, rooms, messages, sessions and welcome controllers

The commands to create these items are listed below and skipping the code to focus on ActionCable but the code is available at github to refer or clone.

$ bundle install
$ rails g model user name:string
$ rails g model room title:string name:string user:references
$ rails g model message content:text user:references room:references
$ rake db:migrate

$ rails g controller users new create show
$ rails g controller rooms new create update edit destroy index show
$ rails g controller messages create

$ rails g controller sessions new create destroy
$ rails g controller welcome about

Make necessary changes to controllers, models and views to create chat application with chat rooms(Refer Github Repository). Start the application with help of puma server to verify the basic functionalities.

$ rails s -b 0.0.0.0 -p 8000

The application should meet following actions. The User will sign up or login with username to get the access new or existing rooms to chat. The user can write messages on the chat room but the messages won't appear to other users at the moment without refreshing the page. Let's see how ActionCable handles it.

Action Cable Implementation:

Configurations:

There are few configurations to enable the ActionCable on the application.

config/routes.rb - The server should be mounted on specific path to serve websocket cable requests.

mount ActionCable.server => '/cable'

app/views/layouts/application.html.erb - The action_cable_meta_tag passes the WebSocket URL(which is configured on environment variable config.action_cable.url) to consumer.

<%= action_cable_meta_tag %>

app/assets/javascripts/cable.js - The consumer should be created to establish the WebSocket connection to specified URL in action-cable-url.

(function() {
  this.App || (this.App = {});

  App.cable = ActionCable.createConsumer();

}).call(this);

Once ActionCable was enabled, the WebSocket connection will be established on accessing the application from any client. But the messages will transmitted only through channels. Here is the sample handshake to create WebSocket connection.

General:
Request URL:ws://139.59.24.93:8000/cable
Request Method:GET
Status Code:101 Switching Protocols

Request Headers:
Connection:Upgrade
Host:139.59.24.93:8000
Origin:http://139.59.24.93:8000
Sec-WebSocket-Extensions:permessage-deflate; client_max_window_bits
Sec-WebSocket-Key:c8Xg5vFOibCl8rDpzvdgOA==
Sec-WebSocket-Protocol:actioncable-v1-json, actioncable-unsupported
Sec-WebSocket-Version:13
Upgrade:websocket

Response Headers:
Connection:Upgrade
Sec-WebSocket-Accept:v46QP1XBc0g5JYHW7AdG6aIxYW0=
Sec-WebSocket-Protocol:actioncable-v1-json
Upgrade:websocket

The /cable is the default URI. if there is a custom URI, it need to mentioned environment file. The origins need to be allowed in the configuration if it is other than localhost.

environments/development.rb
# config.action_cable.url = 'wss://example.com/cable'
# config.action_cable.allowed_request_origins = [ 'http://example.com', /http:\/\/example.*/ ]
Workflow:

I created a diagram to illustrate how the pieces fit together and explain the workflow.

Channels:

The Server side messages channel need to be created to stream the messages from Server to all subscribed clients and client side framework to subscribe the channels to receive the messages. Execute the channels generator and create messages channels skeleton to code on server and client side.

$ rails generate channel Messages 

 app/channels/messages_channel.rb
 app/assets/javascripts/channels/messages.js

messages_controller.rb - Whenever the user writes a message in the room, it will be broadcasted to 'messages' channel after the save action.

class MessagesController < ApplicationController
  def create
    message = Message.new(message_params)
    message.user = current_user
    if message.save
      ActionCable.server.broadcast 'messages',
        message: message.content,
        user: message.user.username
      head :ok
    end
  end

  private

    def message_params
      params.require(:message).permit(:content, :chatroom_id)
    end
end

messages_channel.rb - Messages channel streams those broadcasted messages to subscribed clients through established WebSocket connection.

class MessagesChannel < ApplicationCable::Channel  
  def subscribed
    stream_from 'messages'
  end
end  

messages.js The MessagesChannel was subscribed on accessing the Rooms to chat. The client side receives the message as per subscriptions and populate on the chat room dynamically.

App.messages = App.cable.subscriptions.create('MessagesChannel', {  
  received: function(data) {
    $("#messages").removeClass('hidden')
    return $('#messages').append(this.renderMessage(data));
  },
  renderMessage: function(data) {
    return "

" + data.user + ": " + data.message + "

"; } });

These ActionCable channel related changes could make the Chat application to receive the messages on realtime.

Conclusion:

Rails Action Cable adds additional credits to framework by supplying the promising needed realtime feature. In addition, It could be easily implemented on existing Rails application with the nature of interacting with existing system and similar structural implementation. Also, The strategy of the channels workflow can be applied to any kind of live data feeding. The production stack uses redis by default (config/cable.yml) to send and receive the messages through channels.