News

Welcome to End Point’s blog

Ongoing observations by End Point people

Postgres statistics and the pain of analyze

Anytime you run a query in Postgres, it needs to compile your SQL into a lower-level plan explaining how exactly to retrieve the data. The more it knows about the tables involved, the smarter the planner can be. To get that information, it gathers statistics about the tables and stores them, predictably enough, in the system table known as pg_statistic. The SQL command ANALYZE is responsible for populating that table. It can be done per-cluster, per-database, per-table, or even per-column. One major pain about analyze is that every table *must* be analyzed after a major upgrade. Whether you upgrade via pg_dump, pg_upgrade, Bucardo, or some other means, the pg_statistic table is not copied over and the database starts as a clean slate. Running ANALYZE is thus the first important post-upgrade step.

Unfortunately, analyze can be painfully slow. Slow enough that the default analyze methods sometimes take longer that the entire rest of the upgrade! Although this article will focus on the pg_upgrade program in its examples, the lessons may be applied to any upgrade method. The short version of the lessons is: run vacuumdb in parallel, control the stages yourself, and make sure you handle any custom per-column statistics.

Before digging into the solution in more detail, let's see why all of this is needed. Doesn't pg_upgrade allow for super-fast Postgres major version upgrades, including the system catalogs? It does, with the notable exception of the pg_statistics table. The nominal reason for not copying the data is that the table format may change from version to version. The real reason is that nobody has bothered to write the conversion logic yet, for pg_upgrade could certainly copy the pg_statistics information: the table has not changed for many years.

At some point, a DBA will wonder if it is possible to simply copy the pg_statistic table from one database to another manually. Alas, it contains columns of the type "anyarray", which means it cannot be dumped and restored:

$ pg_dump -t pg_statistic --data-only | psql -q
ERROR:  cannot accept a value of type anyarray
CONTEXT:  COPY pg_statistic, line 1, column stavalues1: "{"{i,v}","{v}","{i,o,o}","{i,o,o,o}","{i,i,i,v,o,o,o}","{i,i,o,o}","{i,o}","{o,o,o}","{o,o,o,o}","{o..."

I keep many different versions of Postgres running on my laptop, and use a simple port naming scheme to keep them straight. It's simple enough to use pg_dump and sed to confirm that the structure of the pg_statistic table has not changed from version 9.2 until 9.6:

$ for z in 840 900 910 920 930 940 950; do echo -n $z: ; diff -sq <(pg_dump \
>  --schema-only -p 5$z -t pg_statistic | sed -n '/CREATE TABLE/,/^$/p') <(pg_dump \
>  --schema-only -p 5960 -t pg_statistic | sed -n '/CREATE TABLE/,/^$/p'); done
840:Files /dev/fd/63 and /dev/fd/62 differ
900:Files /dev/fd/63 and /dev/fd/62 differ
910:Files /dev/fd/63 and /dev/fd/62 differ
920:Files /dev/fd/63 and /dev/fd/62 are identical
930:Files /dev/fd/63 and /dev/fd/62 are identical
940:Files /dev/fd/63 and /dev/fd/62 are identical
950:Files /dev/fd/63 and /dev/fd/62 are identical

Of course, the same table structure does not promise that the backend of different versions uses them in the same way (spoiler: they do), but that should be something pg_upgrade can handle by itself. Even if the table structure did change, pg_upgrade could be taught to migrate the information from one format to another (its raison d'ĂȘtre). If the new statistics format take a long time to generate, perhaps pg_upgrade could leisurely generate a one-time table on the old database holding the new format, then copy that over as part of the upgrade.

Since pg_upgrade currently does none of those things and omits upgrading the pg_statistics table, the following message appears after pg_upgrade has been run:

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Looking at the script in question yields:

#!/bin/sh

echo 'This script will generate minimal optimizer statistics rapidly'
echo 'so your system is usable, and then gather statistics twice more'
echo 'with increasing accuracy.  When it is done, your system will'
echo 'have the default level of optimizer statistics.'
echo

echo 'If you have used ALTER TABLE to modify the statistics target for'
echo 'any tables, you might want to remove them and restore them after'
echo 'running this script because they will delay fast statistics generation.'
echo

echo 'If you would like default statistics as quickly as possible, cancel'
echo 'this script and run:'
echo '    vacuumdb --all --analyze-only'
echo

vacuumdb --all --analyze-in-stages
echo

echo 'Done'

There are many problems in simply running this script. Not only is it going to iterate through each database one-by-one, but it will also process tables one-by-one within each database! As the script states, it is also extremely inefficient if you have any per-column statistics targets. Another issue with the --analyze-in-stages option is that the stages are hard-coded (at "1", "10", and "default"). Additionally, there is no way to easily know when a stage has finished other than watching the command output. Happily, all of these problems can be fairly easily overcome; let's create a sample database to demonstrate.

$ initdb --data-checksums testdb
$ echo port=5555 >> testdb/postgresql.conf 
$ pg_ctl start -D testdb
$ createdb -p 1900 alpha
$ pgbench alpha -p 1900 -i -s 2
$ for i in `seq 1 100`; do echo create table pgb$i AS SELECT \* FROM pgbench_accounts\;; done | psql -p 1900 alpha

Now we can run some tests to see the effect of the --jobs option. Graphing out the times shows some big wins and nice scaling. Here are the results of running vacuumdb alpha --analyze-only with various values of --jobs:

Simple graph showing time decreasing as number of jobs increases

The slope of your graph will be determined by how many expensive-to-analyze tables you have. As a rule of thumb, however, you may as well set --jobs to a high number. Anything over your max_connections setting is pointless, but don't be afraid to jack it up to at least a hundred. Experiment on your test box, of course, to find the sweet spot for your system. Note that the --jobs argument will not work on old versions of Postgres. For those cases, I usually whip up a Perl script using Parallel::ForkManager to get the job done. Thanks to Dilip Kumar for adding the --jobs option to vacuumdb!

The next problem to conquer is the use of custom statistics. Postgres' ANALYZE uses the default_statistics_target setting to determine how many rows to sample (the default value in modern versions of Postgres is 100). However, as the name suggests, this is only the default - you may also set a specific target at the column level. Unfortunately, there is no way to disable this quickly, which means that vacuumdb will always use the custom value. This is not what you want, especially if you are using the --analyze-in-stages option, as it will happily (and needlessly!) recalculate columns with specific targets three times. As custom stats are usually set much higher than the default target, this can be a very expensive option:

$ ## Create a largish table:
$ psql -qc 'create unlogged table aztest as select * from pgbench_accounts'
$ for i in {1..5}; do psql -qc 'insert into aztest select * from aztest'; done
$ psql -tc "select pg_size_pretty(pg_relation_size('aztest'))"
820 MB
$ psql -qc '\timing' -c 'analyze aztest'
Time: 590.820 ms  ## Actually best of 10: never get figures from a single run!
$ psql -c 'alter table aztest alter column aid set statistics 1000'
ALTER TABLE
$ psql -qc '\timing' -c 'analyze aztest'
Time: 2324.017 ms ## as before, this is the fastest of 10 runs

As you can see, even a single column can change the analyze duration drastically. What can we do about this? The --analyze-in-stages is still a useful feature, so we want to set those columns back to a default value. While one could reset the stats and then set them again on each column via a bunch of ALTER TABLE calls, I find it easier to simply update the system catalogs directly. Specifically, the pg_attribute table contains a attstattarget column which has a positive value when a custom target is set. In our example above, the value of attstattarget for the aid column would be 1000. Here is a quick recipe to save the custom statistics values, set them to the default (-1), and then restore them all once the database-wide analyzing is complete:

## Save the values away, then reset to default:
CREATE TABLE custom_targets AS SELECT attrelid, attname, attnum, attstattarget
  FROM pg_atttribute WHERE attstattarget > 0;
UPDATE pg_attribute SET attstattarget = -1 WHERE attstattarget > 0;

## Safely run your database-wide analyze now
## All columns will use default_statistics_target

## Restore the values:
UPDATE pg_attribute a SET attstattarget = c.attstattarget
  FROM custom_targets c WHERE a.attrelid = c.attrelid
  AND a.attnum = c.attnum AND a.attname = c.attname;

## Bonus query: run all custom target columns in parallel:
SELECT 'vacuumdb --analyze-only -e -j 100 ' || 
  string_agg(format('-t "%I(%I)" ', attrelid::regclass, attname), NULL)
FROM pg_attribute WHERE attstattarget > 0;

As to the problems of not being able to pick the stage targets for --analyze-in-stages, and not being able to know when a stage has finished, the solution is to simply do it yourself. For example, to run all databases in parallel with a target of "2", you would need to change the default_statistics_target at the database level (via ALTER DATABASE), or at the cluster level (via ALTER SYSTEM). Then invoke vacuumdb, and reset the value:

$ psql -qc 'alter system set default_statistics_target = 2' -qc 'select pg_reload_conf()'
$ vacuumdb --all --analyze-only --jobs 100
$ psql -qc 'alter system reset default_statistics_target' -qc 'select pg_reload_conf()'

In summary, don't trust the given vacuumdb suggestions for a post-upgrade analyze. Instead, remove any per-column statistics, run it in parallel, and do whatever stages make sense for you.

Implementation of Ruby on Rails 5 Action Cable with Chat Application

Ruby on Rails is a wonderful framework for web development. It was lacking for one important feature since the world has been moving to towards realtime data. Everyone wants to see the realtime data on the applications. Mostly, real-time web applications are now accomplished using WebSocket.

WebSocket provides full-duplex communication between server and client using TCP connection once the handshake completed through HTTP protocol. WebSocket transfers streams of messages on top TCP without being solicited by the client which boosts the data transfer performance on high level compare to HTTP request/response.

WebSocket were adopted on RoR applications with help of third party libraries. But Rails 5 came up with a module called ActionCable which is seamlessly sits with existing framework and integrates the WebSocket to the application. ActionCable provides server and client side framework to implement WebSocket with the application.

ActionCable Overview:

Server Side:

Connection: The Connection only handles authentication and authorisation part of logic. The connection object is instantiated when request from the user comes through browser tab or window or devices. Multiple connections could be created when the user access the server from different devices/browser tabs.

Channel: The Channel is the parent of all custom channels and shares the common logic with all channels. The custom channels will stream the messages to client when the corresponding channels were subscribed by client.

Client Side:

The Client side javascript framework have all functionalities to interact with server side. The Consumer will establish a WebSocket connection with Server to do all the communications. Subscriber subscribes the custom channels to receive the messages from the Server without requests.

Prerequisite:
* Ruby 2.2.2+ is the only supported version for Rails 5. Install the gem package and Rails 5 on your environment.
* ActionCable needs puma as development server to support multithreaded feature.

Let's create the rails 5 chat application...! The application structure will have following default action cable related files.

$ rails new action-cable-chat-example
 - Server Side
        app/channels
        app/channels/application_cable
        app/channels/application_cable/connection.rb
        app/channels/application_cable/channel.rb

 - Client Side
        app/assets/javascripts
        app/assets/javascripts/channels
        app/assets/javascripts/channels/.keep
        app/assets/javascripts/application.js
        app/assets/javascripts/cable.js

Below models and controllers need to be created to have basic chat application.

* User, Room and Message models
* users, rooms, messages, sessions and welcome controllers

The commands to create these items are listed below and skipping the code to focus on ActionCable but the code is available at github to refer or clone.

$ bundle install
$ rails g model user name:string
$ rails g model room title:string name:string user:references
$ rails g model message content:text user:references room:references
$ rake db:migrate

$ rails g controller users new create show
$ rails g controller rooms new create update edit destroy index show
$ rails g controller messages create

$ rails g controller sessions new create destroy
$ rails g controller welcome about

Make necessary changes to controllers, models and views to create chat application with chat rooms(Refer Github Repository). Start the application with help of puma server to verify the basic functionalities.

$ rails s -b 0.0.0.0 -p 8000

The application should meet following actions. The User will sign up or login with username to get the access new or existing rooms to chat. The user can write messages on the chat room but the messages won't appear to other users at the moment without refreshing the page. Let's see how ActionCable handles it.

Action Cable Implementation:

Configurations:

There are few configurations to enable the ActionCable on the application.

config/routes.rb - The server should be mounted on specific path to serve websocket cable requests.

mount ActionCable.server => '/cable'

app/views/layouts/application.html.erb - The action_cable_meta_tag passes the WebSocket URL(which is configured on environment variable config.action_cable.url) to consumer.

<%= action_cable_meta_tag %>

app/assets/javascripts/cable.js - The consumer should be created to establish the WebSocket connection to specified URL in action-cable-url.

(function() {
  this.App || (this.App = {});

  App.cable = ActionCable.createConsumer();

}).call(this);

Once ActionCable was enabled, the WebSocket connection will be established on accessing the application from any client. But the messages will transmitted only through channels. Here is the sample handshake to create WebSocket connection.

General:
Request URL:ws://139.59.24.93:8000/cable
Request Method:GET
Status Code:101 Switching Protocols

Request Headers:
Connection:Upgrade
Host:139.59.24.93:8000
Origin:http://139.59.24.93:8000
Sec-WebSocket-Extensions:permessage-deflate; client_max_window_bits
Sec-WebSocket-Key:c8Xg5vFOibCl8rDpzvdgOA==
Sec-WebSocket-Protocol:actioncable-v1-json, actioncable-unsupported
Sec-WebSocket-Version:13
Upgrade:websocket

Response Headers:
Connection:Upgrade
Sec-WebSocket-Accept:v46QP1XBc0g5JYHW7AdG6aIxYW0=
Sec-WebSocket-Protocol:actioncable-v1-json
Upgrade:websocket

The /cable is the default URI. if there is a custom URI, it need to mentioned environment file. The origins need to be allowed in the configuration if it is other than localhost.

environments/development.rb
# config.action_cable.url = 'wss://example.com/cable'
# config.action_cable.allowed_request_origins = [ 'http://example.com', /http:\/\/example.*/ ]
Workflow:

I created a diagram to illustrate how the pieces fit together and explain the workflow.

Channels:

The Server side messages channel need to be created to stream the messages from Server to all subscribed clients and client side framework to subscribe the channels to receive the messages. Execute the channels generator and create messages channels skeleton to code on server and client side.

$ rails generate channel Messages 

 app/channels/messages_channel.rb
 app/assets/javascripts/channels/messages.js

messages_controller.rb - Whenever the user writes a message in the room, it will be broadcasted to 'messages' channel after the save action.

class MessagesController < ApplicationController
  def create
    message = Message.new(message_params)
    message.user = current_user
    if message.save
      ActionCable.server.broadcast 'messages',
        message: message.content,
        user: message.user.username
      head :ok
    end
  end

  private

    def message_params
      params.require(:message).permit(:content, :chatroom_id)
    end
end

messages_channel.rb - Messages channel streams those broadcasted messages to subscribed clients through established WebSocket connection.

class MessagesChannel < ApplicationCable::Channel  
  def subscribed
    stream_from 'messages'
  end
end  

messages.js The MessagesChannel was subscribed on accessing the Rooms to chat. The client side receives the message as per subscriptions and populate on the chat room dynamically.

App.messages = App.cable.subscriptions.create('MessagesChannel', {  
  received: function(data) {
    $("#messages").removeClass('hidden')
    return $('#messages').append(this.renderMessage(data));
  },
  renderMessage: function(data) {
    return "

" + data.user + ": " + data.message + "

"; } });

These ActionCable channel related changes could make the Chat application to receive the messages on realtime.

Conclusion:

Rails Action Cable adds additional credits to framework by supplying the promising needed realtime feature. In addition, It could be easily implemented on existing Rails application with the nature of interacting with existing system and similar structural implementation. Also, The strategy of the channels workflow can be applied to any kind of live data feeding. The production stack uses redis by default (config/cable.yml) to send and receive the messages through channels.

Perl Dancer Conference 2016 Day 1

Perl Dancer Conference Day 1

The Perl Dancer Conference is a great event, now in its third year. The event took place in the same location as last year in Vienna, Austria at the Hotel Schani Wien. For those of you who have never visited Vienna, it is a perfect place to bring the family. From visiting the beautiful parks to taking a scenic ride on the Danube River, the beautiful and historic city is known for its rich art and musical culture, and has much to offer.

I was very excited to not only attend but also give a talk this year. My talk titled "Dancing in the Clouds" also coincided with the release of 2 new Perl modules Etcd3 and Dancer2::Plugin::Etcd. This article will be the first of a 3 part series, with the final article a focus on my talk and usage examples with the new modules.

Sawyer X (Sawyer X) - A bus tour through Dancer core

The Captain of Dancer core, SawyerX, took us on a bus tour through the core functionality of Dancer2. Using practical examples of code blocks from core, he explained how different areas of the code base worked. I personally enjoyed his explanation of how hooks are implemented and created. Learning from the 1st iteration of Dancer, the second version shows maturity and stability.

Stefan Hornburg (Racke) - No Act on ACT

If you have ever taken the time to review a Perl conference's website or even purchase tickets to attend you have no doubt been in contact with ACT. "Act (A Conference Toolkit) is a multilingual, template-driven, multi-conference website that can manage the users, talks, schedule, and payment for your conference." While this package has been around for many years, it is somewhat dreaded because of its lack of features.

Stefan outlines his work with Interchange6::Schema and the perl.dance website painting a picture of the replacement for ACT. Utilizing Dancer2, DBIx::Class, Moo and other modern Perl tools the infrastructure outlined is very compelling. The package has a user admin, e-commerce, and even a module to print out the passes. Although he notes that this is not a plug and play replacement for ACT yet, with a bit of work and support, it could be the future of Perl conference management.

Andrew Beverly - Implementing i18n in a Dancer application using Plugin::LogReport

Andrew extended his talk last year about internationalization with the Dancer2::Plugin::LogReport module. Using great examples, he not only outlined the usage of the module, but also got into details of how the process works on a technical level. Explaining the different ways that internationalization can be done, he begins to outline how he achieved his task of deploying i18n in a Dancer app.

Theo van Hoesel - Quickstep

Theo was a great addition to the conference this year. He was able to make the event on very short notice after Dancer core Jason Crome was not able to attend due to injury. Theo outlined the Act::Voyager project briefly and the general requirements of adding user friendly features to the ACT toolkit. He also spent a good amount of time explaining the concept of web caching and how many of the existing modules failed in the task of complying with RFC7234. He then explained how all of this brought him to create HTTP::Caching and how it has "The RFC 7234 compliant brains to do caching right". Along with this contribution part of the HTTP::Bundle, his Dancer focused Dancer2::Plugin::HTTP::Bundle was explained.

Job van Achterberg (jkva) - Dancing with Disabilities

Job's talk was a very interesting look into how taking a considerate approach to development and small changes to your code can improve a disabled web user's experience. Using the tools in macOS Job showed how simple things such as naming a list are reflected in a disabled users ability to get information. What I found very interesting in this presentation was how awkward the tools were to use even for an experienced pro like Job. It really made me think a lot about the challenges the disabled face in something many of us take for granted.

Jason Lewis - The Lazy Programmer's Guide to building HTML tables in Dancer2

Jason has become a regular on the #dancer freenode IRC channel. This year he decided to travel all the way from Australia to give his presentation on his experiences replacing Crystal Reports with Dancer2 Template::Toolkit and DataTables. Although a great deal of the presentation was focused on the features of the jQuery plugin DataTables, he gave nice examples of code he used to populate reports and the hurdles he faced replacing Crystal Reports functionality. The reports looked beautiful and were very easy to convert to other data types such as PDF and CSV.

Stefan Seifert (nine) - Perl 5 and Perl 6 - a great team

Stefan is a great presence at the conference, and his fun and witty personality carried over to his presentation. After opening with a really funny skit as a reporter reading the the news, he continued to outline the current state of Perl6 and how important it is for all of us as a community to embrace the fact that we are all in this together. He reminded us of the perils of Python3's launch and the lack of support even today. He then began to focus on the capabilities of using Perl5 with Perl6 together with Inline::Perl5 and Inline::Perl6 modules. To be honest before his talk I had given Perl6 very little time. Stefan's talk opened my eyes to the possibilities of utilizing the two versions together and the advantages that ecosystem has.

Please stop back for links to day 2 of the conference and a breakdown of my talk outlining etcd integration with Perl and Dancer2.

Throw It Away: Suppressing Writes on PostgreSQL Replicas

We needed a way to suppress specific write commands on a Postgres streaming replica. The replica was set up for a DR configuration, with the applications able to be brought up into full service at a moment's notice. But since it's a hot standby, we'd like to still allow the applications to be used in a working but read-only state.

One of the applications on this database is MediaWiki, which worked great in this configuration. But a couple of the other apps have the classic behavior of updating its user object's "last login" field in one form or another when someone authenticates, which would cause the login process to fail entirely.

Of course we want updates to fail, up until that point when (knock on wood) the master server is declared down for the count and the official fail-over happens. Except for the one command that executes on login.

We don't really care about the "last login" type field -- the data is available through logs and other means. The affected apps could probably all be monkey patched to work around that part of the process. But we had a couple different apps doing this, and that adds a maintenance burden for each. And if we could figure out how to make it work at the database level then it'd work for all of them, plus anything else that might pop up.

The first thing we looked at was writing a trigger to intercept the commands, but triggers don't execute on a hot standby replica so that was out pretty quickly. The next hypothesis was that we could write a foreign data wrapper that'd just absorb the writes, or even just use postgres_fdw to route the commands to a local writable database that's more or less a throw-away data sink. But to our surprise, even writes to foreign tables get rejected on a hot standby. I'm slightly tempted to dig in and see what it'd take to enable that.

The third time was the charm: rules. Rules hook in pretty far down into the query parser, and they can be notoriously tricky to work with. But since they're embedded pretty deep, down to the point where views rely on them they're obeyed even on a replica.

So the technique was this: On the master (... obviously) we set up a separate schema, inside which a view was created with the same name as the target table and which had certain commands suppressed:

CREATE SCHEMA replica;

CREATE VIEW replica.users AS SELECT * FROM public.users;

CREATE RULE users_disable_update AS ON UPDATE TO replica.users DO INSTEAD NOTHING;

Plus any permission adjustments the app user might need. On the master server this schema and view are pretty much ignored, as the application user just uses the default search path. But on the replica, we adjust the default search path in postgresql.conf so that it applies to just that server:

search_path = '"$user",replica,public'
app@app:5432=> UPDATE "auth_user" SET "last_login" = now() WHERE "auth_user"."id" = 13;
UPDATE 0

It doesn't quite work everywhere, sadly! Notice the "UPDATE 0"? We found Django actually checks that, and panics with an error to the user when it tries to change something and the row count it gets back is different than what it expects.

Another caveat is that if the target table's schema changes, the view won't automatically follow. Presumably your user table doesn't receive changes all that often, but if you're applying this technique to something else, that might not be the case. Something to be aware of!

Vim Golf: Learning New Skills for Code Editors

Vim is a text-based editor that has been around for 25 years. It comes pre-installed on Linux distributions, so it is a great tool for developing on servers. One of the advantages of Vim is that oft-used keystrokes can be performed without moving your hands from the keyboard (there is no mouse in Vim).

Many of the engineers here at End Point use Vim for our daily development work, and recently, a few of us got together online to try to learn some new tricks and tips from each other. Being efficient with Vim not only improves productivity, it's a lot of fun.

Similar to playing a round of golf, we tested each other with various editing tasks, to see who could perform the task in the fewest number of keystrokes. This is known as "Vim Golf." There is even an entire website devoted to this.

In this post, we share some of the interesting tricks that were shown, and also some links to further learning about Vim.

Tips & Tricks

  • Indenting text: there are multiple ways to do this, but a few are:
    • Visually-select the lines of text to indent (Ctrl v or Shift v), then > to indent, or < to outdent. Press . to perform this action again and again.
    • Locate the line numbers for the lines you wish to change (:set number to turn on line numbering), then :17,36>> to indent lines 17-36 two indentation levels.
    • Define width of a tab :set tabstop=4 would for example set a tab to 4 spaces.
    • Use spaces defined in tabstop instead of an actual tab character (^I) when the Tab key is pressed :set expandtab or :set et
    • Replace tab settings for current line :retab
    • Replace tab settings for current document :retab!
  • Visually-selecting text: Ctrl v will perform a visual column selection, while Shift v will do a row selection.
  • :set will show all the currently-set options.
  • For paging up or down, use Ctrl b and Ctrl f. You can also use PgUp and PgDn keys if you want to move your hands a bit :-)
  • Moving the cursor around the page:
    • Type M to move the cursor to the middle of the screen
    • Type H to move the cursor to the top of the screen
    • Type L to move the cursor to the bottom of the screen
    • Type gg to move the cursor to the top of the document
    • Type G to move the cursor to the bottom of the document
  • Moving the page around the cursor:
    • Type zz to make the current position float to the middle of the screen
    • Type zt to make the current position float to the top of the screen
    • Type zb to make the current position float to the bottom of the screen
  • Search and replace:
    • Find and replace all instances of a string: %s/find_this/replace_with_this/g
    • Case-insensitive find and replace all instances of a string: %s/find_this/replace_with_this/gi
    • Find then ask confirmation before replacing: %s/find_this/replace_with_this/c
    • Search history: Vim maintains search history which is easy to access using / or ? then navigation through the list using the up and down arrows.
  • Deleting from the current position to the bottom of the file: dG
  • Jumping to the first position in the current line: 0
  • Find the next occurrence of a character in the current line: f then the character. To search backwards, use F
  • Undo a command: u (run this multiple times for additional undo steps)
  • Redo your undo: Ctrl r
  • Travel back in time to see the document as it was 30 mins ago :earlier 30m then revert with :later 30m
  • Reselect the last visual selection gv
  • Run a system command from within Vim :! [command]
  • Review your previous vim command history q:

For Further Learning

With contributions from Sam Batschelet

Connect Multiple JPA repositories using Static and Dynamic Methods

The JPA Repository is a useful Spring Framework library that provides object-relational mapping for Java web applications to be able to connect to a wide variety of databases. Most applications need to establish a connection with one database to store and retrieve the data though sometimes there could be more than one database to read and write. There could also be some cases where the application needs to choose which database should be used dynamically, based on each request's parameters. Let's see how to configure and establish connections for these three cases.

1. Single Static Connection

In order to use JPA the following configurations are required to get the database connection handle and define the interface to map a database table by extending JpaRepository class.

UserRepository.java - this part of the code configures how to map the user table
package com.domain.data;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository <User, Integer> {
}
persistent-context.xml - the dataSourceReadWrite bean class defines the database connection while the entityManagerFactoryReadWrite bean helps to access the database from the base package com.domain
...
<jpa:repositories base-package="com.domain" entity-manager-factory-ref="entityManagerFactoryReadWrite" transaction-manager-ref="transactionManager">
</jpa:repositories>

<bean abstract="true" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="abstractDataSource" p:driverclassname="${jdbc.driverClassName}" p:maxactive="20" p:maxidle="20" p:minidle="20" p:testonborrow="true" p:validationquery="SELECT 1" />

<bean id="dataSourceReadWrite" p:password="${jdbc.password}" p:url="${jdbc.url}" p:username="${jdbc.username}" parent="abstractDataSource" />

<bean abstract="true" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="abstractEntityManagerFactory" p:jpadialect-ref="hibernateJpaDialect" p:jpavendoradapter-ref="jpaAdapter">
  <property name="jpaProperties">
    <props>
      <prop key="hibernate.use_sql_comments">true</prop>
      <prop key="hibernate.temp.use_jdbc_metadata_defaults">false</prop>
    </props>
  </property>
</bean>

<bean id="entityManagerFactoryReadWrite" p:datasource-ref="dataSourceReadWrite" p:persistenceunitname="readWritePU" parent="abstractEntityManagerFactory">
  <property name="packagesToScan">
    <list>
      <value>com.domain</value>
    </list>
  </property>
</bean>

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager" p:datasource-ref="dataSourceReadWrite" p:entitymanagerfactory-ref="entityManagerFactoryReadWrite" />
...
UserController.java - the userRepository object access defines how to use a static database configuration to fetch the User object record
@Api(name = "User", description = "User API Service")
@Controller
public class UserController {

  @Autowired
  private UserRepository userRepository;


  @ApiMethod(
    description = "Return the user object using the userId in the URL",
    produces = {MediaType.APPLICATION_JSON_VALUE},
    roles = {"read", "write"}
  )
  @RequestMapping(value = "/users/{userId}", method = RequestMethod.GET, produces = "application/json")
  @ResponseBody
  public UserModel getUser(@PathVariable @ApiParam(name = "userId", description = "User ID") Integer userId) throws ServiceException {
    User user = (userRepository.findOne(userId));
    if (user != null) {
    return new UserModel(user);
  }
  else {
    throw new ResourceNotFoundServiceException("UserId " + userId + " was not found");
  }
}
}

2. Multiple Static Connections

In some cases, we may need to connect more than one database in our application. Usually there will be a primary database and a secondary one which syncs data from the primary, most likely as a readonly replica load balancing approach. In this case the application needs to be configure to establish connection with two different datasources.

To achieve this result it's possible to define ReadWrite and ReadOnly datasources in the spring configuration and then declare the specific Repository classes for each specific datasource.

UserRepository.java - ReadWrite repository definition under the package com.domain.data
package com.domain.data;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository<User, Integer> {
}
UserReadonlyRepository.java - ReadOnly repository definition under the package com.domain.data.readonly
package com.domain.data.readonly;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserReadonlyRepository extends JpaRepository<User, Integer> {
}
persistent-context.xml - this file defines two different datasources (dataSourceReadWrite and dataSourceReadOnly) while jpa repositories specify the repositories package path
...
<jpa:repositories base-package="com.domain" entity-manager-factory-ref="entityManagerFactoryReadWrite" transaction-manager-ref="transactionManager">
  <repository:exclude-filter expression="com.domain.data.readonly" type="regex"></repository:exclude-filter>
</jpa:repositories>

<jpa:repositories base-package="com.domain.data.readonly" entity-manager-factory-ref="entityManagerFactoryReadOnly" transaction-manager-ref="transactionManagerReadOnly">

<bean abstract="true" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="abstractDataSource" p:driverclassname="${jdbc.driverClassName}" p:maxactive="20" p:maxidle="20" p:minidle="20" p:testonborrow="true" p:validationquery="SELECT 1" />
<bean id="dataSourceReadWrite" p:password="${jdbc.password}" p:url="${jdbc.url}" p:username="${jdbc.username}" parent="abstractDataSource" />
<bean id="dataSourceReadOnly" p:password="${jdbc.readonly.password}" p:url="${jdbc.readonly.url}" p:username="${jdbc.readonly.username}" parent="abstractDataSource" />
<bean abstract="true" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="abstractEntityManagerFactory" p:jpadialect-ref="hibernateJpaDialect" p:jpavendoradapter-ref="jpaAdapter">
  <property name="jpaProperties">
    <props>
      <prop key="hibernate.use_sql_comments">true</prop>
      <prop key="hibernate.temp.use_jdbc_metadata_defaults">false</prop>
    </props>
  </property>
</bean>

<bean id="entityManagerFactoryReadWrite" p:datasource-ref="dataSourceReadWrite" p:persistenceunitname="readWritePU" parent="abstractEntityManagerFactory">
  <property name="packagesToScan">
    <list>
      <value>com.domain</value>
    </list>
  </property>
</bean>

<bean id="entityManagerFactoryReadOnly" p:datasource-ref="dataSourceReadOnly" p:persistenceunitname="readOnlyPU" parent="abstractEntityManagerFactory">
  <property name="packagesToScan">
    <list>
      <value>com.domain</value>
    </list>
  </property>
</bean>

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager" p:datasource-ref="dataSourceReadWrite" p:entitymanagerfactory-ref="entityManagerFactoryReadWrite" />

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManagerReadOnly" p:datasource-ref="dataSourceReadOnly" p:entitymanagerfactory-ref="entityManagerFactoryReadOnly" />
...
UserController.java - in this definition it's interesting to note the the readonly flag, which will establish a connection with ReadWrite or ReadOnly database, based on that flag value
@Api(name = "User", description = "User API Service")
@Controller
public class UserController {

  @Autowired
  private UserRepository userRepository;
  @Autowired
  private UserReadOnlyRepository userReadOnlyRepository;

  @ApiMethod(
  description = "Return the user object using the userId in the URL",
  produces = {MediaType.APPLICATION_JSON_VALUE},
  roles = {"read", "write"}
  )
  @RequestMapping(value = "/users/{userId}", method = RequestMethod.GET, produces = "application/json")
  @ResponseBody
  public UserModel getUser(@PathVariable @ApiParam(name = "userId", description = "User ID") Integer userId, @ApiParam(name = "readOnly", description = "Param to set data source to read from") Boolean readOnly) throws ServiceException {
    User user = (readOnly ?
    userReadOnlyRepository.findOne(userId) : userRepository.findOne(userId));

    if (user != null) {
      return new UserModel(user);
    }
    else {
      throw new ResourceNotFoundServiceException("UserId " + userId + " was not found");
    }
  }
}

3. Multiple Dynamic Connections

Recently there was an application that needed to choose the database during API request processing. Unfortunately defining multiple datasources and choosing the database based on the hard coded checks in the code is really cumbersome. Instead it's possible to use JPA Repository which provides a feature to override the database lookup dynamically using AbstractRoutingDataSource when a request is sent to the application.

UserRepository.java - defines mapping to the user table
package com.domain.data;

import com.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository<User, Integer> {
}
persistence-context.xml - dataSourceRootDB and dataSourceLiveDB beans defines two different databases. MultitenantRoutingDataSource holds the datasources available to chose dynamically from the code
...
<jpa:repositories base-package="com.domain" entity-manager-factory-ref="genericEntityManagerFactory" transaction-manager-ref="transactionManager">
</jpa:repositories>

<bean abstract="true" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="abstractDataSource" p:driverclassname="${jdbc.driverClassName}" p:maxactive="20" p:maxidle="20" p:minidle="20" p:testonborrow="true" p:validationquery="SELECT 1" />

<bean id="dataSourceRootDB" p:password="${jdbc.password}" p:url="${jdbc.url}" p:username="${jdbc.username}" parent="abstractDataSource" />

<bean id="dataSourceLiveDB" p:password="${jdbc.livedb.password}" p:url="${jdbc.livedb.url}" p:username="${jdbc.livedb.username}" parent="abstractDataSource" />

<bean class="com.domain.route.MultitenantRoutingDataSource" id="dataSource">
  <property name="targetDataSources">
    <map key-type="java.lang.String">
      <entry key="rootdb" value-ref="dataSourceRootDB"></entry>
      <entry key="livedb" value-ref="dataSourceLiveDB"></entry>
    </map>
  </property>
  <property name="defaultTargetDataSource" ref="dataSourceRootDB">
  </property>
</bean>

<bean class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="genericEntityManagerFactory" p:datasource-ref="dataSource" p:jpadialect-ref="hibernateJpaDialect" p:jpavendoradapter-ref="jpaAdapter" p:persistenceunitname="readWriteDynamicPU">
  <property name="jpaProperties">
    <props>
      <prop key="hibernate.use_sql_comments">true</prop>
      <prop key="hibernate.temp.use_jdbc_metadata_defaults">false</prop>
    </props>
  </property>
  <property name="packagesToScan">
    <list>
      <value>com.data.domain</value>
    </list>
  </property>
</bean>

<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager" p:datasource-ref="dataSource" p:entitymanagerfactory-ref="genericEntityManagerFactory" />
...
UserController.java - this class choose the datasource dynamically based on the request and calls the selected service to complete the action
...
@Api(name = "User", description = "User API Service") 
@Controller public class UserController {
    @Autowired     private UserService userService;

    @ApiMethod(
            description = "Return the user object using the userId in the URL",
            produces = {MediaType.APPLICATION_JSON_VALUE},
            roles = {"oms-read", "oms-write"}
    )
    @RequestMapping(value = "/users/{userId}", method = RequestMethod.GET, produces = "application/json")
    @ResponseBody
    public UserModel getUser(@PathVariable @ApiParam(name = "userId", description = "User ID") Integer userId, @RequestHeader(value="X-Database", defaultValue= DatabaseEndpointUtils.ROOT_DB, required=false) String databaseEndpoint) throws ServiceException {
        MultiTenantContext.setCurrentTenant(databaseEndpoint);
        return userService.getUser(userId, true);
    }
}
...
MultiTenantContext.java - this code sets the datasource connection based on the request from Controller
package com.domain.common;

import com.domain.util.DatabaseEndpointUtils;
import com.domain.supplychain.app.ws.exceptions.InvalidDatabaseEndpointException;
import com.domain.exceptions.ServiceException;

public class MultiTenantContext {
    private static ThreadLocal<Object> currentTenant = new ThreadLocal<>();

    public static Logger logger = LoggerFactory.getLogger(MultiTenantContext.class.getName());
    public static void setCurrentTenant(Object tenant) throws ServiceException {
        logger.info("MultiTenantContext setCurrentTenant: [{}]", tenant);
        if(DatabaseEndpointUtils.isValidEndpoint(tenant.toString())) {
            currentTenant.set(tenant);
        } else {
            throw new InvalidDatabaseEndpointException("Invalid database endpoint");
        }
    }

    public static Object getCurrentTenant() {
        logger.info("MultiTenantContext getCurrentTenant: [{}]", currentTenant.get());
        return currentTenant.get();
    }

}
MultitenantRoutingDataSource.java - here there's the definition which determines how the datasource establish the connection. Specifically it will get the datasource which was set previously based on the request parameters
package com.domain.route;  
import com.domain.common.MultiTenantContext; 
import org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource;  

public class MultitenantRoutingDataSource extends AbstractRoutingDataSource {

    private Logger logger = LoggerFactory.getLogger(MultitenantRoutingDataSource.class.getName());
    @Override
    protected Object determineCurrentLookupKey() {
        logger.info("MultitenantRoutingDataSource determineCurrentLookupKey: [{}]", MultiTenantContext.getCurrentTenant());
        return MultiTenantContext.getCurrentTenant();
    }

}
DefaultUserService.java - Fetch the user data from the dynamically chosen database.
@Service
public class DefaultUserService implements UserService {

    @Autowired
    private UserRepository userRepository;

    @Override
    @Transactional
    public UserModel getUser(Integer userId, boolean readOnly) throws ServiceException {
        User user = (userRepository.findOne(userId));
        if (user != null) {
            return new UserModel(user);
        }
        else {
            throw new ResourceNotFoundServiceException("UserId " + userId + " was not found");
        }
    }
}

Conclusion

The application establishes a connection with database through any one of these methods, based on the requirement. The single or multiple static connections are commonly used in most of the applications. But when there is a requirement to choose the database dynamically to establish a connection, AbstractRoutingDataSource class in spring framework features a wonderful way to implement the functionality as explained above.

DNS and BIND Training with MyNIC

This is yet another yesteryear's story!

I had a chance to attend a DNS/Bind training which was organized by Malaysia's domain registry (MyNIC). The training took two days and was organized in Bangi, Selangor, Malaysia. Dated November 23 to 24, 2015, the two days' training was packed with technical training for the Domain Name System (DNS) using BIND software. Our trainer was Mr Amir Haris, who is running his own DNS specialist company named Localhost Sendirian Berhad (Sendirian Berhad is equivalent to "Private Limited").

Day One

For Day One, the trainer, Mr Amir Haris taught us on the theoretical details of the DNS. For a start, Mr Amir explained to us on the DNS operation, in which he explained the basic of what DNS is and the function of root servers. Later he explained further on the root servers' functions and locations. It was the followed by the explanation of query process.

alternate text

Mr Amir also explained to us the difference of DNS implementations across different operating system platforms. As for the training since we were using BIND as the name server’s software, we we exposed to the historical background of BIND.

The concept of master and slave DNS server was also being taught. In the master, the server will notify the secondary server if any change happened by the NOTIFY message. The NOTIFY message serves as a method to info the slave(s) that the zone file has changed. The NS records in the zone files are being used to determine who the slave(s) are. The benefit of NOTIFY is that it cuts down the delay for changes.

Day Two

For the second day we were doing pretty much on the DNS practical implementation. Each of us were a given a virtual machine access in order to experience our own BIND setup.

The contents of our lab training are as follows:

  • Installing BIND 9
  • Setting up RNDC
  • Setting up logging
  • Recursive and Cache DNS
  • Authoritative NS - Master and Slave
  • Delegation
  • IPv6
  • DNS Security Extensions (DNSSEC)
  • Stealth (split) DNS
  • Hardening DNS systems

Recursive and Cache Nameserver

Three minimum zone files is needed which are:

  • localhost (forward zone)
  • localhost (reverse zone)
  • root

Forward zone file

;filename 127.0.0.fwd
$TTL 345600
@               IN      SOA     localhost. hostmaster.localhost. (
                                2015112401      ; Serial
                                3600            ; Refresh
                                900             ; Retry
                                3600000         ; Expire
                                3600            ; Min TTL
                                )
                IN      NS      localhost.
localhost.      IN      A 127.0.0.1

Reverse zone file

; filename 127.0.0.rev
$TTL 345600
@               IN      SOA     localhost. hostmaster.localhost.        (
                                2015112401      ; Serial
                                3600            ; Refresh
                                900             ; Retry
                                3600000         ; Expire
                                3600            ; Min TTL
                                )
                IN      NS      localhost.
1               IN      PTR     localhost.

We also had chance to "get our hands dirty" for domain name setup with the DNSSEC key.

At the end of the training we were given brief explanation on DNS hardening. In short they are as follows:

  • Isolate DNS from other service
  • Run named as non-root
  • Hide BIND version
  • Use logging
  • Control zone transfer and queries
  • Prevent DNS server from responding to DNS traffic from certain networks
  • Patch BIND whenever a patch is available or when current bind version has vulnerabilities

In conclusion we were pretty much exposed to the introductory part of DNS. Honestly two days are not enough to cover all in detail but it was well done and gave a good start for attendees to initiate further self study and experimentation.

At End Point we are experienced hosting our own and clients' DNS service using BIND (including in more exotic split-horizon setups) and nsd, and using common SaaS DNS providers such as UltraDNS, SoftLayer, Amazon Web Services Route 53, Hurricane Electric, etc.

DNS has largely become an unseen commodity service to Internet users in general, but that makes it all the more important to have skill handling DNS changes safely, and due to the occasional need for unusual configurations and coping with DDoS attacks such as the recent major attack on Dyn.