News

Welcome to End Point’s blog

Ongoing observations by End Point people

Using PostgreSQL cidr and inet types, operators, and functions

A common problem in database design is how to properly store IP addresses: They are essentially positive 32 or 128 bit integers, but they’re more commonly written as blocks of 8 or 16 bit integers written in decimal and separated by periods (in IPv4) or written in hexadecimal and separated by colons (in IPv6). That may lead many people to store IP addresses as strings, which comes with a host of downsides.

For example, it’s difficult to do subnet comparisons and validation requires complex logic or regular expressions, especially when considering the variety of ways IPv6 addresses can be stored with upper- or lower-case hexadecimal letters, :0000: or :0: for a group of zeros, or :: as several groups of zeros (but only once in an address).

Storing them as integers or a fixed number of bytes forces parsing and a uniform representation, but is otherwise no better.

To solve these problems in PostgreSQL, try using the inet and cidr types, which are designed for storing host and network addresses, respectively, in either IPv4 or IPv6 or both.

In many cases, these types could end up simply being used as glorified integers: they display nicely as IP addresses should be, and support addition, subtraction, and bitwise operations, and basic numeric comparisons:

phin=> select inet '192.168.0.1' + 256;
  ?column?
-------------
 192.168.1.1
(1 row)

phin=> select inet '::2' - 1;
 ?column?
----------
 ::1
(1 row)

phin=> select inet '192.168.0.1' - inet '192.168.0.0';
 ?column?
----------
        1
(1 row)

phin=> select inet 'Fff0:0:007a::' > inet '192.168.0.0';
 ?column?
----------
 t
(1 row)

(It’s worth noting that there is no addition operation for two inet or cidr values.)

But if you take a look at the fantastic official documentation, you’ll see that these types support also support helpful containment operators and utility functions which can be used when working with historical IP addresses of ecommerce customers stored in a database.

These include operators to check if one value contains or is contained by another (>> and <<), or the same with equality (>>= and <<=), or containment going either way (&&). Take this very real™ table of ecommerce orders, where order #3 from IP address 23.239.26.78 has committed or attempted some sort of ecommerce fraud:

phin=> select * from orders order by id;
 id |        ip        | fraud
----+------------------+-------
  1 | 23.239.26.161/24 | f
  2 | 23.239.232.78/24 | f
  3 | 23.239.26.78/24  | t
  4 | 23.239.23.78/24  | f
  5 | 23.239.26.78/24  | f
  6 | 23.239.232.78/24 | f
(6 rows)

By using the network(inet) function, we can identify other orders from the same network:

phin=> select * from orders where network(ip) = (
    select network(ip) from orders where id=3
);
 id |        ip        | fraud
----+------------------+-------
  1 | 23.239.26.161/24 | f
  5 | 23.239.26.78/24  | f
  3 | 23.239.26.78/24  | t
(3 rows)

Or, if we’ve identified 23.239.20.0/20 as a problematic network, we can use the <<= "is contained by or equals" operator:

phin=> select * from orders where ip <<= inet '23.239.20.0/20'
 id |        ip        | fraud
----+------------------+-------
  1 | 23.239.26.161/24 | f
  4 | 23.239.23.78/24  | f
  5 | 23.239.26.78/24  | f
  3 | 23.239.26.78/24  | t
(4 rows)

This is a bit of a juvenile example, but given a larger and more real database, these tools can be used for a variety of security and analytics purposes. Identifying common networks for fraud was already given as an example. They can also be used to find common network origins of attacks; tie historic data to known problematic networks, addresses, and proxies; help correlate spam submissions on forums, blogs, or in email; and create and analyze complex network or security auditing records.

At any rate, they’re a useful tool and should be taken advantage of whenever the opportunity presents itself.

Working on production systems


(Not as easy as it looks)
Photo by Flickr user 'edenpictures'

As consultants, the engineers at End Point are often called upon to do work on production systems - in other words, one or more servers that are vital to our client's business. These range from doing years of planning to perform a major upgrade for a long-standing client, down to jumping into a brand-new client for an emergency fix. Either way, the work can be challenging, rewarding, and a little bit nerve-wracking.

Regardless of how you end up there, following some basic good practices may reduce the chance of problems coming up, keep you calm during the chaos, and make the process easier for both you and the client.

Preparation

Unless this is a true emergency, doing major work on a production system (e.g. upgrading a database server) should involve a good amount or preparation. By the time the big night arrives (yes, it is always late at night!) you should have gone through this list:

  • Do lots and lots of testing. Use systems as close as possible to production, and run through the process until it becomes second nature.
  • Know the impact on the business, and the downtime window anticipated by the rest of the company.
  • Have a coworker familiar with everything on standby, ready to call in if needed.
  • Document the process. For a large project, a spreadsheet or similar document can be quite helpful.
    • Who are the people involved, how to contact them, and what general role do they play?
    • Which task is being done at what time, who is the primary and backup for it, and what other tasks does it block?
    • How is success for each stage measured?
    • What is the rollback plan for each step? When is the point of no return reached?
  • Setup a shared meeting space (IRC, Skype, Slack, Zulip, HipChat, etc.)
  • Confirm connections (e.g. VPN up and running? Passwords not set to expire soon? Can everyone get to Slack? SSH working well?)

Screen Up

The night usually begins with connecting to a client server via SSH. The first order of business should be to invoke 'screen' or 'tmux', which are persistent session managers (aka terminal multiplexers). These keep your connections; if your network drops, so you can easily pick up where you left off. They also allow other people to view and/or join in on what you are doing. Finally, they enable you to easily view and control multiple windows. Some screen-specific tips:

  • Name your screen something obvious to the task such as "bucardo-production-rollout". Always give it a name to prevent people from joining it by accident. I often use just my email, i.e. screen -S greg@endpoint.com or tmux new -s greg_endpoint_com.
  • Keep organized. Try to keep each window to one general task, and give each one a descriptive name:
    • screen: Ctrl-a A     tmux: Ctrl-b ,
    • I find that uppercase names stand out nicely from your terminal traffic.
    • Splitting windows by task also helps scrollback searching.
  • Boost your scrollback buffer so you can see what happened a while ago. The default value is usually much too low.
    • Inside /etc/screenrc or ~/.screenrc: defscrollback 50000
    • Inside /etc/tmux.conf or ~/.tmux.conf: set-option -g history-limit 50000
  • Develop a good configuration file. Nothing too fancy is needed, but being able to see all the window names at once on the bottom of the screen makes things much easier.
  • Consider logging all your screen output.

Discovery and Setup

If you don't already know, spend a little bit of time getting to know the server. A deep analysis is not needed, but you should have a rough idea how powerful it is (CPU, memory), how big it is (disk space), what OS it is (distro/version), and what else is running on it. Although one should be able to easily work on any unmodified *nix server, I almost always make a few changes:

  • Install minimal support software. For me, that usually means apt-get install git-core emacs-nox screen mlocate.
  • Put in some basic aliases to help out. Most important being rm='rm -i'.
  • Switch to a lower-priv account and do the work there when possible. Use sudo instead of staying as root.

For Postgres, you also want to get some quick database information as well. There are many things you could learn, but at a bare minimum check out the version, and per-user settings, the databases and their sizes, and what all the non-default configuration settings are:

select version();
\drds
\l+
select name, setting, source from pg_settings where source <> 'default' order by 1;

Version Control (git up)

One of the earliest steps is to ensure all of your work is done in a named subdirectory in a non-privileged account. Everything in this directory should be put into version control. This not only timestamps file changes for you, but allows quick recovery from accidentally removing important files. All your small scripts, your configuration files (except .pgpass), your SQL files - put them all in version control. And by version control, I of course mean git, which has won the version control wars (a happy example of the most popular tool also being the best one). Every time you make a change, git commit it.

psql

As a Postgres expert, 99% of my work is done through psql, the canonical command-line client for Postgres. I am often connecting to various database servers in quick succession. Although psql is an amazing tool, there are important considerations to keep in mind.

The .psql_history file, along with readline support, is a wonderful thing. However, it is also a great footgun, owing to the ease of using the "up arrow" and "Ctrl-r" rerun SQL statements. This can lead to running a command on server B that was previously run on server A (and which should never, ever be run on server B!). Here are some ways around this issue:

One could simply remove the use of readline when on a production database. In this way, you will be forced to type everything out. Although this has the advantage of not accidentally toggling back to an older command, the loss of history is very annoying - and it greatly increases the chance of typos, as each command needs to be typed anew.

Another good practice is to empty out the psql_history file if you know it has some potentially damaging commands in it (e.g. you just did a lot of deletions on the development server, and are now headed to production.). Do not simply erase it, however, as the .psql_history provides a good audit trail of exactly what commands you ran. Save the file, then empty it out:

$ alias clean_psql_history='cat ~/.psql_history >> ~/.psql_history.log; truncate -s0 ~/.psql_history'

Although the .psql_history file can be set per-database, I find it too confusing and annoying in practice to use. I like being able to run the same command on different databases via the arrow keys and Ctrl-r.

It is important to exit your psql sessions as soon as possible - never leave it hanging around while you go off and do something else. There are multiple reasons for this:

  • Exiting psql forces a write of the .psql_history file. Better to have a clean output rather than allowing all the sessions to interleave with each other. Also, a killed psql session will not dump its history!
  • Exiting frees up a database connection, and prevents you from unintentionally leaving something idle in transaction.
  • Coming back to an old psql session risks a loss of mental context - what was I doing here again? Why did I leave this around?
  • Less chance of a accidental paste into a window with a psql prompt (/voice_of_experience).

Another helpful thing for psql is a good custom prompt. There should always be some way of telling which database - and server - you are using just by looking at the psql prompt. Database names are often the same (especially with primaries and their replicas), so you need to add a little bit more. Here's a decent recipe, but you can consult the documentation to design your own.

$ echo "\set PROMPT1 '%/@%m%R%#%x '" >> ~/.psqlrc
$ psql service=PROD
product_elements@topeka=> 

Connection Service File

Using a connection service file to access Postgres can help keep things sane and organized. Connection files allow you to associate a simple name with multiple connection parameters, allowing abstraction of those details in a manner much safer and cleaner than using shell aliases. Here are some suggestions on using connection files:

  • If possible, use the local user's file (~/.pg_service.conf), but the global file is fine too. Whichever you choose, do not put the password inside them - use the .pgpass file for that.
  • Use short but descriptive service names. Keep them very distinct from each other, to reduce the chance of typing in the wrong name.
  • Use uppercase names for important connections (e.g. PROD for the connection to an important production database). This provides another little reminder to your brain that this is not a normal psql connection. (Remember, PROD = Please Respect Our Data)
  • Resist the temptation to alias them further. Force yourself to type everything out each time, e.g. "psql service=PROD" each time. This keeps your head focused on where you are going.

Dangerous SQL

Some actions are so dangerous, it is a good idea to remove any chance of direct invocation on the wrong server. The best example of this is the SQL 'truncate' command. If I find myself working on multiple servers in which I need to truncate a table, I do NOT attempt to invoke the truncate command directly. Despite all precautions, there are many ways to accidentally run the same truncate command on the wrong serve, whether via a .psql_history lookup, or simply an errant cut-n-paste error. One solution is to put the truncate into a text file, and then invoke that text file, but that simply adds the chance that this file may be invoked on the wrong database. Another solution is to use a text file, but change it when done (e.g. search and replace "truncate" to "notruncate"). This is slightly better, but still error prone as it relies on someone remembering to change the file after each use, and causes the file to no longer represent what was actually run.

For a better solution, create a function that only exists on one of the databases. For example, if you have a test database and a production database, you can run truncate via a function that only exists on the test database. Thus, you may safely run your function calls on the test server, and not worry if the functions accidentally get run on the production server. If they do, you end up with a "function not found error" rather than realizing you just truncated a production table. Of course, you should add some safeguards so that the function itself is never created on the production server. Here is one sample recipe:

\set ON_ERROR_STOP on

DROP FUNCTION IF EXISTS safetruncate(text);

CREATE FUNCTION safetruncate(tname text)
RETURNS TEXT
LANGUAGE plpgsql
AS $ic$
  BEGIN
    -- The cluster_name setting can be quite handy!
    PERFORM 1 FROM pg_settings WHERE name = 'cluster_name' AND setting = 'testserver';
    IF NOT FOUND THEN
      RAISE EXCEPTION 'Cannot create this function on this server!';
    END IF;

    EXECUTE FORMAT('truncate table %s', tname);

    RETURN 'Truncated table ' || tname;
  END;
$ic$;

Now you can create text files full of indirect truncate calls - filled with SELECT safetruncate('exultations'); instead of literal truncate calls. This file may be safely checked into git, and copy and pasted without worry.

Record keeping

The value of keeping good records of what you are doing on production systems cannot be overstated. While you should utimately use whatever system is best for you, I like to keep a local text file that spells out exactly what I did, when I did it, and what happened. These should be detailed enough that you can return to them a year later and explain to the client exactly what happened.

The primary way to document things as you go along is with good old fashioned cut and paste. Both the .bash_history and .psql_history files provide automatic tracking of entered commands, as a nice backup to your notes. Make liberal use of the tee(1) command to store command output into discrete files (which reduces the need to rely on scrollback).

Rather than entering commands directly into a psql prompt, consider putting them into a text file and then feeding that file to psql. It's a little extra work, but the file can be checked into git, giving an excellent audit trail. Plus, you may have to run those commands again someday.

If you find yourself doing the same thing over and over, write a shell script. This even applies to psql commands. For example, I recently found myself having to examine tables on two servers, and needed to quickly examine a table's size, indexes, and if anyone was currently modifying it. Thus a shall script:

psql service=PROD -c "\\d $1"
psql service=PROD -Atc "select pg_relation_size('$1')"
psql service=PROD -Atc "select query from pg_stat_activity where query ~ '$1'"

Wrap Up

When you are done with all your production tasks, write up some final thoughts on how it went, and what the next steps are. Read through all your notes while it is all fresh in your mind and clean them up as needed. Revel in your success, and get some sleep!

Liquid Galaxy Goes to Chile for IMPAC4



Marine Protected Areas: Bringing the people and ocean together


Earlier this month, The Marine Conservation Institute and the Waitt Foundation brought a Liquid Galaxy to Chile to showcase interactive and panoramic ocean content at the Fourth International Marine Protected Areas Congress in La Serena.  This conference was a convergence of marine biologists, ocean agencies, and environmentalists from around the globe to discuss the state of and future development initiatives of Marine Protected Areas worldwide.

 The Marine Conservation Institute is working on a mapping project called MPAtlas that visually catalogs the development of Marine Protected Areas across the globe as well as the Global Ocean Refuge System which pushes for elevated standards for Marine Protected Areas and advocates for a 30% protected marine ecosystem by 2030.

mpatlas.org

We built new content to showcase the GLORES areas in Google Earth as well as data visualizations of the global system of Marine Protected Areas. In addition, we collected any past oceanographic related content we’ve developed for the Liquid Galaxy platform. This included underwater panoramic media from the Catlin Seaview Survey, Mission Blue Hope Spots, Google Street View collections of underwater photos such as this project which catalogs the Great Barrier Reef.



Minister of the Environment Marcelo Mena Carrasco gives the
Liquid Galaxy a go

Aside from showcasing this curated content, the Liquid Galaxy served as a comprehensive interactive tool for scientists and environmentalists to showcase their research areas during the break periods between lectures and events occurring at the Congress. Since the Liquid Galaxy utilizes the entire globe, conference attendees were able to free fly to their respective research and/or proposed protected areas as an impromptu presentation aid and further explore underwater terrain and panoramic media in their location.

Liquid Galaxy at IMPAC4 - La Serena, Chile 2017

The Liquid Galaxy platform is featured in museums and aquariums around the world, and we are thrilled that it is being used as a tool to study and conserve oceans and nature. We recently had the opportunity to participate in The Ocean Conference at the United Nations, and are excited the system continues to be utilized to study and prevent future detrimental changes to our planet. We hope to have more opportunities to create content geared toward nature conservation, as well as opportunities to share the Liquid Galaxy with environmentalists so that the system will continue be used as a tool for visualizing research data in new and interesting ways.

Web Security Services Roundup

Security is often a very difficult thing to get right, especially when it’s not easy to find reliable or up-to-date information or the process of testing can be confusing and complicated. We have a lot of history and experience working on the security of websites and servers, and we’ve found many tools and websites to be very helpful. Here is a collection of those.

Server-side security

There are a number of tools available that can scan your website to check for common vulnerabilities and the quality of SSL/TLS configuration, as well as give great tips on how to improve security for your website.

  • Qualys SSL Labs Server Test takes a simple domain name, performs a series of tests from a variety of clients, and returns a simple letter grade (from A+ down to F) indicating the quality of your SSL/TLS configuration, as well as a detailed summary for a host of configuration options. It covers certificates key and algorithms; TLS and SSL configurations; cipher suites; handshakes on a wide variety of platforms including Android, iOS, Chrome, Firefox, Internet Explorer and Edge, Safari, and others; common protocols and vulnerabilities; and other details.
  • HTTP Security Report does a similar scan, but provides a much more simplified summary of a website, with a numeric score from 0 to 100. It gives a simple, easy to understand list of results, with a green check mark or a red X to indicate whether something is configured for security or not. It also provides short paragraphs explaining settings and recommended configurations.
  • HT-Bridge SSL/TLS Server Test is very similar to Qualys SSL Labs Server Test, but provides some valuable extra information, such as PCI-DSS, HIPAA, and NIST guidelines compliance, as well as industry best practices and basic analysis of third-party content.
  • securityheaders.io is another letter-grade scan, but focuses on server headers only. It provides simple explanations for each recommended server header and links to guides on how to configure them correctly.
  • Observatory by Mozilla scans and gives information on HTTP, TLS, and SSH configuration, as well as simple summaries from other websites, including Qualys, HT-Bridge, and securityheaders.io as covered above.
  • SSL-Tools is focused on SSL and TLS configuration and certificates, with tools to scan websites and mail servers, check for common vulnerabilities, and decode certificates.
  • Microsoft Site Scan performs a series of simple tests, focused more on general website guidelines and best practices, including tests for outdated libraries and plugins which can be a security issue.
  • testssl.sh, the final website scanning tool I’ll cover, is a more advanced bash script that covers many of the same things these other websites do, but provides lots of options for fine-tuning test methods, returned information, and testing abnormal configurations. It’s also open source and doesn’t rely on any third parties.

These websites provide valuable information on SSL/TLS which can be used to create a secure, fast, and functional server configuration:

  • Security/Server Side TLS on the Mozilla wiki is a fantastic page which provides great summaries, recommendations, and reference information on many TLS topics, including handshakes, OCSP Stapling, HSTS, HPKP, certificate ciphers, and common attacks.
  • Mozilla SSL Configuration Generator is a simple tool that generates boilerplate server configuration files for common servers, including Apache and Nginx, and specific server and OpenSSL versions. It also allows you to target “modern”, “intermediate”, or “old” clients and servers, which will give the best configuration possible for each level.
  • Is TLS Fast Yet? is a great, simple, and to-the-point informational website which explains why TLS is so important and how to improve its performance so it has the smallest impact possible on your website’s speed.

Client-side security

These websites provide information and diagnostic tools to ensure that you are using a secure browser.

  • badssl.com gives a list of links to subdomains with various SSL configurations, including badly configured SSL, so you can have a good idea of what a well-configured website looks like versus one with errors in configuration, weak ciphers or key exchange protocols, or insecure HTTP forms.
  • IPv6 Test checks your network and browser for IPv6 support, showing you your ISP, reverse DNS pointers, both your IPv4 and IPv6 addresses, and giving an idea of when your computer or network may have problems with dual-stack IPv4 + IPv6 remote hosts or DNS.
  • How’s My SSL? and Qualys Labs SSL Client Test both check your browser for support of SSL/TLS versions, protocols, ciphers, and features, as well as susceptibility to common vulnerabilities.

General Tools

  • NeverSSL is a simple website that promises to never use SSL. Many public wifi networks require you to go through a payment or login page, which can be blocked when trying to access a well-secured website such as Google, Facebook, Twitter, or Amazon, which can cause trouble connecting to that website. NeverSSL provides an easy and simple way to access that login website.
  • crt.sh is a search engine for public TLS certificate information. It provides a history of certificates for a given domain name, with information including issuer and issue date, as well as an advanced search.
  • Digital Attack Map is an interactive map showing DDoS attacks across the world.
  • The Internet-Wide Scan Data Repository is a public archive of scans across the internet, intended for research and provided by the University of Michigan Censys Team.
  • take-a-screenshot.org is a simple website that shows how to take a screenshot on a variety of operating systems and desktop environments. It’s a fantastic tool to help less technically-minded people share their screens or issues they’re having.