News

Welcome to End Point’s blog

Ongoing observations by End Point people

Liquid Galaxy at ASTC 2017



End Point is pleased to be participating in ASTC 2017, alongside our partners BWC Visual Technology. ASTC, which stands for the Association of Science-Technology Centers, is holding their annual conference at The Tech Museum of Innovation, located in San Jose, CA. We were excited to hear that the conference takes place at The Tech Museum, as we have a Liquid Galaxy set up in the museum!

A 3-screen desktop Liquid Galaxy display will be set up by Liquid Galaxy Engineer Josh Ausborne at Booth 1103. This display will be showcasing content that includes Sketchfab and Unity 3D Models, Cesium content with interactive weather data, 360 panoramic video, Google Earth/Google Streetview content, and engaging presentations about National Parks and National Marine Sanctuaries.

We are very excited to be showcasing our technology with BWC Visual Technology. BWC is a distributor and licensed re-seller of state-of-the-art, interactive exhibit technology for museums and science centers. We have great respect for their team and technologies, and are excited to be showcasing Liquid Galaxy with them.

Liquid Galaxy is currently featured in many science and technology centers around the world. Please view the information featured below to learn more. This handout will be available to all attendees at ASTC, and can be picked up at booth 1103.

PKIX path validation failed - Debugging

I recently ran into a case working on an application with a PKIX path validation error on a site that had a valid certificate. I was able to solve the issue using OpenSSL to debug.

Typically, the PKIX path validation error arises due to SSL certificate expiry, but I ran into the same error even when the system was configured with a valid certificate. There are two web applications in our scenario, AppX and AppY. AppX uses AppY's authentication mechanism to allow the users to login with same user account. AppX sends a POST request using HttpClient with necessary arguments to SSL enabled AppY and allows the user to login based on the response.
HttpClient httpclient = new DefaultHttpClient();
// ...
HttpPost httppost = new HttpPost("https://app2domain.com/sessions");

try {
    resp = httpclient.execute(httppost);
}
catch (Exception e) {
    throw new Exception("Exception: ", e);
}

Error

The AppX was isolated to new server and it started throwing PKIX path validation failed error while sending requests to AppY.
Exception: javax.net.ssl.SSLHandshakeException: 
sun.security.validator.ValidatorException: PKIX path validation failed: 
java.security.cert.CertPathValidatorException: timestamp check failed
PKIX (Public-Key Infrastructure - X.509) is standard for key based encryption mechanism. The PKIX path related errors come up due to the failure establishing the connection with SSL applications.

Debug

It is good to identify the root cause of the problem since there are few possible reasons for the same error. Let's start debugging...

  • Check the Certificate status and expiration date in your browser
  • The browser reports that the certificate is valid and will expire at a future date for AppY's domain name. So now on to the detailed debugging using OpenSSL.
  • OpenSSL validation
  • The openssl tool is a handy utility to validate the SSL certificate for any domain. It reports error with return code 20 (unable to get local issuer certificate) when checking the status of certificate - which is in contrast with browser's report on the same certificate.
$ echo -n | openssl s_client -CApath /etc/ssl/certs/ -connect app2domain.com:443 
...
Start Time: 1482921042
Timeout   : 300 (sec)
Verify return code: 20 (unable to get local issuer certificate)


$ echo -n | openssl s_client -CApath /etc/ssl/certs/ -connect app2domain.com:443 </dev/null | openssl x509 -noout -dates

verify error:num=20:unable to get local issuer certificate
DONE
notBefore=May  4 00:00:00 2013 GMT
notAfter=May 14 23:59:59 2015 GMT

Root Cause

The openssl tool reported certificate details another unused and expired domain. Since this is configured on the same server, it is causing the error in our case. The same scenario happened to the AppX when sending request to AppX. It may have tried to establish connection through the expired certificate. So, the lesson here is that it is necessary to clean up the expired certificates when the connection is established through HttpClient utilities. Also, a specific domain name can be validated by passing the -servername option (for SNI) to the openssl, which in this case reports appYdomain.com has valid certificate.
$ echo -n | openssl s_client -CApath /etc/ssl/certs/ -connect app2domain.com:443 -servername app2domain.com
...
Start Time: 1482920942
Timeout   : 300 (sec)
Verify return code: 0 (ok)

$ echo -n | openssl s_client -CApath /etc/ssl/certs/ -connect app2domain.com:443 -servername app2domain.com </dev/null | openssl x509 -noout -dates
...
verify return:0
DONE
notBefore=Sep 26 11:52:51 2015 GMT
notAfter=Apr  1 12:35:52 2018 GMT

Conclusion

In most cases, the PKIX path validation error comes up when the SSL certificate is expired for the domain name, however, there may be different reasons such as certificate expiry, picking wrong certificate, etc. It is always helpful to debug with the openssl tool to identify the root cause. This specific issue was fixed by removing the unused expired certificate.

Disaster Recovery - Miami to Dallas in one hit

Hurricane Irma came a knocking but didn’t blow Miami away.

Hurricanes are never fun, and neither is knowing that your business servers may be in the direct path of a major outage due to a natural disaster.

Recently we helped a client prepare a disaster recovery environment, 4 days out pending Hurricane Irma approaching Miami — where it happens their entire server ecosystem sat. In recent months we had been discussing a long-term disaster recovery plan for this client’s infrastructure, but the final details hadn’t been worked out yet, and no work begun on it, when the news of the impending storm started to arrive.

Although the Miami datacenter they are using is highly regarded and well-rated, the client had the foresight to think about an emergency project to replicate their entire server ecosystem out of Miami to somewhere a little safer.

Fire suits on, battle helmets ready, GO! Two team members jumped in and did an initial review of the ecosystem: six Linux servers, one Microsoft SQL Server database with about 50 GB of data, and several little minions. All of this hosted on a KVM virtualization platform. OK, easy, right? Export the VM disks and stand up in a new datacentre on a similar KVM-based host.

But no downtime could be scheduled at that time, and cloning the database would take too much time to make a replica. If we can’t clone the VM disks without downtime we need another plan. Enter our save-the-day idea: use the database clone from the development machine.

So first things first: Build the servers in the new environment using the cloned copy, rsync the entire disk across, hoping that a restart would return the machine into a working state, take the development database and use the Microsoft tools at our disposal to replicate one current snapshot, then replicate row level details, in preparation of reducing the amount of lost data, should the datacenter lose connection.

Over the course of the next 16 hours, with 2 DevOps engineers, we replicated 6 servers, migrated 100+ GB of data twice and had a complete environment ready for a manual cutover.

Luckily the hurricane didn’t cause any damage to the datacenter, which never lost power or connectivity, so the sites never went down.

Even though it wasn’t used in this case, this emergency work did have a silver lining: Until this client’s longer-term disaster recovery environment is built, these system replicas can serve as cold standby, which is much better than having to rebuild from backups if their primary infrastructure goes out.

That basic phase is out of the way, so the client can breathe easier. Along the way, we produced some updated documentation, and gained deeper knowledge of the client’s software ecosystem. That’s a win!

The key takeaways here: Don’t panic, and do a little planning prior. Leaving things to the last minute rarely works well.
And preferable to all of the above: Plan and build your long-term disaster recovery environment now, well before any possible disaster is on the horizon!

To all the people affected by Hurricane Irma we wish a speedy rebuild.

AngularJS is not Angular

Why I'm doing this?

I often see that people use AngularJS and Angular names interchangeably. It's an obvious mistake for someone familiar with the front-end programming.

If you don't believe me (why would you?) just go to the Angular team repository or Wikipedia (Angular and AngularJS).

AngularJS (1.X)

AngularJS is a popular open-source JavaScript framework created by Miško Hevery released in 2010 that was subsequently taken over by Google. It's not the same thing as Angular. The latest stable version of the framework is 1.6.6.

Angular (2+)

Angular is an open-source TypeScript-based framework created by Google from scratch. It's been totally rewritten and not compatible with AngularJS. The latest stable version of the framework is 4.4.4. They started from version 2 to avoid even more confusion.

Summary

I feel I have made the world better. Go now and correct everyone (it's such fun)!

Using PostgreSQL cidr and inet types, operators, and functions

A common problem in database design is how to properly store IP addresses: They are essentially positive 32 or 128 bit integers, but they’re more commonly written as blocks of 8 or 16 bit integers written in decimal and separated by periods (in IPv4) or written in hexadecimal and separated by colons (in IPv6). That may lead many people to store IP addresses as strings, which comes with a host of downsides.

For example, it’s difficult to do subnet comparisons and validation requires complex logic or regular expressions, especially when considering the variety of ways IPv6 addresses can be stored with upper- or lower-case hexadecimal letters, :0000: or :0: for a group of zeros, or :: as several groups of zeros (but only once in an address).

Storing them as integers or a fixed number of bytes forces parsing and a uniform representation, but is otherwise no better.

To solve these problems in PostgreSQL, try using the inet and cidr types, which are designed for storing host and network addresses, respectively, in either IPv4 or IPv6 or both.

In many cases, these types could end up simply being used as glorified integers: they display nicely as IP addresses should be, and support addition, subtraction, and bitwise operations, and basic numeric comparisons:

phin=> select inet '192.168.0.1' + 256;
  ?column?
-------------
 192.168.1.1
(1 row)

phin=> select inet '::2' - 1;
 ?column?
----------
 ::1
(1 row)

phin=> select inet '192.168.0.1' - inet '192.168.0.0';
 ?column?
----------
        1
(1 row)

phin=> select inet 'Fff0:0:007a::' > inet '192.168.0.0';
 ?column?
----------
 t
(1 row)

(It’s worth noting that there is no addition operation for two inet or cidr values.)

But if you take a look at the fantastic official documentation, you’ll see that these types support also support helpful containment operators and utility functions which can be used when working with historical IP addresses of ecommerce customers stored in a database.

These include operators to check if one value contains or is contained by another (>> and <<), or the same with equality (>>= and <<=), or containment going either way (&&). Take this very real™ table of ecommerce orders, where order #3 from IP address 23.239.26.78 has committed or attempted some sort of ecommerce fraud:

phin=> select * from orders order by id;
 id |        ip        | fraud
----+------------------+-------
  1 | 23.239.26.161/24 | f
  2 | 23.239.232.78/24 | f
  3 | 23.239.26.78/24  | t
  4 | 23.239.23.78/24  | f
  5 | 23.239.26.78/24  | f
  6 | 23.239.232.78/24 | f
(6 rows)

By using the network(inet) function, we can identify other orders from the same network:

phin=> select * from orders where network(ip) = (
    select network(ip) from orders where id=3
);
 id |        ip        | fraud
----+------------------+-------
  1 | 23.239.26.161/24 | f
  5 | 23.239.26.78/24  | f
  3 | 23.239.26.78/24  | t
(3 rows)

Or, if we’ve identified 23.239.20.0/20 as a problematic network, we can use the <<= "is contained by or equals" operator:

phin=> select * from orders where ip <<= inet '23.239.20.0/20'
 id |        ip        | fraud
----+------------------+-------
  1 | 23.239.26.161/24 | f
  4 | 23.239.23.78/24  | f
  5 | 23.239.26.78/24  | f
  3 | 23.239.26.78/24  | t
(4 rows)

This is a bit of a juvenile example, but given a larger and more real database, these tools can be used for a variety of security and analytics purposes. Identifying common networks for fraud was already given as an example. They can also be used to find common network origins of attacks; tie historic data to known problematic networks, addresses, and proxies; help correlate spam submissions on forums, blogs, or in email; and create and analyze complex network or security auditing records.

At any rate, they’re a useful tool and should be taken advantage of whenever the opportunity presents itself.

Working on production systems


(Not as easy as it looks)
Photo by Flickr user 'edenpictures'

As consultants, the engineers at End Point are often called upon to do work on production systems - in other words, one or more servers that are vital to our client's business. These range from doing years of planning to perform a major upgrade for a long-standing client, down to jumping into a brand-new client for an emergency fix. Either way, the work can be challenging, rewarding, and a little bit nerve-wracking.

Regardless of how you end up there, following some basic good practices may reduce the chance of problems coming up, keep you calm during the chaos, and make the process easier for both you and the client.

Preparation

Unless this is a true emergency, doing major work on a production system (e.g. upgrading a database server) should involve a good amount or preparation. By the time the big night arrives (yes, it is always late at night!) you should have gone through this list:

  • Do lots and lots of testing. Use systems as close as possible to production, and run through the process until it becomes second nature.
  • Know the impact on the business, and the downtime window anticipated by the rest of the company.
  • Have a coworker familiar with everything on standby, ready to call in if needed.
  • Document the process. For a large project, a spreadsheet or similar document can be quite helpful.
    • Who are the people involved, how to contact them, and what general role do they play?
    • Which task is being done at what time, who is the primary and backup for it, and what other tasks does it block?
    • How is success for each stage measured?
    • What is the rollback plan for each step? When is the point of no return reached?
  • Setup a shared meeting space (IRC, Skype, Slack, Zulip, HipChat, etc.)
  • Confirm connections (e.g. VPN up and running? Passwords not set to expire soon? Can everyone get to Slack? SSH working well?)

Screen Up

The night usually begins with connecting to a client server via SSH. The first order of business should be to invoke 'screen' or 'tmux', which are persistent session managers (aka terminal multiplexers). These keep your connections; if your network drops, so you can easily pick up where you left off. They also allow other people to view and/or join in on what you are doing. Finally, they enable you to easily view and control multiple windows. Some screen-specific tips:

  • Name your screen something obvious to the task such as "bucardo-production-rollout". Always give it a name to prevent people from joining it by accident. I often use just my email, i.e. screen -S greg@endpoint.com or tmux new -s greg_endpoint_com.
  • Keep organized. Try to keep each window to one general task, and give each one a descriptive name:
    • screen: Ctrl-a A     tmux: Ctrl-b ,
    • I find that uppercase names stand out nicely from your terminal traffic.
    • Splitting windows by task also helps scrollback searching.
  • Boost your scrollback buffer so you can see what happened a while ago. The default value is usually much too low.
    • Inside /etc/screenrc or ~/.screenrc: defscrollback 50000
    • Inside /etc/tmux.conf or ~/.tmux.conf: set-option -g history-limit 50000
  • Develop a good configuration file. Nothing too fancy is needed, but being able to see all the window names at once on the bottom of the screen makes things much easier.
  • Consider logging all your screen output.

Discovery and Setup

If you don't already know, spend a little bit of time getting to know the server. A deep analysis is not needed, but you should have a rough idea how powerful it is (CPU, memory), how big it is (disk space), what OS it is (distro/version), and what else is running on it. Although one should be able to easily work on any unmodified *nix server, I almost always make a few changes:

  • Install minimal support software. For me, that usually means apt-get install git-core emacs-nox screen mlocate.
  • Put in some basic aliases to help out. Most important being rm='rm -i'.
  • Switch to a lower-priv account and do the work there when possible. Use sudo instead of staying as root.

For Postgres, you also want to get some quick database information as well. There are many things you could learn, but at a bare minimum check out the version, and per-user settings, the databases and their sizes, and what all the non-default configuration settings are:

select version();
\drds
\l+
select name, setting, source from pg_settings where source <> 'default' order by 1;

Version Control (git up)

One of the earliest steps is to ensure all of your work is done in a named subdirectory in a non-privileged account. Everything in this directory should be put into version control. This not only timestamps file changes for you, but allows quick recovery from accidentally removing important files. All your small scripts, your configuration files (except .pgpass), your SQL files - put them all in version control. And by version control, I of course mean git, which has won the version control wars (a happy example of the most popular tool also being the best one). Every time you make a change, git commit it.

psql

As a Postgres expert, 99% of my work is done through psql, the canonical command-line client for Postgres. I am often connecting to various database servers in quick succession. Although psql is an amazing tool, there are important considerations to keep in mind.

The .psql_history file, along with readline support, is a wonderful thing. However, it is also a great footgun, owing to the ease of using the "up arrow" and "Ctrl-r" rerun SQL statements. This can lead to running a command on server B that was previously run on server A (and which should never, ever be run on server B!). Here are some ways around this issue:

One could simply remove the use of readline when on a production database. In this way, you will be forced to type everything out. Although this has the advantage of not accidentally toggling back to an older command, the loss of history is very annoying - and it greatly increases the chance of typos, as each command needs to be typed anew.

Another good practice is to empty out the psql_history file if you know it has some potentially damaging commands in it (e.g. you just did a lot of deletions on the development server, and are now headed to production.). Do not simply erase it, however, as the .psql_history provides a good audit trail of exactly what commands you ran. Save the file, then empty it out:

$ alias clean_psql_history='cat ~/.psql_history >> ~/.psql_history.log; truncate -s0 ~/.psql_history'

Although the .psql_history file can be set per-database, I find it too confusing and annoying in practice to use. I like being able to run the same command on different databases via the arrow keys and Ctrl-r.

It is important to exit your psql sessions as soon as possible - never leave it hanging around while you go off and do something else. There are multiple reasons for this:

  • Exiting psql forces a write of the .psql_history file. Better to have a clean output rather than allowing all the sessions to interleave with each other. Also, a killed psql session will not dump its history!
  • Exiting frees up a database connection, and prevents you from unintentionally leaving something idle in transaction.
  • Coming back to an old psql session risks a loss of mental context - what was I doing here again? Why did I leave this around?
  • Less chance of a accidental paste into a window with a psql prompt (/voice_of_experience).

Another helpful thing for psql is a good custom prompt. There should always be some way of telling which database - and server - you are using just by looking at the psql prompt. Database names are often the same (especially with primaries and their replicas), so you need to add a little bit more. Here's a decent recipe, but you can consult the documentation to design your own.

$ echo "\set PROMPT1 '%/@%m%R%#%x '" >> ~/.psqlrc
$ psql service=PROD
product_elements@topeka=> 

Connection Service File

Using a connection service file to access Postgres can help keep things sane and organized. Connection files allow you to associate a simple name with multiple connection parameters, allowing abstraction of those details in a manner much safer and cleaner than using shell aliases. Here are some suggestions on using connection files:

  • If possible, use the local user's file (~/.pg_service.conf), but the global file is fine too. Whichever you choose, do not put the password inside them - use the .pgpass file for that.
  • Use short but descriptive service names. Keep them very distinct from each other, to reduce the chance of typing in the wrong name.
  • Use uppercase names for important connections (e.g. PROD for the connection to an important production database). This provides another little reminder to your brain that this is not a normal psql connection. (Remember, PROD = Please Respect Our Data)
  • Resist the temptation to alias them further. Force yourself to type everything out each time, e.g. "psql service=PROD" each time. This keeps your head focused on where you are going.

Dangerous SQL

Some actions are so dangerous, it is a good idea to remove any chance of direct invocation on the wrong server. The best example of this is the SQL 'truncate' command. If I find myself working on multiple servers in which I need to truncate a table, I do NOT attempt to invoke the truncate command directly. Despite all precautions, there are many ways to accidentally run the same truncate command on the wrong serve, whether via a .psql_history lookup, or simply an errant cut-n-paste error. One solution is to put the truncate into a text file, and then invoke that text file, but that simply adds the chance that this file may be invoked on the wrong database. Another solution is to use a text file, but change it when done (e.g. search and replace "truncate" to "notruncate"). This is slightly better, but still error prone as it relies on someone remembering to change the file after each use, and causes the file to no longer represent what was actually run.

For a better solution, create a function that only exists on one of the databases. For example, if you have a test database and a production database, you can run truncate via a function that only exists on the test database. Thus, you may safely run your function calls on the test server, and not worry if the functions accidentally get run on the production server. If they do, you end up with a "function not found error" rather than realizing you just truncated a production table. Of course, you should add some safeguards so that the function itself is never created on the production server. Here is one sample recipe:

\set ON_ERROR_STOP on

DROP FUNCTION IF EXISTS safetruncate(text);

CREATE FUNCTION safetruncate(tname text)
RETURNS TEXT
LANGUAGE plpgsql
AS $ic$
  BEGIN
    -- The cluster_name setting can be quite handy!
    PERFORM 1 FROM pg_settings WHERE name = 'cluster_name' AND setting = 'testserver';
    IF NOT FOUND THEN
      RAISE EXCEPTION 'Cannot create this function on this server!';
    END IF;

    EXECUTE FORMAT('truncate table %s', tname);

    RETURN 'Truncated table ' || tname;
  END;
$ic$;

Now you can create text files full of indirect truncate calls - filled with SELECT safetruncate('exultations'); instead of literal truncate calls. This file may be safely checked into git, and copy and pasted without worry.

Record keeping

The value of keeping good records of what you are doing on production systems cannot be overstated. While you should utimately use whatever system is best for you, I like to keep a local text file that spells out exactly what I did, when I did it, and what happened. These should be detailed enough that you can return to them a year later and explain to the client exactly what happened.

The primary way to document things as you go along is with good old fashioned cut and paste. Both the .bash_history and .psql_history files provide automatic tracking of entered commands, as a nice backup to your notes. Make liberal use of the tee(1) command to store command output into discrete files (which reduces the need to rely on scrollback).

Rather than entering commands directly into a psql prompt, consider putting them into a text file and then feeding that file to psql. It's a little extra work, but the file can be checked into git, giving an excellent audit trail. Plus, you may have to run those commands again someday.

If you find yourself doing the same thing over and over, write a shell script. This even applies to psql commands. For example, I recently found myself having to examine tables on two servers, and needed to quickly examine a table's size, indexes, and if anyone was currently modifying it. Thus a shall script:

psql service=PROD -c "\\d $1"
psql service=PROD -Atc "select pg_relation_size('$1')"
psql service=PROD -Atc "select query from pg_stat_activity where query ~ '$1'"

Wrap Up

When you are done with all your production tasks, write up some final thoughts on how it went, and what the next steps are. Read through all your notes while it is all fresh in your mind and clean them up as needed. Revel in your success, and get some sleep!

Liquid Galaxy Goes to Chile for IMPAC4



Marine Protected Areas: Bringing the people and ocean together


Earlier this month, The Marine Conservation Institute and the Waitt Foundation brought a Liquid Galaxy to Chile to showcase interactive and panoramic ocean content at the Fourth International Marine Protected Areas Congress in La Serena.  This conference was a convergence of marine biologists, ocean agencies, and environmentalists from around the globe to discuss the state of and future development initiatives of Marine Protected Areas worldwide.

 The Marine Conservation Institute is working on a mapping project called MPAtlas that visually catalogs the development of Marine Protected Areas across the globe as well as the Global Ocean Refuge System which pushes for elevated standards for Marine Protected Areas and advocates for a 30% protected marine ecosystem by 2030.

mpatlas.org

We built new content to showcase the GLORES areas in Google Earth as well as data visualizations of the global system of Marine Protected Areas. In addition, we collected any past oceanographic related content we’ve developed for the Liquid Galaxy platform. This included underwater panoramic media from the Catlin Seaview Survey, Mission Blue Hope Spots, Google Street View collections of underwater photos such as this project which catalogs the Great Barrier Reef.



Minister of the Environment Marcelo Mena Carrasco gives the
Liquid Galaxy a go

Aside from showcasing this curated content, the Liquid Galaxy served as a comprehensive interactive tool for scientists and environmentalists to showcase their research areas during the break periods between lectures and events occurring at the Congress. Since the Liquid Galaxy utilizes the entire globe, conference attendees were able to free fly to their respective research and/or proposed protected areas as an impromptu presentation aid and further explore underwater terrain and panoramic media in their location.

Liquid Galaxy at IMPAC4 - La Serena, Chile 2017

The Liquid Galaxy platform is featured in museums and aquariums around the world, and we are thrilled that it is being used as a tool to study and conserve oceans and nature. We recently had the opportunity to participate in The Ocean Conference at the United Nations, and are excited the system continues to be utilized to study and prevent future detrimental changes to our planet. We hope to have more opportunities to create content geared toward nature conservation, as well as opportunities to share the Liquid Galaxy with environmentalists so that the system will continue be used as a tool for visualizing research data in new and interesting ways.