News

Welcome to End Point’s blog

Ongoing observations by End Point people

jQuery and Long-Running Web App Processes: A Case Study

I was recently approached by a client's system administrator with a small but interesting training/development project. The sys-admin, named Rod, had built a simple intranet web application that used a few PHP pages, running on an Amazon EC2 instance, to accept some user input and kick off multiple long-running server side QA deployment processes. He wanted to use Ajax to start the process as well as incrementally display its output, line-by-line, on the same web page. However, waiting for the entire process to finish to display its output was a poor user experience, and he wasn't able to get an Ajax call to return any output incrementally over the lifetime of the request.

Rod asked me to help him get his web app working and train him on what I did to get it working. He admitted that this project was a good excuse for him to learn a bit of jQuery (a good example of keeping your tools sharp) even if it wasn't necessarily the best solution in this case. I have always enjoy training others, so we fired up Skype, got into an IRC channel, and dove right in.

First, I started with the javascript development basics:
  1. Install the Firebug add-on for Firefox
  2. Use Firebug's Console tab to watch for javascript errors and warnings
  3. Use Firebug's Net tab to monitor what ajax calls your app is making, and what responses they are getting from the server
  4. Replace all those hateful debug alert() calls with console.log() calls, especially for ajax work

A special note about console.log(): Some browsers (including any Firefox that doesn't have Firebug installed) do not natively supply a console object. To work around this, we defined the following console.log stub at the very top of our single javascript file:

if (typeof console === 'undefined') {
    console = { log: function() {} };
}

Rod's javascript was a mix of old-school javascript that he had remembered from years ago, and new-school jQuery he had recently pulled from various tutorials. His basic design was this: When the user clicked the "Deploy" button, it should kick off two separate Ajax requests; "Request A" would initiate a POST request to deploy.php. Deploy.php made a number of system calls to call slow-running external scripts, and logged their output to a temporary logfile on the server. "Request B" would make a GET request to getoutput.php every 2 seconds (which simply displayed the output of said logfile) and display its output in a scrollable div element on the page.

Hearing Rod describe it to me, I wondered if he might be headed down the wrong path with his design. But, he already had put time into getting the server-side code working and did not want to change direction at this point. Discussing it with him further, it became clear that he did not want to re-write the server-side code and that we could in fact make his current design produce working code with teachable concepts along the way.

To start, Rod told me that his "ajax POST request (Request A) wasn't firing." As the Russian proverb says, "Trust, but verify." So, we opened Firebug's Net tab, clicked the web app's Deploy button (actually its only button - Steve Jobs look out) and saw that the ajax request was in fact firing. However, it was not getting back a successful HTTP 200 status code and as such, was not getting handled by jQuery as Rod expected. Expanding the ajax request in the Net tab let us see exactly what name/value data that was getting POSTed. We spotted a typo in one of his form input names and fixed it. Now Request A was clearly firing, POSTing the correct data to the correct URL, and getting recognized as successful by jQuery. (More on this in a bit.)

Rod's code was making Request A from within a jQuery event handler defined for his form's Deploy button. But, he was making Request B via an HTML onClick attribute within that same HTML tag. He was getting all sorts of strange results with that setup based on which request was returning first, if Request B's function call was correctly returning false to prevent the entire form from getting POSTed to itself, etc. Consolidating logic and control into event handlers that are defined in one place is preferable to peppering a web page with HTML onClick, onChange, etc. javascript calls. So, we refactored his original jQuery event handler and onClick javascript call into the following code snippet:

//global variable for display_output() interval ID
var poll_loop;

$(".deploy_button").click(function() {
    $.ajax({
        beforeSend: function() {
            $('#statusbox').html("Running deployment...");
        },
        type: "POST",
        url: "deploy.php",
        data: build_payload(),
        success: function() {
            console.log('Qa-run OK');
            //previously called via an onClick
            poll_loop = setInterval(function() {
                display_output("#statusbox", 'getoutput.php');
            }, 2000);
        },
        error: function() {
            console.log('Qa-run failed.');
        }
    });
});

That $.ajax(...) call is our jQuery code that initiates the Request A ajax call and defines anonymous functions to call based on the HTTP status code of Request A. If Request A returns an HTTP 200 status code from the server, the anonymous function defined for the 'success:' key will be executed. If any other HTTP code is returned, the anonymous function defined for the (optional) 'error:' key is executed. We refactored the onClick's call to display_output() into the 'success:' function above. Now, it only gets called if Request A is successful, which is the only time we'd want it to execute.

The body of the 'success:' anonymous function calls setInterval() to create an asynchronous (in that it does not block other javascript execution) javascript loop that calls display_output() every 2 seconds. The setInterval() function returns an "interval ID" that is essentially a reference to that interval. We save that interval ID to the 'poll_loop' variable that we intentionally make global (by declaring it with 'var' outside any containing block) so we can cancel the interval later.

Here is the display_output() function that makes Request B and gets called every 2 seconds:

function display_output(elementSelector, sourceUrl) {
    $(elementSelector).load(sourceUrl);
    var html = $(elementSelector).html();
    if (html.search("EODEPLOY") > 0) {
        window.clearInterval(poll_loop);
        alert('Deployment Finished.');
    }
    if (html.search("DEPLOY_ERROR") > 0) {
        window.clearInterval(poll_loop);
        alert('Deployment FAILED.');
    }
}

That .load() method is jQuery shorthand for making an ajax GET request and assigning the returned HTML/text into the element object on which it's called. Because the display_output() function is responsible for terminating the interval that calls it, we need to define our end cases. If either "EODEPLOY" (for a successful deployment) or "DEPLOY_ERROR" (for a partially failed deployment) appear as strings within the resulting HTML, we call clearInterval() to stop the infinite loop, and alert the user accordingly. If neither of our end cases are encountered, display_output() will be executed again in 2 seconds.

As it stands, the poll_loop interval will run indefinitely if the server-side code somehow fails to ever return the two strings we're looking for. I left that end case as an exercise up to Rod, but suggested he add a global variable that could be used to measure the number of display_output() calls or the elapsed time since the Deploy button was clicked, and end the loop once an upper limit was hit.

Other suggested features that Rod and I discussed but I've omitted from this article include:
  1. Client-side input validation using javascript regular expressions
  2. Matching server-side input validation because sometimes the call is coming from inside the house
  3. Adding a unique identifier that is passed as part of both Request A and Request B to better identify requests and to prevent temp file naming conflicts from multiple concurrent users.
  4. Packaging display_output()'s "Deployment FAILED" output and providing a button to easily send the output to Rod's team

I'm sure there are a ton of other possible solutions for a project like this. For example, I know that Jon and Sonny developed a more advanced polling solution for another client, www.locateexpress.com, using YUI's AsyncQueue. Without getting to deeply into the server-side design, I'm curious to hear how other people might approach this problem. What do you think?

5 comments:

Anonymous said...

This is a great article, and I am working on something similar. However, I would like to start "Process A", which is a long (hours) running PHP script, and keep it running from the javascript call. This way, when the user leaves the page, the script is killed. However, I cannot get my head around how this would be accomplished (a never ending AJAX call, combined with an infinite Javascript div updating loop!) Do you have any ideas on this?

Brian Gadoury said...

Dear Anonymous,

I hope this letter finds you in good spirits. It's been almost a year to the day that you last wrote me here, but I'm afraid I've only just found your post now!

I can't help but wonder (as I did in the original blog post) if you might be headed down the wrong path with your design. Naturally, I don't know the details, but to be blunt, PHP is not really designed for this type of job, so trying to make it jump through these hoops might be an exercise in frustration.

If changing the long-running PHP script is not an option, I would probably think about the following:

Any pending/processing ajax requests (Process A in this case) should be killed by the browser when you leave the page. I believe this will be detected quickly by the web server, which will kill the PHP process. So, that part is easy. (As an interesting aside, you can save the return value of an $.ajax(...) call and later call abort() on it to kill it, if you wanted to do this in-page, but I digress.)

You may run into limits either on the server or the client that will limit how long Process A could run for. The server might decide the PHP script has been running too long, or is using too much memory, etc. This is especially likely if you are on a shared hosting setup. Adding set_time_limit(0); to your PHP script may help, as well.

If I still can't dissuade you, then perhaps you're really only missing one thing. What I didn't show in this article is how deploy.php uses PHP's shell_exec() to kick off the long-running process. More importantly, it passes an ampersand at the end of the command line string passed to shell_exec(), which causes it to background the long-running process.

This means deploy.php very quickly returns a successful response to the client while the long-running process continues running on the server. If you do this, then you'll need a way to kill that background task when the user leaves the page. Define an event handler with jQuery's unload() method, and have it make a quick call to a different PHP script that will kill the long-running process. Naturally, this is messy, potentially insecure, and less than ideal.

Other than that little missing piece of the puzzle, I don't see how what you've described is any different than what I described in this article.

Good luck!

Brian Gadoury said...

Here's part of a comment I recently received via email, along with part of my response:

> You said you wondered if Rod might be headed down the wrong path with his design.
>
> I'm currently working on a similar
> design. I've done most of the code and it works, but it is quite complex
> and maybe not the best.
>
> Could you please tell me in brief your concern with Rod's design? Do you
> know of a better way to launch,
> manage and monitor long running server processes from a browser, how would
> you do it?
>

You said your solution is quite complex. Tell me more about those pain points. Is it complex because it has to be, or because you haven't come up with a simpler implementation yet? (This will also give me some technical
details about your environment.)

Overall, my concern about Rod's ideas was that he was trying to force these two ideas (long running server-centric processes/jobs, and HTTP) to work together when they aren't really designed to. So, my first question to him was, "Are you trying to solve the right problem?"

What if the "right problem" was really that his deploy jobs were too monolithic and slow and he was trying to force a solution rather than come up with a better idea? Sometimes, we get stuck on a weird workflow idea that doesn't work very well and we have a hard time stepping back and gaining more perspective on the problem at hand. Conversely, sometimes the problem space really is just tough and ugly, so our solution ends up looking a little ugly too.

sswam said...

Well, I decided to fork the long-running jobs from the web-server, rather than creating another 'process server' deamon and sending messages to it. That might have been simpler.

Of course it would be much simpler to abandon the whole task and just run the jobs from the command-line in 'screen'. I would do that, but it's not my decision.

There are several 'pain points' with forking from a web server such as Apache / mod_perl:
- have to close all file descriptors
- have to start a new process session
- have to consider whether it will work with a multi-threaded rather than pre-forked server
- there was plenty of relevant reading and thinking to do

I decided to do this 'safe fork' stuff myself, so that the solution would not depend on using Apache. We use a different server during development.

There are more complexities tracking how the tasks are progressing, checking whether they are still running, etc.

It's probably about as complex as it needs to be, or a little more, but if there is a vastly better way I'd like to know it.

I could attempt to speed up the jobs, but they necessarily will take quite a while. Much longer than I can tie up an Apache worker.

Thanks for listening!

Anonymous said...

Nice write up, gave me an idea of how to implement. I am a newbie to jquery, this was really helpful.