End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Simple bash shell script for running batch MySQL jobs

The other day I needed to run a simple mysql job to backup and delete some database records on a live server. Being a live server, it is important to make sure you aren't asking the database to take on jobs that could potentially lock it up. Better to run a batch job. Running a batch is simple. You can call it right from the mysql console with:

source [path_to]/[the_batch_script].sql

But what if there are millions of records that need deleting? Bash shell script to the rescue.

Here is the idea of the SQL job that needed to get run a few times:

START TRANSACTION;

/* Find what you want to delete and put a LIMIT on your batch size */
CREATE TEMPORARY TABLE records_to_delete_temp SELECT id from `records` where ..... limit 1000;

/* Creating backup table to archive spam orders */
CREATE TABLE IF NOT EXISTS `records_backup` LIKE `records`;
INSERT INTO `records_backup` SELECT * from `records` where id in (select id from `records_to_delete_temp`);

/* Delete Dependents - If your records have foreign key dependencies, delete them first */
DELETE FROM `dependent_1` where record_id in (select id from `records_to_delete_temp`);
DELETE FROM `dependent_2` where record_id in (select id from `records_to_delete_temp`);

/* Delete the records */
DELETE FROM `records` where id in (select id from `records_to_delete_temp`);

/* Return count of remaining records - the where clause should be the same as the original select for the records to delete */
SELECT COUNT(*) from `records` where .... ;

COMMIT ;

Note:

SELECT COUNT(*)

This will return the remaining record count to the shell script.

And the shell script...

ret=$(mysql -u [user] -p[password] --skip-column-names [database_name]  < [the_batch_script].sql)

while [ $ret -gt 0 ]
do
  echo "rows left: $ret"
  sleep 3
  ret=$(mysql -u [user] -p[password] --skip-column-names [database_name]  < [the_batch_script].sql)
done

Notes:

--skip-column-names

This is the little nugget that gives you clean output from the mysql script. Without --skip-column-names you will have to get more creative with parsing the returned table.

-p[password]

Just a friendly reminder, no space after the -p option.

Should you really be running batches to remove millions of records from a live server with a shell script?

Just because you can doesn't mean you should. Before pulling the trigger, back up, consider what could go wrong, have a plan in place to address the possible failure.

Find something fun for your hands to do while bash takes care of running the jobs. At the least, cross your fingers!

9 comments:

Tim Finch said...

Great idea.

Just a thought... If you are embedding password in the script with -p then you'd want to ensure you lock down access to the shell script so that the password doesn't leak, and ensure the are no vulnerabilities with the password ending up in bash history files or similar etc.

Justin Swanhart said...

pt-archiver

Brian Gadoury said...

The ret=... line is duplicated twice, which violates D.R.Y., no? :)

Jon Jensen said...

Thanks, Barrett. If any locks will be held it can be good to minimize the time of a transaction as you're doing.

With the mysql client, putting the password in a command line with -p has another risk: It can be seen in ps/top output by other users on the system while it's running.

The safest way is to put the password in a [client] section of ~/.my.cnf which is readable only by the user, not group or other.

Barrett Griffith said...

-@Tim, thank you for identifying a security concern, and @Jon, +1 for giving a solid solution on locking down MySQL passwords

-@Brian, good catch. This is my first bash script. I was also curious about how to store the command in a variable and then evaluate it. I did a little research. It looks like I could do something like RET="the command" and then: eval $RET. Haven't tested, but will remember this next time I write a bash script.

-I wanted to mention this well written article that I just stumbled on by Baron Schwartz, a.k.a. Xaprb Baron discusses strategies around deleting and archiving data one row at a time (ie LIMIT 1), which has benefits for avoiding locks, but also, "tons of tiny transactions are actually a bottleneck." Should be noted that Baron's work on mk-archiver was the basis for the Pacona Toolkit pt-archiver, source here.

Barrett Griffith said...

And of course by "Pacona" I mean "Percona"

Sheeri K. Cabral said...

I've also successfully done this using a stored procedure - you give it the starting and ending points and an optional limit, and have the procedure do the work.

The safest way to handle passwords is not to handle them at all. Run it within the database. And then it's stored, so if it's a monthly purge, you just run the stored procedure again (or if you're brave, make a MySQL event).

I don't know what millenium Jon is from, but since MySQL 4.0 or so, the mysql commandline client doesn't show the password in ps, it just shows xxxx.

Brian Gadoury said...

Jon is actually from the future.

Additionally, all passwords in the future are literally just xxxx, so the joke's on you.

;)

Jon Jensen said...

Sheeri, I stand corrected that the password is no longer left in the process's command line value, so casual snoopers won't be able to see other users' passwords that way. Thanks for pointing that out.

However, there's still a race condition between the invocation of mysql and when it scrubs the password. During that time a program watching out can sniff out the password before it's scrubbed.

That leads me to conclude that ~/.my.cnf is still a better way to handle authentication with the mysql client.

This blog post gives more detail on the race condition: http://www.lenzg.net/archives/256-Basic-MySQL-Security-Providing-passwords-on-the-command-line.html