End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

File test comparison table for shell, Perl, Ruby, and Python

A few days ago, my co-worker Richard asked how in Python you would do the -x Bourne shell and Perl file test that checks whether a file is executable. This is (for me, at least) a really commonly used function but one I hadn't needed to do yet in Python, so I looked it up.

That wasn't so hard to find, but then I wondered about the other shell and Perl file tests that I use all the time. Finding equivalents for those was harder than I expected. A web search didn't turn much up aside from language holy wars and limited answers, but I didn't find any exhaustive list.

So I made my own. Below is a table comparing file test operators in the original Bourne shell-compatibles bash, ksh, and zsh; Perl's expanded set; Ruby's which was derived first from Perl; and equivalent Python code.

There are still some blanks where I didn't find a good equivalent. Of course I'm sure it's possible with enough custom logic to achieve the same end, but I have tried to stick with relatively simple formulations using built-in functions for now. I'll be happy to fill in the blanks if any readers make suggestions.

Performance notes on avoiding multiple stats of the same file:

  • Starting with Perl 5.9.1, file tests can be "stacked" and will use a single stat for all tests, e.g. -f -x file. In older versions of Perl you can do -f file && -x _ instead.
  • Ruby's File::Stat class can be used to cache a stat for multiple tests.
  • Python's os.stat(file).st_mode can be stored and used for multiple tests.

Unless otherwise specified, these tests follow symbolic links and operate on the target of the link, rather than the link itself.

All tests return boolean true or false unless otherwise noted.

Test bash/ksh/zsh Perl Ruby Python
File is readable by effective uid/gid -r 'file' test ?r, 'file'
File.readable?('file')
File is writable by effective uid/gid -w 'file' test ?w, 'file'
File.writable?('file')
File is executable by effective uid/gid -x 'file' test ?x, 'file'
File.executable?('file')
File is owned by effective uid -O file -o 'file' test ?o, 'file' os.stat('file').st_uid == os.geteuid()
File.owned?('file')
File is owned by the effective gid -G file (stat('file'))[5] == $) test ?G, 'file' os.stat('file').st_gid == os.getegid()
File.grpowned?('file')
File is readable by real uid/gid -r file -R 'file' test ?R, 'file' os.access('file', os.R_OK)
File.readable_real?('file')
File is writable by real uid/gid -w file -W 'file' test ?W, 'file' os.access('file', os.W_OK)
File.writable_real?('file')
File is executable by real uid/gid -x file -X 'file' test ?X, 'file' os.access('file', os.X_OK)
File.executable_real?('file')
File is owned by real uid -O 'file' test ?O, 'file' os.stat('file').st_uid == os.getuid()
File exists -e file -e 'file' test ?e, 'file' os.path.exists('file')
-a file File.exist?('file')
File has zero size (is empty) -f file -a ! -s file -z 'file' test ?z, 'file' os.path.getsize('file') == 0
File.zero?('file') os.stat('file').st_size == 0
File exists and has size greater than zero -s file -s 'file' (boolean and returns size in bytes) test ?s, 'file' (boolean: returns nil if doesn't exist or has zero size, size of the file otherwise) os.path.getsize('file') > 0
File.size?('file') (same)
os.stat('file').st_size > 0
File exists, return size in bytes -s 'file' File.size('file') os.path.getsize('file')
os.stat('file').st_size
File is a plain file -f file -f 'file' test ?f, 'file' os.path.isfile('file')
File.file?('file') stat.S_ISREG(os.stat('file').st_mode)
File is a directory -d file -d 'file' test ?d, 'file' os.path.isdir('file')
File.directory?('file') stat.S_ISDIR(os.stat('file').st_mode)
File is a symbolic link -h file -l 'file' test ?l, 'file' os.path.islink('file')
-L file File.symlink?('file') stat.S_ISLNK(os.lstat('file').st_mode)
File is a named pipe (FIFO) -p file -p 'file' (can also be used on a filehandle) test ?p, 'file' stat.S_ISFIFO(os.stat('file').st_mode)
File.pipe?('file')
File is a socket -S file -S 'file' test ?S, 'file' stat.S_ISSOCK(os.stat('file').st_mode)
File.socket?('file')
File is a block special file -b file -b 'file' test ?b, 'file' stat.S_ISBLK(os.stat('file').st_mode)
File.blockdev?('file')
File is a character special file -c file -c 'file' test ?c, 'file' stat.S_ISCHR(os.stat('file').st_mode)
File.chardev?('file')
File type (returns string 'file', 'directory', 'characterSpecial', 'blockSpecial', 'fifo', 'link', 'socket', or 'unknown' File.ftype('file')
Filehandle or descriptor is opened to a tty -t fd -t $fh fd.isatty os.isatty(fd)
fd.tty?
File has setuid bit set -u file -u 'file' test ?u, 'file' os.stat('file').st_mode & stat.S_ISGID
File.setuid?('file')
File has setgid bit set -g file -g 'file' test ?g, 'file' os.stat('file').st_mode & stat.S_ISUID
File.setgid?('file')
File has sticky bit set -k file -k 'file' test ?k, 'file' os.stat('file').st_mode & stat.S_ISVTX
File.sticky?('file')
File is an ASCII text file (heuristic guess) -T 'file'
File is a "binary" file (opposite of -T) -B 'file'
File modification time (stat('file'))[9] test ?M, 'file' (returns Time object) os.stat('file').st_mtime
-M 'file' (script start time minus file modification time, in days) File.mtime('file') (same)
File access time (stat('file'))[8] test ?A, 'file' (returns Time object) os.stat('file').st_atime
-A 'file' (script start time minus file access time, in days) File.atime('file') (same)
Inode change time (Unix) (stat('file'))[10] test ?C, 'file' (returns Time object) os.stat('file').st_ctime
-C 'file' (script start time minus inode change time, in days) File.ctime('file') (same)
File has been modified since it was last read -N file
file1 is newer (according to modification date) than file2, or if file1 exists and file2 does not file1 -nt file2 (stat('file1'))[9] > (stat('file2'))[9] test ?>, 'file1', 'file2' os.path.exists('file1') and (not os.path.exists('file2') or os.stat('file1').st_mtime > os.stat('file2').st_mtime)
file1 is older than file2, or if file2 exists and file1 does not file1 -ot file2 (stat('file1'))[9] < (stat('file2'))[9] test ?<, 'file1', 'file2' os.path.exists('file2') and (not os.path.exists('file1') or os.stat('file1').st_mtime < os.stat('file2').st_mtime)
file1 and file2 refer to the same device and inode numbers file1 -ef file2 join(':', (stat('file1'))[0,1]) eq join(':', (stat('file2'))[0,1]) test ?-, 'file1', 'file2' os.path.samefile('file1', 'file2')
file1 and file2 have the same modification times (stat('file1'))[9] == (stat('file2'))[9] test ?=, 'file1', 'file2' os.stat('file1').st_mtime == os.stat('file2').st_mtime

Complete details are in the manuals for each language:

5 comments:

Mikel Ward said...

Note that by default Perl's filetest operators don't take ACLs into account, so it doesn't always give the correct answer.

See http://perldoc.perl.org/filetest.htmlfor more details.

Mikel Ward said...

file1 -ef file2 is os.path.samefile(path1, path2) in Python.

And the others of the last five are obviously doable using two tests with an and in the middle.

Jon Jensen said...

Mikel, thanks for your comments.

I added your link to Perl's new filetest pragma and mtime/atime/ctime from stat; Python os.path.samefile, os.isatty, and complex tests for those last tests; and Ruby isatty & tty? for -t.

The table has far fewer blanks now.

Andrew Dalke said...

I think you should include the Python helper functions in Python for some of those more common tasks. As described here, Python seems to be a convoluted language.

For example, to test if a file exists you can use

>>> import os
>>> os.path.exists("/etc/passwd")
True
>>> os.path.exists("/blah/blah")
False
>>>

instead of "os.access('file', os.F_OK)".

Similarly, os.path.getsize("file") instead of os.stat('file').st_size . The implementation of getsize is exactly that latter code, but I think it's easier to read. Of course if you want to minimize stats then you should save and reuse the stat call.

I see that you use a mix of stat return value lookups. Starting with Python 2.2 the preferred form for os.stat('file')[stat.ST_UID] became os.stat('file').st_uid.

That is, stat now returns an object with both dictionary-style lookup (for backwards compatibility) and attribute-style lookup (preferred). Your comparison table uses both styles but should be consistent and use new-style attributes.

Jon Jensen said...

Those were helpful suggestions, Andrew.

I looked for friendlier versions of os.access() calls but just missed os.path.exists() and have updated that. I don't know of any other way to do the remaining os.access() calls I have listed.

Same thing with os.path.getsize() -- I've added that, but left the manual stat version as an alternate because it's useful for caching purposes.

I switched the rest of the stat calls to attribute-style lookups. Is there some equivalent new style that would replace the cumbersome forms like: stat.S_ISSOCK(os.stat('file').st_mode) ?

Thanks again and please let me know of any other improvements you recommend.