Tuesday, December 29, 2009

Recursively Listing Files

Update: It was pointed out to me that the recursive-directory-files word in io.directories.search solves this problem. Good to know, and the below article can be thought of as a learning exercise! :)

Factor has several vocabularies for interacting with the filesystem, and quite a lot of work has been done on making these useful. Some interesting blog posts from early 2009 include:

One feature that I haven't found yet, is a word to simply (and recursively) list files within a directory. This is often useful with the intent of processing some (or all) of the files in some way.

This feature exists in other language standard libraries with different names. In Python, this is called os.walk(). In Perl, this is called File::Find. Usually, even if it doesn't exist, it can be built relatively easily.

I'm going to show you how to create a word to do that with Factor.

Many words within the standard library are implemented by a public word that defers to a private word for part of the functionality. In this case, I want to separate the "setup" functionality of the word, from the "work" functionality.

Let's define a word list-files that will recursively list all the files within a directory. First, we need to create a growable sequence (in this case a vector) to hold the result, and then we want to call a word that will be used to do the actual work.

: list-files ( path -- seq )
    [ V{ } clone ] dip (list-files) ;

For each path that we process, we will want to handle differently depending on what type of file the path points to:

  1. If a symbolic link, we want to read the link and recurse into it.
  2. If a directory, we want to recurse into each of the files within it.
  3. If a regular file, we want to add it to the list of files.

Using some of the words in io.directories, io.files, io.files.info, io.files.links, and io.files.types, we can build such a function:

: (list-files) ( seq path -- seq )
    normalize-path dup link-info type>> {
        { +symbolic-link+ [ read-link (list-files) ] }
        { +directory+ [
            [ directory-files ] keep
            '[ normalize-path _ prepend (list-files) ] each ] }
        { +regular-file+ [ over push ] }
        [ "unsupported" throw ]
    } case ;

We can setup a directory structure like so:

$ tree -f /tmp/foo
/tmp/foo
|-- /tmp/foo/bar
`-- /tmp/foo/baz
    `-- /tmp/foo/baz/foo

1 directory, 2 files

And then use our new function from Factor:

( scratchpad ) "/tmp/foo" list-files .
V{ "/tmp/foo/bar" "/tmp/foo/baz/foo" }

This could be improved further by handling file permissions issues, infinite recursion, and lazily generating the list of files (for better performance with large directory trees).

3 comments:

Unknown said...

"One feature that I haven't found yet, is a word to simply (and recursively) list files within a directory."

Like recursive-directory-files?

mrjbq7 said...

Thanks! It's sometimes hard to search for words.

It is a little odd that directory-files is in io.directories, but recursive-directory-files is in io.directories.search.

Maybe there should be a better way of navigating all these small namespaces -- io.files has the same problem.

At least, they could link to each other in the docs.

Jim Mack (1963) said...

New to factor, I often use this google search to find stuff in the documentation:

site:http://docs.factorcode.org/ recursive

It popped up recursive-directory-files 2nd.