Saturday, August 20, 2011

Unique

A few days ago, I noticed this example from the Racket website, for reporting "each unique line from stdin":
;; Report each unique line from stdin
(let ([saw (make-hash)])
  (for ([line (in-lines)])
    (unless (hash-ref saw line #f)
      (displayln line))
    (hash-set! saw line #t)))
We can implement the same functionality in Factor, reading each unique line from an input stream:
: unique-lines ( -- )
    lines members [ print ] each ;
The lines word acts on the "current input stream", so we can use a file reader as an input stream to print out all unique lines in a file:
: unique-file ( path -- )
    utf8 [ unique-lines ] with-file-reader ;
If we wanted to make this print and flush each unique line of input as it is read, we could have used the each-line word to implement it in a line-by-line fashion:
: unique-lines ( -- )
    HS{ } clone [
        dup pick in? [ drop ] [
            [ over adjoin ]
            [ print flush ] bi
        ] if
    ] each-line drop ;

2 comments:

Jon said...

I prefer the version with less stack gymnastic when you currify the hash set:

HS{ } clone [
2dup in? [ 2drop ] [
[ adjoin ]
[ drop print flush ] 2bi
] if
] curry each-line

Cheers,
Jon

CertainlyNotWeston said...

Nice article!

I'd say the longer version is actually closer to the Racket example, which is written in a surprinsingly imperative style (not to scare the newcomers, I guess..). When you write the above in a more functional Racket, both solutions look quite similar (ummm... backwards, of course):


(define (unique-lines)
(for-each displayln (remove-duplicates (port->lines))))

(define (unique-file f)
(with-input-from-file f unique-lines))

I'd say this is one more example of how related the concatenative and functional paradigms are...

Cheers!

Joan.
PS: Instead of with-input-from-file, you can just pass a parameter to port->lines, or use file->lines in the first place, but this is closer to the Factor version...