3  Putting Everything Together

At this point we are ready to put everything together and write the most useful parentheses-addition program. The first improvement to our program is that it uses an empty default CONVERSION-TABLE; the second improvement is that the shell script accepts a command-line argument: a file that contains the table.

Naturally we will specify the table as a data file with parentheses, e.g.,

(("," " comma ")
 (";" " semi ")
 ("\\\\" "\\\\\\\\"))

read. Thus, assuming that this table is in a file called table, we want to type

   > ./addparens -f table < pre1958-grades.dat

and get the following output:

   (
   ( Adam 78 comma  88 comma  69 comma  66)
   ( Brad 88 comma  87 comma  86 comma  22)
   ( Carr 99 comma  88 comma  88 comma  90)
   ( Dave 77 comma  78 comma  77 comma  78)
   ( Fawn 90 comma  89 comma  81 comma  60)
   ( Gege 67 comma  78 comma  81 comma  85)
   )

That is, the program adds parenthese and performs all the substitution specified in table. In this case: the output contains `` comma '' instead of ``,'' and no other conversions apply.

To add this new functionality, we modify the script since the extension concerns a command-line argument:

#!/bin/sh
string=? ; exec mzscheme -g -l mzlib.ss -r DOLLAR0 "DOLLAR@"
(load "addparens3.ss")

(define MinusF "-f")

(cond
  ((= (vector-length argv) 0) (void))
  ((and (= (vector-length argv) 2) 
        ;; now we know that two arguments were passed on the command line
        (string=? (vector-ref argv 0) MinusF)) 
   (set! CONVERSION-TABLE (call-with-input-file (vector-ref argv 1) read)))
  (else (error 'addparens "bad format")))

(add-parens-to-file)

The revised script acts as before if no command-line arguments are present. If the command line specifies a table via "-f <filename>", the script must change the default conversion table to the one in the specified file. In all other cases, we signal an error.

The Scheme function call-with-input-file is the only new element in the revised script. Since the table is specified via a command-line argument as a filename, we cannot use plain read to get hold of the table. Instead we connect the file with read via the function call-with-input-file. Roughly speaking,

(call-with-input-file "file.dat" read)

makes read take its data from "file.dat" instead of the standard input.

Exercise 3.0.4.   Add the -t option to our script. Using the -t option, a script user can specify a single entry into the conversion table on the command line. For example,

   > ./addparens -f table < pre1958-grades.dat

could be specified with

   > ./addparens -t "," " comma " < pre1958-grades.dat

and still obtain the same output as above.

#!/bin/sh
string=? ; exec mzscheme -g -l mzlib.ss -r DOLLAR0 "DOLLAR@"

(load "addparens3.ss")

(define MinusF "-f")

(define (pair-of-strings? obj)
  (and (list? obj)
       (= (length obj) 2)
       (string? (first obj))
       (string? (second obj))))

(cond
  ((= (vector-length argv) 0) (void))
  ((and (= (vector-length argv) 2) 
        ;; now we know that two arguments were passed on the command line
        (string=? (vector-ref argv 0) MinusF)) 
   (let ((table (call-with-input-file (vector-ref argv 1) read)))
     (if (and (list? table) (andmap pair-of-strings? table))
	 (set! CONVERSION-TABLE table)
	 (error 'addparens "table must specify a list of pairs of strings; see man"))))
  (else (error 'addparens "bad format")))

(add-parens-to-file)

Figure 14:  The final version of addparens

Last, but not least, we can also start worrying about the validity of arguments to our programs, especially if we want to make them available to our friends. Consider the addparens program. First, we should check that the command-line arguments are of the proper shape. At the moment, we only know that the first of two arguments is "-f". We should also check that the second one is the name of an existing file. To this end we can use

  (file-exists? (vector-ref argv 1))

The Scheme function file-exists? consumes a string and returns #t if, and only if, the string specifies a file.

Second, we should also check that the file specifies a table, i.e., a list of pairs of strings. The change to the program is again straightforward: see figure 14. Instead of just assigning the value of (call-with-input-file "file.dat" read) to CONVERSION-TABLE, we first check that it is a list and that each element in the list is a list with two strings.

In general, we should develop error checking code for all data that enters our computation through read. For many problems, this step is easy because we can get away with the simple input techniques we have developed here. On occasion we may need to process more complex forms of text. In that case, we must study more general parsing techniques as typically taught in compiler courses.