Documentation of HTML

A perl Tutorial

by Will Morse, BHP Petroleum






                       Landmark Graphics Corporation
                     World Wide Technology Conference

                              Houston, Texas

                             December 1, 1994



                              A perl Tutorial


                by: Will Morse
                      BHP Petroleum

Copyright 1994 Will Morse.  Permission is given to freely copy and
distribute this paper as long as there is no charge except real and
actual mechanical copying costs and as long as this notice is kept
with each copy so others may copy it as well.

The opinions expressed are the author's own and do not necessarily
reflect the opinions or policies of The Broken Hill Proprietary
Company, Limited, or its various divisions.  Your milage may vary.

INTRODUCTION:

     perl is the "Swiss Army Chainsaw of Systems Administration".

     This tutorial covers:

     *    What is perl?  page 2

     *    What does a perl program look like?  page 4

     *    A brief, hopelessly incomplete, overview of perl syntax
          and features.

          -    Variable naming          page 5
          -    Assignments              page 6
          -    Arithmetic               page 7
          -    If-then-else             page 8
          -    Loops                    page 10
          -    Files                    page 11
          -    Special functions        page 12
          -    Regular expressions      page 14
          -    Report Writer            page 16
          -    Debugger                 page 16

     *    An example using an exported horizon file from SeisWorks.
          page 17

     *    Information on how to get perl.   page 20

     *    A list of books on perl.  page 21

     *    Internet contacts and support for perl.  page 22

     *    Some handy Landmark-related perl programs.  page 23
               histograms,         SEG-Y Headers,
               nulls to spaces,    HPGL

     *    A little about perl version 5.  page 26

     *    Other free software you should know about. page 27
          expect,        tcl/tk,        GNUtar,        gzip

     This paper specifically addresses perl 4.x.  perl 5.x is out
     now, but most people still use version 4.  The current books
     and documentation for perl all address version 4.

     There are versions of perl, such as tkperl and oraperl, that
     address X-windows programming, Oracle database access, and
     other specific features.  There is not time enough to cover
     these versions in this short tutorial.

WHAT IS PERL?

     perl stands for practical extraction and reporting language.
     perl is a high level programming language combining elements
     of C-shell, awk, and many other programming languages and
     utilities.

     perl is free.  There are no license fees.  There are no
     royalties.  It is NOT "Public Domain".  When you get perl, you
     will find a README file that explains your license.  If you
     give perl to someone else, you have to give them this README
     file.  Please read and understand this file.

     There are at least two publicly available books explaining the
     use of perl in detail (see page 21).  There is excellent
     support for perl on the internet in the comp.lang.perl
     newsgroup (see page 22).

     perl is better than shell script programming and awk or sed
     programming because:

     *    It does not have all the shell initiation overhead, which
          makes it faster.

     *    It can read and write binary data files.

     *    It can have many files for input or output at one time.

     *    It has a report writer.

     *    It has extended regular expressions.

     *    It has both linear and associative arrays.

     *    It has powerful defaults that simplify programming.

     *    It can process very large files without record size
          limits.

     perl is better than C, C++, or Fortran programming (for
     sysadmin and data admin tasks) because:

     *    It does not have visible compile or link stages.  The
          program is kept in a source module, like a shell script.

     *    There is a very rich set of character string manipulation
          and array handling commands.

     *    It has a more forgiving and easier to use syntax.

     *    It has both linear and associative arrays.

     *    It has a report writer.

     *    Some versions of perl include X-windows features, tkperl,
          or access to Oracle databases, oraperl.

     In fairness, perl has some down points.

     *    perl programs typically take 1.7 times as long to execute
          as an equivalent C program.  This is okay for utility
          programs, but would be a problem for a program like
          SeisWorks.

     *    There are a number of "gotcha's" that are pretty typical
          Unix characteristics, but would confuse non-programmers.
          For instance, a number starting with 0 is assumed to be
          octal.

                    $num = 010;
                    print "The number is $num. \n";

          will print

                    The number is 8.

     *    Mostly of interest to professional programmers, perl does
          not have a CASE construct.  It does not have pointers or
          let you take the address of anything (until perl 5).

     *    perl has a dozen different ways to do anything.  Some
          people don't like this, others do.

     *    perl has a lot of capabilities to do things most
          sysadmins and data admins will not understand.  Don't
          worry about it, just use what you need.  You don't have
          to know everything about a computer language to make good
          use of it.

     perl does not come on the standard SunOS 4.x distribution, but
     it is a very standard language.  perl is listed in the job
     descriptions compiled by the Systems Administrators Guild
     (SAGE) of Usenix, the Unix User's Group.  There are currently
     negotiations to include perl in the Solaris 2.5 release.

WHAT DOES A SIMPLE PERL PROGRAM LOOK LIKE?

     This simple program copies records from a file and prefixes
     each line with a sequential number.  The italic numbers are
     not in the program, they just help the explanation.


                1   #! /usr/local/bin/perl
                2   while (<>)
                3   {
                4        print STDOUT ++$i, $_;
                5   }


     Explanation:

      1   #! is the Unix method for specifying a shell program.

          /usr/local/bin/perl is the standard place to put perl.

      2   while () {} creates a loop that continues while the
          statement in the () is true.  The statements in the loop
          are enclosed in {}.

          <> is a special default.  It tells perl to look at the
          calling command line to see if any files are specified.
          If they are, read each file in turn.  If no files are
          specified, read from standard input.  In either case, put
          the characters read into the special variable $_.  When
          <> reaches end-of-file, it returns false, which
          terminates the while loop.

      4   print is a simple, unformatted, printing method.

          STDOUT is the standard filehandle for Standard Output.
          Filehandles are specified in all caps in perl.

          ++$i says to increment the value of $i and make that
          value available to the print statement.  All scaler
          values (anything but a command, linear array, associative
          array, filehandle, or procedure name) starts with $.

          $_ is the default operand of any command.  In this case,
          $_ contains the last record read by the <> statement.

          ; terminates each command in perl.

A BRIEF OVERVIEW OF PERL SYNTAX:

     In perl, as in Unix generally, character case is significant.
     X and x are not the same character.  It is common to name
     variables and other items in mixed case:

               $thisIsMixedCase

     It is also permissible to use underscores:

               $variable_with_underscores.

     Do not use names that start with a number, as these are often
     perl special symbols, $1, $2, etc.

     All perl commands end with a semicolon, ;.

Variables:

     perl identifies each type of variable - or data name - with a
     prefix character or identifying style.  These characters are:

          $    scalar              a single number (integer or
                                   real) or character string

          @    linear array        an array referenced by an index
                                   number

          %    associative array   an array referenced by a
                                   textual key

          UC   file-handle         a file handle is uppercase

          &    procedure           a subroutine

          xx:  label               object of goto, or marker for
                                   escape from a loop.

     "Subscripts" enclosed in [] apply to linear arrays.

          @items         refers to the entire array items.

          $items[1]      refers to the scaler value which is the
                         second item in the array items.  Linear
                         arrays start with the index 0.

          $#items        is the number of items in @items starting
                         from 0.

     Subscripts enclosed in {} apply to associative arrays.

          %items         refers to the entire associative array
                         items.

          $items{"x"}    refers to the scalar value matching the
                         key "x"

     Values enclosed in () are lists.  Lists are often used as
     arguments to a subroutine or built-in function call.  It is
     not necessary to enclose arguments in () if there is only one
     argument or the program knows the limit of the list.

     There can be completely separate and unrelated variables $x,
     @x. %x, and &x, not to mention $X, @X, %X and &X.

     There are special variables, the most important of which are
     $_, @_,and @ARGV.

          $_ is the default scaler value.  If you do not specify a
          variable name in a function where a scaler variable goes,
          the variable $_ will be used.   This is a very heavily
          used feature of perl.

          @_ is the list of arguments to a subroutine.

          @ARGV is the list of arguments specified on the command
          line when the program is executed.


Basic Commands and Control:

     Braces, {}, are used to contain a block of program statements.
     It is possible to have local variables within a block.  Blocks
     are used for the objects of most control commands.

     Simple Assignment:

          Simple, scaler, assignment is what you might expect:

               $var = 1;
               $str = "This is a string.";

          One can also assign lists of scalars in one statement:

               ($rock, $jock, $crock) =
                    ("Plymouth", "Warren Moon", "Solaris 2.x");

          One can assign a list to an array:

               @items = (1, 2, "Cambodia", 4);

          or an array to a list:

               ($a, $b, $c, $d) = @items;

          Associative arrays need a key, but otherwise work as you
          would expect:

               $aa{"able"} = "x";

               %aa = ("able", "x", "baker", "y", "aardvark", "z");

          Assigning an ARRAY to a SCALER will give the number of
          items in the ARRAY.

               @items = (10, 20, 30);
               $i = @items;
               print "$i";

          will print "3".

     Arithmetic Operations:

          perl has the usual operations, and many more:

               $c = $a + $b      addition
               $c = $a - $b      subtraction
               $c = $a * $b      multiplication
               $c = $a / $b      division
               $c = $a % $b      remainder
               $c = $a ** $b     exponentiation
               $c = $a . $b      concatenation

               ++$a, $a++        increment by 1
               --$a, $a--        decrement by 1

               $a += $b          increment by $b
               $a -= $b          decrement by $b
               $a .= $b          append $b to $a
               $c = "*" x $b     make $b *'s

          Of course, there are many more.

          There are also modifiers like these:

               $a = "Big And Little";
               $c = \l$a;
               print $c;

          prints "big and little".

               \l        convert to lower case
               \u        convert to upper case
               \L        lowercase until \E
               \U        uppercase until \E
               \E        end case modification

          There are functions for math including:

               log($x)
               exp($x)
               sqrt($x)
               sin($x)
               cos($x)
               atan2($y,$x)

          The only trig functions are sin, cos, and atan2, however,
          these can easily be used to compute the others.  The
          ERUUG Unix Cookbook (see page 21) has a list of the
          formulas for the conversions.

     If-Then-Else:

          The basic if-then-else command is fairly typical of all
          computer languages.

               if ( condition )
               {
                    true branch
               }
               else
               {
                    false branch
               }

          There is also

               if    (condition) {commands}
               elsif (condition) {commands}
               elsif (condition) {commands}

          which simplifies a lot of complex nested if statements.
          Note that it is elsif, not elseif or else if.

          Both the true and false branches may contain any number
          of nested if statements.

          There is also another form of if statement:

               unless (condition)
               {
                    true branch
               }

          The condition has a wide range of comparison operators.
          It is important to observe the distinction between
          numeric comparisons and string comparisons.


               numeric   string        meaning
               ==        eq        equals
               !=        ne        not equal
               >         gt        greater than
               <         lt        less than


          Strings that do not consist of numbers have a value of
          zero.

               if ("abc" == "def")

          is TRUE, because the strings are numerically zeros.  To
          make this work right you have to have

               if ("abc" eq "def")

          perl has file test operators like shell scripts.  perl
          has an extended set to tests such as:


                       -T     true if file is text
                       -B     true if file is binary
                       -M     days since file modified
                       -A     days since file accessed
                       -C     days since file created


          Other forms of the if-command are not common in other
          computer languages, but can be quite useful.  A good
          example is the postfix if.

               next if $var == 1;

          A useful form of logic uses || or && in a command:

               open (IN,"     to open file F for write only.
                    X = >>    to append to file F.
                    X = |     to WRITE to a pipe to PROGRAM F.
                    Y = |     to READ from a pipe from PROGRAM F.

               If only the filename is provided, the file is
               opened for read and write.

          Reading:

          The most basic reading mechanism is to enclose the
          filehandle in <>, like this

               $record = ;

          A special case of this goes like this:

               $record = <>;

          This special case looks for filenames on the program
          command line and reads any files it finds, one after the
          other.  If it finds no filenames on the program command
          line, the program will assign <> to STDIN.

          It is important NOT to use the array form:

               @record = ;

          as this will read the entire file into the array @record,
          which may take up an awful lot of memory.

          Reading is often done using a while loop, like this:

               while ()
               {
                    commands
               }

          When the last record is read, the  returns
          the value FALSE, which terminates the while loop.  Since
          a scaler variable has not been supplied for the record,
          the record is stored in $_.

          Writing:

          Most writing is done using the print or the printf
          commands.  These commands are used to write to files even
          if the results are never actually printed on a hardcopy
          device.

          print writes a line with default line spacing.  It is
          used when the output has no particular column spacing to
          comply with:

               print STDOUT "The X is $x and Y is $y\n";

          printf is just like the printf in C and other similar
          languages.  It is a formatted print.  The first variable
          or string contains the format.

               $fmt = " X =  %8.2f  Y = %8.2f  Flag = %s\n";
               printf STDOUT ($fmt, $x, $y, $flag);

          The \n is the new line character.  The % indicates the
          beginning of a format character, the f is the format for
          floating point numbers.  The 8.2 indicates the number is
          8 characters long with a decimal point in the sixth
          character, and two decimal places in the seventh and
          eighth characters.  The %s is a character string with no
          length specified.

          Closing:

          perl will automatically close any open files when it
          exits.  There are some occasions where it is useful to
          close a file before perl exits, so the there is an
          explicit close.

               close FILEHANDLE;

Other Important Functions:

     Error Messages:

          die is used to print an error message and then exit.

          warn is used to print an error message, but continue.

     String Handling:

          split is used to split tokens (fields) from a character
          string into an array.

          If you have a line:

               $line = "Now is the time for all good men";

          you can put each word into an array with the command:

               @token = split(/\s+/,$line);

          sort sorts a list or array.

          study, an instruction I issue many times to my 12-year-
          old, optimizes string operations.

     Binary Encoding:

          pack      packs values into a string using a template.

                         $pi = pack("f",3.1415926);

                    puts pi into a floating point number.

          unpack    extracts values from a string using a
                    template.

                         $pi2 = unpack("f",$pi);

          There is a long list of templates you can use.  You can
          use more than one template at a time to build up or
          extract binary data from a record.

                    l    long      32 bit signed integer
                    L    long      32 bit unsigned integer
                    s    short     16 bit signed integer
                    S    short     16 bit unsigned integer
                    f    float     32 bit floating point
                    d    double    64 bit floating point
                    A    ASCII     ASCII string
                    c    char      a single byte (character)

     System:

          There are many system oriented functions including:

          chmod     change file permissions

          fcntl     sets file control options

          fork      creates an independent sub-process.

          mkdir     make a directory


Regular Expressions:

     Regular expressions and pattern matching are an important part
     of all Unix programming.  perl adds a set of extended regular
     expression characters to the standard set.

     There are two ways regular expressions are used:

     Match          m/regexp/
                    m is optional, you can use /regexp/

               next if m/^\s*$/;  will skip blank lines.

     Substitute     s/regexp/new/
                    If the regexp matches, replace it with new.

               s/\s*$//;  will trim trailing spaces from a line.


                    Standard Set (not complete)


               a         match a
               a*        match zero or more character a's
               .         match any character
               .*        match zero or more of .
               [a-m]     match characters a through m only
               [^n-z]    do not match letters n to z
               [a-m]*    match zero or more letters a to m
               ^         match the beginning of the line
               $         match the end of the line
               \t        matches a tab character




                    perl extensions (not complete)

               \d        same as [0-9]
               \D        same as [^0-9]
               \s        matches white space (space or tab)
               \S        matches anything but white space
               \w        same as [0-9a-zA-Z] characters)
               \W        same as [^0-9a-zA-Z]
               .+        same as ..*
               [a-m]+    match one or more letters a to m
               a{n,m}    at least n a's, not more than m a's
               a?        zero or one a, not more than one
               \cD       matches control-D


     An important use of regular expressions is the use of () to
     select subsets of the regular expression.  This is actually a
     standard part of regular expressions and can be used in vi,
     awk, sed, and anywhere regular expressions are found.  perl
     makes it especially easy to use the ()

     For instance, if you had the character string:

          "SeisWorks 3D"   "s3d 2> /dev/null"

     as is found in launcher.dat, you could use the regular
     expression:


          ;
          if ( m/^\t"(.+)"\s*"(\S+)\s+2>\s*(.+)$/ )
          {
               ($title, $program, $errorFile) = ($1, $2, $3);
          }

     to extract the title, program name, and the error file name.

     The way this works is:

          ;    reads a record.  Since it doesn't say
                         where to pu the record, it is stored in
                         $_.

          m/.../         matches a regular expression.  Since it
                         doesn't say what variable to use, it uses
                         $_.

          ^              matches the beginning of the line
          \t             matches the initial tab.
          "              matches the first "
          (              starts the first extracted string
          .+             matches one or more of any character
          )              closes the first extraction, placing it
                         in $1
          "              matches the second "
          \s*            matches zero or more spaces or tabs
          "              matches the third "
          (              starts the second extraction
          \S+            matches any characters but space or tab
          )              closes the second extraction, placing it
                         in $2
          \s+            matches one or more spaces or tabs.
          2              matches 2
          >              matches >
          \s*            matches zero or more spaces or tabs
          (              starts the third extraction
          .+             matches one or more characters
          )              closes the third extraction, placing it
                         in $3
          "              matches the fourth "
          $              matches the end of the line

          $title = $1;   puts the value from $1 into $title.

Report Writer:

     The report writer feature lets you define how your page should
     look and do all the necessary assignments with a single
     command.  The report writer takes care of page breaks, page
     numbers, and other issues for you.


          format STDOUT_TOP =
                    Projects Using Too Much Disk     page @##
              Project      Owner     Last Used       Cost
          --------------  --------  ------------- -----------
          $%
          .
          format STDOUT =
          @<<<<<<<<<<<<<  @<<<<<<<  @>>>>>>>>>>>> @#######.##
          $project,       $owner,   $lastUsed,    $cost
          .

          while (<>)
          {
               ($project, $owner, $lastUsed, $cost) = split;
               write;
          }


     _TOP      indicates a heading
     .         ends a format description
     $%        is the page number
     @<<<<     is a left justified field
     @>>>>     is a right justified field
     @###.##   is a right justified, two decimal number


Debugger:

     perl has a built-in debugging system.

     To use the debugger, all you have to do is add a -d to the
     first line of the program.

          #! /usr/local/bin/perl -d
          commands

     When you run the program, it will start in debug mode.  You
     then have many debugging commands you can use including:

          h         help on debugger
          s         step
          c         continue to next break
          c   continue until line
          n         next (does not step into subroutines)
          l  list program statements in the
          b   sets a breakpoint at line
          p   prints  which is usually a variable

AN EXAMPLE USING AN EXPORTED HORIZON FILE:

Background:

     A typical exported horizon file from Seisworks is in the form:

          Line    Trace    X    Y    Z

     where  Z is often the time, but in this example, we are going
     to export the amplitude as Z.

     What we want to do in this example is to clip the amplitudes
     to some specific range of values.  Anything below the range
     will be set to the lowest value in the range, anything above
     will be clipped back to the highest value in the range.

     This can also be done using bcm.  We chose this example
     because it is easy to follow and can be extended to do things
     bcm cannot do.  This simplified example is taken from a
     program used by BHP Petroleum (Americas) to suppress tuning
     effects resulting from a formation thickness being close to
     the size of a seismic wave length.

     This example is also kept simple.  An experienced perl
     programmer would use more sophisticated programming to write
     a shorter, faster, program.


Usage:

     Before using this program the first time, you must use

          chmod +x horizonClip

     to make it an executable file.  There is no compile step or
     link step as in C, Fortran or other languages.

     You have to extract the file using the data export feature of
     SeisWorks.

     The program is called by typing:

          horizonClip   low   high  filein  fileout

     You can then re-import the horizon using the data import
     feature of SeisWorks.

Program:

     Note:  The line numbers do not appear in the file or in the
     program, they are just used in this paper to help you follow
     the program:


         1   #! /usr/local/bin/perl
         2   die "Usage: horizonClip low high in out\n"
         3         if $#ARGV !=3;
         4   $low  = $ARGV[0];
         5   $high = $ARGV[1];
         6   if ($low > $high)
         7   {
         8        $tmp = $low;
         9        $low = $high;
        10        $high = $tmp;
        11   }
        12   $filein  = $ARGV[2];
        13   $fileout = $ARGV[3];
        14   if ($filein eq "-")
        15   {
        16        open (IN,"<&STDIN");
        17   }
        18   else
        19   {
        20         open (IN,<$filein)
        21             || die "No file $filein $!\n";
        22   }
        23   if ($fileout eq "-")
        24   {
        25        open (OUT,">&STDOUT");
        26   }
        27   else
        28   {
        29        open (OUT,>$fileout)
        30            || die "Cannot make $fileout $! \n";
        31   }
        32   while ()
        33   {
        34        ($line, $trace, $x, $y, $z) = split(\s+);
        35        if ($z < $low)  {$z = $low;}
        36        if ($z > $high) {$z = $high;}
        37        printf OUT ("%20s %12s %12 %12s %12.2f",
        38              $line, $trace, $x, $y, $z);
        39        $count++;
        40   }
        41   print STDOUT "Processed $count records";


Details:

      1        The first line of all perl programs (on Unix
               platforms).

      2 -  3   Post-fix if.  There are four arguments.  $#ARGV is
               3 because $ARGV starts at 0.

      4 -  5   We could have as easily said:

               ($low, $high, $filein, $fileout) = @ARGV;

     16 - 25   We can assign filehandles to other filehandles
               (merge the output of the filehandles) using the
               open statement and the &.

     20 - 29   These are more standard opens.

     32 - 40   The while loop continues until  becomes false.

     39        $count++ adds one to the value of $count.


HOW TO GET PERL:

     perl is free, it is NOT PUBLIC DOMAIN.  Public Domain means
     there is no identifiable owner or the public at large is the
     owner.  perl is owned by Larry Wall.  Larry gives everybody a
     free LICENSE to use perl.  That is not the same as ownership.
     If Larry let you use his lawnmower for free, you wouldn't own
     it.  There is a license file that comes with perl.  READ AND
     UNDERSTAND THE LICENSE.  Read it again before selling any
     software based on perl.

     Many of the CD-ROMS available have perl on them.  In many
     cases you can get the perl executable binary so you don't even
     have to compile it.  These CD-ROM's are advertized in most
     Unix trade magazines.

     Some Walnut Creek and some Prime Time Freeware CD-ROM's have
     perl in SunOS executable form.  There is book available at
     BookStop and other bookstores called:

               Prime Time Freeware for Unix, $60.00
               ISBN 1-881957-04-7

     The CD-ROM in the book Unix Power Tools has perl on it, and is
     available from several bookstores in the Houston area.

               Unix Power Tools, $59.95
               ISBN 0-679-79073-X

     The best way to get perl is via the Internet.  This will get
     you the latest version with the latest bug fixes.  One place
     to get perl on the Internet is:


          ftp   ftp.uu.net
          login: anonymous
          password: your-internet-name@your-internet-site
          ftp> cd /gnu
          ftp> binary        ----- DON'T FORGET THIS LINE
          ftp> get perl-4xxxx.tar.Z
          ftp> bye


     When you get it back to your machine, you will need to
     uncompress it, un-tar it, and execute the make command.

     It is a good idea to get gcc (also free) rather than using the
     bundled C compiler on SunOS.  gcc will make a much faster
     executable of perl.


BOOKS ON PERL

     The main reference is

               Programming Perl, usually called "The Camel Book",
               by Larry Wall and Randal Schwartz.
               Published by O'Reilly & Associates,
               ISBN 0-937175-64-1.

     A more tutorial, but less complete, book is

               Learning Perl, usually called "The Llama Book",
               by Randall Schwartz.
               Published by O'Reilly & Associates,
               ISBN 1-56592-042-2.

     A book giving examples of perl for systems administration is
     supposed to come out soon, but I have been unable to get
     details about it.

     The Energy Related Unix User's Group (ERUUG) Unix Cookbook has
     several example programs related to petroleum.  This book is
     available to members of ERUUG, and is available to guests.  It
     is also available on the Internet at this world wide web
     location:

                          http:/www.glg.ed.ac.uk/



SUPPORT FOR PERL:

     The main source of support for perl is the Internet newsgroup

                              comp.lang.perl

     All the big names in perl follow this newsgroup and many
     people on the net will answer questions.  I usually get an
     answer in a few hours.  That is better than any computer
     department, vendor help desk, or on-site support
     representative I have ever dealt with.

          some big names on the Internet for perl include:

               Larry Wall (author of perl)
               lwall@netlabs.com

               Randall Schwartz
               merlyn@stonehenge.com

               Tom Christiansen
               tchrist@perl.com
               (303) 444-3212

          Randall Schwartz and Tom Christiansen are consultants.

     The Energy Related Unix User's Group (ERUUG) has several
     members with at least some perl experience.

     There are several consulting services that can install and
     support perl.  Sometimes MIS Departments insist that all
     programs acquired must be paid for and have paid support.  Of
     course no-one supports most programs in Unix, particularly not
     awk or the bundled C compiler, but most MIS Departments are
     still learning about Unix and want to run things the way they
     did on the VAX or IBM mainframe.

     You can usually get around the "MIS shuffle" by buying the CD.
     Cygnus Support sells support for many free software packages,
     and unlike most software vendors supporting their own
     packages, Cygnus Support actually provides support.

                                APPENDIX I

SOME USEFUL PERL SCRIPTS:

     These scripts have been written to illustrate points made in
     this paper.  They are not always the most efficient, compact,
     or best way to write the particular script.

          Correct bcm2d histogram:           page 23
          Dumping SEG-Y Headers:             page 24
          Change nulls to spaces in file     page 25
          Splitting an HPGL file:            page 25

Program to correct bcm histogram:

     The histogram feature of the bcm program has a small but
     annoying round off error.  It also does not make a visual
     histogram.  This program reads a bcm listing, selects the
     histogram portion, recalculates the percentages and draws a
     histogram to the side.

          #! /usr/local/bin/perl
          if ($#ARGV != 1)
          {
               print STDERR "Usage: histofix in.file out.file\n";
               exit;
          }
          open (IN,  "<$ARGV[0]");
          open (OUT, ">$ARGV[1]");
          while ()
          {
               print OUT;
               last if /\*\*\* *\.STATS *: *Summary/;
          }
          $skip = ; print OUT "$skip\n";
          $skip = ; print OUT "$skip\n";
          while ()
          {
               last if m/^ *$/;
               m/^\s*(\d+\.\d*)\s+(\d+) /;
               ($interval[$i],$count[$i]) = ($1, $2);
               $total += $count[$i];
               $big = $count[$i] if $count[$i] > $big;
               $i++;
          }
          while ($i > $j)
          {
               $pc = ($count[$j] * 100.0) / $total;
               $pct += $pc;
               $graph = "X" x int((($count[$j] / $big) * 20) + 1);
               printf OUT "%12.4f %15.0f $7.2f %7.2f %s",
                    $interval[$j], $count[$j], $pc, $pct, $graph;
               $j++;
          }
          while () {print OUT;}


Dumping SEG-Y Headers:

     This program is around 500 lines long and thus too long to
     include here in its entirety.  The program reads the EBCDIC,
     Binary, and first Trace header of a SEG-Y file on disk or
     tape.

          #! /usr/local/bin/perl
          for $i (0..255) {$ebcdic{$i} = "_";}
          $ebcdic{  0} = "~";
          $ebcdic{ 64} = " ";
               ...
          $ebcdic{129} = "a";
               ...
          $ebcdic{193} = "A";
               ...
          $ebcdic{249} = "9";
          $binHeadTemplate = "l3s25s170";
          $traceHeadTemplate = "l7s4l8s2l4s13S2s31f5ss17";
               ...
          sysread (IN,$ebcdicHeader,3200);
          sysread (IN,$binaryHeader,400);
          sysread (IN,$traceHeader,240);
          print STDOUT "--------------EBCDIC---------";
          for $i (0..3199) {substr($asciiHeader,$i)
               = $ebcdic{ord(substr($ebcdicHeader,$i,1))} };
          for $i(0..39)
          {
               $line = substr($asciiHeader,$i*80,80);
               print STDOUT "$line\n";
          }
          (    $jobid,
               $lineid,
               $reel,
               ...
               $vibratoryPolarity
          ) = unpack($binHeadTemplate,$binaryHeader);
          print STDOUT "--------------Binary---------";
          print STDOUT "jobid              $jobid             \n";
          print STDOUT "lineid             $lineid            \n";
          print STDOUT "reel               $reel              \n";
          ...
          print STDOUT "vibratory polarity $vibratoryPolarity \n";
          (    $traceLine,
               $traceReel,
               ...
               $overTravelTaper,
          ) = unpack($traceHeadTemplate,$traceHeader);
          print STDOUT "--------------Binary---------";
          print STDOUT "Trace Line         $traceLine         \n";
               ....
          print STDOUT "Over Travel Taper  $overTravelTaper   \n";
          exit;

     The information here should give anyone who is familiar with
     SEG-Y enough information to reconstruct the program.  If you
     are not familiar with SEG-Y, obtain the Seismic Unix (SU)
     package from the Center for Wave Phenomenon at the Colorado
     School of Mines.  This package contains more than enough
     information to complete this program.

Program to convert nulls to spaces in bcm output:

     The output of bcm2d and bcm3d has some sloppy code that prints
     nulls instead of spaces.  This is usually okay for vi and
     more, but interferes with the correct operation of aXe or
     Xless.  It also makes it harder to read the file into a
     spreadsheet.  This program looks in any file for null
     characters and changes them to spaces:

          #! /usr/local/bin/perl
          open (IN,"<$ARGV[0]");
          open (OUT,">$ARGV[1]");
          while (!eof(IN))
          {
               $c = getc(IN);
               $c = " " if ord($c) == 0;
               print OUT $c;
          }


Converting a monolithic HPGL file to records:

     It is common to find HPGL and other plotter control files
     given as one long record with no new-lines.  These files are
     hard to troubleshoot or transfer between programs.  The fold
     command can split the file into arbitrary length records, but
     what you want is to be able to make sense of the commands.

     HPGL files contain plotter commands separated by semicolons.
     HPGL ignores embedded new-lines.

     A perl program to fix this can be as simple as:

          #! /usr/local/bin/perl
          while (<>)
          {
               s/;/;\n/g;
               print;
          }

     It could actually be done as simply as:

          perl -pi.bak -e 's/;/;\n/g' hpgl.file


                                APPENDIX II

PERL 5:

     perl 5 was released just as this report was being prepared.
     These are a few new features we expect to see in perl 5.

     *    awk-like BEGIN and END sections.

     *    Better access to system function calls.

     *    Pointers and structures

     *    Object-oriented programming features

     *    Additional regular expression features.


Tkperl 5:

     Tk is a set of libraries and functions to create X-windows
     "widgets", picture elements such as scroll bars, pull down
     menus, and even "canvas" graphics.

     Tkperl 4 used embedded Tcl (Tool Command Language) to use Tk.
     Tkperl 5 has native access to Tk.

                               APPENDIX III

OTHER FREE SOFTWARE YOU SHOULD KNOW ABOUT:

expect:

     expect is a program that lets you run the kind of programs
     that ask you stupid questions every fifteen minutes or so.  An
     expect script can read what the program prints and give it an
     answer according to your instructions.

     A good example is bcm3d.  Part way through the program, bcm3d
     asks you for a real number.  Later on in the program, it asks
     if you want to run the program.  This makes it hard to put
     together a shell script of ten or twenty bcm3d jobs and run
     them overnight.  expect can anticipate these questions and
     answer them for you.  You can start the job overnight and go
     home to your family.

     This is an example:


               #! /usr/local/bin/expect -f
               #       disable timeouts
               set timeout -1
               #       start the bcm3d program using the
               #       parameter specifying a .pcl file
               spawn bcm3d [lindex $argv 0]
               #       wait for the reel number question
               expect "*number :*"
               exec sleep 1
               send 1\r
               #       wait for the ready question
               expect "*Ready, or A to Abort :*"
               exec sleep 1
               send r\r
               #        wait for completion message
               expect "*ended normally*"
               exec sleep 1
               exit

     There is a book coming out about expect that you will want to
     read.

               Exploring Expect
               by Don Libes
               Published by O'Reilly & Associates,
               ISBN: 1-56592-090-2

Tcl/Tk:

     Tcl/Tk is a shell script like language for writing X-windows
     applications.  It is not terribly easy, but is much easier
     than writing C, C++, Motif and X programs.


     There is a new book out about Tcl/Tk.

               Tcl and the Tk Toolkit
               John Ousterhout
               Published by Addison-Wesley
               ISBN: 0-201-63337-X

GNUtar:

     GNUtar is like regular tar except:

     *    It can write tapes across the network (no more RFS or dd
          to worry about).

     *    It can compress files as it backs them up.

     *    It strips the leading slash off the path, so you don't
          have absolute paths in your tar.

     If you get nothing else, get GNUtar.


gzip / gunzip:

     gzip, gunzip, and znew are programs that work basically like
     the standard SunOS compress and uncompress, except that they
     typically get 40% more compression.  znew takes a file
     compressed with compress and turns it into a gzip file.  Files
     compressed with gzip have a .gz extension rather than the .Z
     extension of compress.