Read-Ahead and Python Generators

One of the early classics of program design is Michael Jackson’s Principles of Program Design (1975), which introduced (what later came to be known as) JSP: Jackson Structured Programming.

Back in the 1970′s, most business application programs did their work by reading and writing sequential files of records stored on tape. And it was common to see programs whose top-level control structure looked like (what I will call) the “standard loop”:

open input file F

while not EndOfFile on F:
    read a record
    process the record

close F

Jackson showed that this way of processing a sequence almost always created unnecessary problems in the program logic, and that a better way was to use what he called a “read-ahead” technique. 

In the read-ahead technique, a record is read from the input file immediately after the file is opened, and then a second “read” statement is executed after each record is processed.

This technique produces a program structure like this:

open input file F
read a record from F     # get first

while not EndOfFile on F:
    process the record
    read the next record from F  # get next

close F

I won’t try to explain when or why the read-ahead technique is preferable to the standard loop. That’s out of scope for this blog entry, and a good book on JSP can explain that better than I can. So for now, let’s just say that there are some situations in which the standard loop is the right tool for the job, and there are other situations in which read-ahead is the right tool for the job.

One of the joys of Python is that Python makes it so easy to do “standard loop” processing on a sequence such as a list or a string.

for item in sequence:
    processItem(item)

There are times, however, when you have a sequence that you need to process with the read-ahead technique.

With Python generators, it is easy to do. Generators make it easy to convert a sequence into a kind of object that provides both a get next method and an end-of-file mark.  That kind of object can easily be processed using the read-ahead technique.

Suppose that we have a list of items (called listOfItems) and we wish to process it using the read-ahead technique.

First, we create the “read-ahead” generator:

def ReadAhead(sequence):
    for item in sequence:
        yield item
    yield None # return the "end of file mark" after the last item

Then we can write our code this way:

items = ReadAhead(listOfItems)
item = items.next()  # get first
while item:
    processItem(item)
    item = items.next()  # get next

Here is a simple example.

We have a string (called “line”) consisting of characters. Each line consists of zero or more indent characters, some text characters, and (optionally) a special SYMBOL character followed by some suffix characters. For those familiar with JSP, the input structure diagram looks like this.

line
    - indent
        * one indent char
    - text
        * one text char
    - possible suffix
        o no suffix
        o suffix
            - suffix SYMBOL
            - suffix
                - one suffix char

We want to parse the line into 3 groups: indent characters, text characters, and suffix characters.

indentCount = 0
textChars = []
suffixChars = []

# convert the line into a list of characters
# and feed the list to the ReadAhead generator
chars = ReadAhead(list(line))

c = chars.next() # get first

while c and c == INDENT_CHAR:
    # process indent characters
    indentCount += 1
    c = chars.next()

while c and c != SYMBOL:
    # process text characters
    textChars.append(c)
    c = chars.next()

if c and c == SYMBOL:
    c = chars.next() # read past the SYMBOL
    while c:
        # process suffix characters
        suffixChars.append(c)
        c = chars.next()

Python runs JSD

Of possible interest to those interested in systems analysis methodologies.

http://master.dl.sourceforge.net/project/pyjsd/MissGrantsControllerInJSD.pdf

A synopsis…

Miss Grant’s Controller: A JSD specification

For the purpose of writing computer software specifications, it is useful to view a computer software system as a software “machine” that transitions from state to state under the control of an input stream of events.

Traditionally, computer system specifications focus on the states and the transitions between the states: they view the system as a state machine and use state transition diagrams (STDs) to specify the behavior of the machine.

In contrast, Jackson System Development (JSD) specifications focus on the events and the sequence in which the events may occur. JSD views a software machine as a simulation in which model processes (coroutines running inside the system) simulate real-world processes. The model processes running in the machine are synchronized with their real-world counterparts by means of events – events sent from the real-world processes into the machine. JSD uses action structure diagrams to represent model processes.

I (and others, of course) believe that JSD-style specifications are a more useful tool than STDs and state machines for specifying many systems.

In this paper, I will present a small argument-by-example for JSD event-oriented specifications.

My example problem will be “Miss Grant’s controller”, which is based on an example problem from the introduction to Martin Fowler’s new (2010) book Domain-Specific Languages.
….

Since the 1980′s, JSD experts have had a vision of executable JSD specifications. They were frustrated by the fact that a JSD model process is a coroutine, but COBOL – the programming language in use in the business community where JSD was most popular – did not support coroutines.

That situation has changed in the last few years, with the increasing acceptance ofPython. Python, it turns out, is the ideal language for creating executable JSD specifications.

This is what Miss Grant’s controller looks like when the action structure diagram is translated into Python.

To run the model, we need to create some test data – a stream of sensor events – that we can feed to the controller. So here is the code for a Python driver program that creates a sequence of event objects and feeds them to the Python specification for Miss Grant’s controller.

Here is the output produced by a test run.