Python Style Decomposition

Why?

Why are CS106A programs structured the way the are - the many functions, the main(), the Doctests. These structures are not an accident. There is a reason it's all done a particular way, and here it is.

Software Engineering Crisis

Suppose you need to finish a 500 line coding project, or 1000 lines or 10,000 or 100,000 lines. How many hours do you suppose it will take to be finished?

As computers grew more powerful in the early days of computing, a disturbing pattern emerged. Anyone could finish a 500 line project, but evidently projects did scale up in size so nicely. A 10x larger project took more than 10x longer to complete. Very large projects might seemingly never be finished. The project might be abandoned or perhaps the users will just have to suffer along with the not-quite-done version. I think of this when I'm trying to complete a form online, and do something it doesn't like, and the system dumps me back at a blank form to re-enter the information. I pause at that moment to appreciate that somehow, a lot of software in the world is not really finished.

Software Costs - N²

As rough rule of thumb, the difficulty of completing a software project of N lines seems to be proportionate to N²

alt: n-squared costs per line

The N² curve gets extremely steep, piling incredible costs on the project as the number of lines goes up. Past some point, the project is practically impossible to complete. And yet you live in a world with many, enormous and pretty functional projects. how did CS escape the N² trap?

Divide and Conquer - Functions - Modularity

CS has this one trick to avoid the N² trap. Say we have this 500 line program to write. What are we going to do?

One Big main()

The naive approach would be to write all the code as one 500 line main() function. This is a terrible strategy, getting the worst of the N² curve - too much code all in one piece

alt:one big main()

Function Decomposition

The key step is not working on a 500 line project. Work on a series of 50 line functions - each nice and cheap - and bundle them together to make the 500 line. Or put another way, get left on the N² curve.

alt: program divided into separate functions

This is why functions are such a central part of every programming language. Programming is always done at this scale, trying to get left on the curve. Striving to keep parts of the program separate from each other is also known as the strategy of "modularity".

Functions Collaborate - Data Black Box

We can divide the program into functions, but how do the functions work with each other? With the black-box model, we use the input and output data for each function as its contact point with all the other functions.

alt: black box function: parameters in, return value out

Aside: thought it appears to be a detail, keeping a strict structure for data-in and data-out turns out to be important. If functions are allowed to access data used by other functions arbitrarily, in effect it blends the lines of those functions together. It's possible in Python and other languages to use "global variables" to move data between functions not just at call/return boundaries. We'll never do that in CS106A, concentrating on the proper function-call, black-box model that keeps the functions separated from each other.

Whole Program Picture - Data Pipeline

If the functions are all separate, how do they work with each other? The black box model helps here - we connect the output of one function to the input of the next. The functions are separate, but work together.

You can think of the whole program as a pipeline of functions taking in, say, a text file of data to start, each function returning some refined version of the data, until we get nice graphs or conclusions out at the end, which then get adoring tweets!

alt: pipeline of functions, take in data file to start, putting out conclusions at the end

The functions are all separate, and yet they are working together to solve the whole problem. Their input/output interactions are kept as narrow and simple as possible.

Black Box - Abstraction and Implementation

You are already familiar with the input/output framing of a function. Here we'll add on CS terms "abstraction" and "implementation". These ideas are crucial for computer systems, but you may find them handy for many parts of life.

alt: abstraction vs. implementation

Black Box 1 - Abstraction

The abstraction of a function is what it accomplishes - what it requires as input and what it will produce as output. We can think of this as the "contract" for the function: what is required to go in. What the function promises to provide. The abstraction contract is also basically what is written in the triple quoted """Pydoc""" at the top of a function.

alt: abstraction vs. implementation

Black Box 2 - Implementation

The implementation detail of a function is all the code and complexity within the function that does the actual work. The word "detail" is sometimes used as a blanket term for all the implementation features hidden inside a function.

Usually, the abstraction for a function is relatively simple compared to its implementation.

Key: Calling A Function - Abstraction

What do you need to know to call a function correctly? Just the abstraction. The implementation can be hidden inside the function. Our strategy is to hide "implementation detail" inside the function so the rest of the program does not need to know or depend on it. This is howe we fight the N² curve.

alt: implementation is hidden

Example of Decomposition, Abstraction, Implementation

1. function1()

Suppose you are writing a big program, and now it's time to work on function1():

def function1(s):
   """Given string s ..."""
   .. lots of detail ..
   .. in here ..

Work on function1. At this time, your mind is focussed on the function1 abstraction, and you are wrestling with the detail and bugs and whatnot of its implementation. Eventually you get it working perfectly.

2. function2()

Now it's time to work on function2() which calls function1() as a helper. Look at the key line below.

def function2(s):
    """..."""
    ....
    part = function1(s)  # the key line
    ...

What is your state of mind writing the key line? The abstraction of function1. Do not think about the implementation details of function1, though you were just working on it.

One Implementation in Mind At a Time

What the N² trap tells us, is that keeping all of the function implementations in mind at one time is hopeless. Here we only think about one implementation at a time. Once a function is done, we work only in terms of its abstraction.

The Power Of Not Knowing

We have sectioned off some of the program complexity inside function1. When it's time to call function1() we could think about how it is implemented. Instead, we embrace not knowing what's going on in there. Just call it, and it should meet its contract. Leverage the abstraction to only know what's needed as we go.

This is why Python and other languages have the """PyDoc""" documentation, allowing the contract to be written out and easily accessible on the fly, so programmers can access just the abstraction they need, not looking at the implementation details.

Abstraction in CS

This technique of using the abstraction, isolating complexity within units is the central trick by which all large computer systems are built. As an example, later in CS106A we will see function calls like the ones below

# Get list of filenames inside a directory
filenames = os.listdir('Downloads')

# Get the current date and time
now = datetime.now()

We know the abstraction of os.listdir() which is pretty simple - a list of string filenames from within a directory (folder). But the implementation .. who knows. The code within os.listdir() is quite complicated, accessing your operating system, perhaps accessing a USB key plugged in to your computer. Calling the function, we do not think about that. We assume the function does its job correctly and returns a list of valid filename strings, and we just carry on and use them. This is the way.

Ride To Airport Abstraction vs. Implementation

We use abstraction all the time in life - it's the natural way to get multiple people to coordinate on something.

Suppose you are getting a ride to the airport. What is the abstraction?

Ride to airport abstraction - a pretty short list

Pick up time and place.
Drop off time and place.
Ride shared with others y/n.

Ride to airport implementation details, don't care about:

Car has alloy wheels
Car Car has LED headlights
Color of the car seats
Is the driver wearing a hat
Car has enough fuel
You do care about this in a way, but it's covered by "drop off time/place"

The point: the abstraction is much simpler than the implementation. There is endless detail to the implementation that the abstraction does not need to care about. Calling a function is just the abstraction.

Mechanics: Pydoc, Doctests

So the abstration ideas translate to Python code designing a function. Choose a good function name, summarizing what it does. The parameters list its inputs. The """Pydoc""" at the top of a function summarizes its abstraction in words. What does it require as input? What does it promise to return as output? We generally use the word "given" in here to refer to the parameters, like "Given values x and y, returns something something."

You can delete the ":param s: " stuff PyCharm puts in. That syntax is seldom used at this time. You can summarize the abstraction with the "Given x ..." Pydoc.

The Doctests are another way to summarize the abstraction - not with words, but with a series of input/output examples. They also have the benefit of helping you debug your code.

def del_chars(s, target):
    """
    Given string s and a "target" string,
    return a version of s with all chars that
    appear in target removed, e.g. s 'abc'
    with target 'bx', returns 'ac'.
    (Not case sensitive)
    >>> del_chars('abC', 'acx')
    'b'
    >>> del_chars('ABc', 'aCx')
    'B'
    >>> del_chars('', 'a')
    ''
    """
    result = ''
    target = target.lower()
    for ch in s:
        if ch.lower() not in target:
            result += ch
    return result

Program - Big Picture Checklist

This is the backstory for every CS106A program
Divide the program into functions
Work on the helper function first
Focus on that function, testing it with Doctests
Test as you go
Move on to bigger functions
Call the helper, use its abstraction
Build up the whole program
Typically main() at the bottom, calling helpers
e.g. look at our crazycat example
Do not: write all the code first, THEN start testing
Pre doc-Test CS106A easy for students to do this
A disaster!
Doctests are just a small feature
But they enable the right strategy at the big picture level
That's why we love them