The Architect´s Napkin

Software Architecture on the Back of a Napkin
posts - 69 , comments - 229 , trackbacks - 0

My Links

News

Archives

In need of more abstraction

The ultimate product of software development is this: CPU executable binary code.

image

Decades ago we used to “write” this more or less directly into memory. But that was very tedious and error prone. Code was hard to reason about, hard to change.

Abstractions in code

So we looked for ways to make coding easier. Enter a higher level of abstraction: Assembler.

image

By representing machine code instructions as text and throwing in macros productivity increased. It was easier to read programs, easier to think them up in the first place, and they were quicker to write with less errors.

According to Jack W. Reeves Assembler source code was a design of the ultimate code which got built by an automatic transformation step.

Soon, though, software was deemed hard to write even with Assembler. So many details needed to be taken care of again and again, why not hide that gain behind some abstractions?

That was when 3GL languages were invented like Fortran or Algol, later C and Pascal etc. But I want to paint this evolution differently. Because from today’s point of view the next level of abstraction on top of Assembler is not a 3GL like Java or C#, but an intermediate language like Java Byte Code (JBC) or the .NET Intermediate Language (IL).[1]

image

Solving the problem of overwhelming details of concrete hardware machines was accomplished by putting a software machine on top of it, a virtual machine (VM) with its own machine code and Assembler language.

Where Assembler provided symbolic instructions and names and macros to abstract from bits and bytes, VMs for example provided easier memory management. Not having to deal with CPU registers or memory management anymore made programming a lot easier.

Now IL Assembler source code was a design for IL byte code, which was source code for machine code Assembler, which was source code for the ultimate machine code. Ok, not really, but in principle.

IL made things simpler - but not simple enough. Programming still left much to be desired in terms of readable source code and productivity. Partly the solution to that were libraries. But also another level of abstraction was needed. Enter 3GLs with all there control structures and syntactic sugar for memory management and sub-program access.

image

That’s where we are today. Source code written in Java or C# is the design for some IL Assembler, which is the design for IL VM byte code, which is the design for machine code Assembler, which is the design for the ultimate machine code. OK, not really, but in principle.[2]

Abstraction beyond code

We like to think of 3GL source code as the design for the executable machine code. As it turned out, though, yesterday’s design, yesterday’s source code became today’s target code.

Yesterday’s abstractions became today’s details. Nobody wants to reason about software on the level of abstraction of any Assembler language. That’s why Flow Charts and Nassi-Shneiderman diagrams were invented.

And what was pseudo-code once, is now a real programming language.

Taking this evolution as a whole into view it begs the question: What’s next?

There is a pattern so far. As many levels of abstractions as have been put onto each other there is one aspect that hasn’t changed. All those languages - Assembler, IL, 3GL - are all about control flow.

Mainstream reasoning about software hasn’t changed. Today as in the 1950s it’s about algorithms. It’s about putting together logic statements to create behavior.

So how can this be extended? What’s our current “pseudo-code” about to be turned into source code of some future IDE?

My impression is: It’s over.

Control flow thinking, the imperative style of programming is at its limit.

There won’t be another level of abstraction in the same vain. I mean language-wise. The number of frameworks to be glued together to form applications will increase. There will be more levels of abstractions.

But to actually design behavior, we will need to switch to another paradigm.

Accessing data has become hugely more productive by the introduction of declarative programming languages like SQL (and modern derivatives like Linq) or Regular Expressions.

So my guess is, we need to go more in that direction. Programming has to become more declarative. We have to stave off imperative details as long as possible.

Functional Programming (FP) seems to be hinting in that direction. Recursion is a declarative solution compared to loops. Also simple data flows as f |> g in F# have declarative power because they leave open whether control flows along with data. f could (in theory) still be active while g already works on some output from f.

Still, though, even with FP there is one question unanswered: How do you think about code?

Is there a way for us to express solutions without encoding them as heaps of texts right away? Is there a way to communicate solutions without and before actually programming them? Can we describe software behavior in a systematic way on a next level of abstraction - and then systematically translate this description into Groovy or Haskell?

Object-orientation (OO) has given us more ways to describe data structures than most developers know. Think of all the relationship types defined in UML.

But software firstly is not about data structures, it’s about functionality, about behavior, about activities. How can that be described, planned, designed above today’s source code, even FP source code?

Because if Assembler, the code design of the 1950s, nowadays is just the output of a compiler translating today’s 3GL source code design… then what kind of design can be translated into today’s source code as a target?

Model-Driven Software Development (MDSD) seems to be trying to answer this question. But despite all efforts it has not been widely adopted. My guess is, that’s because the design of a modelling language is even harder than the design of a decent framework. Not many developers can do that. Also, not many domains lend themselves to this. And it’s not worth the effort in many cases.

But still, MDSD has gotten something right, I guess. Because what I’ve seen of it so far mostly is about declarative languages.

So the question seems to be: What’s a general purpose way to describe software behavior in a declarative manner?

Only by answering this question we’ll be able to enter a next level of abstraction in programming - even if that currently only means to enable more systematic designs before 3GL code and without automatic translation.

We have done that before. That’s how we started with object-orientation or querying data. First there was a model, a way of thinking, the abstraction. Then, later, there was a tool to translate abstract descriptions (designs) into machine code.

The above images all show the same code.[3] The same solution on different levels of abstraction.

However, can you imagine the solution on yet another level of abstraction above the 3GL/C# source code?

That’s what I’m talking about. Programming should not begin with source code. It should begin with thinking. Thinking in terms of models, i.e. even more abstract descriptions of solutions than source code.

As long as we’re lacking a systematic way of designing behavior before 3GL source code - be it OO or FP - we’ll be suffering from limited productivity. Like programmers suffered from limited productivity in the 1950s or 1990s before the invention of Assembler, IL, 3GLs.

And what’s the next level of abstraction?

In my view it’s data flow orientation. We have to say goodbye to control flow and embrace data flows. Control flow will always have its place. But it’s for fleshing out details of behavior. The big picture of software behavior has to be painted in a declarative manner.

Switching from OO languages to FP languages won’t help, though. Both are limited by textual representation. They are great means to encode data flows. But they are cumbersome to think in. Nobody wants to design software in machine code or byte code. Nobody wants to even do it in Assembler. And why stop with 3GLs?

No, think visually. Think in two or three or even more dimensions.

And once we’ve designed a solution in that “space”, we can translate it into lesser textual abstractions - which then will look differently.

This solution

image

surely wasn’t translated from a design on a higher level of abstraction. How the problem “wrap long lines” is approached conceptually is not readily understandable. Even if there were automatic tests to be seen they would not explain the solution. Tests just check for certain behavior.

So, as an exercise, can you imagine a solution to the problem “Word Wrap Kata” on a higher level of abstraction? Can you depict how the expected behavior could be produced? In a declarative manner?

That’s what I mean. To that level of discussion about software we have to collectively rise.

 

PS: Ok, even though I did not want to elaborate on how I think designing with data flows can work – you find more information for example in my blog series on “OOP as if you meant it” –, I guess I should at least give you a glimpse of it.

So this is a flow design for the above word wrapping problem:

 

image

This shows in a declarative manner, how I envision a process for “producing” the desired behavior. The top level/root of the hierarchical flow represents the function in question. The lower level depicts the “production process” to transform a text:

  • First split the text into words,
  • then split words longer than the max line length up into “syllables” (slices).
  • Slices then are put together to form the new lines of the given max length.
  • Finally all those lines are combined into the new text.

This sounds like control flow – but that’s only due to the simplicity of the problem. With slight changes the flow design could be made async, though. Then control would not flow along with the data anymore.

The data flow tells a story of what needs to be done, not how it exactly should happen. Refinement of a flow design stops when each leaf node seems to be easy enough to be written down in imperative source code.

Here’s a translation of the flow design into C# source code:

 

image

You see, the design is retained. The solution idea is clearly visible in code. The purpose of Wrap() is truly single: it just integrates functions into a flow. The solution can be read from top to bottom like the above bullet point list. The code is “visually honest”.

Such a flow design can be easily drawn on a flipchart. With it I can communicate my idea of how to create a certain behavior quickly to my fellow team members. It’s easy to translate into code. And since it does not contain imperative logic, it leads to very clean code, too. Logic is confined to functionally independent leafs in the decomposition tree of the flow design. Read more about this in my blog series “The Incremental Architect’s Napkin”.


  1. I even remember P-Code already used in the 1970s as the target of Pascal compilers.

  2. Of course, this is not what happens with all 3GL languages. Some are interpreted, some are compiled to real machine code. Still, this is the level of abstraction we’ve reached in general.

  3. Well, to be honest, the first image is just some arbitrary binary code. I couldn’t figure out how to get it for the Assembler code in the second image.

Print | posted on Wednesday, December 31, 2014 12:39 PM | Filed Under [ Thinking outside of the box Software modelling ]

Feedback

Gravatar

# Thanks

Hi Ralf.
I just want to say thank you for a great blog. I am very impressed by your articles about messaging, informed TDD and architecture. I am especially exited about data flow and the messaging as a programming model. You have really opened my eyes for a better way of designing and coding and I will definitely try and use this paradigm in my own coding.

Thanks again and please keep on writing,
Anders
1/5/2015 12:50 PM | Anders Baumann
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

Powered by: