On Parallelism

Hi, I'm Tim Tyler - and today I will be discussing the state of modern computer architectures.

The design of most modern computers is appalling in at least one particular area - namely parallelism.

Most computers are still based on the Von Neumann architecture - which is essentially an implementation of a finite approximation of a Universal Turing machine.

These machines essentially execute a single stream of instructions in a deterministic fashion.

If you contrast this with how nature builds brains, the difference is dramatic: Nature uses massive fine-grained parallel machines for its main computing tasks. Theoretical analysis suggests that nature is doing things right - and that modern computers have it all wrong - parallel computers are intrinsically much more powerful on many common tasks - searching, sorting, classification, decision theory, playing games - and so on.

So, why have we got things wrong? why hasn't the problem been fixed? - and what can we do about it? To start with, why we have got things wrong:

Part of the answer is that parallel machines are intrinsically harder to program. Parallel computing introduces problems involving race conditions, indeterminism, coordination and I/O resource locks that do not crop up with a serial machine.
Another part of the answer is that parallel programs are a poor match for the linguistic parts of the human brain.: the human brain communicates most things using language - and spoken language is a serial phenomenon - because sound waves constitute a one-dimensional stream of information. So it is natural to tell machines what to do using a sequential stream of verbal or written instructions.
Another part of the answer is "lock in" created by existing serial software: much existing software is intrinsically difficult to parallelise - because it consists of a low-level sequential series of instructions. Frequently, such software does not benefit much from running on parallel hardware - and indeed it often runs more slowly there. So, this "backwards-compatibility" issue reduces the benefits of using a parallel machine - existing software works more slowly on a parallel machine than on a serial one of the same cost.
Lastly, part of the answer is that computers are primarily used to augment human intelligence. As such, they compensate for its deficits, and avoid competing with its strengths. The brain is already good at massively-parallel processing - so there is no need to do very much of that. Rather it is rapid sequential work that computers have specialised in - an area where humans are very weak.
However, now computers are not merely augmenting human intelligence, they are substuting for it. It seems reasonable to expect that this will eventually create demand for massively parallel hardware, capable of performing tasks similar to those humans perform.

The problem of lock in

The lack of parallel software reduces the interest in building parallel hardware. Also, the lack of parallel hardware means there is little motivation to create parallel software. Together these effects combine to produce a vicious circle that keeps computer science stuck in a primitive, backwards serial world.

This "lock in" created by existing serial software can be visualised on a graph:

Serial Hill is on the left - representing the current state of affairs - and Parallel Mountain is on the right - which is where we would like to be. The relatively poor performance that much existing software exhibits on parallel machines means that the cost of additional processing units is largely wasted - and so there is little motivation to cross the saddle between them.

The way to fix things

What needs to be done is really pretty obvious. Rather than telling machines what to do in a sequential programming language, and have them compile that to a sequential stream of machine-code instructions, programmers should be able to specify what they want the machine to do in very high level language, and have the machine translate that into circuitry.

Letting computer programs write software for you will ultimately be a compelling solution to the problem of how to write parallel programs when the human brain is so wired up to think in a serial fashion. The human would write their software in an appropriate high-level language, and then let the compiler do all the work of parallelising it.

For example, instead of spelling out a serial algorithm specifying how you want a list sorted, you would simply tell the computer what kind of a list you have - and how you want it sorted - and then let the compiler transform your specification into a circuit that performs the task for you.

The higher level the language you use, the better the compiler could do at parallelizing it, and the less programming work humans would have to do. However, the more complicated and difficult it becomes to create the compiler.

The expected performance boost this kind of strategy would provide at a given level of hardware technology would clearly be enormous.

Existing parallel computing machines

There are existing massively-parallel computing machines. The field of "Programmable logic" has produced what are known as Field-Programmable-Gate-Arrays - which are essentially a form of massively-parallel hardware. These are used mainly in industrial settings for rapid prototyping of circuitry.

The field combining conventional microprocessors with programmable logic is currently known as "reconfigurable computing". Manufacturers in this area are currently targeting various high-performance applications.

For programmable logic to go mainstream, several things have to happen. Parallel hardware has to become widely avaialable enough for people to start writing parallel programs in reasonable numbers - and programming languages that properly support parallelism need to become more widespread.

I am not talking about threads here - threads are usually a heavy-weight form of parallelism that preserves the concept of an instruction stream.

Also, the hardware engineers need to let go of the concept of a deterministic synchronous system. Determinism and global synchrony are practical in small systems - but become a burden in larger ones. To maintain deterministic global synchrony, a system can only switch as fast as its slowest component. The correct thing to do is to run as quickly as you can, and allow software to deal with the issues associated with maintaining synchrony, and producing reliability. That way programmers retain the option of choosing which tradeoff between performance and reliability they prefer.

The nature of parallel machines

Parallel hardware can vary in several dimensions which do not crop up much when considering serial hardware:

One is configurability: a measure of a system's flexibility - how reprogrammable it is. A conventional CPU represents one end of this spectrum - it specialises in doing one thing well and its architecture is pretty fixed. A Field-Programmable-Gate-Array lies at the other end of the spectrum - being extremely programmable. However, there's a range of intermediate systems one can imagine which vary in how general-purpose they are.

Another is memory density: sometimes you need a little processing power to accompany a lot of memory - whereas sometimes you need a lot of concentrated processing power and very little memory. In the latter case, you may need more power and a good heat sink.

Then there is the issue of ease of reconfiguration. Is it necessary to be able to rapidly create new circuit configurations? If not, reprogramming need not be fast - and that can produce performance gains in the operation of the rest of the chip.

Since hardware represented by different points in the resulting multi-dimensional space are best suited for different applications it would be nice to have a range of different configurations present in each machine. However, you would then need sophisticated software to manage the heterogeneous hardware and correctly exploit its strengths.

What can be done?

Parallelism potentially represents a substantial source of improvement in computing performance - for a given level of hardware capability.

However, only very slow progress towards parallelism seems to be being made - and it would be good to get things to move forwards a bit faster.

What can be done?

A roadmap seems like one thing that would help - develop a concrete plan that lays out the road to attaining decent parallell computing machines.
Stop wasting resources on dead-end paths towards parallel machines. Much effort that goes into producing parallel machines is expended on projects that do not really lead towards the target - because they are dead ends. As an example of that, take the concept of Very Long Instruction Words. That is part of a long tradition of trying to prop up and improve existing serial architectures by parallelising them using techniques such as critical path analysis. It has little or nothing to do with the real road that actually leads towards parallel machines.
Identify the areas where positive action can be taken, and apply force at those points. The development of computing hardware is a bit like a large juggernaut, which has a lot of momentum, and is difficult for humans to control. However, as with many such systems, there will be areas where relatively little effort can have a significant positive effect. Smart RAM in graphics cards may be one such area. Customised high-performance systems may be another. Embedded systems which perform specialised tasks which need parallelism may be another.
Actively take steps to simplify the creation of parallel software. The gap between high level languages and hardware description languages needs to be bridged. High level language authors should not just ignore issues to do with parallel performance - since some day their code may be executing on parallel hardware.
Lastly: raise consciousness about the problem. Not everyone in the industry is properly aware of the scale of the problem, and the factors preventing its solution.

References

Reconfigurable computing;

Tim Tyler | Contact | http://alife.co.uk/