These machines essentially execute a single stream of instructions
in a deterministic fashion.
If you contrast this with how nature builds brains, the difference is
dramatic: Nature uses massive fine-grained parallel machines for its main
computing tasks. Theoretical analysis suggests that nature is doing things
right - and that modern computers have it all wrong - parallel computers are
intrinsically much more powerful on many common tasks - searching,
sorting, classification, decision theory, playing games - and so on.
So, why have we got things wrong? why hasn't the problem been fixed? - and
what can we do about it? To start with, why we have got things wrong:
Part of the answer is that parallel machines are intrinsically harder to
program. Parallel computing introduces problems involving race conditions,
indeterminism, coordination and I/O resource locks that do not crop up
with a serial machine.
Another part of the answer is that parallel programs are a poor match for
the linguistic parts of the human brain.: the human brain communicates most
things using language - and spoken language is a serial phenomenon - because
sound waves constitute a one-dimensional stream of information. So it is
natural to tell machines what to do using a sequential stream of verbal or
written instructions.
Another part of the answer is "lock in" created by existing serial
software: much existing software is intrinsically difficult to parallelise -
because it consists of a low-level sequential series of instructions.
Frequently, such software does not benefit much from running on parallel
hardware - and indeed it often runs more slowly there. So, this
"backwards-compatibility" issue reduces the benefits of using a parallel
machine - existing software works more slowly on a parallel machine than on a
serial one of the same cost.
Lastly, part of the answer is that computers are primarily used to augment
human intelligence. As such, they compensate for its deficits, and avoid
competing with its strengths. The brain is already good at
massively-parallel processing - so there is no need to do very much of that.
Rather it is rapid sequential work that computers have specialised in - an
area where humans are very weak.
However, now computers are not merely augmenting human intelligence,
they are substuting for it. It seems reasonable to expect that this
will eventually create demand for massively parallel hardware, capable of
performing tasks similar to those humans perform.
The problem of lock in
The lack of parallel software reduces the interest in building parallel hardware.
Also, the lack of parallel hardware means there is little motivation to create
parallel software. Together these effects combine to produce a vicious circle
that keeps computer science stuck in a primitive, backwards serial world.
This "lock in" created by existing serial software can be visualised on a graph:
Serial Hill is on the left - representing the current state of affairs -
and Parallel Mountain is on the right - which is where we would like to
be. The relatively poor performance that much existing software exhibits on
parallel machines means that the cost of additional processing units is
largely wasted - and so there is little motivation to cross the saddle between
them.
The way to fix things
What needs to be done is really pretty obvious. Rather than telling machines
what to do in a sequential programming language, and have them compile that to
a sequential stream of machine-code instructions, programmers should be able
to specify what they want the machine to do in very high
level language, and have the machine translate that into circuitry.
Letting computer programs write software for you will ultimately be a
compelling solution to the problem of how to write parallel programs when the
human brain is so wired up to think in a serial fashion. The human would write
their software in an appropriate high-level language, and then let the
compiler do all the work of parallelising it.
For example, instead of spelling out a serial algorithm specifying how you
want a list sorted, you would simply tell the computer what kind of a list you
have - and how you want it sorted - and then let the compiler
transform your specification into a circuit that performs the task for you.
The higher level the language you use, the better the compiler could do at
parallelizing it, and the less programming work humans would have to do.
However, the more complicated and difficult it becomes to create the
compiler.
The expected performance boost this kind of strategy would provide at a given
level of hardware technology would clearly be enormous.
Existing parallel computing machines
There are existing massively-parallel computing machines. The field
of "Programmable logic" has produced what are known as
Field-Programmable-Gate-Arrays - which are essentially a form of
massively-parallel hardware. These are used mainly in industrial settings
for rapid prototyping of circuitry.
The field combining conventional microprocessors with programmable logic
is currently known as "reconfigurable computing". Manufacturers in this
area are currently targeting various high-performance applications.
For programmable logic to go mainstream, several things have to happen.
Parallel hardware has to become widely avaialable enough for people to start
writing parallel programs in reasonable numbers - and programming languages
that properly support parallelism need to become more widespread.
I am not talking about threads here - threads are usually a heavy-weight
form of parallelism that preserves the concept of an instruction stream.
Also, the hardware engineers need to let go of the concept of a deterministic
synchronous system. Determinism and global synchrony are practical in small
systems - but become a burden in larger ones. To maintain deterministic global
synchrony, a system can only switch as fast as its slowest component. The
correct thing to do is to run as quickly as you can, and allow
software to deal with the issues associated with maintaining synchrony,
and producing reliability. That way programmers retain the option of choosing
which tradeoff between performance and reliability they prefer.
The nature of parallel machines
Parallel hardware can vary in several dimensions which do not crop up
much when considering serial hardware:
One is configurability: a measure of a system's flexibility - how
reprogrammable it is. A conventional CPU represents one end of this spectrum -
it specialises in doing one thing well and its architecture is pretty fixed. A
Field-Programmable-Gate-Array lies at the other end of the spectrum -
being extremely programmable. However, there's a range of
intermediate systems one can imagine which vary in how general-purpose they
are.
Another is memory density: sometimes you need a
little processing power to accompany a lot of memory - whereas
sometimes you need a lot of concentrated processing power
and very little memory. In the latter case, you may need more power
and a good heat sink.
Then there is the issue of ease of reconfiguration. Is it necessary
to be able to rapidly create new circuit configurations? If not, reprogramming
need not be fast - and that can produce performance gains in the operation of
the rest of the chip.
Since hardware represented by different points in the resulting
multi-dimensional space are best suited for different applications it would be
nice to have a range of different configurations present in each machine.
However, you would then need sophisticated software to manage the heterogeneous hardware and
correctly exploit its strengths.
What can be done?
Parallelism potentially represents a substantial source of improvement in
computing performance - for a given level of hardware capability.
However, only very slow progress towards parallelism seems to be
being made - and it would be good to get things to move forwards a bit
faster.
What can be done?
A roadmap seems like one thing that would help - develop a concrete
plan that lays out the road to attaining decent parallell computing
machines.
Stop wasting resources on dead-end paths towards parallel machines. Much
effort that goes into producing parallel machines is expended on projects that
do not really lead towards the target - because they are dead ends. As an
example of that, take the concept of Very Long Instruction Words.
That is part of a long tradition of trying to prop up and improve existing
serial architectures by parallelising them using techniques such as critical
path analysis. It has little or nothing to do with the real road
that actually leads towards parallel machines.
Identify the areas where positive action can be taken, and apply force at
those points. The development of computing hardware is a bit like a large
juggernaut, which has a lot of momentum, and is difficult for humans to
control. However, as with many such systems, there will be areas where
relatively little effort can have a significant positive effect. Smart RAM in
graphics cards may be one such area. Customised high-performance systems may
be another. Embedded systems which perform specialised tasks which need
parallelism may be another.
Actively take steps to simplify the creation of parallel software. The gap
between high level languages and hardware description languages needs to be
bridged. High level language authors should not just ignore issues to do with
parallel performance - since some day their code may be executing on parallel
hardware.
Lastly: raise consciousness about the problem. Not everyone in the
industry is properly aware of the scale of the problem, and the factors
preventing its solution.