Planning_A_Computer_System.pdf part of IBM Planning A Computer System Planning_A_Computer_System.pdf, IBM 7030 Planning_A_Computer

Text preview for : Planning_A_Computer_System.pdf part of IBM Planning A Computer System IBM 7030 Planning_A_Computer_System.pdf

Back to : Planning_A_Computer_Syste | Home

March 17, 2003

The following document is
"Planning a Computer System - Project Stretch"
edited by
Werner Buchholz
Systems Consultant
Corporate Staff, Research and Engineering
Internatinal Business Machines Corporation
published by
McGraw-Hill Book Company
New York, ... 1962

Copyright status
--------------------------------------------
----- Original Message -----
From: Plikerd, Scott
To: '[email protected]'
Sent: Friday, February 28, 2003 12:02 PM
Subject: (c) owner of Buchholz/PLANNING A COMPUTER SYSTEM

Dear Mr. Thelen:

According to our records, the copyright registration for above-referenced
title published in 1962, was not renewed with the Copyright Office at the
Library of Congress. Because this title was published before 1964, it did
not receive an automatic renewal and appears to have fallen into the public
domain. It is possible that IBM or even the author renewed this title in
1990, when it came up for renewal, but McGraw-Hill did not. To be
absolutely sure, you will have to check with the Copyright Office to see if
the copyright registration was renewed.

Regards,

Scott W. Plikerd
Manager
Permissions Department
McGraw-Hill Education
Two Penn Plaza, 9th Floor
New York, NY 10121-2298
(212) 904-2614 (phone)
(212) 904-6285 (fax)
-------------------------------------------

Editor's permission
--------------------------------------------
----- Original Message -----
From: "Werner Buchholz"
To: "Ed Thelen"
Cc: "Williams, Mike" ; "Spicer, Dag"

Sent: Wednesday, March 12, 2003 5:33 AM
Subject: Re: your book "Planning a Computer System - Project Stretch"

> At 03:43 AM 3/12/2003 -0800, Ed Thelen wrote:
> >I presume your book is now "in the public domain". However, I think it
> >proper to ask your permission
> >to place a representation of your book on my web site.
>
> I certainly have no objection.
>
> Werner Buchholz

The book was kindly loaned by
The Computer history Museum
1401 Shoreline Blvd.
Mountain View, California
and scanned by
Ed Thelen [email protected]
--------------------------------------------
--------------------------------------------
Chapter 1
PROJECT STRETCH
by W. Buchholz

The computer that is discussed in this book was developed by the
International Business Machines Corporation a t Poughkeepsie, N.Y .,
under Project Stretch. The project started toward the end of 1954.
By then IBM was producing several stored-program digital computers :
the IBM 650, a medium-sized computer; the IBhf 704, a large-scale
computer primarily for scientific applications; and the IBM 705, a large-
scale computer primarily for business data processing. The 704 and 705
had already superseded the 701 and 702, which were IBM's first com-
mercial entries into the large-computer field. Since the entire field was
still new, there had been little experience on which to base the design of
these machines, but by 1954 such experience was building up rapidly.
This experience showed that the early computers were basically sound
and eminently usable, but it was also obvious that many of the early
decisions would have been made quite differently in 1854 and that many
improvements had become possible.
At the same time, solid-state components were rapidly being developed
to the point where it appeared practical to produce computers entirely
out of transistors and diodes, together with magnetic core memories. A
computer made only of solid-state components promised to surpass its
vacuum-tube predecessors with higher reliability, lower power consump-
tion, smaller size, lower cost made possible by automatic assembly, and
eventually greater speed. The imminrncc of new technology, together
with the knowledge of shortcomings in existing designs, gave impetus to
a new computer project.
I n 1955 the project was directed more specifically toward achieving,
on very large mathematical computing problems, the highest perform-
ance possible within certain limits of time and resources. If mostly
on-the-shelf components were used, a factor-of-10 improvement over the
IBM 704, the fastest computer then in production, appeared feasible.
Although this level of improvement would have been a respectable
1
2 [eH.\P. 1
arhievement. it was rejected as not being a large eiiougli step. Instead,
an over-all performance of 100 times that of the 704 was set as the target.
The purpose of setting so ambitious a goal was to stimulate innovation
in all aspects of computer design. The technology available in 1955 mas
dearly not adequate for the task. New transistors, new cores, new logi-
cal features, and new manufacturing techniques were needed, which.
although they did not yet exist, were known to be a t least physically
possible. Even though the goal might not be reached in all respects, the
resultant machine would set a new standard of performance and make
available the best technology that could be achieved by straining the
technical resources of the laboratory. Hence the name Project Stwtch.
The need for a computer of the power envisioned was clear. A num-
ber of organizations in the country had many important computing prob-
lems for which the fastest existing computers were completely inadequate,
and some had other problems for which even the projected computer of
100 times the speed of the existing ones would not be enough. Xegoti-
ations with such organizations resulted in a contract with the U.S. Atomic
Energy Commission in late 1956 to build a Stretch system for the Los
Alamos Scientific Laboratory.
The early design objectives were described in 1956l in terms of certain
technological and organizational goals:
l'wformance
.Zn over-all performance level of 100 times that of the fastest machines
then in existence was the general objective. (It has since become evi-
dent that speed comparisons of widely different machines are very diffi-
cult t o make, so that it is hard to ascertain how well this target has been
achieved. Using the IBM 704 as the reference point, and assuming
problems that can easily be fitted to the shorter word size, the smaller
memory, and the more limited repertoire of the 704, the speed ratio for
the computer actually built falls below the target of 100. On the other
hand, for large problems which strain the facilities of the 704 in one or
more ways, the ratio may exceed 100.)
Reliability
Solid-state components promised the much higher reliability needed
for satisfactory operation of a necessarily complex machine.
Checking
Extensive automatic checking facilities were intended to detect any
errors that occurred and to locate faults within narrow limits. Storage
devices were also to be equipped with error-correction facilities to ensure
l S. W. Dunwell, Design Objectives for the IBM Stretch Computer, Proc. Eastern
Joint Computer Conf., December, 1956, pp. 20-22.
CHAP.I] STKETCH 3
PROJECT
that datu could be recovered in spite of an occasional wror. The pur-
pose was again to increase performance by rpducing the rerun time often
needed in unchecked computers.
Generalit?]
To broaden the area of application of the system and to increase the
cffrrtireness of the system on secondary but time-consuming portions
of any single job, it was felt desirable to include in one system the best
features of scientific, data-processing, and real-time control computers.
Furthermore, the input-oiitpiit controls were t o be sufficiently general to
permit considerable future expansion and attachment of new input-output
devices.
High-speed 4 rithmetic
h high-speed parallel arithmetic unit was to execute floating-point
additions in 0.8 microsecond and multiplications in 1.4 microseconds.
(The actual speeds are not as high, see Chap. 14.) This unit would not
he responsible for instruction preparation, indexing, and operand fetch-
ing, which were to be carried out by other sections of the system whose
operation mould overlap the arithmetic.
ICditing
A separate serial computer unit with independent instruction sequen-
cing was visualized to edit input and output data of variable length in a
highly flexible manner. (It was later found desirable to combine the
serial and parallel units to a greater degree, so that they are no longer
independent, but the functional capability of both units mas retainrd.)

The main memory was to have a cycle time of only 2 microseconds.
(All but the early production memories will indeed be capable of work-
ing a t 2.0 fisec, but computer timing dictates a slightly longer cycle of
2.1 psec.) The capacity was to be 8,192 (later raised to 16,384) words
per unit. I
Input-Output Ezchangr
h unit resembling somewhat a telephone exchange was to provide
simultaneous operation of all kinds of input-output, storage, and data-
transmission devices.

A second set of faster, though smaller, memory units was also postulated, but it
was later omitted because the larger units were found t o give about the same over-all
performance with a greater capacity per unit cost. These units are still used, however,
to satisfy more specialized requirements of the 7051 Procmsing Unit described in
Chap. 17.
4 PROJECT
STRETCH [CHAP.
1

Magnetic disk units were to be used for external storage to supplement
the internal memory. The target was a capacity of 1 (later raised to 2 )
million words with a transfer rate of 250,000 (later lowered to 125,000)
words per second. These disk units permit a very high data flow rate
(even at the lower figure) on problems for which data cannot be con-
tained in memory.
As the understanding of the task deepened, this tentative plan was
modified in many ways. The functional characteristics of the actual
computer were developed in the years 1956 to 1958. This planning
phase, which is likened in Chap. 2 to the work of an architect planning
a building, culminated in a detailed programmer's manual late in 1958.
During the same period the basic technology was also established. A
number of changes were subsequently made as design and construction
progressed, but the basic plan remained as in 1958.
The Stretch computer is now called the IBM 7030. It was delivered to
LOSAlamos in April, 1961. Several other 7030 systems were under con-
struction in 1961 for delivery to other organizations with a need for very
large computers. Wc shall leave it to others to judge, on the hasis of
subsequent operating experience, how close the computer comes to satis-
fying the original objectives of Project Stretch.
Chapter 2
ARCHITECTURAL PHILOSOPHY
by F. P. Brooks, Jr

Computer architecture, like other architecture, is the art of' determin-
ing the needs of the user of a structure and then designing to meet those
needs as effectively as possible within economic and technological con-
straints. Architecture must include engineering considerations, so that
the design will be economical and feasible; but the emphasis in architec-
ture is upon the needs of the user, whereas in engineering the emphasis is
upon the needs of the fabricator. This chapter describes the principles
that guided the architectural phase of Project Stretch and the rationale
of some of the features of the I R M 7030 computer which emerged.
2.1. The Two Objectives of Project Stretch
High Performance
The objective of obtaining a major increase in over-all performance
over previous computers had a triple motiv,A t`
. ion.
1. There were some real-time tasks with deadlines so short that they
demanded very high performance.
2. There were a number of very important problems too large to be
tackled on existing computers. In principle, any general-purpose com-
puter can do any programmable problem, given enough time. In prac-
tice, however, a problem can require so much time for solution that the
program may never be "debugged" because of machine malfunctions and
limited human patience. Moreover, problem parameters may change,
or a problem may cease to be of interest while it is running.
3. Cost considerations formed another motivation for high perform-
ance. It has been observed that, for any given technology, performance
generally increases faster than cost. A very important corollary is that,
for a fully utilized computer, the cost per unit of computation declines
with increasing performance. It appeared that the Stretch computer
would show accordingly an improved performance-to-cost ratio over
3
6 I'HILOSOPHY
AHCHITECTURAL ICH.4P. 2
carlier computers. It, appeared, further, that some cornputter Iisers did
indeed have sufficient work to occupy fully an instrument of t,he pro-
posed power and could, therefore, obtain economic advantage by using
R Stretch computer.

Generality
In addition to being fast, the Stretch computer was to be truly a
general-purpose computer, readily applicable to scientific computing,
business data processing, and various large information-processing tasks
encountered by the militaiy. In 1955 and 1956, when the general objec-
tives of Project Stretch wcre set, it was apparent that there existed a few
applications for a very-high-performance computer in each of these areas.
There is no question that the new computer could have been made atl
least twice as fast,, with perhaps no more hardware, if it had been special-
ized for performing a very few specific computing algorithms. This
possibility was rejected in favor of a general-purpose computer for four
reasons, each of which w-ould have sufficed :
1. S o prospective user had all his work confined to so few programs,
nor could any user be sure that his needs would not change significantly
during the life of the machine.
2 . I a computer were designed to perform well on the entire class of
f
problems encountered by any one user, the shift in balance required to
make it readily applicable to other users would be quite small.
3. Since there exist,ed only R few applications in each specialized area
and since the development costs of a computer of very high performance
are several times the fabrication costs, each user would in fact be acquir-
ing a general-purpose computer (containing some hardware he did not
especially need) more cheaply than he could have acquired a. machinc
more precisely specialized for his needs.
4. Since there are real limitations on the skilled manpower and other
facilities available for development efforts, it would not have been possi-
ble to develop several substantially different machines of this performance
class a t once, whereas it was possible to meet a variety of needs for very-
high-performance computers with a single machine.
In sum, then, Project Stretch was to result in a very-high-performance,
general-piirpose information-processing svstem.
2.2. Resources
h sharp increase in computer performance does not spring solely from
It appeared
n strong justification for it ; new technology is indispensable.
that expected technological advances would permit the design to be based
I M . C. Sangren, Role of Digital Computers in Kurlear Design, A`ucl~ontcs,
vel. 15,
no. 5 , pp. 56-60, May, 1957.
Ssc. 2.31 GUIDINGPRINCIPLES 7
iipon new cor(' memories with a 2-microsecond cycle time, new transistor
circuits with delays of 10 to 20 nanoseconds (billionths of a second) per
stage, and corrmponding new packaging techniques. The new transistor
technology offered not only high speeds but a new standard of reliability,
which made it not unreasonable to contemplate a machine with hundreds
of thousands of components.
In order to complete the computer within the desired t:mc span, it was
decided to accept the risks that would be iiivolved in ( 1 ) developing the
technology and ( 2 ) designing the machine simultaneously.
The new circuits would be only ten to twenty times as fast as those of
the 704, and the new memories would be only six times as fast. Obvi-
ously, a new system organization was required if t,here was to be a major
increase in performance. It was clear that the slow memory speed would
be the principal concern in system design and the principal limitation on
performance. This fact influenced many decisions, among them the
selection of a long memory word, and prompted the devotion of con-
siderable effort to maximizing the use of each instruction bit.
Project Stretch benefited greatly from practical experience gained with
the first generation of large-scale electronic computers, such as the IBM
700 series. Decisions made in the design of these earlier computers had
necessarily been made without experience in the use of such machines.
A t the beginning of Project Stretch the design features of earlier machines
were reviewed in the light of subsequent experience. It should not be
surprising that a number of features were found inadequate: some con-
siderations had increased in significance, others had diminished. Thus
it was decided not to constrain Stretch to be program-compatible with
earlier computers or to follow any existing plan. ., completely fresh
1
start meant extra architectural effort, hut this freedom permitted many
improvements in system organization.
A wealth of intensive cxperience in the application of existing com-
puters was made available by the initial customers for Stretch computers.
From these groups came ideas, insight, counsel, and often, because the
groups had quite diverse applications, conflicting pressures. The diver-
sity of these pressures was itself no small boon, for it helped ensure adher-
ence to the objective of general applicability.
2.3. Guiding Principles
The universal adoption of several guiding principles helped ensure the
conceptual integrity of a plan whose many detailed decisions were made
by many contributors.
Over-all Optimization
The objective of economic efficiency was understood to imply mini-
mizing the cost of answers, not just the cost of hardware. This meant
8 .~RCHITECTIJRAL PHILOSOPHY [CHAP.2
repeated consideration of the costs associated with programming, compi-
lation, debugging, and maintenance, as e ell a s the obvious cost of machine
time for production computation. A consequent objective was to make
programming easier-not necessarily for trivial problems, but for prob-
lems worthy of the computer, problems whose coding in machine language
would usually be generated automatically by a compiler from statements
in the user's language.
A corollary of this principle was the recognition that complex tasks
always entail a price in information (and therefore money) and that this
price is minimized by selecting the proper form of payment-sometimes
r.xtra hardware, somet,imcs extra instruction executions, and sometimes
harder thought in developing programming systems. For example, the
price of processing data with naturally diverse lengths and structures is
easily recognized (see Chap. 4 . This price appeared to be paid most
)
economically in hardware; so very flexible hardware for this purpose was
provided. Similarly, protection of memory locations from unwanted
alteration was accomplished much more economically with equipment
than it would have been with programming. A final minor example is
the STORE V A L U E IK ADDRESS' operation, which inserts index values into
addresses of different lengths; by using address-length-determining hard-
ware already provided for other reasons, this instruction performs a task
that would be rather painful to program. For other tasks, such as pro-
gram relocation, excep tion-condi tioii fix-up, and supervisory control of
input-output, hardware was considered, hut programming techniques
were selected as more economical.
Poww instpad of Simplicity
The user was given power rather than simplicity whenever an equal-
cost choice had to be made. It was recognized in the first place that
the new computer would have many highly sophisticated and experienced
users. It would have been presumptuous as well as unwise for the com-
puter designers to "protect" such users from equipment complexities that
might be useful for solving complex problems. In the second place, the
choice is asymmetric. Powerful features can be ignored by a user who
wishes to confine himself to simple techniques. But if powerful features
were not provided, the skillful and motivated user roiild not wring their
power from the computer.
For these reasons, the user is given programmed access to the hardware
* Names of actual 7030 operations are printed in SMALL CAPS in this book. When
a name is used t o denote a class of operations of which this operation is a member, it
is printed in ztulics; also italicized are operations that exist in 8ome computers but not
in this one. For example, operations of the add type built into the 7030 include ADD,
ADD TO MEMORY, ADD TO MAGNITUDE, etc., but not add absolute, which is provided i a n
different manner by modifier bits.
SEC. 2.31 GUIDING
PRINCIPLES 9
wherever possible. He is given, for example, an interruption and address-
protection system whose use can be simple or very complex. He is given
a n indexing system that can be used simply or in some rather complex
ways. If he chooses and if his problems are simple, he can write pro-
grams using floating-point arithmetic without regard for precision, over-
flow, or underflow; but if he needs to concern himself with these often
complex matters, he is given full facilities for doing so.
Generalized Features
Wherever specific programming problems were considered worthy of
hardware, ad hoc solutions were avoided and general solutions sought.
This principle came from a strong faith that important variants of the
same problem would surely arise and that generality and flexibility would
amply repay any extra cost. There was also certainty that the architects
could hardly imagine, much less predict, the many unexpected uses for
general operations and facilities. This principle, for example, explains
the absence of special operations to edit output: the problem is solved
by the general and powerful logical-connective operations. Similarly, a
single uniform interruption technique is used for input-output communi-
cation, malfunction warning, program-fault indication, and routine detec-
tion of expected but rare exceptional conditions.
Specialized Equipment for Frequent Tasks
There is also an antithetical principle. For tasks of great frequency
in important applications, specialized equipment and operations are pro-
vided in addition to general techniques. This, of course, accounts for
the provision of floating-point arithmetic and automatic index modifi-
cation of addresses.
To maximize instruction density, however, specialized operations of
less than the highest frequency are specified by extra instructions for
such operations rather than by extra bits in all instructions. I n short,
the information price of specifying a less usual operation is paid when it
is used rather than all the time. For example, indirect addressing,
multiple indexing, and instruction-counter storing on branching each
require half-word instructions when they are used, but no bits in the
basic instructions are used for such purposes. As a result of such detailed
optimization, the 7030 executes a typical scientific program with about
20 per cent fewer instructions of 32 bits than does the 704 with 36-bit
instructions on a corresponding program.
Systematic Instruction Set
Because the machine would be memory-limited, it was important t,o
provide a very rich instruction set so that the memory accesses for an
10 \ L PHILOSOPHY
AKCHITECTITR ICHtP. 2

instruction and its operand mould accomplish as much as possible. As it
has developed, the instruction set contains several thousand distinguish-
able operations. Such a wealth of function could be made conceptually
manageable only by strong systematization. For example, there is only
one conditional branch instruction for testing the machine indicators, but
this is accompanied by a 6-bit code to select any one of the 64 machine
indicators, a bit to specify testing for either the on or the off condition,
and another bit to permit resetting of the indicator. Thus there are only
a few basic operations and a few modifiers. In all, the number of oper-
ations and modifiers is less than half the number of operations in the
IBM 709 (or 7090), although the number of different instruction actions
is over five times that of the 709.
Such systematization, of course, implies symmetry in the operation
code set-each modifier can be validly used with all the operations for
which it can be indicated in the instruction, and, for most operations, the
logical converses or counterparts are also provided. Thus the floating-
point-arithmetic set includes not only the customary DIVIDE where the,
addressed operand constitutes the divisor, but also a RECIPROCAL D I V I D E
which addresses the dividend.
Proiision ,for New Operating Techniques
Experience with the IBM 650 and 704 computers had clemo~~htr:tlcd
that two computers whose spceds ditrcr by more than one order of magni-
tude are different in kind as well as in degree. This confirmed the SUS-
picion that the 7030 would be more than a super-704 and would be
operated in a different way. An early effort was made, therefore, to
anticipate some of the operating techniques appropriate for such an
~nstrument,so that suitable hardware could be provided.
The most significant conclusion from these investigations was that an
important operating technique would be mzcltiprogramming, or time-
.haring of t he central computer amoiig several independent problem
programs. This now familiar (but yet unexploited) concept was new in
19.56 and viewed widely with suspicion.
-\ second conclusion was that the proposed high-capacity, high-data-
rat e disk storage would contribute substantially to system performance
and would permit the 7030 to be operated as a scientific computer with-
o u t very-high-speed magnetic tapes.

2.4. Contemporary Trends in Computer Architecture
Over the years computer designs have gone through a constant and
gradual evolution shaped largely by experience gained in many active
c.omputing centers. This experience has heavily influenced the architec-
ture of Stretch. I n several instances the attack on a problem exposed
SEC'. 2.41 ('ONTEMPO11 i l l y rrl{lGXl)h I > ('OMI'UTER .\II('HITECTURE 11
by experience with existing computers differs in Stretch from the solution
presently adopted in most computer installations. For example, with
existing large computers the only way to meet the high cost of human
intervention is to minimize such intervention; in the Stretch design the
attempt has been, instead, to make human intervention much cheaper.
The effect of several of these contemporary design trends on the Stretch
architecture will be examined here.
Concurrency
Most new computer designs achieve higher performaiice by oper-
ating various parts of the computer system concurrently. Concurrent
operation of input-output and the central computer has been available
for some years, but some contemporary designs go considerably beyond
this and allow various elements of the central computer to operate
roncurrently.
d distinction may be made (see Chap. 13) between local concurrency,
providing overlapped execution of instructions that are immediate neigh-
Ilors in the instruction stream of a single program, and nonlocal con-
currency, where the overlap is between nonadjacent instructions that
may belong to different programs. The usual input-output concurrency
i\ of the nonlocal type; since the instructions undergoing simultaneous
mecution are not closely related to one another, the need for interlocks
rind safeguards is not severe and may, to a large extent, be accomplished
by supervisory programming.
Local concurrency is used rxteiisivrly in the central processing unit of
the 7030 to achieve a high rate of instruction flow within a single instruc-
tion sequence. Unlike another scheme,2 in which each specialized unit
performs its task and returns its result to memory to await call by the
next unit, the 7030 uses registers; this is because memory speed is the
main limitation on 7030 computer speed. Several of these registers form
receives instructions and operands from the real memory in advance of
execution by the arithmetic unit and receives the results for storing while
the arithmetic unit proceeds with the next operation. Up to eleven SUC-
t.essive instructions may be in the registers of the central processing unit
.tt various stages of execution : undergoing address modification, awaiting
L ~ r c c m operands in memory, waiting for and being executed by the
to
.withmetic units, or waiting for a result to be returned to memory.
Considerable effort was expended on automatic interlocks and safe-
s a r d s , so that the programmer would not have to concern himself with
P. Dreyfus, Programming Design Features of the GAMMA 60 Computer, Pror.
December, 1958, pp. 174-181.
f..aslerrr J(Ji7Lf Comp?rter (lonf.,
Ibid.
12 PHILOSOPHY
ARCHITECTURAL [CHiP. 2
the intricate logic of local concurrency. The programmer writes his pro-
gram as if it were to be executed sequentially, one instruction a t a time.
To make a computer with automatic program-interruption facilities
behave this way was not an easy matter, because the number of instruc-
tions in various stages of processing when an interrupting signal occurb
may be large. The signal may have been the result of one of these
instructions, requiring interruption before the next instruction is exe-
cuted. Since the next several instructions may already be under way,
it must be possible to go back and cancel their effects. The amount of
overlap varies dynamically and may even be different for two executions
of the identical instruction sequence; so it would be almost impossible
for the programmer to do the backtracking. Therefore, the elaborate
safeguards provided to ensure sequential results from nonsequential oper-
ation do more than satisfy a desire to simplify programming; the pro-
grammer would be lost without them.
ftultiprogramming
i
Time-sharing (as of a computer by multiprogramming) and concur-
rency are two sides of one coin: to overcome imbalance in a computer
system, faster elements are time-shared and slower elements are made to
operate concurrently. In the 7030, for example, the single central com-
puter uses several concurrently operating memory boxes, and the single
computer-memory system may control in turn many concurrently oper-
ating input-output devices.
Even though per-operation cost teiids to decrease as system perform-
ance increases, per-second cost increases, and it therefore hecomes more
important to avoid delaying the calculator for input-output. To
take full advantage of concurrent input-output operation for a computer
of very high performaiice demands that input data for one program be
entered while a preceding program is in control of calculation and that
output take place after calculation is complete. For this reason alone,
it was apparent from the beginning that multiprogramming facilities
would be needed for Project Stretch.
-1second motivation for multiprogramming is the need for a closer man-
machine relationship. As computers have become faster, the increasing
cost of wasted seconds has dictated increasing separation between the
problem sponsor and the solution process. This has reduced the over-all
efficiency of the problem-solving process; for, in fact, the more complex
problems solved on faster calculators are harder, not easier, for the spon-
sor to comprehend and therefore need more, not less, dynamic interaction
between solution process and sponsor. There can be no doubt that much
computer time and more printer time has been wasted because the prob-
lem sponsor cannot observe and react as his program is being run on large
SEC.2.41 CONTEMPOHAKY N U S C'om
T I ~ E IX R ARCHITECTURE 1 3
computers like the IBM 704. This difficulty promised to become more
acute with the even more complex problems for which Stretch was needed.
With multiprogramming it becomes economically practical for a person
seated a t a console to observe his program during execution and interrupt
it while considering the next step. Since the computer can immediately
be switched to another waiting program, the user is not charged with the
cost of an idle computer. Thus the extension of multiprogramming to
manual operation offers, once the technique has been mastered, a tre-
mendous economic breakthrough : it provides a general technique for
solving the problem of loss of contact betn.em sponsor and solution. A
sponsor can now interact with his problem a t his own speed, paying only
the cost of delaying the problem, not that of delaying the machine. This
should materially accelerate that large proportion of scientific compu-
tation which is expended on continual and perpetual refinement and
debugging of mathematical models and the programs that embody them.
The solution of moPt such problems is characterized more closely by a
fixed number of interactions between computer and sponsor than by a
fixed amount of computer time.
Multiprogramming also makes it economically practical to enter nevi
data and to print or display results on line, that is, via directly connected
input and output devices; whereas the economics of previous computers
forced card-to-tape and tape-to-printer conversion o f line, that is, with
physically separate devices, so that only the fastest possible medium,
magnetic tape, would be used on the computer. On-line operation of
input and output is emphasized in the Stretch philosophy, because it
removes much of the routine operator intervention and reduces the over-
all elapsed time for each run of a problem.
Multiprogramming makes several demands upon system organization.
Most obvious is the requirement of ample and fast storage, both internal
and external. Of equal importance is an adequate and flexible inter-
ruption system. Also, in the real world, time-sharing of a computer
among users with ordinary human failings requires memory protection,
40 that each user can feel secure within his assigned share of the machine.
Dcbugging is difficult enough a t best, arid most users would sacrifice
efficiency rather than tolerate difficulties caused by the errors in other
programs. It proved possible in the 7030 to provide a rudimentary but
sufficient form of memory protectioii without affecting speed and with a
modest amount of hardware.
The equipment for multiprogramming was, however, limited to two
essential features : program interruption and address monitoring, and
r hese were designed to be as flexible as possible. Other multiprogramming
runctions are left t o the supervisory prograin, partly because that arrange-
ment appeared to be efficient, but primarily because no one could be sure
14 ARCHITECTURAL
PnILosoPm [CHAP.
3

which further facilities would prove useful and which would prove merely
expensive and overly rigid inconveniences. Several years of actual multi-
programming experience will undoubtedly demonstrate the value of other
built-in features.
I multiprogramming is to be an operating technique}a radically differ-
f
ent design is needed for the operator's console. If several independent
programs are to be run, each with active operator intervention, there
must be provision for multiple independent consoles. Each console must
be incapable of altering any program other than the associated problem
program. For active intervention by the problem sponsor (rather than
by a special machine operator), the console must be especially convenient
to use. Finally, if a supervisory program is to exercise complete control
in scheduling programs automatically, it must be able to ignore unused
console facilities. Although intelligent human intervention is prized
highly, routine human intervention is to be minimized, so as to reduce
delays and opportunities for error.
The operating console was designed to be simply another input-output
device with a convenient assortment of switches, keys, lights, digital dis-
plays, and a typewriter. A console interpretive program assigns mean-
ing to the bits generated by each switch and displayed by each light.
There are no maintenance facilities on the operator's console, and com-
pletely separate maintenance consoles are provided.
Automatic Programming
Undoubtedly the most important change in computer application tech-
nique in the past several years has been the appearance of symbolic
assemblers and problem-language compilers. Studies showed that for
Stretch a t least half of all computer time would be used by compiier-
produced programs; all programs would be a t least initially translated
by an assembler.
A most important implication of symbolic-language programming is
that the addressing radix and structure need not be determined for coder
convenience. Fairly complex instruction formats can be used without
causing coding errors, and operation sets with hundreds of diverse oper-
ations can be used effectively.
Many proposals for amending system architecture to simplify com-
pilers were considered. The most far-reaching of these concerned the
number of index registers, which should be infinity or unity for greatest
ease of assignment during compilation. The alternatives were investi-
gated in considerable detail, and both turned out to reduce computer
performance rather sharply. Indeed, reduced performance was implied
by most such proposals. These studies resulted in a belief which is not
shared by all who construct compilers; this is that total cost to the user is
SEC.2.51 HINDSIGHT 1 5
minimized not by restricting system power to keep compilers simple but
by enhancing facilities for the task of compilation itself, so that com-
pilers can operate more rapidly and efficiently.
Information Processing
The arithmetic power of a computer is often only ancillary to its power
of assembling, rearranging, testing, and otherwise manipulating infor-
mation. To a n increasing extent, bits in even a scientific computer
represent things other than numerical quantities: elements of a pro-
gram metalanguage, alphabetic material, representations of graphs, bits
scanned from a pattern, etc. In the light of this trend, it was therefore
important to match powerful arithmetical with powerful manipulative
facilities. These are provided in the variable-field-length arithmetic
and, in unique form, in the variable-field-length connective operations,
which operate upon bits as entities rather than components of numbers.
Good variable-field-length facilities are, of course, particularly important
for business and military data processing.

2.5. Hindsight
As the actual shape of the 7030 began to emerge from the initial
planning and design stages, it became apparent that some of the earlier
thoughts had to be revised. (Some of these changes have already been
noted parenthetically in Chap. 1.) The bus unit for linking and schedul-
ing traffic between many memory boxes and many memory-using units
turned out to be a key part of the design. The original algorithms for
multiplication and division proved inadequate with available circuits,
and new approaches were devised. It became clear that division, especi-
ally, could not be improved by the same factor as multiplication. Serial
(variable-field-length) operation turned out to be considerably slower
than expected; so serial multiplication and division were abandoned, and
the variable-field-length multiplication and division operations were rede-
signed to use the faster parallel unit.
The tivo separate computer sections that were postulated originally
I\ ere later combined (see Chap. l),and both sets of facilities were placed
under the control of one instruction counter. Although the concept of
multiple computing units, closely coupled into one system, was not found
practical for the 7030 system, this concept still seems promising.' I n
iact, the input-output exchange coupled to the main computer in the
7030 is a simplified example, since the exchange is really another com-
puter, albeit a highly specialized one with an extremely limited instruc-
t ion vocabulary.
* A . L. Leiner, W. A. Nota, J. L.Sniith, and A. Weinberger, PILOT: h New bhlti-
Computer System, J . AC'M, vol. 6, no. 3, pp. 313-335, July, 1959.
16 A4KCHITECTUHAL PHILOSOPHY [CHAP. 2
Some architectural features proved unworkable. Rather late in the
design period, for example, it became clear that the method of handling
zero quantities in floating-point arithmetic was ill-conceived ; so this
method was abandoned, and a better concept was devised.
Two excellent features, each of which contributes markedly to system
performance, were found to have inherently conflicting requirements;
their interaction prevents either feature from realizing its full potential.
The program-interrupt system is intended to permit unpredicted changes
in instruction sequencing. The instruction look-ahead unit, on the other
hand, depends for its effectiveness on the predictability of instruction
sequences; each interruption drains the look-ahead and takes time to
recover. This destroyed the usefulness of the interrupt system for fre-
quent one-instruction fix-ups and required the addition of built-in excep-
tion handling in such cases as floating-point underflow.
On the other hand, some improvements became possible as the design
progressed. It turned out, for example, that the equipment for perform-
ing variable-field-length binary multiplication with the parallel arithmetic
unit could easily be made to do binary-decimal and format conversions;
so this facility was added.
There are in the 7030 architectural features whose usefulness is still
unmeasured. h few are probably mistakes. Others seem to be innova-
tions that will find redefinition and refinement in future computers, large
and small. Still other features appear now to be wise for very-high-
performance computers, but must be considerably scaled down for more
modest machines. Experience has, however, reinforced the system archi-
tects' belief in the guiding principles of the design and in the general
applicability of these principles to other computer-planning projects.
Chapter 3
SYSTEM SUMMARY OF IBM 7030
by W. Buchholz

3.1. System Organization
The IBM 7030 is composed of a central processing unit, one or more
memory units, a memory bus unit, an input-output exchange, and input-
output devices. Optionally, high-speed magnetic disk storage units and
a disk control unit may be added for external storage. A typical system
configuration is shown in Fig. 3.1.
Information moves between the input-output devices and the memo-
ries under control of the exchange. The central processing unit (CPU)
actually consists of several units that may operate concurrently: a n
instruction unit, which controls the fetching and indexing of instructions
and executes the instructions concerned with indexing arithmetic; a look-
ahead unit, which controls fetching and storing of data for several instruc-
tions ahead of the one being executed, so as to minimize memory traffic
delays; a parallel arithmetic unit, for performing binary arithmetic on
floating-point numbers a t very high speed ; and a serial arithmetic unit,
for performing binary and decimal arithmetic, alphanumeric operations,
Lind logical-connective operations on fields of varying lengths.
T,ogically the CPU operates as one coordinated unit upon a succession
of instructions under the control of a single instruction counter. Care is
taken in the design so that the user need not concern himself with the
intricacies of overlapped operations within the CPU.
The memory bus unit coordinates all traffic between the various
memory units on the one side and, on the other side, the exchange, the
disk control, and the various parts of the CPU.
3.2. M e m o r y Units
The main magnetic core memory units have a read-write cycle time of
2.1 microseconds. A memory word consists of G information bits and
4
S check bits for automatic single-error correction and double-error
detection.
17
18 SUYBI.\RT IBM 7030
SYSTEM OF [CHAP.
3
The address part of every instruction provides for addressing directly
any of 262,144 (219 word locations. Addresses are numbered from 0
up to the amount of memory provided in a particular system, but
addresses 0 to 31 refer to index words and special registers instead of
general-purpose memory locations.
Each unit of memory consists of 16,384 (214) words. A system may
contain one, two, or a multiple of two such units, up to a maximum of

Memory units

1 Memory out bus
1 1 1 1 1 1 .Z Memory in bus

Memory bus
.-
unit

Controls
I

synchronizer Index Index
unit
Channels for
input-output
units
Arithmetic
(Magnetic tapes
Magnetic disks
Printers Parallel
arithmetic unit
Readers
Consoles
Displays High-speed arithmetic unit
Inquiry stations disk units
Data transmission Central
etcJ processing
unit

FIG.3.1. 7030 system.

sixteen units. Each memory unit operates independently. I n systems
with two units or more, several memory references may be in process
at the same time. I n order to take better advantage of this simultaneity,
successive addresses are distributed among different boxes. When a sys-
tem comprises two units, successive addresses alternate between the two.
When a system comprises four or more units, the units are arranged in
groups of four, and successive addresses rotate to each of the four units
in one group, except for the last group which may consist of only two
units with alternating addresses.
SEC. 3.51 I N P U T A N D O U P U T FACILITIE6 19

3.3. Index M e m o r y
A separate fast magnetic core memory is used for index registers.
Since index words are normally read out much more often than they are
altered, this memory has a short, nondestructive read cycle of 0.6 psec.
The longer clear-and-write cycle of 1.2 psec is taken only when needed.
The index memory is directly associated with the instruction unit of
the computer. It cannot be used to furnish instructions, nor can it be
used directly with input or output.
The sixteen index registers have regular addresses 16 to 31, which
correspond to abbreviated 4-bit index addresses 0 to 15. The first
register cannot participate in automatic address modification since a n
index address of 0 is used to indicate no indexing.
3.4. Special Registers
Many of the registers of the machine are directly addressable. Some
of these are composed of transistor flip-flops; others are in the fast index
memory or in main memory. The addressable registers are assigned
addresses 0 to 15. These locations cannot be used for instructions or for
input or output data.
Address 0 always contains zero. It is a bottomless pit; regardless of
what is put in, nothing comes out. The program may attempt to store
data at address 0, but any word fetched from there will contain only 0
data bits.'
The remaining fifteen addresses correspond to machine registers, time
clocks, and control bits. They are listed in the Appendix.
3.5. Input and Output Facilities
Input to the system passes from the input devices to memory through
The exchange. The exchange assembles successive 64-bit words from the
flow of input information and stores the assembled words in successive
memory locations without tying up the central processing unit. The
CPU specifies only the number of input words to be read and their loca-
tion in memory; the exchange then completes the operation by itself.
The exchange operates in a similar manner for output, fetching SUC-
cessive memory words and disassembling them for the output devices
independently of the CPU. External storage devices, such as tapes and
disks, are operated via the exchange as if they were input and output.
The exchange has the basic capability of operating eight independent
input-output units. This eight-channel exchange can be enlarged by
A distinctive type (0, is used in the text for the bits of binary numbers or codes,
1)
and regular type (0,1, 2, ...) for decimal digits. For example, 10 is p binary
gumber (two)and 10 a decimal number (ten).
20 SYSTEM S U M M A R Y OF IRM 7030 [("HAP. 3
adding more eight-channel groups. Each of these channels can handle
informat,ion a t a rate of over 500,000 bits per second. The exchange as a
whole can reach a peak data rate of 6 million information bits per second.
A wide variety of input-output units can be operated by the exchange.
These include card readers and punches, printers, magnetic tapes, oper-
ator's consoles, and typcwriter inquiry stations. Several of some kinds
of units can be attached to a single exchange channel; of the several units
on a single channel, only one can be operated a t a time.
Provisions have been made in the design of the exchange for adding up
to 64 more channels operating simultaneously but a t a much lower data
rate per channel. This extension is intended for tying the computer eco-
nomically into a large network of low-speed units, such as manually
operated inquiry stations.
3.6. High-speed Disk Units
For many large problems, the amount of core storage that it is practical
to provide is not nearly large enough to hold all the data needed during
computation. Earlier systems have been severely limited by the rela-
tively low data rates of magnetic tapes or the relatively low capacities of
magnetic drums available for back-up storage. To avoid having the
over-all 7030 performance limited by the same devices, it was essential
to develop an external storage medium with high capacity and high data-
transfer rates. A magnetic disk storage unit was designed for this
purpose.
The disk units read or write a t a rate of 125,000 words per second, or
8 million bits per second over a single channel (a rate 90 times that of the
IBM 727 tape available with the 704). One or more units, each with a
capacity of 2 million words, may he attached. Access to any location of
any disk unit requires of the order of 150 milliseconds. Once data trans-
mission has started it continues a t top speed for as many consecutive
words as desired, without further delays for access to successive tracks.
The control unit, or disk synchronizer, functions like the input-output
exchange except that it is a single-channel device designed specifically to
handle the high data rate of the disks. The exchange and the disk syn-
chronizer can operate independently and simultaneously a t full speed.
An error-correcting code is used on the disks, and any single errors in data
read from the disks are corrected automatically by the control unit before
transfer to memory.
3.7. Central Processing Unit
The central processing unit performs arithmetical and logical oper-
ations upon operands taken from memory. The results are generally
left in accumulator registers to be further operated on or to he stored in
1M3TRUCTION LOOK-AHEAD 21
memory subsequently. Operations are specified one a t a time by instruc-
tions, which are also taken from memory. Each instruction usually
specifies an operation and an operand or result. The operand specifi-
cation is made up of an address and an index address. Part of the index
word contents are added to the address in the instruction to obtain an
6ffective address. The effective address designates the actual location of
the operand or result. The additions needed to derive the effective
address and to modify index words are performed in an index-arithmetic
unit which is separate from t)he main arithmetic unit.

3.8. Instruction Controls
h n instruction may be one word or one half word in length. Full-
,ind half-length instructions can be intermixed without regard to word
boundaries in memory.
Instructions are taken in succession under control of an instruction
<.ounter. The sequence of instructions may be altered by branching oper-
ations, which can be made to depend on a wide variety of conditions.
-1utomatic interruption of the normal sequence can also be caused by
many conditions. The conditions for interruption and control of branch-
ing are represented by bits in an indicator register. The interrupt sys-
rem also includes a mask register for controlling interruption and an
, Titerrupt address register for selecting the desired set of alternate pro-

grams. When it is needed, the address of the input or output unit
I Jusing an interruption can be read from a channel address register which
1.31~ set up only by the exchange.
be
The interpretation and execution of instructions is monitored to make
-ure that the effective addresses are within boundaries defined by two
tmndary registers.

3.9. Index-arithmetic Unit
The index-arithmetic unit, which is part of the instruction-control unit,
~ i t a i nregisters for holding the instructions to be modified and the index
s
m r d s used in the modification. When index words themselves are oper-
L T P on, some of these registers also hold the operand data. The index-
~
.:g operations include loading, storing, adding, and comparing. The
.adex-arithmetic unit has gates for selecting the necessary fields in index
2nd instruction words and a 24-bit algebraic adder.

3.1 0. Instruction Look-ahead
-1fter initiating a reference to memory for a data word, the instruction
passes the modified instruction on to the look-ahead unit. This unit
.nit
nolds the relevant parts of thc instruction until the data arrive, so that
22 SYSTEMV M M ~ R YIBRf 7030
S OF [CH11'. 3
both the operation and its operand can be sent) to the arithmetic uiiit
together. Since access to the dcsired memory unit takes a relatively long
time, the look-ahead will accept several instructions a t a time and
iiiitiate their memory references, so as to smooth out the memory traffic
and obtain a high degree of overlap between memory units. Thus
the unit "looks" several instructions ahead of the instruction being
executed and anticipates the memory references needed. This reduces
delays and keeps the arithmetic unit in as nearly continuous operation
as possible.
Indexing and branching illstructions are completed by the instruction
unit without involving the main arithmetic unit. The instruction unit
receives its own operands, whereas the look-ahead receives operands for
the main arithmetic unit. The look-ahead, however, is responsible for
storing all results for both units, so that permanent modification of stored
information is done in the proper logical sequence. Interlocks in the
look-ahead unit ensure that nothing is altered permanently until all pre-
ceding instructions have been executed successfully.
3.1 1. Arithmetic Unit
The arithmetic unit consists of a parallel and a serial section. The
parallel section essentially performs floating-point arithmetic a t high
speed, and the serial section performs fixed-point arithmetic and logical
operations on fields of variable length. Both sections share the same
basic registers and much of the control equipment; so they may be treated
as one unit.
For simplicity, the arithmetic unit may be considered to be composed
of 4 one-word registers and a short register. This conceptual structure is
shown in Fig. 3.2, where the full-length registers are labeled A , B , C, and
D , and the short register is labeled X. The registers marked A and B
constitute the left aiid right halves of the accumulator. The registers
marked C and D serve only as temporary-storage registers, receiving
words from memory and (in serial operations only) assembling results to
be stored in memory. The short register S stores the accumulator sign
bit and certain other indicative bits.
In floating-point addition the operand from memory is sent to register
C. (Since floating-point operands will fit into register C, register D is not
needed here.) This operand is then added to the contents of register A
or of both registers A aiid I?, depending on whether single- or double-
length addition has been specified. The result is placed in A or in A
aiid R. As an alternative (adding to memory), the result may be
returned to the location of the memory operand instead.
I n floating-point multiplication one factor is the number in accumu-
lator register A . The other factor comes from memory and is trans-
Ssc. 3.111 .\HITHMETI(' UNIT 23
From memor)

Exponent Fraction A Fraction (continued) fj
Left half Right half Accumulator
accumulator accumulator sign

From memory From memory

Left half Right half Accumulator
accumulator accumulator sign
S E R I A L OPERATION
FIG.3.2. Simplified register structure of arithmetic unit.

ierred to register C. The factors are now multiplied together, and the
product is returned to the accumulator register, replacing the previous
1-ontents. I n cumulative multiplication one factor must have been previ-
ously loaded into a separate factor register (not shown). The other fac-
:or again comes from memory and goes to C . The factors are multiplied
3s in ordinary multiplication, but the product is added to the contents of
:he accumulator register.
In floating-point division the dividend is in the accumulator, and the
divisor is brought from memory to register C. The quotient is returned
24 SYSTEMU M M ~ OF Y
S H IBM 7030 [CHAP. 3
to the accumulator, and the remainder, if any, goes to a rvmainder register
(not shown).
In serial variable-field-length operations the operand field may occupy
parts of two adjacent memory words, and both words if necessary are
fetched and placed in registers C and D . The other operand field comes
from A and B. The operands are selected a few bits a t a time and
processed in serial fashion. The result field may replace A and B , or it
may replace selected bits of C and D whose contents are then returned to
memory. Binary multiplication and division operands are stepped into
the parallel mechanism a few bits at a time, but the actual operation is
performed in parallel.
Other registers are the transit register, a full-word location, which may
be used for automatic subroutine entry; and two 7-bit registers, the ali-
ones counter and the left-zeros counter, which are used in connective oper-
ations to hold bit counts developed.from the results.
All registers mentioned above, except memory registers C and D, are
also addressable as explicit operands.

3.1 2. Instruction Set
The operations available may be divided into these categories :
Data arithmetic
1. Floating-point arithmetic
2. Variable-field-length arithmetic
Radix conversion
Connectives
Index arithmetic
Branching
Transmission
Input-Output
The categories are briefly described in the next few sections
3.1 3 . Data Arithmetic
The arithmetical instruction set includes the conventional operations
LOAD, ADD, STORE, MULTIPLY, and DIVIDE. Modifier bits are available to
change the operand sign. The operations subtract and add absolute are
obtained by use of sign modifiers to the ADD instruction and are not pro-
vided as separate operations. The same modifiers make it possible to
change the sign of a number that is to be loaded, stored, multiplied, or
divided.
A convenient feature of the MULTIPLY operation is that one of the fac-
tors is taken from the accumulator rather than from a separate register,
and this factor may be the result of previous computation. Similarly,
Ssc. 3.131 L ) a ~ aARITHMETIC 25
DIVIDE places the quotient in the accumulator, and so the quotient is
available directly for further arithmetical steps.
Extensions of the basic set of arithmetical operations permit adding
.ind counting in memory, rounding, cumulative multiplication, compari-
.on, and further variations of the standard ADD operation.
One of these variations is called ADD TO MAGNITUDE. This operation
tliffers from ADD in that, when the signs and modifiers are set for sub-
'iaction, it does not allow the result sign to change. When the result
-ign would change, the result is set instead to zero. This operation is
i.eful in dealing with nonnegative numbers or in computing with dis-
1 ontinuous rates.
The important arithmetical operations are available in the floating-
.Joint mode as well as in the (fixed-point) variable-field-length mode.
b'ioatzng-poznt-arithmetic Operations
Floating-point (PLP) arithmetic use5 a 64-bit floating-point word con-
+ling of a signed 48-bit binary fraction, a signed 10-bit binary exponent,
lud an exponent flag to indicate numbers that have exceeded the avail-
,hie exponent range. Arithmetic can be performed in either normalized
tr unnormalized form.
The 48-bit fraction (mantissa) is longer than those available in earlier
ttmputers, so that many problems can be computed in single precision,
A hich would previously have required much slower double precision.

Khen multiple-precision computation is required, however, it is greatly
-at>ilitatedby operations that produce double-length results.
To aid in significance studies, a noisy mode is provided in which the
,in -order bits of results are modified. Running the same problem twice,
-r:t in the normal mode and then in the noisy mode, gives an estimate
a-.f the significance of the results.
Operations
1-ariable-field-length-arithmetic
The class of variable-field-length (VFL) arithmetic is used for data
l r i t hmetic
on other than the specialized floating-point numbers. The
-rnphasis here is on versatility and