fire-side-chatJeff Fox (firstname.lastname@example.org)
Sun, 17 Nov 1996 02:22:00 -0800
Dear MISC readers,
You get this first. I will post it to c.l.f tomorrow. I will
follow this up later with a post about the status of F21.
November 16, 1996
Chuck Moore gave his annual fireside chat to the combined Forth day
meeting of the Silicon Valley, Sacramento, and North Bay chapters
of the Forth Interest Group today.
I was one of the speakers in the morning session and gave a
progress report on the developments at the iTV Corporation. For
those who didn't know the story I gave some background explaining
that the iTV Corporation was developing a low cost set-top box
for browsing the internet. This box will contain iTV's i21 chip
designed by Chuck Moore, and software written by a number of
programmers at iTV.
I showed the iTV glossies of its Pegasus product and its i21 chip.
I noted that I was pleased to see that the competition has now started
running adds on TV. Not only do you see articles in the paper and
magazines all the time about browsing the web on your tv but now
you see commercials for browsing the internet on your tv on your tv.
I noted also that I was pleased to see the inside of the Sony and
Philips boxes and see why their boxes will be so much more expensive.
I recalled that in one of our meetings at iTV Joe Zott was told
that one connector would cost a few cents more than another and he
said something like "but a penny is a lot of money on our board. When
you are going to make ... boards it is a lot of money."
I said that I was pleased to see MISC technology moving into a
product where it can take advantage of low cost.
So often over the last few years many people have said to me
"worry about efficiency or memory use, or cost? My workstation
(or PC) has lots of power and lots of memory and who cares if the
hardware and software is fat and bloated." They really just don't
get the idea behind MISC chips.
Our product is not yet ready for market and we expect to ramp up
sales in first quarter of 97. I said that it was fun to browse
and do email with Forth programs and it was fun working with
the people at iTV and with MISC chips.
In the afternoon Chuck gave his annual fireside chat. I took several
pages of notes which I will present here. There will be a few
direct quotes, but I am summing up.
Howdy. I am now working for iTV and we are making a
set-top box using what some people have called "Moore technology"
in this case the i21 chip.
iTV is currently in the process of raising money. It has been
in this process since it was founded and will probably always be
in this process. :-) As such we have a lot of people who drop by
to take a tour. Mostly they are from Taiwan, Korea, and Japan.
I put on a demo and show the co-evolution of the chips and
design tools. This situation is very satisfying for me as
each make each other possible.
There are two classes of visitors, investors or management types and
engineers. You never really know what the investors or management
types think, but the engineers listen, ask a few questions, blink,
and go away impressed.
I have given various numbers for the speed of our chip, my latest
measurements show we are now running at 500mips for a sequence
of instructions in a word. So it executes each instruction in
2 nanoseconds then waits a long time for memory. We have 16k
transistors and five processors on the chip.
We have produced i21a,b,c,d,e,f,i and are about to submit j,k so
it is not a one try process. Sooner or later I will make one
that works perfectly. J and k go in next week for multi wafer
fab. So instead of making 25 parts we are making thousands
this time. It is very exciting to get thousands of chips that
don't work. :-)
J as previous versions has 65 pads, but k will use 68. In quantities
of 25 ceramic packaging is cheaper, but in larger quantities you
want to do plastic. We are pad limited so going to 68 pad meant
expanding the die. This means we will have much more empty space.
The extra pads will be used for power and ground. On the previous
chips we had 3 power and 3 ground but on the k we will have 16 power
and ground pins. This is because inductance on the power and ground
leads produces a drop in voltage or a ground bounce on chip. This
can be very bad.
We have spent a lot of time and money down at Acurel trying to look
at things on our chips. We use an electron beam probe to look at
what was happening on the chip and see pulse widths and such but
pretty much failed to get any useful information this way.
Its not just our problem everyone puts lots of power and ground
pins on their chips because of this problem. On the h chip I
mis-estimated the power disipation and could see about a 1 volt
voltage drop from this problem.
I was using .07 ohms per square for metal and 70 ohms per square
for diffusion and figured that the .07 ohms per square was
going to be negligible. I figured it is four orders of magnitude
lower than the impedance of the diffusion so I didn't think it
would matter. But there are so many tiles that it added up to
an unacceptable level. The tiles are 2.6 micro meters on a
side and the metal wires are 1 micron wide. A 2mm chip is
2000 tiles long.
Last month I beefed up the power busses. I was not simulating the
power resistance in OKAD so it was the prime candidate for the
problem we were seeing. The worst case was where I write all
1s on the data bus and all 1s on the address bus when these
were all zero. This draws about 400ma for 2ns. This results
in a voltage drop of 5V. :-) The drop is only to the i/o pads
and at that time nothing is happening on the chip because we
are waiting for memory access but it is still too much.
OKAD now draws the traces of four current signals across the
screen and at the top I have added a trace for the current use.
It is very interesting to see where the power use happens. It
is sort of black magic. We have P transistors to power and N
transistors to ground and when you construct a complementary
transistor pair like this you get some parasitic capacitance.
This can lead to circulating currents. So the amount of current
going in and out of the transistor isn't the total current. The
amount of current on chip vs off chip is hard to predict.
What happens is that there is virtually zero power until an
instruction then there are 4 peaks as the four opcodes in a
word execute at 2ns intervals and each peak is higher until
it reaches a maximum of about 150ma on the last peak. As
a result it is not always safe to execute four instructions
in one word. It is always safe if you have 3 nops and one
other opcode per word. Sometimes you can use 4 instructions,
sometimes 2, sometimes one. It is both dependent on the
instruction and with some instructions the data. So it is
only safe in the general case now to run with 3 nops in a
word. The new chip will hopefully fix this.
We now have great wide power busses. I used to think that narrow
power lines were pretty but now they are ugly. Wide ones are
pretty. I would bulldoze a path for a wider power bus across
the chip like building a freeway across a city, plowing down
transistors. The stacks were a problem because there was
very little space there. I wrote a program that would widen
the power bus. I had to notch it it place to fit around some
stack circuit edges, but now the program produces some 0 width
rectangles so I have to fix it.
The power busses are now twice as wide and we bring power in
from the top and bottom so we have four times as much in
the stacks and six times as much as we did elsewhere. So I
expect this will solve the problem.
I do believe in "satisfysing" that is doing a design that is
just good enough.
We have a box, inside is an i21, some dram, some flash. It
has video output so you can connect it to a TV or monitor and
it has a serial internet interface over a modem. The hardware
is not complex, the software is not complex, it is an appliance.
You may not know but .gif file format is patented. It's one of
those terrible patents. If you decode an image in .gif format
you have to pay. The charge is $.25 for each box. $.25 is a
lot when Joe is concerned about $.01 extra on a connector. The
patent runs out in a couple of years.
We generate video in 384x480 format. We have a different aspect
ratio than the PC so we must resample images when we display
You can code in a high level Forth with stacks in memory or
use assembler. Assembler restricts you to assembler opcodes,
there is no OR, no ROT, no SWAP, and memory addressing is a
little different than Forth's @ and !. You can code in
high level and convert critical routines to assembler. You
can gain an order of magnitude in speed by using the
on chip stacks and assembler but you must live with some
restrictions. You can carefully craft the assembler code
to get it to run fast by keeping data access onpage. You
pay a 3 times penalty when data access is not onpage.
There is documentation for OKAD which few people have seen. In it
there was a section that read "it would be easy to record the time
when signals transit 2.5 volts." I would read it and think yes
that might be useful. Then I decided to take that out of the
documentation and just about that time I added it OKAD. Now I
can measure pulse width in OKAD. When I did I was horrified.
Pulses that were suppose to be 1 to 2ns were 700 picoseconds. I
was off by about 2 to 1. Now I can point to any circuit and
see pulse width or capacitance.
I know that the ideal pulse for my shift register is about 650ps,
750ps for a counter, and 900ps would be too much.
John Rible has been re-engineering the OKAD code. I have been
saying for years that "the map is not the territory." It is time
to revisit this issue. There was no source for OKAD to start
with. We went from object code to a MASM source. 12,000 bytes
of object, 12,000 lines of MASM. I'm dissapointed on this one.
I wanted object code to stand on its own, but I failed. It is
the biggest most complex program I have ever done and I have
spent far more time using it than any other thing I have ever
We purchased a subset of the Mentor VLSI tools to see what it
could do. We paid $160k. It can read my chip layout, it has
spice, schematic capture, and can simulate the entire chip.
But they can't really. Their simulator can't simulate this
chip and mine can because mine was designed to do it.
One of my favorite circuits is the phase lock loop. We extracted
just a pll set of transistors and imported it into Mentor and
did a schematic capture and simulate. Mentor returns garbage.
They can't even simulate a simple circuit from i21 let alone
the entire chip. It is a question of the number of man years
it would take to get Mentor to simulate the chip. The original
estimate was man weeks, it has been man months, and who knows
how long it would take.
This is a chip without a description. But Mentor needs schematic,
they NEED schematic capture, they can't do anything without it.
It is backwards to get a schematic from a chip. I am dubious that
iTV will invest the effort needed, and I question the value.
My simulator is at least as accurate as theirs. Things I don't
have they don't have either.
We factor things so that manufacturing process details has a
file, and chip parameters has a file, and there is a file
for simulation parameters. But really each chip has its own
version of OKAD. I could have done it with one version but
it would have required lots of state variables at run time,
better at compile time thus many versions.
I am still the only user of OKAD. We have the OKAD hardware
simulation of the chip, Jeff's software simulator, the Mentor
simulator, and the actual chip. We run code and test routines
through all of them.
There is a delay in getting parts fabricated and it results in
a sort of pipeline. We have had the pipeline filled and I have
been putting in chips with changes before seeing the actual
chips come back from 3 previously submitted designs. The
pipeline is now empty. I think our wafer run will be faster
than the 25 part runs, 4 weeks plus 2 weeks to package.
(someone asked how big was the box and Chuck showed with his hands)
Marketing said the box had to be a minimal size for customer
perception. From the engineering standpoint it could be a bulge
in the cable. But the box has to be a certain size and heavier
Other design tools start with a design then generate schematics
then transistors and finally vlsi components. People spend a lot
of time making pretty schematics but then the software that
does everything else with the schematic does a poor job.
There is a tendency today for everything to be text. I looked
at what was being done with the Mentor tools. It was all ascii
text manipulations. Everything seems to be going this way and
it is wrong. We need to work more closely with the "thing"
not with an ascii description of it.
We have plans for 32 bit chips. We plan to put ram on the chip.
That is what you do with all the white space on the die that
we are not currently using. With on chip ram you could
actually sustain that 500mip operation. With external DRAM
it will be more like 100. With the fab pipleline empty I may
work on getting ram on the chip.
The lesson we have learned here is that things must be kept
simple. We have a simple chip, a simple board, and simple
(Chuck was asked if the competition was aware of iTV)
The word we got was that Sony did know about us and they
don't believe it is possible. I have the .8 micro process
running at 500mips with 650ps pulse widths etc. Engineers
will tell you that this is not possible. Conventional
engineering says the limit is ten times lower than this
and that these numbers are just not possible, but there
is the chip.