|
FPGA SoC On-Chip Buses |
|||
|
Home On-chip Memory >> << Supercomputers
Usenet Postings |
Subject: Re: Microcomputer buses for use inside FPGA/ASIC devices?
Newsgroups: comp.arch.fpga
Date: Sat, 24 Jul 1999 21:36:20 -0700
Wade D. Peterson wrote in message <7ndpnl$pcu$1@news1.tc.umn.edu>...
>I'm working on a project where we're doing a microcomputer bus (kind of
like
>VMEbus or PCIbus) for use *INSIDE* of FPGAs and ASICs. It's for hooking
>system-on-chip (SOC) components together. If anyone has done this before,
or
>know of any references to this kind of project, I'd like to hear about it.
>If anybody knows of similar technology, I'd like to hear about it. If
there are
>more, then my intention is to start a FAQ database on our website for all
to
>use.
My 1995 J32 system had a 32-bit on-chip peripheral bus. The left 60% of the
XC4010 was a 32-bit RISC processor, using a 32-bit long line bus to
multiplex amongst the various execution stage results (including add/sub,
logic, 1-, 2-, 4-bit shifts left and right, load data, sign extension data,
return address). This used approximately 16x11=176 TBUFs.
The right half of the XC4010 was a 32-bit long line peripheral bus. It had
4 byte-wide lanes. The processor was byte addressable with byte, 16-bit
halfword, and 32-bit word data types.
Call the processor result bus P[31:0], the peripheral data bus D[31:0], and
the external RAM data bus XD[31:0]. I used these sets of TBUFs: (approx.
144 TBUFs + 32 OBUFTs):
* store byte, halfword, word:
D[7:0] <- P[7:0],
D[15:8] <- P[15:8],
D[31:16] <- P[31:16]
* load byte, halfword, word:
P[7:0] <- D[7:0],
P[15:8] <- D[15:8],
P[31:16] <- D[31:16]
* store various byte lanes to external RAM (OBUFTs)
XD[7:0] <- D[7:0]
XD[15:8] <- D[15:8]
XD[23:16] <- D[23:16]
XD[31:24] <- D[31:24]
* load various byte lanes from external RAM
D[7:0] <- XD[7:0]
D[15:8] <- XD[15:8]
D[23:16] <- XD[23:16]
D[31:24] <- XD[31:24]
* copy bytes/halfwords to upper byte lanes
D[15:8] <- D[7:0]
D[23:16] <- D[7:0]
D[31:24] <- D[15:8]
* copy bytes from upper byte lanes
D[7:0] <- D[15:8]
D[7:0]] <- D[23:16]
D[15:8] <- D[31:24]
In case you are interested, here is some of the source code which generated
this. It is my own "CNets HDL", a C++ class library for emitting XNF. ff()
is a flip-flop, tbuf() is a tbuf. Note the use of tlocs (LOCs for TBUFs).
void Mem::emit(Control& c) {
net(zad24n) = adn(23,20) == 0U;
net(zad20n) = adn(19,16) == 0U;
ff(selROM, zad24n & zad20n, c.marce, _, init(1));
ff(selRAM, ~adn[23] & ~(zad24n & zad20n), c.marce);
ackROM = start & selROM;
ack = ackROM | ackRAM | ackUART;
for (unsigned i = 0; i < 4; i++)
bytesel[i] = (byte & ad(1,0) == i) | (half & ad(1,1) == (i>>1)) | word;
// processor to internal dbus interface
ff(doutbytet, ~write, start, _, init(1));
ff(douthalft, ~(write & (byte|half)), start, _, init(1));
ff(doutwordt, ~(write & (byte|half|word)), start, _, init(1));
// dbus internal/external interface:
// emit 3state drivers to copy external dbus to/from internal dbus
bus(dbusin, cbit);
bus(dpads, cbit);
for (i = 0; i < cbit; i++) {
tsIgnore(dpads[i]);
iopad(dpads[i], ploc(dpadlocs[i]));
ibuf(dbusin[i], dpads[i]);
unsigned t = 1 + even(i);
tbuf(xd[i], dbusin[i], dinbyteextt[i / 8]);
obuft(dpads[i], xd[i], doutextt);
}
// byte/halfword load/store alignment logic
ff(b1b0t, ~( write & byte & ad[0]), start, _, init(1));
ff(b2b0t, ~( write & (byte|half) & ad(1,0) == 2), start, _, init(1));
ff(b3b1t, ~( write & ((byte&(ad(1,0)==3))|(half&ad[1]))), start, _, init(1));
ff(b0b1t, ~(~write & byte & ad[0]), start, _, init(1));
ff(b0b2t, ~(~write & (byte|half) & ad(1,0) == 2), start, _, init(1));
ff(b1b3t, ~(~write & ((byte&(ad(1,0)==3))|(half&ad[1]))), start, _, init(1));
for (i = 0; i < 8; i++) {
unsigned t = 1 + even(i);
tbuf(xd[i+ 8], xd[i ], b1b0t, tloc(rowForBit(i+ 8),20,t));
tbuf(xd[i+16], xd[i ], b2b0t, tloc(rowForBit(i+16),20,t));
tbuf(xd[i+24], xd[i+ 8], b3b1t, tloc(rowForBit(i+24),19,t));
tbuf(xd[i ], xd[i+ 8], b0b1t, tloc(rowForBit(i ),19,t));
tbuf(xd[i+ 8], xd[i+24], b1b3t, tloc(rowForBit(i+ 8),18,t));
tbuf(xd[i ], xd[i+16], b0b2t, tloc(rowForBit(i ),17,t));
}
}
The on-chip "peripherals were a UART and on-chip RAM and ROM, enough to boot
and print a "hello world" message. There was also an integrated DRAM
controller.
You can see a floorplan of this at
http://www3.sympatico.ca/jsgray/sld021.htm.
Old articles which touched on this subject:
http://deja.com/getdoc.xp?AN=120389301&fmt=text
http://deja.com/getdoc.xp?AN=136481723&fmt=text
http://deja.com/getdoc.xp?AN=280290025&fmt=text
http://deja.com/getdoc.xp?AN=398007481&fmt=text
Recently I designed another on-chip bus with particular
CPU-to-bus-controller and bus-controller-to-peripheral interfaces. Please
write me for more information.
Jan Gray
Subject: Re: Microcomputer buses for use inside FPGA/ASIC devices?
Newsgroups: comp.arch.fpga
Date: Sat, 24 Jul 1999 21:48:41 -0700
I wrote:
>...The left 60% of the XC4010 was a 32-bit RISC processor.
>...This used approximately 16x11=176 TBUFs.
Sigh. Rather, 32x11 = 352 TBUFs.
Jan Gray
Subject: Re: Microcomputer buses for use inside FPGA/ASIC devices?
Newsgroups: comp.arch.fpga
Date: Mon, 26 Jul 1999 21:34:30 -0700
Wade D. Peterson wrote in message <7nf1rv$5r$1@news1.tc.umn.edu>...
>1) When you say "on-chip peripheral bus" is this your terminology, or are
you
>refering to a so-called 'OPB' bus that I'm seeing on some cores? For
example, I
>believe that ARM processors use something called an 'OPB' bus.
My terminology, just a descriptive phrase. (It hosted on-chip memory
elements and peripheral elements and interfaced to off-chip memory.)
>2) Do you think your peripheral bus is portable across multiple FPGA
>architectures, or is it limited to Xilinx?
It is port-able, but not especially so, portability was not a design goal.
1. design tool: the CNets C++ class library, would need to be retargeted.
Easy for Orca or Virtex, somewhat less so for other families.
2. implementation: used generic logic expressions and flip-flops, but there
were lots of 3-state buffers, and the design was optimized using LOC
constraints that would not apply to a non-XC4000.
3. interfaces (signaling): would work unchanged across architectures.
(I do not propose the J32 bus for any purpose. I thought it might of
historical interest.)
>> Old articles which touched on this subject:
>I tried these links, but they appear to be dead.
Try again!
>> Recently I designed another on-chip bus with particular
>> CPU-to-bus-controller and bus-controller-to-peripheral interfaces. ...
>Do you have anything written up on these.
Sorry, the docs are not yet ready for publication. But I think some of the
design space issues are:
* zero, one, or more processors? on-chip or off-chip processor? :-)
* clocking -- do CPU clocks equal bus clocks? 1-1? 2-1? 1-2?
* processor has one memory port or two (Harvard)?
* one bus (share processor result bus with on-chip data bus) or two?
* any access to processor resources (e.g. reg file ports)?
* byte addressing? byte/halfword/word types? byte-lane shifting?
* is the on-chip bus connected to an off-chip I/O or memory bus? same
width? same clock discipline?
* wait state insertion?
* multi-master? arbitration?
* interrupt requests?
* DMA requests?
* pipelined bus transactions?
* split transactions?
In my current work-in-progress, the bus is: 1-1 with on-chip CPU's clock,
Harvard, one bus, byte addressable, byte/16-bit-word data types, attached to
a double-cycled external data bus, with arbitrary wait-states, interrupts,
DMA, and pipelined bus transactions.
Other comments.
FPGA Device Architects: this on-chip bus stuff is so much easier if you
follow the XC4000 lead and provide the abstraction of long, wide,
partitionable buses with *abundant* 3-state drivers -- one per logic cell is
good. The bus control itself can be built in programmable logic.
Finally, in designing a on-chip bus with an eye on standardization, note
some interesting design tensions:
1. malleable or fixed bus topologies and clocking disciplines? -- why not
take advantage of FPGA flexibility and define a general bus architecture
space, making allowance for one or more 8-, 16-, 32-, even arbitrary k-bit
buses, and other dimensions of the design space I described above? Then
customers can specialize designs to suit. -- Oops, that adds complexity and
makes validation much harder.
2. lightweight or heavyweight? My current bus has a control overhead of ~2
CLBs per peripheral. At the opposite extreme, imagine an on-chip PCI bus.
The latter would offer many features, like configuration registers, but
these would be of little value in a cheap SOC in an XCS10XL or 20XL.
I can't wait to see an on-chip bus standard (or standards) for FPGAs -- then
we might finally see a marketplace of plug-and-play processors and
peripherals cores.
Jan Gray
Copyright © 2000, Gray Research LLC. All rights reserved. |