|Architecture||Photos||Schematics||Back to projects|
|Technology||Bitslice / PLD|
The Mark 2 was inspired by the PISC (Pathetic Instruction Set Computer) described by Bradford J. Rodriguez of McMaster University in his article A Minimal TTL Processor for Architecture Exploration.
This diagram is reproduced subject to the following copyright notice: From the Proceedings of the 1994 ACM Symposium on Applied Computing. Copyright (c) 1994, Association for Computing Machinery. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
I came across the PISC whilst building my Mark 1. The PISC appeared to be not only simpler but more powerful and I felt a bit foolish. When I tabulated their comparative performance cycle-for-cycle, the Mark 1 wasn't so bad after all.
Whilst the original PISC 1 used the rare and obsolete 74172 register file, the Mark 2, like the PISC 2, uses the also obsolete but not quite so rare Am29705 4-bit by 16 word Dual Port RAM - part of the AMD 29000 Bitslice family.
IR = Instruction Register
The ALU bottleneck was the worst feature of the Mark 1. Here, the ALU sits between the register file and the system buses.
Instruction execution takes two cycles: FETCH and EXECUTE. The program counter (PC) is placed on the address bus during FETCH and IR is loaded from memory. Simultaneously, PC is incremented (by 2). The ALU is used to increment the program counter, which can be any of the general-purpose registers.
The Mark 2 is a 16-bit machine. Words are aligned on even address boundaries. To support byte addressing, there are three HC245 bus transceivers on the memory board. The middle (HL) transceiver is used to access bytes at odd addresses; and the lower (LL) to access bytes at even addresses. 00H is forced onto the upper half of the data bus during byte reads.
PISC memory was organised as words on unit address boundaries with no support for byte addressing. This made it easy to increment the program counter during a FETCH. The 74181 has an "A plus Carry" function. By asserting the carry input, this function is effectively "A+1". It is not so easy to perform "A+2".
A spare section of a 74HC244 is used to force the value 00102 onto the lowest 4-bits of the B bus during FETCH. The function "A+B+Carry" is used on the lower 181 (with no carry-in). The function "A+Carry" is used on the other three allowing the upper 12 bits of the B bus to be left floating.
The Mark 2 has a highly encoded instruction set. I don't consider it a microcoded machine.
|Memory Read Word||0||0||1||1||RB||RA|
|Memory Read Byte||0||1||0||0||RB||RA|
|Memory Write Word||0||1||0||1||RB||RA|
|Memory Write Byte||0||1||1||0||RB||RA|
|Register - Register||0||1||1||1||ALU Function||RB||RA|
|Test Memory Word||1||0||0||1||RA|
|I/O Read Byte||1||1||0||0||RB||RA|
|I/O Write Byte||1||1||0||1||RB||RA|
|Branch if IRQ||1||1||1||0||*D5||D4||D3||D2||D1||D0|
Mark 2 assembly language looks weird. It's implemented using MASM macros:
RdWord MACRO rb, ra RdByte MACRO rb, ra WrWord MACRO rb, ra WrByte MACRO rb, ra R2R MACRO alu, rb, ra:=<0> TestR MACRO r TestM MACRO r Immed MACRO alu, rb, d Cond MACRO cond, dest IOR MACRO rb, ra IOW MACRO rb, ra
I also make extensive use of the following:
$PUSH MACRO rb, sp immed alu_sub, sp, 2 wrword rb, sp ENDM $PULL MACRO rb, sp rdword rb, sp immed alu_add, sp, 2 ENDM
alu_add and alu_sub are ALU function codes.
The source code is available for download here:
|Mk2.INC||MASM include file declaring constants and macros|
|Monitor.ASM||ROM Resident monitor for downloading code via the serial port|
|Split.cpp||Split binary into odd/even ROM images|
|m.bat||Batch file to automate assembly and splitting|
|m4000.bat||Assemble for download to RAM at 4000h|
The bootstrap code which brings the system up following a hard reset can be found at the beginning of Monitor. The initial program counter is 0000h. NEXT is located at address 0004h within the range of an absolute jump (load PC immediate).
† Stack ordering of remainders and quotients left by division words is reverse of standard fig-FORTH.
Programmable logic devices (PLDs) are used everywhere: three on the controller, and one on each of the other cards.
On the controller, at the heart of the system, 22V10.SEQ is the state machine which controls the FETCH-EXECUTE cycle (amongst other things). 22V10.ALU and 22V10.REG assist with instruction decoding. The former, as its name suggests, is concerned with generating control signals for the ALU. The latter controls the address select inputs of the register file. Sometimes, these control signals are derived from the instruction register (IR) and simply pass transparently through the PLD. Other times, e.g. during FETCH, the PLD supplies specific codes.
Instead of conventional control signals like MR and MW, everything is controlled by the imaginatively named 'BUS' bus. This 4-bit code, generated by 22V10.SEQ, and fed to every other PLD in the system, is nearly the same as the instruction set:
|0000||Float||All bus signals go tri-state|
|0001||Start||Load PC with 0000h|
|0011 - 1110||Execute||Same as IR15...12|
The PLD source and compiled JEDEC files are available for download here:
I've again used a 64-way DIN 41612 backplane.
|3||S0||0V / Screen|
|30||0V / Screen||IRQ / FLAG0|
I've successfully clocked the Mark 2 up to 4 MIPS via the external clock input (16 MHz divided by 4) but the bus signals do not look pretty at that speed. The bus is not terminated and it has no ground plane! I generally run it at a conservative 1 MHz as I do the Mark 1.
I had some RAM corruption problems during development due to tri-state bus contention. Basically, I was not allowing the bus to go tri-state between cycles. The outputs were fighting one another for a few tens of nanoseconds producing spikes on the bus and power supplies. Due to propagation delays, de-selected outputs take a while to go tri-state. What's worse, unequal propogation delays through the logic cause weird things to happen while BUS is changing. Much of the decoding is essentially asynchronous combinatorial logic. The solution was to hold off output enable on the memory board for the first quarter of each cycle. This allows time for everything to settle down:
Fortunately, I was able to make this change by modifying the CUPL hardware definition language (HDL) in the 16V8 GAL on the memory board.
The Mark 2 is not that much faster than the Mark 1. Having twice the data width gives it a factor of 2 advantage, which it promptly throws away by taking two cycles to execute an instruction (FETCH-EXECUTE). However, it has more registers to play with and it wipes the floor with the Mark 1 at multiplication and division. The following table compares the number of cycles consumed by each FORTH primitive:
|(LOOP)||* 29/32||* 24/26|
|U*||* 325/613||* 118/182|
The Mark 1 is actually faster at 32-bit negation! Doh! Ah well, never mind, back to the drawing board ...
|Copyright © Andrew Holme, 2003.|