next up previous contents index
Next: Future Research Up: The Dreams System Previous: Subroutine Threading   Contents   Index


Hardware Assistance

We have devised a hardware implementation of the dynamic binding mechanism used in Dreams. This hardware technique permits determination of the currently active binding of any execution token without the addition of any processor clock cycles. Since the establishment of a new binding only involves a few instructions on a processor equipped with the Dreams hardware, and the use of any such binding is absolutely free with respect to CPU time, this invention provides real-time embedded systems with a very effective way to incorporate object oriented programming. This invention can switch the binding environment to a totally different environment in only one machine cycle. This is advantageous in multi-tasking interrupt driven embedded real-time systems.

For example, the Harris RTX-2000 series of Forth-based RISC microcontrollers can service an interrupt by performing a context switch on the processor registers in 400 ns, which is 4 machine cycles [Har88,Har90]. With the Dreams hardware, only 1 additional machine cycle is needed to perform a context switch, including the switching of the complete active binding environment. To perform a complete context switch between two processes in an object oriented program in only 5 machine cycles, or 500 nanoseconds, is impressive.

The hardware itself is composed of a bank of fast static RAM inserted into the instruction fetch logic of the processor. This RAM performs a programmable address translation on the destination address of each subroutine call. This RAM token translation table, or TTT, is divided into pages, with one page for each task. When a task switch occurs, the active TTT is changed by switching the active page in RAM. Since the processor uses a significant portion of a clock cycle to determine the type of an instruction, the TTT can translate the destination portion of a subroutine call instruction into a new destination by addressing the TTT with the destination of the original subroutine instruction and using the contents fetched from the TTT as the actual subroutine address. This translated address is then placed into the program counter to determine the next instruction address. The translation is performed in parallel with the instruction decode, so no time penalty is paid.

To change the binding of an execution token, all that is needed is to store the new binding address into the associated token slot in the TTT. Of course, to support the Dreams system, the old binding must be read and pushed onto the active bindings stack, but this is a detail that is handled by the software in the Dreams programming system. To perform a task switch, the bindings active in the old task must be saved, and the new task's bindings must be restored. This is done very quickly by having enough pages in the RAM to support all the time-critical tasks in the application, and at task switch time, performing an output operation to select the active page of the RAM. This saves the old bindings and restores the new ones in a single machine cycle. Typical embedded applications should not use over 1 K tokens in RAM per task. The RAM must be fast enough to keep up with the processor instruction decode operation. The TTT must be wide enough to hold a destination address. On the RTX-2000, this is 15 bits for the small memory model, and 19 bits for the large memory model.

On the RTX-2000, the machine cycle is 100 ns, and 25 ns static RAM would do the job just fine. Harris has indicated3.1 that with the RAM on the same chip as the processor, 40 ns RAM would do. They say it is straightforward to put 8 K cells of such RAM on the same chip as the processor. This would support 8 tasks with a 500 ns context switching overhead running an object oriented program. The cost for a commercial grade chip should be around $65.00 in quantity. We do not feel that there is another processor around that can come anywhere near this price/performance ratio for a real-time micro-controller running object oriented applications.

The same ideas are readily adaptable to the John Hopkins University Applied Physics Laboratory Forth-on-a-chip project [HL89], and Phil Koopman's WISC machine [Koo86,Koo89]. The WISC has a writable instruction set, so the dynamic binding mechanism could be implemented with writable firmware, which means the additional hardware is not needed, but the execution time would be a bit slower.


next up previous contents index
Next: Future Research Up: The Dreams System Previous: Subroutine Threading   Contents   Index
Robert J. Brown
1999-09-26