Finitron - HDL Artistry

 

 

   
Core Source Sample Projects Software Documents Links Contact Info About Pictures 6502 Page Flash Fiction



 

Custom Search

October 16, 2016
The latest batch of work has been on a simple .MNG file viewer. It is capable of viewing MNG files in the simplest format. Finray has been extended with the ability to loop back and parse multiple frames of information. From this it can generate simple animation. It stores sequences of PNG bitmaps which can then be loaded with FNG (The MNG file viewer) and turned into simple MNG files.
April 27, 2016
For the past couple of weeks bitmap controllers have been on my mind. It's amazing how something fundamentally simple can get to be fairly complex. The basic operation can be summed up in a single line. A bitmap controller reads through memory in a linear fashion and outputs to a display. However once you throw in options to support multiple display resolutions and color depths things start to get complex. For the latest bitmap controller added on top of simple display capability is pixel plotting and fetching. Pixel plot / fetch is a reasonable operation to perform in a bitmap contoller as bus aribtration for memory is already present. Depending on the color depth pixels may fit unevenly into memory locations. This can result in complicated software to fetch or store a pixel. Software can be made simpler by provided a hardware pixel plot and fetch.
March 08, 2016
I've been experimenting with ray-tracing and come up with a "simple" ray-tracing program. The program uses a ray-tracing script file (.finray) to generate images. The script language supports generation of random vectors so that random colors and positions may be used. It also supports composite objects and repeat blocks. The display of a group of object may be repeated a number of time. The image below shows some sample output.
March 03, 2016
I took a break from FPGA cpu's for a bit to develop some games. I created a rendition of the venerable asteroids game. It's available for download in the software directory.
Jan 09, 2016
I've been experimenting with error-correction for the memory components of the latest system. I found a bad bit in the host system and the way to work around it was to use error correcting memory components. The diagram below shows the error correction associated with DRAM memory. It stores an eight bit byte plus five syndrome bits in a sixteen bit memory cell. The reason I chose to error correct on a byte basis rather than a word basis is that correcting on a byte basis doesn't require implementation of read-modify-write cycles.

Once error checking is included there is some justification for using bytes larger than eight bits in size. A five bit syndrome can provide error correction information for up to eleven data bits not just eight bits. Using eleven bit bytes plus five bits for error checking it would fit nicely into 16 bits. One would likely be using a 16 bit path to store an eight bit byte plus five bit syndrome to memory. So why not use all the bits and go with eleven bit bytes instead ?

 

October 24, 2015
Most recently I've been working on porting Fig Forth 6502 to the RTF6809 and converting it to use 32 bit Forth words. It doesn't quite work yet, but it's close. Forth is an interpretive computer language. I hope it to be able to make use of the RTF6809's 32 bit address space. The work is posted on my github account.
June 25, 2015
I've started yet another FPGA processor project called Dark Star Dragon One (D.S.D.1). Featuring variable length oriented instructions, segment registers, branch registers, and multiple condition code registers. Yes this does mean I'm shelving the FISA64 project for now.

The author is of the opinion that any serious processor will have variable length instructions, the improvement in code density and cache usage is just too great to avoid. 16 bit instruction were added to FISA64 and improved code density by about 20%. Having an inherently variable length architecture should improve things even more.

Segment registers do get used in general purpose applications. DSD1 will be reusing some of the segmentation model from the Table888 project.

The branch register set is really just a collection of registers that are specially defined in most instruction sets. This set includes the program counter, exceptioned program counter, return address register and others. In this design they are given their own explicit register array.

April 21, 2015
FISA64 is continuing to occupy my time. I've been posting about it frequently in BLog style at anycpu.org. Yesterday's work was on the compiler try/catch mechanism and getting CTRL-C events to be handed to tasks. In the past month I've written a system emulator for the FISA64 test system and have been using it to test out software. I added then removed bounds registers from the processor design, then added a simpler check (CHK) instruction instead.
March 15, 2015

Tonight's quandary is a design decision that leaves the same FISA64 branch instruction branching to one of two different locations depending on whether or not it's predicted taken. FISA64 makes use of immediate prefixes to extend immediate values beyond a 15 bit limit set in the instruction.

Branch instructions canít make proper use of an immediate prefix because they donít detect an immediate prefix at the IF stage in order to keep the hardware simpler. (There is no requirement for conditional branching more than 15 bits). However a branch instruction just uses the same immediate value that is calculated for other instructions in the EX stage. This could lead to branches branching to two different locations if an immediate prefix is used for a branch.

For example if a prefix is used with a branch, BEQ *+$100010 for instance (the $100000 displacement would require a prefix). Then the branch will branch to *+$10 if it is predicted taken (ignoring the prefix), but to *+100010 if itís predicted not taken, then taken later in the EX stage.

 If the branch is predicted taken, itíll branch using the 15 displacement field from the instruction. If the branch is predicted not taken, but is taken later in the EX stage, itíll branch using the full immediate value, which with prefixes could be up to 64 bits. The solution is that the assembler never outputs branches with prefixes. There is no hardware protection against using an immediate prefix with a branch. 

In the IF stage ,rather than look at the previous instructions for an immediate prefix, the processor simply ignores the fact a prefix is present, and sign extends the branch displacement in the instruction without taking into account a prefix.

IF stage:

                                if (iopcode==`Bcc && predict_taken) begin

                                                pc <= pc + {{47{insn[31]}},insn[31:17],2'b00};   // Ignores potential immediate prefix

                                                dbranch_taken <= TRUE;

                                end

However, the EX stage uses a full immediate including any prefix, also to simplify hardware.

EX stage:

                                `Bcc:       if (takb & !xbranch_taken)

                                                                                update_pc(xpc + {imm,2'b00});   // This uses a ďfullĒ immediate value

 

December 28, 2014
Addressing modes in a modern processor are boring. For the typical RISC processor only a single address mode is supported because it's the minimum needed. That address mode is register indirect with displacement. A register is added to a displacement to form the memory address. Sometimes indexed addressing using two registers is also supported. Few new processors have available memory indirect addressing modes. The plethora of addressing modes on an older processor like the 680x0 series made the processor interesting. The key benefit to memory indirect addressing modes is that it allows pointers stored in memory to be larger than the size of a register. This is put to good use in the 6502 processor. In the latest ISA FISA64 memory indirect address modes are available to experiment with. a 128 bit address space is supported using memory indirect address modes.
December 12, 2014
Tonight's lesson is one about clock gating. When a clock is gated it introduces a buffer delay to that clock tree. If the ungated version of the clock is also being used, the buffer delay in the ungated version needs to be matched with that of the gated clock. Otherwise if the buffer delay isn't matched the P&R tools may have a heck of time trying to meet timing requirements.

December 10, 2014
IEEE standard for floating point isn't the simplest thing to get working, or so I'm finding out. I've spent some time recently working with floating point units both standard and non-standard. One can do a lot of computing without floating point. Many early micro-processors didn't support floating point at all. How to incorporate floating point into an older system using an eight bit micro came to mind. FT816Float is a memory mapped floating point device oriented towards byte oriented processing. It's a bit non-standard and makes use of a two's complement mantissa rather than a sign-magnitude one.
November 7,2014
Yet another ISA is born this past week. FISA64 is a 64 bit ISA that attempts to overcome the shortcomings run into with the Scarerob-V ISA. Rather than having a segmentation model that works automatically behind the scenes, the FISA64 ISA requires "manual" manipulation of the segment registers. This is possible by supporting two modes of operation: kernel and application. In kernel mode the address space is a flat unsegmented one. This allows the segment values to be manipulated without affecting the processor's addressing. The segmentation model supports up to a 128 bit address. The processor does not support a paging system.
November 3, 2014
I spent the past week or so working on a new ISA. Well I synthesised an implementation of it, and it's too big. Too big at (122 %) the size of the FPGA. It's a shame because it had a nice segmentation and protection model, similar to x86 series. Projects tend to get bigger with bug fixes, so there's no way to shoehorn it into the FPGA. So for now it's another project that's being shelved. Time to get back to a basic simple 32 bit ISA. Why not RISC-V ? I'm not overly fond of the ISA layout and the branch model. There's also fewer instructions than I like to see in the base model. Sure the ISA can be extended with brownfield or greenfield extensions but then there's the issue of compatibility. If one is going to go to the trouble of extending the ISA and developing toolset changes to support the extended ISA, why not just start one's own ISA ? One wants to use an existing ISA to leverage the use of the ISA's toolset.
October 31, 2014
Scarey Halloween. They're back. The nightmare of segment registers. I wasn't going to include them in the latest ISA design, but I've changed my mind after reading up on how they are used in a modern OS. Normally segment registers (CS, DS, SS) are initialized to zero and left alone. However other segment registers (FS, GS) can be used like an additional index register in an instruction to quickly point to thread local storage and global storage areas. So I've added segment base registers to be used in this fashion to the latest ISA design. The latest ISA in the works is called Scarerob-V given that it's halloween, and other recent events. Scarerob-V ISA makes use of variable length instructions which are much shorter than those of Table888.
October 22, 2014
The RISC-V ISA (riscv.org) has a lot going for it. Variable length instructions, extensibility with 32/64/ and 128 bit versions. A simple base ISA and a number of standard extensions. It seems to be one worth studying and I've spent some time studying this recently. It's become an implementation project on my todo list. The RISC-V ISA is an ISA that attempts to please all. It'll be interesting to see how well it works in practice.
October 4, 2014
A couple of Flash-Fiction stories have been added recently to the website. A page for character descriptions has also been added. The Finitron verse is slowly expanding.
August 19, 2014
Back to the drawing board. I've started working on yet another soft processor core, expanding my toybox furthur. The instruction set will be similar to Table888's. Support for a segmentation model is not going to be provided. Also dropped is index scaling on the indexed addressing mode. The new core will stay with a 40 bit fixed size opcode, and 256 registers.
July 15, 2014
I've taken a break from my normal HDL artistry to work on a piece of software that generates artificial maps. The basic map generator is based on something called a Veronoi fracture map. The fracture map simulates lumps of matter composing the planet. Previously the map generator was based on a fractal generator which generated nice looking maps but they weren't very realistic (it placed mountains in the centre of continents). Now mountains are along the coast and where there is extreme difference in elevation, more in line with reality.
July 10, 2014
Learning more about the .ELF file format and how to link object files together was the order of the day. .ELF files are a popular standard file format used to represent executable and relocatable files. I was looking at the extended ELF64 file format developed by HP/Intel with the intent of supporting the format for the Table888 project. The A64 assembler can output .ELF files in addition to binary and listing files. In theory the L64 linker can link together .ELF relocatable files produced by the assembler. It's the first time I ever wrote a linker, and there's still a couple of issues to resolve with it.
July 01, 2014
Got hung up on mneumonics. The compiler called the exclusive or function XOR and the assembler recognized only EOR. A quick fix to the assembler allows it to recognize XOR as well as EOR as the same instruction. I can never make up my mind on that one, so I'll just support it both ways. All kinds of different mnemonics are used to represent essentially the same instructions in different assembly languages. Is that ADDC the same as the ADC in another instruction set ? One has to research carefully sometimes while working with assembler code. Is SED set the decimal mode or set the direction flag ?
June 21, 2014
I needed something small and simple to test the C64 compiler with and I needed some sort of file system available for my system. Luckily I found ChaN's FatFs which fits the bill. ChaN's system provides all the basics for a FAT file system operating in an embedded system. All one needs to do is to supply a few interface routines to the low level disk access. I've been busy working towards a simple SD Card access system. My current goal is to be able to load and run a file from the card. After a few compiler fixes I've got as far as being able to display a directory. It's a slow going circus dance.
June 12, 2014
FPP (Finch's macro pre-processor) has been updated with some bug fixes. It's undergoing testing by compiling the MINIX system. The fixes include an operator precedence problem fix and a macro expansion bug fix. The pre-processor was originally written in 1992 so it's now 22 years old. Recent work has been on the C64 compiler, modifying it to support the Table888 processor.
June 8, 2014
Tonight's escape is clock throttling. Clock throttling or controlling the clock rate can be used to control power consumption. The lower the clock frequency is, the less power is used. Power as we all know is physically proportional to frequency. Being able to control power consumption is one place where a gated clock might be used. Generally speaking gating clocks is not a good idea but occasionally it is done. Fortunately the FPGA vendor provides a clock gate specifically for handling gated clocks. Incorporated into Table888 (the latest processor work) is a clock gating register. This register is filled with a pattern that controls the clock gate, for power control.
June 3, 2014
NOP Ramps are my latest craze. |n order to avoid really complicated hardware, the concept of NOP ramps can be used. I'm talking about what happens when instructions cross page boundaries in a system with memory management. The problem with instructions spanning page boundaries is classic. If there is memory management page miss, the instruction needs to be re-executed once the missing page is brought into memory. In order to ensure proper operation both the missing page and the previous page need to be in memory. Re-executing instructions can be a non-trivial problem. Fortunately what I'm working on only has a handful of instructions that can cross page boundaries. Rather than attempt to re-execute the instructions, the assembler just forces the instruction into the next page of memory by inserting NOP instructions. Hence it's the NOP instructions that span the page boundary. If there is a need to re-execute them, then it is trivial to do so. NOP ramp example:

00008FF0      41 F8 2A 90 00             bne           fl0,kbdi2

00008FF5      16 01 24 00 00             ldi           r1,#36

00008FFA      EA EA EA EA EA       ; imm

00009000      EA EA EA EA EA

00009005      FD 70 FF 03 10

0000900A      A0 00 01 00 18             sb            r1,LEDS

May 15, 2014
One can write a lot of code using just three registers if one codes in assembly language and is careful. With just a few registers to work with, a byte-code processor can offer high code density. This is great for microcontroller type applications where memory space may be constrained. What if one wants more registers in order to support a compiled language ? RISC processors were originally designed for high performance with compiled languages. The typical RISC processor uses a fixed size instruction format. Unfortunately, one size does not fit all instructions, and the result is that code density for the typical RISC style suffers. To improve code density one can look at the typical operation performed and encode them in as few bits as possible. Allowing the size of instructions to vary in a design based on a RISC processor, results in a kind of hybrid processor; the worst of both worlds. Lower code density and higher complexity. Unfortunately processors become complex anyways when they have to support legacy systems. Trends for currently popular architectures include variable instruction sizes (ARM, INTEL) and flags registers (ARM, INTEL, SPARC). If one removes the limitations of a fixed size instruction set, one can optimize instructions for code density. It's amazing how adequate a branch instruction composed of an eight bit opcode, and an eight bit displacement is. This sixteen bit instruction covers about 90+% of the cases where a branch would be used. The RTF65003 strives to have a good mix of legacy support, while adding additional registers and increasing the addressing space. It is necessarily more complex than an new design.
May 14, 2014
What's better than the RTF65002 ? - The RTF65003. There are several things I don't like about the RTF65002 so I started working on a better version. One item is the branch target address. In native mode on the RTF65002 the target address is computed relative to the address of the instruction; this is different than the '02 and '816 where the address is computed relative to the address of the next instruction. The '003 follows the convention set by the '02. Another issue is the different code and data addresses of the RTF65002. The RTF65002 is a word addressed machine for most data operations and this makes it difficult to use with a compiled language like 'C'. I decided before putting a lot more work into porting software, to create an improved version of the processor. The RTF65003 has byte addressable memory operations, and greater support for different operand sizes. Byte (8 - bit) or character (16 -bit) prefix codes can be applied for memory operations to override the default of a word sized operation. Prefix codes are used to modify the behaviour of following instructions rather than creating a whole bunch of rarely used instructions.
May 12, 2014
Compiled code for the RTF65002 generated signed multiply instructions which hadn't been added to the processor. Two possible solutions were to either change the compiler so that it generated code to perform sign adjustments or modify the processor to include signed multiplies. Thinking that adding signed multiplies to the processor would generate too much additional overhead, they had been left out; I decided to try adding them. Well, lo-and-behold adding the functionality made the processor smaller and faster (by about 5%!). I guess adding the opcodes simplified the instruction decoder.  Encouraged by this good fortune I decided to try adding signed division and modulus operations as well. Doing this resulted in almost no impact to the size or speed of the processor. So the RTF65002 now supports both signed and unsigned multiply / divide / modulus operations.
May 11, 2014
I love today's machines. It makes it possible to do things that were impossible on those of yesteryear's. Take for example a string handling library. The descriptions of the strings can consume more memory than ever before. The current string library I've got makes each string a member of an all strings list, so that all the strings can be garbage collected all at once. Making a list like this isn't practical on a small machine because it would consume too much in the way of memory resources. I also blithely load entire text files into strings, rather than process a line at a time. It seems like poor programming practice, but it's really in the interest of simple algorithms.
May 9, 2014
In order to implement firstcall blocks in a compiled language, auto-converting branches are used. An auto-converting branch (ACBR) acts like a NOP instruction (a branch never) and a store the first time it is executed, and it changes itself into a branch always (BRA) instruction for subsequent execution. In order for this to work properly any instruction cache has to be disabled; this is likely desirable anyway for one-time executing code, so that it doesn't fill up the cache. Shown below is a sample usage and resulting compiled code.

// High level language

firstcall {

     printf("This appears the first time only.\r\n");

}

start_tick = get_tick();

; Generated assembler code:  

     icoff               ; turns off instruction cache

    acbr L_9         ; auto-converting branch into a bra

    ld r5,#L_3>>2  ; get parameter for printf

    push r5

    jsr printf         ; call the printf() routine

    sub sp,#-1      ; dump the parameter

    ld r5,r1

L_9:

This is an excerpt taken from a prime number sieve program written in C32 a C like language. It has been successfully compiled and run on the RTF65002 processor. The program was compiled, assembled, then the resulting binary placed on SD card. It was subsequently loaded and run as a task in the test system.
May 5, 2014
Supermon816 is now running on the RTF65002 in 65C816 emulation mode. Supermon816 is a monitor program contributed by BDD (Big Dumb Dinosaur) at 6502.org. It allows one to assemble / disassemble programs, dump memory, and search for data and more. The program can be activated by pressing 'SU' at the '$' prompt on the test system. The 65C816 is an 8/16 bit processor found in a few systems like the Apple IIgs and SNES. The RTF65002 test system has it's own monitor program for native mode, which is slightly different than Supermon. Supermon816 is one of the first programs to run in '816 mode, and helped to verify that emulation is working correctly.
April 17, 2014
Doing some analysis on filesystems, I wanted a breakdown of the number of files of different sizes. To do this I've modified dir2html to report this information. For a really simple file system, the disk may be broken up into slots of various sizes, the number of slots of a given size determined as a percentage of the disk size. With enough files on the disk there's bound to be some averages. For instance an analysis of c: drive revealed that 27% of files were under 512 bytes in size. 90+% of files were under 64kB in size. I want the file system primarily to store BASIC programs, plus some executables for my system, so I don't expect massive multimedia files.
April 14, 2014
Spent some time working on a file system. I've been studying file systems and decided to create a derivative of the ext2 filesystem. Information on the ext2 file system can be found here. Information on the derived system can be found here. Currently the only tool I have to work with is an assembler, so it's been in assembly language so far. The file system code is complex enough that it really requires a compiled language of some sort. I've been hesitant to finish porting a compiler for the RTF65002 because whew! It's a lot of work.
April 8, 2014
This past week has been spent updating the RTF65002 core to include a backwards compatible 65C816 emulation. The core now supports three modes of operation RTF65002, 65C816 and 65C02. The core uses about 1/2 of an xc6SLX45 FPGA with all options included. The current sources are on GitHub.
March 15, 2014
Today's study is INTEL / AMD architecture and register renaming. Briefly, register renaming is a trick used to enhance performance accomplished by making more registers available than are visible in the programming model. Renaming target registers can allow more instructions to complete in parallel than could be done without renaming. Renamed registers requires a rename map which translates logical register names to physical register names. It may also requires a fifo containing the names of physical write registers, depending on the implementation. Needless to say, a  lot of trickery is involved.
Dec 24, 2013
Merry Christmas ! I added a couple of new stories under the flash fiction folder. I continue my studies of processors.
Dec 16,2013
A query about producing a video display for a 20MHz 6502 system has got me thinking about video subsystems again. In days gone by this would have been handled using a CRT controller chip like a 6845 or 8275 and associated support logic. These days a trend for simple video is to use a microcontroller to generate an NTSC composite signal. Unfortunately the NTSC standard is an older standard. Modern video systems use DVI or HDMI interfaces which is more difficult to produce using just a micro-controller. However, a video system isn't too bad to implement in an FPGA. There are several "free" cores for DVI or HDMI video generation. While an FPGA could be used as just a video generator, the trend with FPGA's is to place the entire system on the chip in order to reduce the component count. I've built several system-on-chips and usually include at least a text video display. One simple solution is to use dual-port RAMs in the FPGA with the microcontroller accessing one side and the video display circuitry accessing the other.
Dec 14, 2013
The CAS instruction, an acronym for Compare-And-Swap, performs an atomic read-modify-write operation to swap a value in memory with one in a processor register. Only if the value read from memory matches a set value does the write portion of the read-modify-write take place. This can be implemented with a 'classic' read-modify-write cycle supported by the WISHBONE bus spec. The CAS instruction is used for semaphore operations. Typically the pid (process id) or task id is written to the memory if a semaphore is available to indicate a new owner of a semaphore. Semaphores are important in multi-processor systems where they are used to control access to shared resources. Most cpu's that might be used in multi-processor systems have some form of hardware support for semaphores.
Dec 12,2013
I've been working with pipelined burst memory accesses using the WISHBONE constant address burst cycle, cycle type 001. This isn't really the intent of the constant address burst cycle. But it's the only cycle type that allows ignoring the address bus during a transfer. Constant address bursts are intended for interfacing to I/O ports and FIFO's.  But now I've run into a situation where a real constant address is needed during the burst (to implement a CAS instruction). So I need to define a new bus cycle type (011) to support older pipelined burst accesses that were previously accomplished with cycle type 001. The WISHBONE spec supports an incrementing address burst cycle, cycle type 010. However this type of cycle waits for an bus acknowledgement response before the address can change, and doesn't work well with pipelined memory access. The cycle type I'd like to define differs friom the incrementing burst cycle type in that it doesn't wait for an acknowledgment back from the memory before incrementing the address. Instead the bus cycle operates by streaming a constantly incrementing address to the memory, then latching data as ack_i's come back. A diagram is here. The WISHBONE bus is a "free" bus specification available at OpenCores.org.