Hitachi's SH series were supposedly named that because they were "super hot". Their marketing department didn't manage to propagate that interpretation but I liked the architecture.
> The SH-4 is probably most famous for being the processor behind the Sega Dreamcast.
It was always amusing to visit Sega HQ back in those days and see how all-in they were on Hitachi: from the Hitachi elevator in the building to the conference rooms with Hitachi OHP, Hitachi wall clock, Hitachi conference table, Hitachi pens, hitachi computers and of course Hitachi chips... Those huge conglomerates (Keiretsu) are crazy!
It's not because of their temperature. One version of the SH3 (single issue, at 80 MHz) is speced at 260 mW. The SH4 used in the Dreamcast (dual issue at 200 MHz), seems to be 4W.
They definitely don't run hot. In addition to Dreamcasts, they were used in many appliance type devices like NAS, and I've never seen or heard of any that require cooling.
Undoubtedly retronymized from Super Hitachi as in the successor to Hitachi H8... So if it weren't supposed to stand for hot, it'll still have had to be something with H.
One thing I remember was how good Hitachi's documentation is! It was clear and concise and any quirk was well explained. The pseudo-code they used to describe each instruction was also good, C-like to the point that I was able to copy-paste the snippets with minor changes to write an emulator.
However the architecture itself was a bit annoying to work with because of the small opcodes. Doing something as simple as loading a 32bit immediate to a register ended up being 7-9 instructions. More realistically the compiler would put the values in LTORG and we'd load them from there in 1 or 2 instructions. Relative jumps were also an issue for the same reasons. It also inherited the infamous delay slots where half the instructions are illegal and the other half is quirky so we just NOPed them but I know that smarter compilers do useful work there.
That being said I do admire the elegance of how they fitted their 100 instructions with 16/24 registers in a 16bit opcode, it's a beautiful application of engineering trade-offs :).
Yes, I do Dreamcast homebrew, and even with GCC 12.2, the compiler does such stupid, stupid things.
The SH has addressing modes for postincrement reads and predecrement writes, so something like
var = *ptr++;
can be compiled down to a single instruction. But GCC almost always does something moronic. I've seen it generate code equivalent to:
ptr++; tmp = ptr; tmp -= 16; var = tmp[15];
Insane.
Older versions of GCC (3.x) are generally much better at low level instruction selection, but newer versions are better at big-picture stuff like optimizing across function calls, so speed-wise, across a whole program, they aren't too dissimilar. Newer versions are still worth it for the feature set (better warnings, modern standards like C23/C++23, etc).
The article mentions that ARM licensed SuperH patents for their Thumb instruction set, and ARM did that for improved code density, so that seems to be what they got in return.
If you read through all 10 parts of the series, Raymond later mentions that it might look like a lot of code but instructions are half as big as other RISC contemporaries so it's better than it looks.
> instructions are half as big as other RISC contemporaries
That's a bit of a gimmick, tbh. Yes you can just about make it work if you have a really barebones insn set, as the early SuperH chips had. (You do need to start from a 2-register insn format of course.) But modern chips - even quite small chips nowadays - benefit from having more than that, so you're left with a really limited niche.
(I do think that the calculus changes if you have a strictly Harvard architecture chip/core, since these might be able to afford a "weird" instruction length. A core with, e.g. 20-bit or 24-bit fixed-length instructions designed along the same lines gains some much needed space for ISA extension.)
Alas that the j-core kind of dried up development. There was a roadmap for this open super-h compatible core up to sh4 then a 64 bit version. Kind of tied to key patents expiring.
There's a j32 & j32smp core now, but I'm not sure where-ish that would slot into the old roadmap, and it seems like development isnt active. https://j-core.org/roadmap.html
Meanwhile support keeps bitrotting. Someone managed to get a Debian port going in 2015, & it sounded semi adventurous to make go. https://lwn.net/Articles/647636/
Chipmaker Renesas - who kind of was super-h for a while - meanwhile seems to be starting down the RISC-V path, alongside a variety of ARM cores & their own (small microcontrollers) RL78 architecture that I don't know much about. https://www.cnx-software.com/news/renesas/
Was really looking forward to the j-core project and got into SH4 emulating the Dreamcast port of NetBSD which is currently available for the latest version which is 9.3 I believe.
There are also lots of interesting things going on with NAOMI emulation on the gaming side. All of it is very hackable and j-core would have made a great addition to that.
I did a lot of work for both Hitachi and Renesas over the years and used SH-1, 2, 3 and 4 with various OSes. I liked the SH architecture, I first learnt assembly programming on PDP-11s and SH felt like an extended RISCified PDP-11.
The SH-3 is also the CPU of the Cave CV1000 arcade board, which gave us the best bullet hell games. (Though arguably, Cave's developers thought the board was way underpowered. Still, they did an amazing job with the games)
SH also powers many of Casio's programmable calculators. Prizm is especially interesting in that it's a fairly traditional device (powered by AAA batteries!) with a decently large color screen that has a full-fledged C SDK available for it - and while Casio doesn't provide any official docs or other support, they don't block third party software, either. FX-CG50 is powerful enough to run a NES emulator (https://github.com/TSWilliamson/nesizm).
Please do note how these units do division. I was very stumped why div, rotcl were used in unrolled form until Raymond explained it and this is how I reverse engineered one of many division functions of a SH2 firmware.
Had fun almost 20 years ago with it during my internship at Ricoh : they were porting Linux on it (mostly as a research, the device was already nearing its EOL, but it had a touch screen with a stylus and 2 PCMCIA ports, which made it possible to put a WIFI card on it) and a made some small demonstration programs. Spent a lot of my time fighting to manage to make libs compile on it ^^;
I thoroughly enjoy these deep dives into CPU architectures. I forget most of it within seconds after reading it, because I have no practical interest in it. But it is fun to get a look underneath the pavement so to speak.
I actually got interested in SH3 again recently as one of my viewers donated me some hardware to see if I could port Gentoo to run on it. It's more a passion project however I did manage to get a PoC image to semi boot over a weekend so I do hope I'll manage to get this into a state that I can merge this into Gentoo to officially support it again.
> The SH-4 is probably most famous for being the processor behind the Sega Dreamcast.
It was always amusing to visit Sega HQ back in those days and see how all-in they were on Hitachi: from the Hitachi elevator in the building to the conference rooms with Hitachi OHP, Hitachi wall clock, Hitachi conference table, Hitachi pens, hitachi computers and of course Hitachi chips... Those huge conglomerates (Keiretsu) are crazy!