Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The SuperH-3, part 1: Introduction (2019) (devblogs.microsoft.com/oldnewthing)
81 points by ksec on Jan 21, 2024 | hide | past | favorite | 33 comments


Hitachi's SH series were supposedly named that because they were "super hot". Their marketing department didn't manage to propagate that interpretation but I liked the architecture.

> The SH-4 is probably most famous for being the processor behind the Sega Dreamcast.

It was always amusing to visit Sega HQ back in those days and see how all-in they were on Hitachi: from the Hitachi elevator in the building to the conference rooms with Hitachi OHP, Hitachi wall clock, Hitachi conference table, Hitachi pens, hitachi computers and of course Hitachi chips... Those huge conglomerates (Keiretsu) are crazy!


> Hitachi's SH series were supposedly named that because they were "super hot".

Super hot like they're great chips, or super hot like they're space heaters that incidentally do math?


It's not because of their temperature. One version of the SH3 (single issue, at 80 MHz) is speced at 260 mW. The SH4 used in the Dreamcast (dual issue at 200 MHz), seems to be 4W.

Hitachi apparently had a trademark on "Cool Engine" for the SuperH (https://www.cpushack.com/CIC/embed/announce/HitachiSH7709.ht...).


As in the "hot chips" conference -- supposed to mean "amazing and a big hit". I mean, would the marketing department promote something negative?


They definitely don't run hot. In addition to Dreamcasts, they were used in many appliance type devices like NAS, and I've never seen or heard of any that require cooling.


Undoubtedly retronymized from Super Hitachi as in the successor to Hitachi H8... So if it weren't supposed to stand for hot, it'll still have had to be something with H.


Super Hot like they were named in Japan in the early 90s


What ever happened to them.. why don't Fujitsu and Hitachi design chips ?


The Japanese chip manufacturers weren't all competitive on their own so they rolled everything together into Renesas to gain better scale


And Renesas continues to produce the existing chips/architectures (like SuperH), but also adopted ARM due to customer demand.



I worked on SuperH3/4 compilers back in the day.

One thing I remember was how good Hitachi's documentation is! It was clear and concise and any quirk was well explained. The pseudo-code they used to describe each instruction was also good, C-like to the point that I was able to copy-paste the snippets with minor changes to write an emulator.

However the architecture itself was a bit annoying to work with because of the small opcodes. Doing something as simple as loading a 32bit immediate to a register ended up being 7-9 instructions. More realistically the compiler would put the values in LTORG and we'd load them from there in 1 or 2 instructions. Relative jumps were also an issue for the same reasons. It also inherited the infamous delay slots where half the instructions are illegal and the other half is quirky so we just NOPed them but I know that smarter compilers do useful work there.

That being said I do admire the elegance of how they fitted their 100 instructions with 16/24 registers in a 16bit opcode, it's a beautiful application of engineering trade-offs :).


Have you seen how terrible SH code generation is? At least code compiled in 2005-2009. Unnecessary moves:

mov r0, r3 <-result from a subroutine

mov r3, @r4

mov @r4, r3

do something with value.

And many many more examples like this. Wasting cycles and space.


Yes, I do Dreamcast homebrew, and even with GCC 12.2, the compiler does such stupid, stupid things.

The SH has addressing modes for postincrement reads and predecrement writes, so something like

  var = *ptr++;
can be compiled down to a single instruction. But GCC almost always does something moronic. I've seen it generate code equivalent to:

  ptr++; tmp = ptr; tmp -= 16; var = tmp[15];
Insane.

Older versions of GCC (3.x) are generally much better at low level instruction selection, but newer versions are better at big-picture stuff like optimizing across function calls, so speed-wise, across a whole program, they aren't too dissimilar. Newer versions are still worth it for the feature set (better warnings, modern standards like C23/C++23, etc).


>> engineering trade-offs

Its not obvious what they trade off and got in return.


The article mentions that ARM licensed SuperH patents for their Thumb instruction set, and ARM did that for improved code density, so that seems to be what they got in return.

If you read through all 10 parts of the series, Raymond later mentions that it might look like a lot of code but instructions are half as big as other RISC contemporaries so it's better than it looks.


> instructions are half as big as other RISC contemporaries

That's a bit of a gimmick, tbh. Yes you can just about make it work if you have a really barebones insn set, as the early SuperH chips had. (You do need to start from a 2-register insn format of course.) But modern chips - even quite small chips nowadays - benefit from having more than that, so you're left with a really limited niche.

(I do think that the calculus changes if you have a strictly Harvard architecture chip/core, since these might be able to afford a "weird" instruction length. A core with, e.g. 20-bit or 24-bit fixed-length instructions designed along the same lines gains some much needed space for ISA extension.)


Alas that the j-core kind of dried up development. There was a roadmap for this open super-h compatible core up to sh4 then a 64 bit version. Kind of tied to key patents expiring.

There's a j32 & j32smp core now, but I'm not sure where-ish that would slot into the old roadmap, and it seems like development isnt active. https://j-core.org/roadmap.html

Meanwhile support keeps bitrotting. Someone managed to get a Debian port going in 2015, & it sounded semi adventurous to make go. https://lwn.net/Articles/647636/

Chipmaker Renesas - who kind of was super-h for a while - meanwhile seems to be starting down the RISC-V path, alongside a variety of ARM cores & their own (small microcontrollers) RL78 architecture that I don't know much about. https://www.cnx-software.com/news/renesas/


Was really looking forward to the j-core project and got into SH4 emulating the Dreamcast port of NetBSD which is currently available for the latest version which is 9.3 I believe.

http://wiki.netbsd.org/ports/dreamcast/

GXemul is what I used at the time.

https://gavare.se/gxemul/gxemul-stable/doc/machine_dreamcast...

There are also lots of interesting things going on with NAOMI emulation on the gaming side. All of it is very hackable and j-core would have made a great addition to that.


I’m maintaining Debian for sh4 and I’m also the kernel maintainer for SuperH.

If you’re interested in SuperH Linux, feel free to join #linux-sh on Libera IRC.


Yeah, shame J-core didn't pan out. SH4 is such an interesting chip with some minimal vector facilities.


I did a lot of work for both Hitachi and Renesas over the years and used SH-1, 2, 3 and 4 with various OSes. I liked the SH architecture, I first learnt assembly programming on PDP-11s and SH felt like an extended RISCified PDP-11.


The SH-3 is also the CPU of the Cave CV1000 arcade board, which gave us the best bullet hell games. (Though arguably, Cave's developers thought the board was way underpowered. Still, they did an amazing job with the games)


SH also powers many of Casio's programmable calculators. Prizm is especially interesting in that it's a fairly traditional device (powered by AAA batteries!) with a decently large color screen that has a full-fledged C SDK available for it - and while Casio doesn't provide any official docs or other support, they don't block third party software, either. FX-CG50 is powerful enough to run a NES emulator (https://github.com/TSWilliamson/nesizm).


SH-3 also powers the Korg Electribe ESX-1 and EMX-1 which are still the best grooveboxes ever made to this day.


Please do note how these units do division. I was very stumped why div, rotcl were used in unrolled form until Raymond explained it and this is how I reverse engineered one of many division functions of a SH2 firmware.


You can find many of these in the GCC SH target


Where is that explained?


It also powered Ricoh's RDC-i700 ( https://www.ricoh-imaging.co.jp/english/r_dc/rdc/i700/ )

Had fun almost 20 years ago with it during my internship at Ricoh : they were porting Linux on it (mostly as a research, the device was already nearing its EOL, but it had a touch screen with a stylus and 2 PCMCIA ports, which made it possible to put a WIFI card on it) and a made some small demonstration programs. Spent a lot of my time fighting to manage to make libs compile on it ^^;


I thoroughly enjoy these deep dives into CPU architectures. I forget most of it within seconds after reading it, because I have no practical interest in it. But it is fun to get a look underneath the pavement so to speak.


This was a great read, thanks for sharing.

I actually got interested in SH3 again recently as one of my viewers donated me some hardware to see if I could port Gentoo to run on it. It's more a passion project however I did manage to get a PoC image to semi boot over a weekend so I do hope I'll manage to get this into a state that I can merge this into Gentoo to officially support it again.


Please join #linux-sh on Libera IRC if you’re interested in SuperH Linux.

I’m one of the kernel maintainers for SuperH Linux.


I've been idling there for a while now actually. All of your work was what made think this was worth trying in the first place.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: