Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone who worked for a while and still works in HPC, my impression from this field as compared to eg. programming in finance sector or programming for storage sector is that... HPC is so backwards and far behind, it's really amazing how it's portrayed as some sort of a champion of the field.

That's not to say that new things don't happen there, it's just that I find a lot of old stuff that was shown to be bad decades ago still being in vogue in HPC. Probably because it's a relatively small field with a lot of people there being academics and not a lot of migration to/from other fields.

You've probably never heard of `module` (either Tcl or Lmod). This is a staple of HPC world. What this thing does is it sources or (tries to) remove some shell variables and functions into the shell used either interactively or by a batch job. This is a beyond atrocious idea to handle your working environment. The information leaks, becomes stale, you often end up loading the wrong thing into your environment. It's simply amazing how bad this thing is. And yet, it's just everywhere in HPC.

Another example: running anything in HPC, basically, means running Slurm batch jobs. There are alternatives, but those are even worse (eg. OpenPBS). When you dig into the configuration of these tools, you realize they've been written for pre-systemd Linux and are held together by a shoestring of shell scripting. They seldom if at all do the right thing when it comes to logging or general integration with the environment they run in. They can be simultaneously on the bleeding edge (eg. cgroup integration or accelerator driver integration) and be completely backwards when it comes to having a sensible service definition for systemd (eg. try to manage their service dependencies on their own instead of relying on systemd to do that for them).

In other words, imagine a steam-punk world, but now it's in software. That's sort of how HPC feels like after a decade or so in more popular programming fields.

Also, a lot of code written for HPC is written the way it is not because the writer chose the language or the environment. The typical setup is: university IT created a cluster with whatever tools they managed to put there eons ago, and you, the code writer, have to deal with... using CentOS6 by authenticating to university's AD... in your browser... through JupyterLab interface. And there's nothing you can do about it because the IT isn't there, is incompetent to the bone and as long as you can get your work done somehow, you'd prefer that over fighting to perfect your toolchain.

Bottom line, unless a language somehow becomes indispensable in this world, no matter its advantages, it's not going to be used because of the huge inertia and general unwillingness to do beyond the minimum.



HPCs never loved the inefficiencies of anything virtualized (VMs or any containers really), so the shell hacks of module enabled a (limited, but workable) level of reproducibility that was sufficiently composable and usable by researchers who understood the shell. I am not going to defend this tcl hack any further, but I can see how it was the path of least resistance when people tried to stay close to the raw metal of their large clusters while keeping some level of sanity. Slurm is a more defensible choice, but I agree that these tools are from a different era of compute. I grew to love and hate these tools, but they definitely represent an acquired taste, like a dorian fruit; not like an apple.

Your centos6 references made me chuckle :-)


I promise you that the main reason HPC is behind on virtualization is not because of the little bit of overhead. There are a dozen other inefficiencies in the average HPC workload that are more significant.

Most centers don't even have good real-time observability systems to diagnose systemic inefficiencies, leaving application/workload profiling purely up to user-space.

The HP in HPC has really been watered down over the last couple decades, and "IT for computational research" would be a more accurate name. You can do genuinely high-performance computing there, but you'll be an outlier.


It's a mixture of legacy and reality.

For one, the assumption has been that you had dedicated use of all the nodes and communication network. It would kill your performance if your local node CPU scheduler was interfering with having your actual HPC program active when the messages were coming in from its peer tasks on the other nodes, since parallel jobs are limited in the end by the critical path latency of the cross-node communications.

It's only on the most "embarrassingly parallel" end of the spectrum where you can tolerate a bunch of virtualization and non-determinism, because the tasks communicate so infrequently or via such asynchronous mechanisms that they don't really impact the throughput of the whole job if they are asleep at random times.

But HPC systems also were very "unique". It wasn't just all Linux but a dozen different vendors' Unix variants with very different personalities. And for the bleeding-edge systems, each deployment was practically its own dialect of that vendor OS. Running a job was like cross-compiling to a one of a kind target. There was no generic platform where you could expect to build an app once and ship it around to whichever supercomputer was available.


Agreed on all points and this captures the history well.


Containers are an OS sandboxing/namespacing primitive, they don't involve any overhead on their own. The overhead is dependent on what's inside the container besides a single deployed binary.


What you way is true after the container starts. Typical HPC codes are tuned to raw hardware so they assume full ownership of the hardware anyways. When HPC was developing 30 years ago we didnt have clean ways to avoid overheads in the regime of 10k nodes. Instead we got parallel filesystems, caching, and shell, with module, which technically did the job for reproducible runs at a huge human cost.


How should it be better? Most environments offer Apptainer which can import Docker containers. Plus a lot of theae languages like Julia and Chapel are pretty self contained and programmed against eg ancient libc for these very reasons.


As you dig deeper I think you'll find a method behind the madness.

Sure modules just play with env variables. But it's easy to inspect (module show), easy to document "use modules load ...", allows admins to change the default when things improve/bug fixed, but also allows users to pin the version. It's very transparent, very discover-able, and very "stale". Research needs dictate that you can reproduce research from years past. It's much easier to look at your output file and see the exact version of compiler, MPI stack, libraries, and application than trying to dig into a container build file or similar. Not to mention it's crazy more efficient to look at a few lines of output than to keep the container around.

As for slurm, I find it quite useful. Your main complaint is no default systemd service files? Not like it's hard to setup systemd and dependencies. Slurms job is scheduling, which involves matching job requests for resources, deciding who to run, and where to run it. It does that well and runs jobs efficiently. Cgroup v2, pinning tasks to the CPU it needs, placing jobs on CPU closest to the GPU it's using, etc. When combined with PMIX2 it allows impressive launch speeds across large clusters. I guess if your biggest complaint is the systemd service files that's actually high praise. You did mention logging, I find it pretty good, you can increase the verbosity and focus on server (slurmctld) or client side (slurmd) and enable turning on just what you are interested, like say +backfill. I've gotten pretty deep into the weeds and basically everything slurm does can be logged, if you ask for it.

Sounds like you've used some poorly run clusters, I don't doubt it, but I wouldn't assume that's HPC in general. I've built HPC clusters and did not use the university's AD, specifically because it wasn't reliable enough. IMO a cluster should continue to schedule and run jobs, even if the uplink is down. Running a past EoL OS on an HPC cluster is definitely a sign that it's not run well and seems common when a heroic student ends up managing a cluster and then graduates leaving the cluster unmanaged. Sadly it's pretty common for IT to run a HPC cluster poorly, it's really a different set of contraints, thus the need for a HPC group.

Plenty of HPC clusters out there a happy to support the tools that helps their users get the most research done.


It's been years since I last used `slurm`. Thanks for the blast from the past.


> you realize they've been written for pre-systemd Linux

So still retaining some kind of sanity and good engineering practices?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: