A RISC'y cluster - Part I
- 13 minutes read - 2726 wordsA house without a cluster is just a house.
In order to run mail, website, git, batch processing and whatever other services you can imagine needing servers in the basement for, I set out to find an interesting and exciting while still economical solution.
Bit of background
Through my career I have worked with many hardware architectures. I have written assembly for x86, Sparc, UltraSparc, PA-RISC, Power2, Power PC, Alpha and other processors over the years and I have written high performance software for all of them in higher level languages too (Fortran, C, C++). There is something genuinely interesting about processor architectures and their differences. Would be cool to run a home setup on something that’s not amd64 - and actually, given that so much runs on ARM these days, would be cool to also not do that. I have nothing against ARM or amd64 of course, they’re great - they are just not exciting.
Then there’s the space concern. I want multiple identical systems running in a cluster. Ideally I want these to all fit in a single unit of a 19" rack. Rack space in my basement is very limited. Something like a compact single board computer (SBC) would be perfect so I could fit multiple nodes in a single 1U chassis. Like a Raspberry Pi, except, everyone is using those and while I have nothing against them, they are just not exciting.
The solution
It so happens, that development of RISC-V processors have picked up and that actual realistic processors and boards featuring them started becoming available a few years back. The way it works, as far as I understand, is that there is no licensing cost on the RISC-V architecture, which opens up for the possibility of low cost chips. Now, they still need to be designed and manufactured, they are not going to be free, but the cost of a processor using the RISC-V architecture is potentially more closely going to align with the costs of the short term chip development and production, not the historical costs of developing a modern processor architecture.
I am certainly not against paying for development, but in a world where “fast enough” processor architectures have been widely available for many decades, I think there is merit to the idea of a “fast enough” processor manufactured to a low total cost (and a minimal power envelope).
You can absolutely run your e-mail and web site on something less fast than the top of the line from AMD or Intel. One solution could be to simply purchase some 10 year old data centre servers - they might be cheap in acquisition but they would still require the space and the power they did when they were top of the line. I rather like the idea of a new, low power, affordable processor that is simply not meant to be top of the line performance wise.
And this is - as I understand it - very much the niche the newer RISC-V systems fill right now. Yes they are getting vector extensions, yes the architecture can use both deep pipelines, out of order execution and any combination of threads, cores and cells. But the manufacturers are not quite there yet. Yet… Right now, there are affordable low power low performance RISC-V systems available - and that actually sounds exciting!
Enter the SiFive VisionFive 2 board. The footprint is certainly Raspberry-Pi inspired, but it is a quad core 1.5GHz RISC-V processor with 8GB of memory, a M.2 connector and two gigabit ethernet cards. These are the things I need.
The system also has a GPU on board, except as of this writing support is still not in the mainline Linux kernel, which makes it useless. This is generally something where new manufacturers need to understand how to work with the community - them providing a patched up kernel in an outdated copy of some random linux distro simply doesn’t cut it. If the hardware is not supported in the mainline kernel in the common distros, the hardware doesn’t work. Nobody wants a patched up outdated hardware vendor Franken-distro.
Making it work
Step one in the cluster project is to get a single node running a standard Debian Trixie installation. No patches. No quirks. No hand-holding. Debian 13 (Trixie) is the first Debian distribution to natively support the riscv architecture so the timing is perfect.
Hooking up
The kit I bought my first VisionFive2 board in, came with a USB-to-RS232 adapter with connectors that fit onto the board headers. This gives me serial console acces to the system from early boot. I don’t know if you can access everything you need if all you have is a monitor on the HDMI output - I didn’t try, so this guide is not for that.
First hook up the board TX/RX and GND wires to your USB adapter. In my case the black wire is GND, board TX (adapter RX) is white and board RX (adapter TX) is green. Since the adapter is powered from the PC and the SBC is powered from a USB-C power adapter, there is no need to connect the red +5V power lead.

On your PC, connect to the board using
sudo minicom -D /dev/ttyUSB0 -b 115200
or similar.
If you power on the board now, you should see console output from the early boot process.
Hardware
To use this board as a more or less regular computer you should definitely install NVMe storage. The M.2 slot on the bottom of the VisionFive board takes a full length NVMe card - I use 500GB Crucial SSDs on my boards but there are many options (and honestly storage has become shockingly cheap).
Preparing for Debian Trixie install
Use a USB stick for the Debian installer. Simply get the NetInstall image (from here) and write it to the stick.
Now edit the partition table on the stick and create a third
partition. Format that one as FAT and copy over the mini.efi image
(from
here),
as well as the u-boot and spl firmware files (sources for those are
described in the next Section).
This will later prove useful as a way of booting the installer using an EFI image that works on the VisionFive, because the GRUB bootloader currently does not and the installer seems to expect that it does.
Updating firmware
Before proceeding, make sure your board firmware is up to date. There are features you need that you may not have if the firmware is old enough.
Updated firmwares are available here. To perform the update, follow the excellent guide here.
At the time I did the install, firmwares version 5.13.1 was the most recent stable version. I know that this guide works with that version.
Launching the installer
Reset the board and interrupt the boot process with a keypress. You should now be at the bootloader prompt. We now need to
- have all PCI devices enumerated
- start up the USB subsystem
- load the mini.efi image into memory
- load the DTB file (hardware description) into memory
- instruct the bootloader to boot the image
This looks like:
pci enum
usb start
fatload usb 0:3 0x40200000 mini.efi
fatload usb 0:2 0x46000000 /dtb/starfive/jh7110-starfive-visionfive-2-v1.3b.dtb
bootefi 0x40200000 0x46000000
In case you are wondering about the addresses, don’t worry, they just happen to be the standard addresses for where these things get loaded on this board.
What should happen now, is, that your system should boot the mini.efi image and the installer will launch. I ran the installation over the serial connection and this works just fine - I am sure something more fancy could be achieved with the HDMI output but I didn’t bother because I already had the serial console.
Notes on disk layout
I generally run XFS filesystems on LVM logical volumes if I can get
away with it. Simply because of flexibility and habit. For the /boot
filesystem I run ext4 and I do not place this filesystem under LVM -
it is the only plain partition on the disk other than my LVM physical
volume partition. Swap too is placed on a logical volume.
Basically the disk setup on my nodes looks like
joe@dale:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 465.8G 0 disk
├─nvme0n1p1 259:1 0 953M 0 part /boot
└─nvme0n1p2 259:2 0 464.8G 0 part
├─dale--vg-root 254:0 0 13G 0 lvm /
├─dale--vg-var 254:1 0 5.6G 0 lvm /var
├─dale--vg-swap 254:2 0 8G 0 lvm [SWAP]
├─dale--vg-brick0 254:3 0 100G 0 lvm /data/brick0
└─dale--vg-brick1 254:4 0 50G 0 lvm /data/brick1
joe@dale:~$
Ignore the brick partitions for now, they are used in a
gluster clustered filesystem setup that I
use. You may want to do this differently, that’s fine. Different
strokes for different folks.
Finalising the install
One of the last steps on the Debian installer is installing the GRUB
bootloader. That will not work - don’t do that. Instead, we need to
set up a structure so that the u-boot system which is part of the
board firmware, can load a boot menu with configurations from the
/boot/extlinux/extlinux.conf file (which doesn’t exist on your newly
installed system yet).
You will select “continue without bootloader” - and upon finishing the installation you should see a message that says something like:
You will need to boot manually with the /vmlinuz kernel on partition /dev/nvme0n1p1 and root=/dev/mapper/dale–vg-root passed as a kernel argument.
This is fine. Before quitting the installer you need to do one last thing though - you need to start a shell so that we can prepare your system for booting.
In the shell you need to
chroot /target /bin/bash
in order to enter the installed environment.
Now we need to install the u-boot menu:
apt install u-boot-menu
Edit the file /etc/default/u-boot so that it looks like this:
## /etc/default/u-boot - configuration file for u-boot-update(8)
U_BOOT_UPDATE="true"
U_BOOT_FDT="starfive/jh7110-starfive-visionfive-2-v1.3b.dtb"
U_BOOT_FDT_DIR="/linux-image-"
#U_BOOT_ALTERNATIVES="default recovery"
#U_BOOT_DEFAULT="l0"
#U_BOOT_PROMPT="1"
#U_BOOT_ENTRIES="all"
#U_BOOT_MENU_LABEL="Debian GNU/Linux"
#U_BOOT_PARAMETERS="ro quiet"
#U_BOOT_ROOT=""
#U_BOOT_TIMEOUT="50"
#U_BOOT_FDT=""
#U_BOOT_FDT_DIR="/usr/lib/linux-image-"
#U_BOOT_FDT_OVERLAYS=""
#U_BOOT_FDT_OVERLAYS_DIR="/boot/dtbo/"
#U_BOOT_SYNC_DTBS="false"
Unfortunately, the u-boot-update in its current state in Trixie will
not honour the U_BOOT_SYNC_DTBS flag if set to true, so we must
manually copy over the DTB hierarchy to our boot filesystem (and I
suppose, running the system with one huge partition outside of LVM
would allow avoiding this copy - but I want my LVM). Therefore, let us
copy the DTB hierarchy from the installed kernel:
cp -rv /usr/lib/linux-image-* /boot/
Unfortunately, until u-boot-update is fixed to actually honour the
U_BOOT_SYNC_DTBS flag, you must manually run this copy after
upgrading your kernel. This is the only boot quirk relative to a
completely standard amd64 Linux system.
With this out of the way, you can now run
u-boot-update
This will read the configuration, look at available kernels and
generate the /boot/extlinux/extlinux.conf file that the firmware
u-boot system needs for booting.
Just to be sure, please verify the contents of this file, and see that it looks something like the following (but of course with the appropriate current linux kernel versions and the LVM volume names as you configured them):
## /boot/extlinux/extlinux.conf
##
## IMPORTANT WARNING
##
## The configuration of this file is generated automatically.
## Do not edit this file manually, use: u-boot-update
default l0
menu title U-Boot menu
prompt 1
timeout 50
label l0
menu label Debian GNU/Linux 13 (trixie) 6.12.57+deb13-riscv64
linux /vmlinux-6.12.57+deb13-riscv64
initrd /initrd.img-6.12.57+deb13-riscv64
fdt /linux-image-6.12.57+deb13-riscv64/starfive/jh7110-starfive-visionfive-2-v1.3b.dtb
append root=/dev/mapper/dale--vg-root ro quiet
...
Now, finally, you can exit the shell and pick the reboot option from the installer menu. You have a system that can boot. Don’t forget to remove the USB drive by the way.
Configuring the firmware boot loader
Once again, interrupt the early boot process with a key press. We should set up the environment variables for booting your installed system.
We can test-run a boot without saving the environment variables - once it works, save the variables and your system will boot unassisted as it should.
Assuming that the /boot partition is an ext4 file system on
partition 1 on the NVME device, the following should work:
pci enum
nvme scan
nvme device
env set bootpart 1
env set bootcmd_distro 'sysboot ${bootdev} ${devnum}:${bootpart} ext2 ${scriptaddr} /${boot_syslinux_conf};'
boot
The boot command should cause your system to boot up your freshly
installed Trixie system. If it does not, look at the error messages,
see if you can figure it out. Maybe google helps, if not you’re
welcome to send me an e-mail.
Assuming this works, you want to reboot the system, interrupt the boot
process again, and this time set the bootpart and bootcmd_distro
variables like above, but this time save the environment with a
env save
Now your system should simply boot up on its own when powered on. Just like you would expect any system to.
And that’s it!
Final remarks
Debian Trixie works and runs amazingly well on the VisionFive2 - it is every bit as solid and reliable and natural as an amd64 or other “normal” system. And there are no special kernels, no patched up drivers, no nothing out of the ordinary. Bog standard Trixie - like it should be. Fantastic job by all the folks who made that happen!
However, as a newcomer to the platform I found it unbelievably difficult to install a “proper” Debian system (see Don’t Break Debian). The install process and the boot configuration in particular, while not difficult at all when you have all the answers, was largely undocumented (or documented in places I could not find). The internet is full of bad advice, and sorting the weed from the chaff was not trivial.
Known shortcomings
The only quirk on this system is, that after a kernel upgrade, I need
to manually copy over the new DTB files and re-run
u-boot-update. However, that script has an option to enable this
operation to run automatically as part of the u-boot update, it just
wasn’t implemented in the script as far as I can see. This seems to me
to be a trivial fix which would remove this quirk.
It seems the Debian folks have their mind set on using GRUB as the
boot loader everywhere, meaning that a u-boot system is something only
experts run. Unfortunately, grub as it is today will not work on the
VisionFive - it consistently fails booting a linux image with an
error: out of memory. error message. If this was fixed, the u-boot
issues of course would go away since it wouldn’t be needed - this is
probably an even better fix.
On the hardware side, the GPU is still (as of this writing) unsupported in mainline kernels. Now, I was never interested in this system as a desktop, but on a server with a relatively low performance CPU, it could - maybe - be cool to have a GPU to offload vectored FP64 operations onto. Now with the GPU chosen (BXE-4-32MC1) that won’t work - it does FP16 and FP32 only (and honestly at 32 FP32 FLOPs/clock at 500MHz this would not exactly be a speed daemon either). But whether that would be useful or not, we won’t find out, when the GPU is unsupported… I do not understand why the company behind the GPU (Imagination Technologies Ltd.) and the company behind the board (SiFive) have not pushed the needed code for mainline kernel inclusion - it has been years. And honestly, for a GPU at this level, I find it hard to believe that they would have so much secret sauce that needs protection. What are they afraid of? NVIDIA is going to steal their IP? Let’s be realistic.
Other than that…
This is a compact, economical, reliable fully featured little Linux server system with a fully supported major distribution working flawlessly (basically). For the purposes I have, this is perfect. And while it would have been simpler doing this with amd64 or ARM, there is a different kind of satisfaction in getting this system going:
joe@dale:~$ free -h
total used free shared buff/cache available
Mem: 7.7Gi 1.2Gi 137Mi 32Mi 6.7Gi 6.6Gi
Swap: 8.0Gi 1.0Mi 8.0Gi
joe@dale:~$ uname -a
Linux dale 6.12.57+deb13-riscv64 #1 SMP Debian 6.12.57-1 (2025-11-05) riscv64 GNU/Linux
joe@dale:~$ lscpu | head
Architecture: riscv64
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: 0x489
Model name: sifive,u74-mc
CPU family: 0x8000000000000007
Model: 0x4210427
Thread(s) per core: 1
Core(s) per socket: 4
joe@dale:~$
This was stage one. Obviously no basement is complete with only one server.