elektito programming & stuff

Moving away from Github pages

I’m moving away from github services. As such, I’m also moving my website from Github pages. The new website is still available on my own domain name elektito.com.

My new home for my new source code will be sourcehut. My website is also available as elektito.srht.site using their sourcehut pages service.

I'm on Gemini now!

I’ve created a gemini capsule at this same domain. If you have a gemini browser, you can visit my capsule at gemini://elektito.com.

Not sure if that means I will not be writing here anymore. Probably not. But I’m definitely more active there at the moment.

What’s Gemini?

If you’re not familiar with it, Gemini is a new Internet protocol that is heavier than Gopher and (a lot) lighter than the web. It’s a small space right now, with no more than a few thousand capsules (which is what we call what would be a website on the web, or a gopherhole on gopher).

It’s part of a recent movement of the small (or smol!) Internet, which strives for a slower and more human scale internet. A counter culture, you could say, to what the web has become. It’s a lot like what the web was like back in the early 90’s.

The project name unfortunately, is rather ungooglable. Try searching for Gemini, and you’d mostly find stuff about the constellation, astrology, a crypto/scam exchange, or the NASA project if you’re lucky. Here’s the, more or less, official website for Project Gemini.

LRFS Part 4: Early Userspace: initrd and initramfs

This is part of a series of articles. You can find the first part here.

Although init is considered the beginning of the Linux userspace, this is not technically the case. There are other facilities that are part of the userspace and run even before init. Using these is not mandatory, and as it happens we are not going to use them in our distro (not now, at least), but I thought it might be educational to examine them here.

Why?

So why do we need something to run before init? Here are a few cases that this might be necessary:

  • Mounting the root file system might need access to kernel modules that are not built into the kernel, but are instead built as kernel modules and reside on the very file system we are going to mount. It is, of course, possible to build these into the kernel, but we might want to keep the kernel from becoming too large. This was especially the case before, when RAM used to be more limited than it is now.

  • The root file system might be encrypted and it might need being decrypted before being mounted.

  • The system might be in hibernation and need special treatment before waking up.

The kernel provides two facilities for running a small userspace before the actual init: these are called initrd and initramfs, the latter being a more recent addition to the kernel, although both have been there for quite some time now.

The names initrd and initramfs, although referring to distinct facilities, are frequently used interchangeably.

  • initrd is a file system image that is mounted as root. It usually contains an executable named /linuxrc that is run after the image is mounted. After performing any necessary preparations and mounting the real root file system in a temporary location, this program then uses the pivot_root system call to switch to the new root and then unmounts initrd.

  • initramfs is a file archive that becomes the root file system. An executable called /init on this archive is then run by the kernel, effectively becoming init (that is, PID 1). This can continue as an init, or later mount a new root and exec the real init.

In the next two sections, we will examine each of these carefully and see some examples.

initrd

We will be using the following C program as the “early init”:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int
main(int argc, char *argv[])
{
  printf("Early Init\n");
  printf("----------\n");
  printf("PID=%d UID=%d\n", getpid(), getuid());
  printf("----------\n");
  for(;;);

  return 0;
}

Save this as earlyinit.c and compile it statically:

gcc earlyinit.c -static -Wl,-s -o linuxrc

Now create a file system image and copy linuxrc to it:

dd if=/dev/zero of=image.img bs=20M count=1
mke2fs image.img
sudo mount image.img /mnt
sudo cp linuxrc /mnt
sudo umount /mnt

You can also compress this file:

gzip -9 image.img

You probably won’t be able to use your distribution’s kernel for this as support for initrd is probably not built into the kernel. Build a new kernel from source and make sure the CONFIG_BLK_DEV_INITRD and CONFIG_BLK_DEV_RAM options are set to y in the .config file.

Like in previous sections, we will be using qemu to run the kernel and initrd. qemu has a -initrd option that we can use:

qemu-system-x86_64 -kernel /path/to/bzImage \
                   -initrd image.img.gz \
                   -enable-kvm \
                   -append "console=ttyS0" \
                   -nographic

Take a look at the output. Notice the PID reported in the output. It is not 1. An initrd is not an init.

A real initrd would mount the root file system as part of its work. When initrd returns, the kernel assumes that root is already mounted and so proceeds to running init (from /sbin/init, for example).

Let’s try this. Update the C program above like this:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <string.h>

int
main(int argc, char *argv[])
{
  printf("Init\n");
  printf("----------\n");
  printf("PID=%d UID=%d argv[0]=%s\n", getpid(), getuid(), argv[0]);
  printf("----------\n");
  if (strcmp(argv[0], "linuxrc") == 0) {
    /* running as initrd */
    return 0;
  } else {
    for(;;);
  }
}

We are going to use this as both linuxrc and init. Recompile, like before and update the image like this:

gunzip image.img.gz
sudo mount image.img /mnt
sudo cp linuxrc /mnt
sudo mkdir /mnt/sbin
sudo cp linuxrc /mnt/sbin/init
sudo umount /mnt

We won’t be compressing the image this time, since qemu does not accept a compressed image as an argument to -hda. Run qemu like this:

qemu-system-x86_64 -kernel /path/to/bzImage \
                   -initrd image.img \
                   -enable-kvm \
                   -hda image.img \
                   -append "console=ttyS0 root=/dev/sda" \
                   -nographic

Again we are not actually mounting the real root file system here. So in this case, when initrd returns, the kernel runs /sbin/init from the already mounted RAM disk (i.e. initrd iteself). In the output, you will see two invocations of our program, one as linuxrc, the other as /sbin/init.

initramfs

Originally, initramfs was supposed to be a an archive embedded into the Linux kernel itself. This archive is mounted as root and a /init file inside it is executed as init (i.e. with PID 1).

What this incarnation of early init does is slightly different from initrd. First of all, initramfs cannot be unmounted. It usually deletes all of its contents in the end, however, chroots into the real root file system and invokes init using one of the exec system calls. pivot_root cannot be used in initramfs. klibc and busybox each have a utility (called run-init and switch-root respectively) that helps initramfs writers with the usual tasks (deleting files, chroot and exec, among other things).

In order to try this, we’ll revert to the original version of our simple early init:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int
main(int argc, char *argv[])
{
  printf("Early Init\n");
  printf("----------\n");
  printf("PID=%d UID=%d\n", getpid(), getuid());
  printf("----------\n");
  for(;;);

  return 0;
}

Compile this as init:

gcc earlyinit.c -o init -static -Wl,-s

Now we have to create an archive, instead of a file system image. This is a cpio archive. The concept is very similar to the more widely used tar archive. Let’s create an archive with init as its only content:

echo init | cpio -o -H newc | gzip -9 >initramfs

The -H flag specifies the variant of cpio that the kernel uses. Now, as we said before, initramfs is an archive that is embedded inside the kernel and indeed when building a kernel you can configure an initramfs to be built inside the kernel. However, there is a simpler way of doing this as when the kernel receives a cpio archive instead of a file system image for its initrd parameter, it uses the archive as if it is a built-in initramfs archive.

So in effect, you can pass the initramfs archive like an initrd to qemu. Let’s try it:

qemu-system-x86_64 -kernel /path/to/bzImage \
                   -initrd initramfs \
                   -enable-kvm \
                   -append "console=ttyS0" \
                   -nographic

You will see that this time our program is run with PID 1. It can simply do its work and exec the real init in the end.

Tools for initramfs writers

If writing an initramfs in C, in many cases, alternative C standard libraries are used in place of glibc which is feature-rich and very large. musl is one popular implementation of the C standard library that is used when size is important. klibc is another which, although not implementing the full extent of the standard library, has been specifically written for writing an early init. Both provide wrapper scripts for building against them. For musl, you can use the musl-gcc script:

musl-gcc earlyinit.c -static -Wl,-s -o linuxrc

while for klibc you can use klcc:

klcc earlyinit.c -static -Wl,-s -o linuxrc

Both provide very smaller executables than when linking against glibc.

In many cases, the early init program is in fact written in shell script, so a shell and a number of utilities are included. You can use bash and GNU coreutils for this, but again these are quite large and all of their features is probably not necessary for the small initramfs script.

busybox is one alternative, which includes a shell, and a large number of utilities, including the previously-mentioned switch-root.

klibc also comes with a number of utilities, which are more limited, and smaller, than the ones with busybox. It also includes the run-init utility which helps with wrapping up the work in initramfs.

A note on Ubuntu’s initramfs

If you try taking a peek at the initramfs on an Ubuntu system with cpio, a few files are extracted and then you’ll receive an error message. At least, that’s how it is on Ubuntu 16.04 and 18.04 where I tried this. This is because the Ubuntu initramfs is actually two cpio archives put one after the other in a single file.

Ubuntu 18.04 comes with an unmkinitramfs utility (installed with the initramfs-tools-core package) capable of extracting the contents of this initramfs. It’s a shell script so you can take a look at it and see how it actually works.

Wrapping up

supermin contains a small initramfs program that can be very educational to look at. Just get the source code and open the init/init.c file.

I also found the following links very informative:

As I said in the beginning, we are not going to use either initrd or initramfs in our distro-to-be. So we’ll just carry on.

LRFS Part 3: Init

This is part of a series of articles. You can find the first part here.

In this first part of this series we built a kernel and ran it with a very minimal (and useless) init program. We then built bash and used that as init. Let’s go back to our init program and see how we can make a more proper init system.

What is init?

The first process to be started by the kernel is called init. This process always has the Process ID (PID) of 1 and has a number of special properties:

  • It should keep running up until the system shuts down. If init is terminated, the kernel will panic.

  • All orphan processes are re-parented to init*. These are the processes whose parents has been terminated before them. The orphans, when terminated, become zombies. Init is tasked “reaping” these processes, so that their resources is allowed to be freed.

  • Signals without a signal handler do not have any default behavior for init. As an example, a process that does not handle SIGTERM, will shutdown by default if it receives that signal. If init, however, receives SIGTERM and has no signal handler for it, the signal is just ignored.

Init is the process that starts the Linux userland. Everything from the login prompt, your shell or your desktop environment is directly or indirectly started by init.

Note however, that technically speaking, the only thing that init “has to” do is reaping zombie processes. Today’s init systems though do a lot more. Starting and managing services is among the most important of those.

Our init system, called hello, does little though. For now, it is going to:

  • Run a startup script.
  • Start a shell.
  • Keep running and reap zombie processes.

Let’s do it then.

The code

Here is our expanded init system, in all its glory:

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <stdio.h>

static void
handle_sigchld(int sig) {
  int saved_errno = errno;

  /* reap orphaned children, passed away. rip. */
  while (waitpid((pid_t)(-1), 0, WNOHANG) > 0) {}

  errno = saved_errno;
}

static int
run_program(const char *path)
{
  pid_t child;
  int ret;
  siginfo_t info;

  child = fork();
  if (child) {
    waitid(P_PID, child, &info, WEXITED);
    return info.si_status;
  } else {
    execl(path, path, NULL);
    printf("Could not run: %s\n", path);
    printf("    %s\n", strerror(errno));
    exit(255);
  }
}

static void
launch_login(void)
{
  if (!fork()) {
    execl("/bin/bash", "/bin/bash", NULL);
  }
}

int
main(int argc, char *argv[])
{
  struct sigaction sa;

  /* register sigchld handler */
  sa.sa_handler = &handle_sigchld;
  sigemptyset(&sa.sa_mask);
  sa.sa_flags = SA_RESTART | SA_NOCLDSTOP;
  if (sigaction(SIGCHLD, &sa, 0) == -1) {
    printf("Could not install signal handler. Aborting.\n");
    return 1;
  }

  printf("Hello, World!\n");
  printf("This is hello, your friendly init system!\n");

  printf("Attempting to run your rc.local...\n");
  run_program("/etc/rc.local");

  printf("Launching your shell...\n");
  launch_login();

  for (;;) {
    usleep(600 * 1000000);
  }

  return 0;
}

As you can see, we start by adding a SIGCHLD handler. SIGCHLD can be sent to a process whenever something interesting happens to its children. Here we explicitly ask to only be informed when one of the children has exited (SA_NOCLDSTOP).

We then display a friendly startup message and then run the script located at /etc/rc.local. We then launch bash as the shell and go to sleep. From here are, the only thing init does is to handle SIGCHLD and wait on the child processes so that their resources can be freed by the kernel.

Try it

As before, rebuild the package and add the contents to the image file and then run the result in qemu:

qemu-system-x86_64 -kernel /path/to/bzImage \
              -append "root=/dev/sda init=/bin/bash console=ttyS0" \
              -hda /path/to/image.img \
              -enable-kvm \
              -nographic \
              -serial mon:stdio

You can use the tools in the /tools directory for building the package and creating the rootfs.

The source

You can find the source code for hello and all the other tools and packages talked about in this series here on Github.

* Technically that is not always correct. In more recent versions, there can be “sub-reapers” that an orphan might be re-parented to. In the absence of sub-reapers though, orphans are re-parented to init.

LRFS Part 2: Adding a shell and packages

This is part of a series of articles. You can find the first part here.

In this first part of this series we built a kernel and ran it with a very minimal (and useless) init program. That’s not very useful. Let’s add a shell.

But before that, let’s backtrack a bit. Remember we briefly talked about repeatability in the last post. Let’s get back to that and later see what it has to do with us wanting to add a shell to our distro.

Repeatability

So what’s repeatability and why is it important? Repeatability is the quality of a quality of a process to assures us we arrive at the same results every time we follow it, whether we do it now or ten years later.

How are we supposed to do that? By documenting a record of every little thing that has a meaningful impact on our results. This includes configuration, build options, patches, environment, etc.

I am going to put everything in a git repository. Every piece (which we are going to call a package) will be in its own sub-directory. In that sub-directory, we’ll have a file named pkg.json which describes the package, and a Makefile that contains build and installation instructions. The pkg.json file will look something like this:

{
    "version": "1.0.0",
    "source": {
        "type": "local"
    }
}

This tells the build script what the current package version number and how to obtain the source code (local in this case, since the code is included right there in the package directory).

The Makefile should contain at least the following targets: all which will be used for building the package, and install which will install the package files to a path determined in the INSTDIR environment variable.

But this is not all that is needed for repeatable builds. Another factor that might affect the build in longer periods of time, is the tools in use. It might so happen, for example, that a warning added in a new version of gcc breaks a build in which all warnings are considered errors.

However, it sounds a bit impractical to me, to have tools like compilers and linkers as part of the package. A better approach, could be to use the same version of tools for all the packages at every point in time. In order to do so, we’ll simply describe the build environment for all packages in the root of the pkgs directory. We’ll use a build.json file like this:

{
    "env": {
        "name": "ubuntu",
        "version": "18.04"
    }
}

Given that there are generally no breaking changes in development tools in a single version of Ubuntu, this should work for now.

The directory tree looks like this at the moment:

/
/pkgs/
/pkgs/build.json
/pkgs/kernel/
/pkgs/kernel/pkg.json
/pkgs/kernel/Makefile
/pkgs/kernel/config
/pkgs/hello/
/pkgs/hello/pkg.json
/pkgs/hello/Makefile
/pkgs/hello/hello.c
/tools/
/tools/build
/tools/build-rootfs
/tools/run
/README.md

As you can see, we have two packages right now, kernel and hello, both of which are in the pkgs directory. We’ll also have a top-level tools directory, which contains a script for building the packages, a script for building a root file system from a list of packages, and a script to run everything in qemu.

The build script

The build script (aptly named build), located in the /tools directory, builds one or more packages, according to the instructions in the pkg.json file and the Makefile. Apart from “local” source code, it also supports downloading a source tarball or obtaining it from a git repository.

The build script also supports applying one or more patches to the code before building it. For each package, a .tar.xz file is created which contains all the files needed for installation.

The script needs a number of tools to run:

  • jq: A versatile, command-line JSON parser.
  • awk: For text processing.
  • lsb_release: For getting information about the build environment.
  • fakeroot: Runs another program in an environment with fake root privilege for file manipulation. This is needed because sometimes install scripts need to change a file’s group, something that only root can do, but we do not want our build system to run as root, hence we use fakeroot for creating the package. Extracting the packages, however, will obviously need root permissions.

In order to build the hello package, for example, go the tools directory and run ./build hello. When the script is done running, you should have a hello-1.0.0.tar.xz package in your current directory.

And a shell

Here we are at last. We are going to build bash as our shell. This is the pkg.json file to use:

{
    "version": "4.4.23",
    "source": {
        "type": "dl",
        "location": "https://ftp.gnu.org/gnu/bash/bash-4.4.tar.gz",
        "inner_dir": "bash-4.4"
    },
    "patches": {
        "options": "-p0",
        "apply_dir": ".",
        "files": [
            "bash44-001",
            "bash44-002",
            "bash44-003",
            "bash44-004",
            "bash44-005",
            "bash44-006",
            "bash44-007",
            "bash44-008",
            "bash44-009",
            "bash44-010",
            "bash44-011",
            "bash44-012",
            "bash44-013",
            "bash44-014",
            "bash44-015",
            "bash44-016",
            "bash44-017",
            "bash44-018",
            "bash44-019",
            "bash44-020",
            "bash44-021",
            "bash44-022",
            "bash44-023"
        ]
    }
}

This is a bit more complicated than the one we saw before. Let’s see what it does. First, we are saying this is the 4.4.23 release of bash, which happens to be the latest release at the moment. You can find all bash releases here.

After that, we determine how the source code is to be obtained. The value dl for type means that a tarball is going to be downloaded. The location field determines the download address and the inner_dir field tells the build system where in the tarball the source code resides.

We then have a list of patches, twenty-three of them, that need to be applied to the 4.4 release, so that we arrive at 4.4.23 (all of these are downloaded from the bash release page mentioned before).

Then there’s the Makefile:

all:
    cd ../_src_ && ./configure --prefix=/usr --exec-prefix= --enable-static-link && $(MAKE)

install:
    $(MAKE) -C ../_src_ install DESTDIR=${INSTDIR}

.PHONY: all install

(Yes, my code coloring tool messes up the colors for this make snippet. Ugly, but can’t do nuffing about that right now!)

As you can see the all target configures and builds the code, while the install target actually installs the files. There are a few points to explain:

  • For building every package, a temporary directory created and the package directory is copied here and renamed to pkg. The source code, if obtained from an external source, is fetched and put in an _src_ directory inside the temporary directory.

  • We configure bash so that it is linked statically. We still don’t have glibc or any of the other shared libraries needed.

  • We use the $(MAKE) special variable instead of running make directly. This way, some of the properties of the parent make are communicated to the sub-makes (like the -j argument, so that the right number of parallel jobs are used).

In order to actually run the built shell, you need to add the contents of the package file (the one built by the build script) to the image file we created before. After that, run the VM like this:

qemu-system-x86_64 -kernel /path/to/bzImage \
              -append "root=/dev/sda init=/bin/bash console=ttyS0" \
              -hda /path/to/image.img \
              -enable-kvm \
              -nographic \
              -serial mon:stdio

Note the init parameter passed to the kernel. Here, we are telling the kernel to use bash as the init program.

The source

Everything I’ve talked about in this article can be found in this repository on Github.