26 Apr 2023
I’m moving away from github services. As such, I’m also moving my website from
Github pages. The new website is still available on my own domain name
elektito.com.
My new home for my new source code will be sourcehut. My website is also
available as elektito.srht.site using their sourcehut pages service.
06 Apr 2023
I’ve created a gemini capsule at this same domain. If you have a gemini browser,
you can visit my capsule at gemini://elektito.com.
Not sure if that means I will not be writing here anymore. Probably not. But I’m
definitely more active there at the moment.
What’s Gemini?
If you’re not familiar with it, Gemini is a new Internet protocol that is
heavier than Gopher and (a lot) lighter than the web. It’s a small space right
now, with no more than a few thousand capsules (which is what we call what would
be a website on the web, or a gopherhole on gopher).
It’s part of a recent movement of the small (or smol!) Internet, which strives
for a slower and more human scale internet. A counter culture, you could say, to
what the web has become. It’s a lot like what the web was like back in the early
90’s.
The project name unfortunately, is rather ungooglable. Try searching for Gemini,
and you’d mostly find stuff about the constellation, astrology, a crypto/scam
exchange, or the NASA project if you’re lucky. Here’s the, more or less,
official website for Project Gemini.
31 Dec 2018
This is part of a series of articles. You can find the first part
here.
Although init is considered the beginning of the Linux userspace,
this is not technically the case. There are other facilities that are
part of the userspace and run even before init. Using these is not
mandatory, and as it happens we are not going to use them in our
distro (not now, at least), but I thought it might be educational to
examine them here.
Why?
So why do we need something to run before init? Here are a few cases
that this might be necessary:
-
Mounting the root file system might need access to kernel modules
that are not built into the kernel, but are instead built as kernel
modules and reside on the very file system we are going to
mount. It is, of course, possible to build these into the kernel,
but we might want to keep the kernel from becoming too large. This
was especially the case before, when RAM used to be more limited
than it is now.
-
The root file system might be encrypted and it might need being
decrypted before being mounted.
-
The system might be in hibernation and need special treatment
before waking up.
The kernel provides two facilities for running a small userspace
before the actual init: these are called initrd
and initramfs
, the
latter being a more recent addition to the kernel, although both have
been there for quite some time now.
The names initrd
and initramfs
, although referring to distinct
facilities, are frequently used interchangeably.
-
initrd
is a file system image that is mounted as root. It usually
contains an executable named /linuxrc
that is run after the image
is mounted. After performing any necessary preparations and
mounting the real root file system in a temporary location, this
program then uses the pivot_root
system call to switch to the new
root and then unmounts initrd.
-
initramfs
is a file archive that becomes the root file system. An
executable called /init
on this archive is then run by the
kernel, effectively becoming init (that is, PID 1). This can
continue as an init, or later mount a new root and exec
the real
init.
In the next two sections, we will examine each of these carefully and
see some examples.
initrd
We will be using the following C program as the “early init”:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int
main(int argc, char *argv[])
{
printf("Early Init\n");
printf("----------\n");
printf("PID=%d UID=%d\n", getpid(), getuid());
printf("----------\n");
for(;;);
return 0;
}
Save this as earlyinit.c
and compile it statically:
gcc earlyinit.c -static -Wl,-s -o linuxrc
Now create a file system image and copy linuxrc to it:
dd if=/dev/zero of=image.img bs=20M count=1
mke2fs image.img
sudo mount image.img /mnt
sudo cp linuxrc /mnt
sudo umount /mnt
You can also compress this file:
You probably won’t be able to use your distribution’s kernel for this
as support for initrd is probably not built into the kernel. Build a
new kernel from source and make sure the CONFIG_BLK_DEV_INITRD
and
CONFIG_BLK_DEV_RAM
options are set to y
in the .config
file.
Like in previous sections, we will be using qemu
to run the kernel
and initrd. qemu
has a -initrd
option that we can use:
qemu-system-x86_64 -kernel /path/to/bzImage \
-initrd image.img.gz \
-enable-kvm \
-append "console=ttyS0" \
-nographic
Take a look at the output. Notice the PID reported in the output. It
is not 1. An initrd is not an init.
A real initrd
would mount the root file system as part of its
work. When initrd returns, the kernel assumes that root is already
mounted and so proceeds to running init
(from /sbin/init
, for
example).
Let’s try this. Update the C program above like this:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <string.h>
int
main(int argc, char *argv[])
{
printf("Init\n");
printf("----------\n");
printf("PID=%d UID=%d argv[0]=%s\n", getpid(), getuid(), argv[0]);
printf("----------\n");
if (strcmp(argv[0], "linuxrc") == 0) {
/* running as initrd */
return 0;
} else {
for(;;);
}
}
We are going to use this as both linuxrc
and init
. Recompile, like
before and update the image like this:
gunzip image.img.gz
sudo mount image.img /mnt
sudo cp linuxrc /mnt
sudo mkdir /mnt/sbin
sudo cp linuxrc /mnt/sbin/init
sudo umount /mnt
We won’t be compressing the image this time, since qemu
does not
accept a compressed image as an argument to -hda
. Run qemu
like
this:
qemu-system-x86_64 -kernel /path/to/bzImage \
-initrd image.img \
-enable-kvm \
-hda image.img \
-append "console=ttyS0 root=/dev/sda" \
-nographic
Again we are not actually mounting the real root file system here. So
in this case, when initrd returns, the kernel runs /sbin/init
from
the already mounted RAM disk (i.e. initrd iteself). In the output, you
will see two invocations of our program, one as linuxrc
, the other
as /sbin/init
.
initramfs
Originally, initramfs was supposed to be a an archive embedded into
the Linux kernel itself. This archive is mounted as root and a /init
file inside it is executed as init (i.e. with PID 1).
What this incarnation of early init does is slightly different from
initrd. First of all, initramfs cannot be unmounted. It usually
deletes all of its contents in the end, however, chroots into the real
root file system and invokes init using one of the exec
system
calls. pivot_root
cannot be used in initramfs. klibc
and busybox
each have a utility (called run-init
and switch-root
respectively)
that helps initramfs writers with the usual tasks (deleting files,
chroot and exec, among other things).
In order to try this, we’ll revert to the original version of our
simple early init:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int
main(int argc, char *argv[])
{
printf("Early Init\n");
printf("----------\n");
printf("PID=%d UID=%d\n", getpid(), getuid());
printf("----------\n");
for(;;);
return 0;
}
Compile this as init:
gcc earlyinit.c -o init -static -Wl,-s
Now we have to create an archive, instead of a file system image. This
is a cpio archive. The concept is very similar to the more widely used
tar archive. Let’s create an archive with init as its only content:
echo init | cpio -o -H newc | gzip -9 >initramfs
The -H
flag specifies the variant of cpio that the kernel uses. Now,
as we said before, initramfs is an archive that is embedded inside the
kernel and indeed when building a kernel you can configure an
initramfs to be built inside the kernel. However, there is a simpler
way of doing this as when the kernel receives a cpio archive instead
of a file system image for its initrd parameter, it uses the archive
as if it is a built-in initramfs archive.
So in effect, you can pass the initramfs archive like an initrd to
qemu. Let’s try it:
qemu-system-x86_64 -kernel /path/to/bzImage \
-initrd initramfs \
-enable-kvm \
-append "console=ttyS0" \
-nographic
You will see that this time our program is run with PID 1. It can
simply do its work and exec the real init in the end.
If writing an initramfs in C, in many cases, alternative C standard
libraries are used in place of glibc which is feature-rich and very
large. musl
is one popular implementation of the C standard library
that is used when size is important. klibc
is another which, although
not implementing the full extent of the standard library, has been
specifically written for writing an early init. Both provide wrapper
scripts for building against them. For musl, you can use the
musl-gcc
script:
musl-gcc earlyinit.c -static -Wl,-s -o linuxrc
while for klibc
you can use klcc
:
klcc earlyinit.c -static -Wl,-s -o linuxrc
Both provide very smaller executables than when linking against glibc.
In many cases, the early init program is in fact written in shell
script, so a shell and a number of utilities are included. You can use
bash and GNU coreutils for this, but again these are quite large and
all of their features is probably not necessary for the small
initramfs script.
busybox
is one alternative, which includes a shell, and a large
number of utilities, including the previously-mentioned switch-root
.
klibc
also comes with a number of utilities, which are more limited,
and smaller, than the ones with busybox. It also includes the
run-init
utility which helps with wrapping up the work in initramfs.
A note on Ubuntu’s initramfs
If you try taking a peek at the initramfs on an Ubuntu system with
cpio, a few files are extracted and then you’ll receive an error
message. At least, that’s how it is on Ubuntu 16.04 and 18.04 where I
tried this. This is because the Ubuntu initramfs is actually two cpio
archives put one after the other in a single file.
Ubuntu 18.04 comes with an unmkinitramfs
utility (installed with the
initramfs-tools-core package) capable of extracting the contents of
this initramfs. It’s a shell script so you can take a look at it and
see how it actually works.
Wrapping up
supermin contains a small initramfs program that can be very
educational to look at. Just get the source code and open the
init/init.c
file.
I also found the following links very informative:
As I said in the beginning, we are not going to use either initrd or
initramfs in our distro-to-be. So we’ll just carry on.
21 Dec 2018
This is part of a series of articles. You can find the first part
here.
In this first part of this series we built a kernel and ran it
with a very minimal (and useless) init program. We then built bash
and used that as init. Let’s go back to our init program and see how
we can make a more proper init system.
What is init?
The first process to be started by the kernel is called init. This
process always has the Process ID (PID) of 1 and has a number of
special properties:
-
It should keep running up until the system shuts down. If init is
terminated, the kernel will panic.
-
All orphan processes are re-parented to init*. These are
the processes whose parents has been terminated before them. The
orphans, when terminated, become zombies. Init is tasked “reaping”
these processes, so that their resources is allowed to be freed.
-
Signals without a signal handler do not have any default behavior
for init. As an example, a process that does not handle SIGTERM,
will shutdown by default if it receives that signal. If init,
however, receives SIGTERM and has no signal handler for it, the
signal is just ignored.
Init is the process that starts the Linux userland. Everything from
the login prompt, your shell or your desktop environment is directly
or indirectly started by init.
Note however, that technically speaking, the only thing that init “has
to” do is reaping zombie processes. Today’s init systems though do a
lot more. Starting and managing services is among the most important
of those.
Our init system, called hello, does little though. For now, it is
going to:
- Run a startup script.
- Start a shell.
- Keep running and reap zombie processes.
Let’s do it then.
The code
Here is our expanded init system, in all its glory:
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
static void
handle_sigchld(int sig) {
int saved_errno = errno;
/* reap orphaned children, passed away. rip. */
while (waitpid((pid_t)(-1), 0, WNOHANG) > 0) {}
errno = saved_errno;
}
static int
run_program(const char *path)
{
pid_t child;
int ret;
siginfo_t info;
child = fork();
if (child) {
waitid(P_PID, child, &info, WEXITED);
return info.si_status;
} else {
execl(path, path, NULL);
printf("Could not run: %s\n", path);
printf(" %s\n", strerror(errno));
exit(255);
}
}
static void
launch_login(void)
{
if (!fork()) {
execl("/bin/bash", "/bin/bash", NULL);
}
}
int
main(int argc, char *argv[])
{
struct sigaction sa;
/* register sigchld handler */
sa.sa_handler = &handle_sigchld;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART | SA_NOCLDSTOP;
if (sigaction(SIGCHLD, &sa, 0) == -1) {
printf("Could not install signal handler. Aborting.\n");
return 1;
}
printf("Hello, World!\n");
printf("This is hello, your friendly init system!\n");
printf("Attempting to run your rc.local...\n");
run_program("/etc/rc.local");
printf("Launching your shell...\n");
launch_login();
for (;;) {
usleep(600 * 1000000);
}
return 0;
}
As you can see, we start by adding a SIGCHLD
handler. SIGCHLD
can be
sent to a process whenever something interesting happens to its
children. Here we explicitly ask to only be informed when one of the
children has exited (SA_NOCLDSTOP
).
We then display a friendly startup message and then run the script
located at /etc/rc.local
. We then launch bash
as the shell and go
to sleep. From here are, the only thing init does is to handle
SIGCHLD
and wait on the child processes so that their resources can
be freed by the kernel.
Try it
As before, rebuild the package and add the contents to the image file
and then run the result in qemu:
qemu-system-x86_64 -kernel /path/to/bzImage \
-append "root=/dev/sda init=/bin/bash console=ttyS0" \
-hda /path/to/image.img \
-enable-kvm \
-nographic \
-serial mon:stdio
You can use the tools in the /tools
directory for building the
package and creating the rootfs.
The source
You can find the source code for hello
and all the other tools and
packages talked about in this series here on Github.
* Technically that is not always correct. In more recent
versions, there can be “sub-reapers” that an orphan might be
re-parented to. In the absence of sub-reapers though, orphans are
re-parented to init.
20 Dec 2018
This is part of a series of articles. You can find the first part
here.
In this first part of this series we built a kernel and ran it
with a very minimal (and useless) init program. That’s not very
useful. Let’s add a shell.
But before that, let’s backtrack a bit. Remember we briefly talked
about repeatability in the last post. Let’s get back to that and later
see what it has to do with us wanting to add a shell to our distro.
Repeatability
So what’s repeatability and why is it important? Repeatability is the
quality of a quality of a process to assures us we arrive at the same
results every time we follow it, whether we do it now or ten years
later.
How are we supposed to do that? By documenting a record of every
little thing that has a meaningful impact on our results. This
includes configuration, build options, patches, environment, etc.
I am going to put everything in a git repository. Every piece (which
we are going to call a package) will be in its own sub-directory. In
that sub-directory, we’ll have a file named pkg.json
which describes
the package, and a Makefile
that contains build and installation
instructions. The pkg.json
file will look something like this:
{
"version": "1.0.0",
"source": {
"type": "local"
}
}
This tells the build script what the current package version number
and how to obtain the source code (local in this case, since the code
is included right there in the package directory).
The Makefile
should contain at least the following targets: all
which will be used for building the package, and install
which will
install the package files to a path determined in the INSTDIR
environment variable.
But this is not all that is needed for repeatable builds. Another
factor that might affect the build in longer periods of time, is the
tools in use. It might so happen, for example, that a warning added in
a new version of gcc breaks a build in which all warnings are
considered errors.
However, it sounds a bit impractical to me, to have tools like
compilers and linkers as part of the package. A better approach, could
be to use the same version of tools for all the packages at every
point in time. In order to do so, we’ll simply describe the build
environment for all packages in the root of the pkgs
directory. We’ll use a build.json
file like this:
{
"env": {
"name": "ubuntu",
"version": "18.04"
}
}
Given that there are generally no breaking changes in development
tools in a single version of Ubuntu, this should work for now.
The directory tree looks like this at the moment:
/
/pkgs/
/pkgs/build.json
/pkgs/kernel/
/pkgs/kernel/pkg.json
/pkgs/kernel/Makefile
/pkgs/kernel/config
/pkgs/hello/
/pkgs/hello/pkg.json
/pkgs/hello/Makefile
/pkgs/hello/hello.c
/tools/
/tools/build
/tools/build-rootfs
/tools/run
/README.md
As you can see, we have two packages right now, kernel
and hello
,
both of which are in the pkgs
directory. We’ll also have a top-level
tools
directory, which contains a script for building the packages,
a script for building a root file system from a list of packages, and
a script to run everything in qemu.
The build script
The build script (aptly named build
), located in the /tools
directory, builds one or more packages, according to the instructions
in the pkg.json
file and the Makefile
. Apart from “local” source
code, it also supports downloading a source tarball or obtaining it
from a git repository.
The build script also supports applying one or more patches to the
code before building it. For each package, a .tar.xz
file is created
which contains all the files needed for installation.
The script needs a number of tools to run:
jq
: A versatile, command-line JSON parser.
awk
: For text processing.
lsb_release
: For getting information about the build environment.
fakeroot
: Runs another program in an environment with fake root
privilege for file manipulation. This is needed because sometimes
install scripts need to change a file’s group, something that only
root can do, but we do not want our build system to run as root,
hence we use fakeroot for creating the package. Extracting the
packages, however, will obviously need root permissions.
In order to build the hello
package, for example, go the tools
directory and run ./build hello
. When the script is done running,
you should have a hello-1.0.0.tar.xz
package in your current
directory.
And a shell
Here we are at last. We are going to build bash
as our shell. This
is the pkg.json
file to use:
{
"version": "4.4.23",
"source": {
"type": "dl",
"location": "https://ftp.gnu.org/gnu/bash/bash-4.4.tar.gz",
"inner_dir": "bash-4.4"
},
"patches": {
"options": "-p0",
"apply_dir": ".",
"files": [
"bash44-001",
"bash44-002",
"bash44-003",
"bash44-004",
"bash44-005",
"bash44-006",
"bash44-007",
"bash44-008",
"bash44-009",
"bash44-010",
"bash44-011",
"bash44-012",
"bash44-013",
"bash44-014",
"bash44-015",
"bash44-016",
"bash44-017",
"bash44-018",
"bash44-019",
"bash44-020",
"bash44-021",
"bash44-022",
"bash44-023"
]
}
}
This is a bit more complicated than the one we saw before. Let’s see
what it does. First, we are saying this is the 4.4.23
release of
bash, which happens to be the latest release at the moment. You can
find all bash releases here.
After that, we determine how the source code is to be obtained. The
value dl
for type
means that a tarball is going to be
downloaded. The location
field determines the download address and
the inner_dir
field tells the build system where in the tarball the
source code resides.
We then have a list of patches, twenty-three of them, that need to be
applied to the 4.4
release, so that we arrive at 4.4.23
(all of
these are downloaded from the bash release page mentioned
before).
Then there’s the Makefile:
all:
cd ../_src_ && ./configure --prefix=/usr --exec-prefix= --enable-static-link && $(MAKE)
install:
$(MAKE) -C ../_src_ install DESTDIR=${INSTDIR}
.PHONY: all install
(Yes, my code coloring tool messes up the colors for this make
snippet. Ugly, but can’t do nuffing about that right now!)
As you can see the all
target configures and builds the code, while
the install
target actually installs the files. There are a few
points to explain:
-
For building every package, a temporary directory created and the
package directory is copied here and renamed to pkg
. The source
code, if obtained from an external source, is fetched and put in an
_src_
directory inside the temporary directory.
-
We configure bash
so that it is linked statically. We still don’t
have glibc
or any of the other shared libraries needed.
-
We use the $(MAKE)
special variable instead of running make
directly. This way, some of the properties of the parent make
are
communicated to the sub-makes (like the -j
argument, so that the
right number of parallel jobs are used).
In order to actually run the built shell, you need to add the contents
of the package file (the one built by the build
script) to the image
file we created before. After that, run the VM like this:
qemu-system-x86_64 -kernel /path/to/bzImage \
-append "root=/dev/sda init=/bin/bash console=ttyS0" \
-hda /path/to/image.img \
-enable-kvm \
-nographic \
-serial mon:stdio
Note the init parameter passed to the kernel. Here, we are telling the
kernel to use bash
as the init program.
The source
Everything I’ve talked about in this article can be found in this
repository on Github.