Linux from Scratch has been one of the projects that I’ve always
been interested in…and never gotten around to actually going through
with it! I think I downloaded the book a few years ago and actually
read a couple of chapters, but never went past that.
So for one reason or another, during the last few days I’ve been
looking at how Linux is actually booted and how the user space is
launched, and I thought maybe I can do it all by myself. Without even
going through the LFS book.
So in this, hopefully, series of articles, I’m going to document the
process of building a very basic Linux system, all the pieces built
from scratch. This will be an iterative process in which we do things
step-by-step, in each step adding a bit more complexity.
This is what I’m trying to achieve with this series:
We’ll take a look at how Linux actually boots.
We’ll see the building blocks of the user space.
We’ll try to see what it is the distros do for us. Hopefully we’re
going to get a lot more respect for the folks who do all that hard
work for us!
I’ll try to keep everything deterministic and repeatable.
My main focus will be creating an image that is run in a VM and
accessed with SSH. So no graphics, at least, not for some time. SSH
will also take a while to arrive, but I’ll try to get there as soon
as possible, since I really hate working in a console without a
proper terminal.
Where to begin?
We will start with two pieces: the Linux kernel and a tiny init
program. This will be our init:
#include <stdio.h>
intmain(intargc,char*argv[]){printf("Hello, World!\n");printf("This is your friendly init system.\n");printf("Just hanging here...\n");for(;;);return0;}
This will simply print out a message and then loop indefinitely, since
an init is not supposed to ever exit. We will need to compile this
statically. We don’t have glibc or other dynamic libraries right
now. Compile the program by running:
gcc hello.c -static -o hello
strip hello
The resulting executable, hello, will be our primitive init.
Then we need to build the kernel. Get the source code for the stable
branch of the kernel:
You need to fix the path to the kernel and the disk image. The kernel
should be in the arch/x86_64/boot directory after the build is
complete.
If everything is okay, you’ll see the boot messages and at the end
you’ll get the “Hello, World!” message from our “init system.”
What’s in this command?
-kernel /path/to/bzImage: specifies the kernel image to
use. Using this option, we won’t have to create a bootable disk, we
just directly provide the kernel to boot.
-append "root=/dev/sda init=/init console=ttyS0": adds a few
options to the kernel command-line. root is the root file-system
that is mounted on /, init is the path to the init program to
use, and console specifies the output console device (needed in
combination with the -serial option).
-enable-kvm: use KVM for virtualization.
-nographic: do not show the SDL window that is used as VM display
by default.
-serial mon:stdio: redirect the serial port to stdio. This, in
combination with the console parameter passed to the kernel,
causes kernel output to be displayed on the current terminal.
It’s just the beginning
I had actually prepared a lot more material, especially about
repeatability and build automation, but since the article was getting
too long, I’ll leave those for another article. Stay tuned.
I’ve recently moved to the small but beautiful town of Enschede in The
Netherlands. I’ll just leave you with a view from my desk at work for
now. Pretty colorful, isn’t it?!
Just imagine for a second, that people were calling for a law that the
nationality of who calls them needs to be clear. “I need to know if a
Mexican is calling me,” they would say. What would you call those
people? Racists, right?
Now those people are doing something similar, but now they call what
they do “ethics”. They are outraged that we can’t be sure if it’s a
human or a robot on the other end of the line. Take a look at this
video, from Google I/O 2018, if you don’t know what the fuss is all
about:
My question is, why would you need to know that? One common argument
so far has been that scammers can make convincing robo-calls using
this technology. Well, excuse me, but scam calls were invented by
humans and they are still made, sometimes on pretty large scales, by
human callers.
And besides, say a law was passed that robots had to introduce
themselves as such over the phone. Then what? Let me let you in on a
little secret. Scammers are already doing something illegal. You think
they care? So what happens is legitimate calls, for which you have
nothing to worry about anyway, will start with “Hi! I’m Google
Assistant calling on behalf of Bob,” while scam calls will still start
with “Hey this is Bob from…”, you get the idea.
So let me get a bit of advice. When someone calls you, listen to what
they are saying. If it makes sense, go ahead. If not, end the call
immediately. Doesn’t make much of a difference if it’s a robot calling
you or not.
I’ve written a small benchmarking tool for some of the different XML
parsers available to Python programmers. It calculates each option’s
throughput by sending a large amount of XML data to each parser. You
need to provide it with some XML input.
As you can see, lxml rocks. Although, to be honest, I’m still
looking for something faster than that!
A word of warning. I don’t claim this is in anyway a fair and
scientific benchmark. I just wanted to see how these relatively
compare and cooked this script to get me some numbers.
For a long time now, I’ve been using Vagrant to quickly launch a VM or
two when I need to. Recently, I’ve been less and less satisfied with
Vagrant. It’s usually slow and needs editing the Vagrantfile if I
want to change the machine specs. The slowness might be partially due
to using VirtualBox by default. There is a vagrant-libvirt plugin that
lets you use libvirt/KVM but the plugin seems to be a hit-and-miss
affair and I’ve not been able to make it work all the time.
There is always the option of using virsh and other libvirt
utilities, of course, to launch VMs, but that is not as simple as I’d
like. I finally decided to write some sort of wrapper script for
libvirt and here it is: spinup –a simple utility to launch VMs
as fast as possible.
You need to clone the repository, run prepare.sh and you’re set to
use spinup. I’ll also assume that you’ve made a symlink to
spinup.py as spinup in an appropriate place, and installed the
dependencies, so that the utility is always easily available to
you. There’s of course the option of installing dependencies in a
virtualenv and running ./spinup.py from there. You will obviously
need libvirtd available, too.
The easiest way to launch a VM is by running this:
$ spinup
This will create an Ubuntu based VM with 1GiB of RAM and one CPU core,
downloading the Ubuntu cloud image the first time you run it. To land
inside the VM, simply run:
$ spinup ssh
The created VM is tied to the directory you create it in (although no
files are created in that directory). So you need to be in that
directory in order to have access to the VM.
In order to destroy the VM, simply run:
$ spinup destroy
You can create a VM with different specs like this:
$ spinup coreos 4G 2cpus
This will create a CoreOS based VM with 4GiB of RAM and two CPU cores.
It’s also possible to launch multiple VMs at the same time:
$ spinup :foo ubuntu 2G -- :bar coreos 4G 2cpus
Here we have created two VMs, naming them foo and bar
respectively. In order to ssh into bar simply run:
$ spinup ssh bar
Running spinup destroy will destroy both VMs.
One area in which spinup is sorely lacking at the moment is
networking. The created VMs are connected to libvirt’s default
network, but there are no other options. I’m hoping to fix this in the
near future. (Update: configuring network is now available, although
you might need to create the appropriate libvirt networks first.)
spinup is in its very early stages of development, released in the
“release early, release often” spirit. If you have any questions, you
can send me an email at mostafa(at)sepent.com or create an issue or
send a pull request over at github.