elektito programming & stuff

Linux Really from Scratch: Part 1

Linux from Scratch has been one of the projects that I’ve always been interested in…and never gotten around to actually going through with it! I think I downloaded the book a few years ago and actually read a couple of chapters, but never went past that.

So for one reason or another, during the last few days I’ve been looking at how Linux is actually booted and how the user space is launched, and I thought maybe I can do it all by myself. Without even going through the LFS book.

So in this, hopefully, series of articles, I’m going to document the process of building a very basic Linux system, all the pieces built from scratch. This will be an iterative process in which we do things step-by-step, in each step adding a bit more complexity.

This is what I’m trying to achieve with this series:

  • We’ll take a look at how Linux actually boots.

  • We’ll see the building blocks of the user space.

  • We’ll try to see what it is the distros do for us. Hopefully we’re going to get a lot more respect for the folks who do all that hard work for us!

  • I’ll try to keep everything deterministic and repeatable.

  • My main focus will be creating an image that is run in a VM and accessed with SSH. So no graphics, at least, not for some time. SSH will also take a while to arrive, but I’ll try to get there as soon as possible, since I really hate working in a console without a proper terminal.

Where to begin?

We will start with two pieces: the Linux kernel and a tiny init program. This will be our init:

#include <stdio.h>

int
main(int argc, char *argv[])
{
    printf("Hello, World!\n");
    printf("This is your friendly init system.\n");
    printf("Just hanging here...\n");
    for (;;);

    return 0;
}

This will simply print out a message and then loop indefinitely, since an init is not supposed to ever exit. We will need to compile this statically. We don’t have glibc or other dynamic libraries right now. Compile the program by running:

gcc hello.c -static -o hello
strip hello

The resulting executable, hello, will be our primitive init.

Then we need to build the kernel. Get the source code for the stable branch of the kernel:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

The latest stable version is v4.19.8 right now. Checkout that version:

git checkout v4.19.8

I decided to simply use the config for current Ubuntu 18.04 kernel as the basis. (Thinking about repeatability? You’re right. More on that later.)

Configure and build the kernel:

cd linux
cp /boot/config-$(uname -r) ./.config
make oldconfig
make

Running what we have

We’re going to use qemu to run what we have. But first, let’s create the root image.

Create an image file and mount it:

qemu-img create image.img 2G
sudo mount image.img /mnt

Now copy the hello executable to the mount directory and rename it to init. Then unmount.

cp hello /mnt/init
sudo umount /mnt

All right. We’re all set. Launch it all by running:

qemu-system-x86_64 -kernel /path/to/bzImage \
                   -append "root=/dev/sda init=/init console=ttyS0" \
                   -hda /path/to/image.img \
                   -enable-kvm \
                   -nographic \
                   -serial mon:stdio

You need to fix the path to the kernel and the disk image. The kernel should be in the arch/x86_64/boot directory after the build is complete.

If everything is okay, you’ll see the boot messages and at the end you’ll get the “Hello, World!” message from our “init system.”

What’s in this command?

  • -kernel /path/to/bzImage: specifies the kernel image to use. Using this option, we won’t have to create a bootable disk, we just directly provide the kernel to boot.

  • -append "root=/dev/sda init=/init console=ttyS0": adds a few options to the kernel command-line. root is the root file-system that is mounted on /, init is the path to the init program to use, and console specifies the output console device (needed in combination with the -serial option).

  • -enable-kvm: use KVM for virtualization.

  • -nographic: do not show the SDL window that is used as VM display by default.

  • -serial mon:stdio: redirect the serial port to stdio. This, in combination with the console parameter passed to the kernel, causes kernel output to be displayed on the current terminal.

It’s just the beginning

I had actually prepared a lot more material, especially about repeatability and build automation, but since the article was getting too long, I’ll leave those for another article. Stay tuned.

Moved!

I’ve recently moved to the small but beautiful town of Enschede in The Netherlands. I’ll just leave you with a view from my desk at work for now. Pretty colorful, isn’t it?!

Office View

So you want to know you're talking to a robot?

Just imagine for a second, that people were calling for a law that the nationality of who calls them needs to be clear. “I need to know if a Mexican is calling me,” they would say. What would you call those people? Racists, right?

Now those people are doing something similar, but now they call what they do “ethics”. They are outraged that we can’t be sure if it’s a human or a robot on the other end of the line. Take a look at this video, from Google I/O 2018, if you don’t know what the fuss is all about:

My question is, why would you need to know that? One common argument so far has been that scammers can make convincing robo-calls using this technology. Well, excuse me, but scam calls were invented by humans and they are still made, sometimes on pretty large scales, by human callers.

And besides, say a law was passed that robots had to introduce themselves as such over the phone. Then what? Let me let you in on a little secret. Scammers are already doing something illegal. You think they care? So what happens is legitimate calls, for which you have nothing to worry about anyway, will start with “Hi! I’m Google Assistant calling on behalf of Bob,” while scam calls will still start with “Hey this is Bob from…”, you get the idea.

So let me get a bit of advice. When someone calls you, listen to what they are saying. If it makes sense, go ahead. If not, end the call immediately. Doesn’t make much of a difference if it’s a robot calling you or not.

Benchmarking Python XML Parsers

I’ve written a small benchmarking tool for some of the different XML parsers available to Python programmers. It calculates each option’s throughput by sending a large amount of XML data to each parser. You need to provide it with some XML input.

$ ./pyxmlperftests.py 1.xml 2.xml 3.xml 4.xml

You can find the source code here on Github.

These are the results on my computer:

Results:
   xml.dom.minidom: 7.49 MBps
   lxml.etree: 89.63 MBps
   xml.etree.ElementTree.iterparse: 31.77 MBps
   xml.etree.ElementTree: 58.43 MBps
   xml.sax: 25.68 MBps

As you can see, lxml rocks. Although, to be honest, I’m still looking for something faster than that!

A word of warning. I don’t claim this is in anyway a fair and scientific benchmark. I just wanted to see how these relatively compare and cooked this script to get me some numbers.

Launch Virtual Machines Quickly with spinup

For a long time now, I’ve been using Vagrant to quickly launch a VM or two when I need to. Recently, I’ve been less and less satisfied with Vagrant. It’s usually slow and needs editing the Vagrantfile if I want to change the machine specs. The slowness might be partially due to using VirtualBox by default. There is a vagrant-libvirt plugin that lets you use libvirt/KVM but the plugin seems to be a hit-and-miss affair and I’ve not been able to make it work all the time.

There is always the option of using virsh and other libvirt utilities, of course, to launch VMs, but that is not as simple as I’d like. I finally decided to write some sort of wrapper script for libvirt and here it is: spinup –a simple utility to launch VMs as fast as possible.

You need to clone the repository, run prepare.sh and you’re set to use spinup. I’ll also assume that you’ve made a symlink to spinup.py as spinup in an appropriate place, and installed the dependencies, so that the utility is always easily available to you. There’s of course the option of installing dependencies in a virtualenv and running ./spinup.py from there. You will obviously need libvirtd available, too.

The easiest way to launch a VM is by running this:

$ spinup

This will create an Ubuntu based VM with 1GiB of RAM and one CPU core, downloading the Ubuntu cloud image the first time you run it. To land inside the VM, simply run:

$ spinup ssh

The created VM is tied to the directory you create it in (although no files are created in that directory). So you need to be in that directory in order to have access to the VM.

In order to destroy the VM, simply run:

$ spinup destroy

You can create a VM with different specs like this:

$ spinup coreos 4G 2cpus

This will create a CoreOS based VM with 4GiB of RAM and two CPU cores.

It’s also possible to launch multiple VMs at the same time:

$ spinup :foo ubuntu 2G -- :bar coreos 4G 2cpus

Here we have created two VMs, naming them foo and bar respectively. In order to ssh into bar simply run:

$ spinup ssh bar

Running spinup destroy will destroy both VMs.

One area in which spinup is sorely lacking at the moment is networking. The created VMs are connected to libvirt’s default network, but there are no other options. I’m hoping to fix this in the near future. (Update: configuring network is now available, although you might need to create the appropriate libvirt networks first.)

spinup is in its very early stages of development, released in the “release early, release often” spirit. If you have any questions, you can send me an email at mostafa(at)sepent.com or create an issue or send a pull request over at github.