From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.34)
	id 1BSDYA-0005kW-As
	for qemu-devel@nongnu.org; Mon, 24 May 2004 07:23:22 -0400
Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.34)
	id 1BSDXS-0005aP-8n
	for qemu-devel@nongnu.org; Mon, 24 May 2004 07:23:10 -0400
Received: from [66.54.152.27] (helo=jive.SoftHome.net)
	by monty-python.gnu.org with smtp (Exim 4.34) id 1BSD6a-0000s1-Vz
	for qemu-devel@nongnu.org; Mon, 24 May 2004 06:54:54 -0400
From: Mulyadi Santosa <a_mulyadi@softhome.net>
Date: Mon, 24 May 2004 17:53:13 +0700
MIME-Version: 1.0
Content-Type: Multipart/Mixed;
  boundary="Boundary-00=_ZQdsAI7mxACFxYK"
Message-Id: <200405241753.13144.a_mulyadi@softhome.net>
Subject: [Qemu-devel] Qemu+openMosix HOWTO
Reply-To: a_mulyadi@softhome.net, qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://mail.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://mail.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org


--Boundary-00=_ZQdsAI7mxACFxYK
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

hello all

This is my newest Dojo, as the subject mention, it is a HOWTO about setting up  
virtual cluster using Qemu and openMosix

As this is 1.0 version, i might write something wrong or miss something that 
you think need to include, so feel free to add corrections or critics.

happy exercising ur Chi !! :-)

regards
--Boundary-00=_ZQdsAI7mxACFxYK
Content-Type: text/plain; charset="us-ascii"; name="Qemu-openMosix.doc.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="Qemu-openMosix.doc.txt"

OpenMosix Dojo (version 1.0. Copyright of Mulyadi Santosa)

Qemu and OpenMosix: The Internal Power of Virtualization
       =20
	Trying the first adventure into clustering arena? Then maybe you
began to gather some old PC from your garage, borrowing PCs from your=20
friend, or even sneaking into your neighbour's home trying to=20
"pick"their PC? :-) All just because =E2=80=9Coh boy, i just have one=20
PC.........and I want to play with openMosix for a while, but I have=20
no more PC...=E2=80=9D.

Or maybe, you are a brave spirit try to =E2=80=9Cconquer=E2=80=9D openMosix=
, so you=20
install openMosix on 4 PCs and then "booommmm" you got nasty=20
segfaults. and then someone suggests you to download and try new=20
version of openMosix patch....now it's time for another leg and hand=20
sport, moving around between PCs to update the kernel. Well, LTSP=20
might helps, but maybe it's not a good idea.

So, it's time to gather your strength. If you know Chi practice on=20
kungfu, now we do the same for your lonely PC....:-) If you ever heard=20
tools like VMWare, Bochs, Xen, Plex86, User Mode Linux or the gangs,=20
then it is time to meet=20
Qemu(http://http://fabrice.bellard.free.fr/qemu/) Grab the source=20
tarball at http://fabrice.bellard.free.fr/qemu/qemu-0.5.5.tar.gz, this=20
is the latest version (0.5.5)

Unpack the tarball (using tar -xzvf). Now, before you do the actual=20
"make", apply the following patch

=2D--------------------CUT Start of the Patch-----------------------
=2D-- ./before-diff/sdl.c	2004-05-18 10:33:05.000000000 +0700
+++ ./sdl.c	2004-05-18 10:40:55.000000000 +0700
@@ -130,6 +130,7 @@
 static void sdl_process_key(SDL_KeyboardEvent *ev)
 {
     int keycode, v;
+    static int modif;
    =20
     /* XXX: not portable, but avoids complicated mappings */
     keycode =3D ev->keysym.scancode;
@@ -150,6 +151,78 @@
     } else {
         keycode =3D 0;
     }
+    /* Adjust shift-key states when leaving window */
+
+    if (ev->keysym.scancode =3D=3D 0) {
+        if ((modif ^ ev->keysym.mod) & KMOD_LSHIFT)
+            kbd_put_keycode(0x2a | (modif & KMOD_LSHIFT ? 0x80 : 0));
+        if ((modif ^ ev->keysym.mod) & KMOD_RSHIFT)
+            kbd_put_keycode(0x36 | (modif & KMOD_RSHIFT ? 0x80 : 0));
+        if ((modif ^ ev->keysym.mod) & KMOD_LCTRL)
+            kbd_put_keycode(0x1d | (modif & KMOD_LCTRL ? 0x80 : 0));
+        if ((modif ^ ev->keysym.mod) & KMOD_RCTRL) {
+            kbd_put_keycode(0xe0 );
+            kbd_put_keycode(0x1d | (modif & KMOD_RCTRL ? 0x80 : 0));
+        }
+        if ((modif ^ ev->keysym.mod) & KMOD_LALT)
+            kbd_put_keycode(0x38 | (modif & KMOD_LALT ? 0x80 : 0));
+        if ((modif ^ ev->keysym.mod) & KMOD_RALT) {
+            kbd_put_keycode(0xe0 );
+            kbd_put_keycode(0x38 | (modif & KMOD_RALT ? 0x80 : 0));
+        }
+        modif =3D ev->keysym.mod;
+    }
+
+    /* remember shift-key state */
+
+    switch (keycode) {
+    case 0x2a:                          /* Left Shift */
+        if (ev->type =3D=3D SDL_KEYUP)
+            modif &=3D ~KMOD_LSHIFT;
+        else
+            modif |=3D KMOD_LSHIFT;
+        break;
+    case 0x36:                          /* Right Shift */
+        if (ev->type =3D=3D SDL_KEYUP)
+            modif &=3D ~KMOD_RSHIFT;
+        else
+            modif |=3D KMOD_RSHIFT;
+        break;
+    case 0x1d:                          /* Left CTRL */
+        if (ev->type =3D=3D SDL_KEYUP)
+            modif &=3D ~KMOD_LCTRL;
+        else
+            modif |=3D KMOD_LCTRL;
+        break;
+    case 0x1de0:                        /* Right CTRL */
+        if (ev->type =3D=3D SDL_KEYUP)
+            modif &=3D ~KMOD_RCTRL;
+        else
+            modif |=3D KMOD_RCTRL;
+        break;
+    case 0x38:                          /* Left ALT */
+        if (ev->type =3D=3D SDL_KEYUP)
+            modif &=3D ~KMOD_LALT;
+        else
+            modif |=3D KMOD_LALT;
+        break;
+    case 0x38e0:                        /* Right ALT */
+        if (ev->type =3D=3D SDL_KEYUP)
+            modif &=3D ~KMOD_RALT;
+        else
+            modif |=3D KMOD_RALT;
+        break;
+    case 0x45:                          /* Num Lock */
+        kbd_put_keycode(0x45);
+        kbd_put_keycode(0xc5);
+        return;
+    case 0x3a:                          /* Caps Lock */
+        kbd_put_keycode(0x3a);
+        kbd_put_keycode(0xba);
+        return;
+
+    }
+
    =20
     /* now send the key code */
     while (keycode !=3D 0) {

=2D--------------CUT End of Patch------------------------------

basically this is a patch for fixing a keyboard problem in the SDL=20
Graphic output. This patch is adjusted for SDL-1.2.5-3 on Redhat 9, so=20
feel free to adjust the patch for your distro/setting. Do I mention=20
SDL? yes, you need to install SDL and SDL devel package if you want=20
graphical output (it is heavily recommended....at least from my point=20
of view)

Now, do the usual mantra. I assume that you will=20
install into /usr/local/qemu:
# ./configure --prefix=3D/usr/local/qemu/
# make && make install

Now, we are ready to build the disk image. You can imagine disk image=20
as virtual hard drive for Qemu. I assume you want to create the disk=20
image inside /mnt/qemu:
dd of=3D/mnt/qemu/myimage bs=3D1M seek=3D700 count=3D0

The above command is example on creating 700 MB of empty image. You=20
can set another size by changing "seek" and "bs" parameter. "man dd"=20
for complete reference


export this directory on QEMU_TMPDIR environment variable:

        export QEMU_TMPDIR=3D/mytmpfs

after that, pick you Linux CD or ISO image and run the following=20
command (from now on, please self adjust the actual path to qemu and=20
qemu-fast binary):

# qemu -hda /mnt/qemu/myimage -cdrom /mnt/cdrom -boot boot d -mem 64

This is relatively easy to understand, it tolds qemu to boot from CD=20
Rom and also load the disk image so you can start the instalation.=20
Couple weeks ago, I install debian 3.0 woody inside the disk image=20
because i think it is relatively stable and compact. You can pick=20
another distro of you flavour...just remember to give enough room=20
because so far I don't know how to resize the disk image :-)

Just install Linux as usual and don't forget to set swap partition.=20
So, actually when you finish installing Linux, inside the disk image,=20
it should contains the root partition and the swap.

The things you need to include are gcc/glibc, shells (of course, who=20
can live without it ;-) ), automake/autoconf, tar, gzip/gunzip.=20

After finishing the Linux instalation, quit first from Qemu and now we=20
move to openMosix kernel compilation. Put the below patch on your=20
openMosix patched kernel to make it compatible with qemu-fast:

=2D---------------CUT Start of patch------------------------
diff -Naur ./linux/arch/i386/vmlinux.lds=20
=2E/linux-qemu/arch/i386/vmlinux.lds--- ./linux/arch/i386/vmlinux.lds=09
2002-02-26 02:37:53.000000000 +0700+++=20
=2E/linux-qemu/arch/i386/vmlinux.lds	2004-05-17 17:15:37.000000000=20
+0700@@ -6,7 +6,7 @@ ENTRY(_start)
 SECTIONS
 {
=2D  . =3D 0xC0000000 + 0x100000;
+  . =3D 0x90000000 + 0x100000;
   _text =3D .;			/* Text and read-only data */
   .text : {
 	*(.text)
diff -Naur ./linux/include/asm-i386/page.h=20
=2E/linux-qemu/include/asm-i386/page.h---=20
=2E/linux/include/asm-i386/page.h	2004-05-14 12:26:48.000000000 +0700+++=20
=2E/linux-qemu/include/asm-i386/page.h	2004-05-17 17:14:50.000000000=20
+0700@@ -78,7 +78,7 @@  * and CONFIG_HIGHMEM64G options in the kernel=20
configuration.  */
=20
=2D#define __PAGE_OFFSET		(0xC0000000)
+#define __PAGE_OFFSET		(0x90000000)
=20
 /*
  * This much address space is reserved for vmalloc() and iomap()
=2D-------------------------CUT end of patch---------------------------

This patch is modifying several kernel page offset, so it becomes=20
compatible with qemu-fast.....

Why do we need qemu-fast? Why not using plain Qemu? The answer is:=20
(copied from Qemu documentation)
"qemu-fast uses the host Memory Management Unit (MMU) to simulate the=20
x86 MMU. It is fast but has limitations because the whole 4 GB address=20
space cannot be used and some memory mapped peripherials cannot be=20
emulated accurately yet"

In other word, qemu-fast doesn't simulate MMU, instead it use the=20
host's MMU.....should be faster right? But yes, there is 4GB=20
limitation, but who want 4GB just for simulation? :-) It should be=20
fine for general case AFAIK

On kernel configuration, remember to add kernel native (not module)=20
for ne2k and ne2000: ( you can found them on "Network device=20
support"-->"Etherne 10 or 100 MBit")
CONFIG_NE2000=3Dy
CONFIG_NE2K_PCI=3Dy

I am not sure which one actually needed for Qemu, but adding both=20
won't hurt :-) Feed another option if you think you will need them=20

Do the usual kernel compilation, and move the finished bzImage (i=20
prefer bzImage, it is up to you the pick the final type of kernel),=20
vmlinux and System.map to a directory. if you had modules, we will=20
move them later inside the disk image. Lets assume you move them to=20
/boot/qemu Oh, BTW, it is also a good idea to put tmpfs mounted=20
directory for Qemu's need. here I create 1 Gigabyte tmpfs:

mount -t tmpfs -o size=3D1G tmpfs /mytmpfs/

Now, we need to testdrive the kernel. Put following command as shell=20
script :
/usr/local/qemu/bin/qemu-fast -hda /mnt/qemu/myimage -hdb /dev/hda=20
=2Dkernel /boot/qemu/bzImage -append "root=3D/dev/hda1=20
ide3=3Dnoprobe ide4=3Dnoprobe ide5=3Dnoprobe"

The above script assume that you create root filesystem inside the=20
disk image on first partition, that's why the root parameter is=20
"/dev/hda1". And what is "-hdb /dev/hda"? Well, we need to copy=20
several files from host system, so we need to mount the disk inside=20
the Qemu :-) If your layout is different, again feel free to modify=20
the parameter

You get the login prompt? Congratulations ! Now, login and make sure=20
you have following report from "dmesg":
NE*000 ethercard probe at 0x300: 52 54 00 12 34 56
eth0: NE2000 found at 0x300, using IRQ 9.

This lines indicate that kernel succesfully detect emulated NE2000=20
card. So far I have no problem with the "fake" NE2000, so the only=20
trick is....just make sure you are including NE2000 support.=20

We have moved half-way so far. Now we move the kernel modules....how?=20
by mounting fake "/dev/hdb" inside Qemu. e.g:
mount -t ext2 /dev/hdb1 /mnt/host

There....you can access the host filesystem, now copy the /lib/modules=20
straight into the disk image. After that, "halt" the guest system and=20
restart qemu using above script. You should find out that now "the=20
missing modules" are loaded successfuly

openMosix need user land tools, right? Same like above, transfer the=20
userland tools tarball (use version 0.3.5) inside the guest and do=20
compilation. This way, we make sure that it is compiled against=20
correct gcc/glibc.Oh wait? you need oM kernel headers? Mount the host=20
filesystem and create the soft link from openMosix kernel source=20
toward /usr/src/linux-openmosix and then the compilation will goes=20
smooth

I will skip the oM spesific setting, just refer to the HOWTO for how=20
to setup /etc/inittab, setting maps etc. Also remember to setup ip=20
address for eth0 (on Debian, you can turn it on after start up using=20
/etc/networks/interface)

Again, shut down the guest system. Now we move to setting up the=20
second node. "What, doing above steps again? You gotta be=20
kidding, right? I need faster way" ! Ok, relax :-) that's why we will=20
create COW (Copy on Write Image). What is it? You can imagine as a way=20
for sharing original disk image between Qemu instances, but each=20
instance keeping its own copy of disk image if they do some=20
modifications inside the original disk image. The original image will=20
be safe.....

Lets create two COWs (yes COW, but not cows which produces milk, ok?=20
:-))) ):

# qemu-mkcow -f /mnt/qemu/myimage /mnt/qemu/mycow1.cow
# qemu-mkcow -f /mnt/qemu/myimage /mnt/qemu/mycow2.cow

Not so difficult, right? After that, create script for enabling the=20
TUN/TAP device. "Wait wait....TUN/TAP, why do I need it?" Well,=20
TUN/TAP is virtual device that acting as network bridge between=20
guest system and its host. So, if you don't turn it on, there is=20
no "network connection" between guest and its host

here is the example of the script:
#!/bin/sh
sudo /sbin/ifconfig $1 192.168.1.11 netmask 255.255.255.0

Modify above script for each TUN/TAP of the guests and remember not to=20
assign same IP to other TUN/TAP or guest's IP=20

I suggest to separate the netmask of TUN/TAP device and the interface=20
inside the guest against the netmask of host. I use this trick so I=20
won't mess a lot with host's routing table. For this "Dojo" i use=20
following topology:

                        host (10.1.1.1)
                        /        \
                       /          \
                      /            \
1st TUN/TAP (192.168.1.11)      2nd TUN/TAP (192.168.1.12)
        |                               |
        |                               |
1st guest (192.168.1.21)        2nd guest (192.168.1.22)

Got brighter picture from above diagram? I hope so.....:-) So, back to=20
the TUN script, you should write two script:

=46or 1st TUN/TAP: (name it /mnt/qemu/qemu-ifup)
#!/bin/sh
sudo /sbin/ifconfig $1 192.168.1.11 netmask 255.255.255.0

=46or 2nd TUN/TAP: (name it /mnt/qemu/qemu-ifup2)
#!/bin/sh
sudo /sbin/ifconfig $1 192.168.1.12 netmask 255.255.255.0

eth0 inside 1st guest: 192.168.21
eth0 inside 2nd guest: 192.168.22

I use above IP numbering so I can quickly remind myself about the=20
topology (x.x.x.x1 is for 1st group, x.x.x.x2 for 2nd group). You=20
don't have to follow my idea :-)

because you already have two COWs, modify the qemu start script, so it=20
becomes:
(for 1st guest)
/usr/local/qemu/bin/qemu-fast -hda /mnt/qemu/mycow1.cow -macaddr=20
52:54:00:12:34:56  -kernel /boot/qemu/bzImage -n ./qemu-ifup
=2Dappend "root=3D/dev/hda1 ide3=3Dnoprobe ide4=3Dnoprobe ide5=3Dnoprobe"

(for 2nd guest)
/usr/local/qemu/bin/qemu-fast -hda /mnt/qemu/mycow2.cow -macaddr=20
52:54:00:12:34:60  -kernel /boot/qemu/bzImage -n ./qemu-ifup2
=2Dappend "root=3D/dev/hda1 ide3=3Dnoprobe ide4=3Dnoprobe ide5=3Dnoprobe"

"I am noticing -macaddr switch...Why do we need it?" The answer: if=20
you don't set it explicitly, you will get same mac address for the=20
both guest system...and that would confuse TCP/IP arp resolve=20
mechanism. So, we need to set distinct MAC adress for each of=20
guests. Still confuse on what I am talking about? Go to RFC about ARP=20
or TCP/IP and read about IP to MAC Adress resolution mechanism.

Now, fire up both qemu instance and watch them load the openMosix=20
kernel until you got login prompt. back to host system, now we need to=20
setup a bridge connecting these 2 guests "Oh boy, another pain is=20
come....when it will stop? " :))) remember, dojo is the place to=20
practice, not for instant skill like Neo when he got kung fu skill=20
inside Matrix :-) The quote "No pain no gain" must be followed here=20
:_)))

OK, back to bridge. You can imagine bridge as "a hub connecting=20
any target network interface" Copy following script to setup bridge=20
between TUN0 and TUN1:
(let's name is start-bridge.sh)
#!/bin/bash
/sbin/modprobe bridge
/sbin/route del -net 192.168.1.0 netmask 255.255.255.0
/sbin/route del -net 192.168.1.0 netmask 255.255.255.0
/usr/sbin/brctl addbr br0
/usr/sbin/brctl addif br0 tun0
/usr/sbin/brctl addif br0 tun1
/sbin/ifconfig br0 192.168.1.13 netmask 255.255.255.0
/sbin/ifconfig tun0 0.0.0.0
/sbin/ifconfig tun1 0.0.0.0

Basically, the default kernel on redhat 9 (the one I use as=20
experiment) comes with bridging capability as module (bridge.o) If you=20
don't found one, recompile the kernel and make sure you include this:
CONFIG_BRIDGE=3Dm (as module)--> preferred
or
CONFIG_BRIDGE=3Dy (as native kernel part)

You can find them under "Networking options". It is named "802.1d=20
Ethernet Bridging".

Why do we need to do "route del"? Well, remember that we previously=20
turn up the TUN/TAP device? On Linux (recently), "ifconfig"=20
automatically setup routing for each new IP address assigned to a=20
device. So, basically we clean them up becaue we don't need them!=20

The next line is about setting up the bridge itself. You need to=20
install bridge-utils RPM (RH 9 includes this tools). If you don't=20
think your distro doesn't include it, go to=20
http://www.math.leidenuniv.nl/~buytenh/bridge and grab the tarball=20
there. Actually, what I am goinf to explain is short version of Bridge=20
Mini Howto, you can find more about bridging on www.tldp.org and=20
search about "bridge". usually many distribution includes this docs.

Lets analyze the command
/usr/sbin/brctl addbr br0
=2D-> here we create new bridge interface named "br0"

/usr/sbin/brctl addif br0 tun0
/usr/sbin/brctl addif br0 tun1

=2D-> here we "bond" the tun0 and tun1 so they were attached "inside"=20
the bridge=20

/sbin/ifconfig br0 192.168.1.13 netmask 255.255.255.0

=2D-> like you know, assign an IP address and netmask for the bridge.=20
You still need to assure that the bridge on same subnet like the=20
guests are.....

/sbin/ifconfig tun0 0.0.0.0
/sbin/ifconfig tun1 0.0.0.0

=2D-> easy, just assign 0.0.0.0 IP (but not turn down, i repeat DO NOT=20
turn the TUNs down)  for the TUNs :-)

The topology becomes

                        host (10.1.1.1/24)
                        /        \
                       /          \
                      /            \
                    T H E   B R I D G E (192.168.1.13/24)
                        |                       |
                        |                       |
        1st TUN/TAP (0.0.0.0)              2nd TUN/TAP (0.0.0.0)
                |                               |
                |                               |
        1st guest (192.168.1.21/24)        2nd guest (192.168.1.22/24)

Now try to ping from guest 1 to guest 2 and likewise.....success? Now=20
start the openMosix (just copy the openMosix start/stop script from=20
openMosix userland tarball) and confirm that "mosmon" see all the=20
nodes !

Let me state something before we goes further. Something inside Qemu=20
screw up openMosix auto detection of system's speed, so make sure you=20
include this line on your openMosix startup script

mosctl setspeed 15000

feel free to adjust the number, but make sure you set same number=20
across all guest system, if not, you will got weird load levelling=20
mechanism...believe me....:-) It takes 2 days for me just to find out=20
about this "speed" thing when I saw openMosix doesn't load=20
balance my program  :-)))

After that, try the migration between guests.....this won't be an=20
openMosix cluster if it can't migrate process, right? :-) Just compile=20
simple C program like below:void main()
{
    int a=3D0,b=3D0;
    for (a=3D0;a<=3D1000000;a++)
        for (b=3D0;b<=3D1000000;b++)
        {
        };
}

suppose you name it "silly.c" then compile it as "silly" and run silly=20
in the background (add "&"). 2 instance of "silly" is sufficient for=20
start..

Success? Then congratulations...you have exercised your Chi into=20
highest level :-) have fun with your new virtual Cluster

reference:
=2D Qemu user documentation and technical documentation
=2D openMosix HOWTO
=2D Ethernet Bridge mini HOWTO
=2D Documentation/networking/tmpfs.txt inside the kernel source=20
directory
--Boundary-00=_ZQdsAI7mxACFxYK--