From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.34) id 1BSDYA-0005kW-As for qemu-devel@nongnu.org; Mon, 24 May 2004 07:23:22 -0400 Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.34) id 1BSDXS-0005aP-8n for qemu-devel@nongnu.org; Mon, 24 May 2004 07:23:10 -0400 Received: from [66.54.152.27] (helo=jive.SoftHome.net) by monty-python.gnu.org with smtp (Exim 4.34) id 1BSD6a-0000s1-Vz for qemu-devel@nongnu.org; Mon, 24 May 2004 06:54:54 -0400 From: Mulyadi Santosa Date: Mon, 24 May 2004 17:53:13 +0700 MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_ZQdsAI7mxACFxYK" Message-Id: <200405241753.13144.a_mulyadi@softhome.net> Subject: [Qemu-devel] Qemu+openMosix HOWTO Reply-To: a_mulyadi@softhome.net, qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org --Boundary-00=_ZQdsAI7mxACFxYK Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline hello all This is my newest Dojo, as the subject mention, it is a HOWTO about setting up virtual cluster using Qemu and openMosix As this is 1.0 version, i might write something wrong or miss something that you think need to include, so feel free to add corrections or critics. happy exercising ur Chi !! :-) regards --Boundary-00=_ZQdsAI7mxACFxYK Content-Type: text/plain; charset="us-ascii"; name="Qemu-openMosix.doc.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="Qemu-openMosix.doc.txt" OpenMosix Dojo (version 1.0. Copyright of Mulyadi Santosa) Qemu and OpenMosix: The Internal Power of Virtualization =20 Trying the first adventure into clustering arena? Then maybe you began to gather some old PC from your garage, borrowing PCs from your=20 friend, or even sneaking into your neighbour's home trying to=20 "pick"their PC? :-) All just because =E2=80=9Coh boy, i just have one=20 PC.........and I want to play with openMosix for a while, but I have=20 no more PC...=E2=80=9D. Or maybe, you are a brave spirit try to =E2=80=9Cconquer=E2=80=9D openMosix= , so you=20 install openMosix on 4 PCs and then "booommmm" you got nasty=20 segfaults. and then someone suggests you to download and try new=20 version of openMosix patch....now it's time for another leg and hand=20 sport, moving around between PCs to update the kernel. Well, LTSP=20 might helps, but maybe it's not a good idea. So, it's time to gather your strength. If you know Chi practice on=20 kungfu, now we do the same for your lonely PC....:-) If you ever heard=20 tools like VMWare, Bochs, Xen, Plex86, User Mode Linux or the gangs,=20 then it is time to meet=20 Qemu(http://http://fabrice.bellard.free.fr/qemu/) Grab the source=20 tarball at http://fabrice.bellard.free.fr/qemu/qemu-0.5.5.tar.gz, this=20 is the latest version (0.5.5) Unpack the tarball (using tar -xzvf). Now, before you do the actual=20 "make", apply the following patch =2D--------------------CUT Start of the Patch----------------------- =2D-- ./before-diff/sdl.c 2004-05-18 10:33:05.000000000 +0700 +++ ./sdl.c 2004-05-18 10:40:55.000000000 +0700 @@ -130,6 +130,7 @@ static void sdl_process_key(SDL_KeyboardEvent *ev) { int keycode, v; + static int modif; =20 /* XXX: not portable, but avoids complicated mappings */ keycode =3D ev->keysym.scancode; @@ -150,6 +151,78 @@ } else { keycode =3D 0; } + /* Adjust shift-key states when leaving window */ + + if (ev->keysym.scancode =3D=3D 0) { + if ((modif ^ ev->keysym.mod) & KMOD_LSHIFT) + kbd_put_keycode(0x2a | (modif & KMOD_LSHIFT ? 0x80 : 0)); + if ((modif ^ ev->keysym.mod) & KMOD_RSHIFT) + kbd_put_keycode(0x36 | (modif & KMOD_RSHIFT ? 0x80 : 0)); + if ((modif ^ ev->keysym.mod) & KMOD_LCTRL) + kbd_put_keycode(0x1d | (modif & KMOD_LCTRL ? 0x80 : 0)); + if ((modif ^ ev->keysym.mod) & KMOD_RCTRL) { + kbd_put_keycode(0xe0 ); + kbd_put_keycode(0x1d | (modif & KMOD_RCTRL ? 0x80 : 0)); + } + if ((modif ^ ev->keysym.mod) & KMOD_LALT) + kbd_put_keycode(0x38 | (modif & KMOD_LALT ? 0x80 : 0)); + if ((modif ^ ev->keysym.mod) & KMOD_RALT) { + kbd_put_keycode(0xe0 ); + kbd_put_keycode(0x38 | (modif & KMOD_RALT ? 0x80 : 0)); + } + modif =3D ev->keysym.mod; + } + + /* remember shift-key state */ + + switch (keycode) { + case 0x2a: /* Left Shift */ + if (ev->type =3D=3D SDL_KEYUP) + modif &=3D ~KMOD_LSHIFT; + else + modif |=3D KMOD_LSHIFT; + break; + case 0x36: /* Right Shift */ + if (ev->type =3D=3D SDL_KEYUP) + modif &=3D ~KMOD_RSHIFT; + else + modif |=3D KMOD_RSHIFT; + break; + case 0x1d: /* Left CTRL */ + if (ev->type =3D=3D SDL_KEYUP) + modif &=3D ~KMOD_LCTRL; + else + modif |=3D KMOD_LCTRL; + break; + case 0x1de0: /* Right CTRL */ + if (ev->type =3D=3D SDL_KEYUP) + modif &=3D ~KMOD_RCTRL; + else + modif |=3D KMOD_RCTRL; + break; + case 0x38: /* Left ALT */ + if (ev->type =3D=3D SDL_KEYUP) + modif &=3D ~KMOD_LALT; + else + modif |=3D KMOD_LALT; + break; + case 0x38e0: /* Right ALT */ + if (ev->type =3D=3D SDL_KEYUP) + modif &=3D ~KMOD_RALT; + else + modif |=3D KMOD_RALT; + break; + case 0x45: /* Num Lock */ + kbd_put_keycode(0x45); + kbd_put_keycode(0xc5); + return; + case 0x3a: /* Caps Lock */ + kbd_put_keycode(0x3a); + kbd_put_keycode(0xba); + return; + + } + =20 /* now send the key code */ while (keycode !=3D 0) { =2D--------------CUT End of Patch------------------------------ basically this is a patch for fixing a keyboard problem in the SDL=20 Graphic output. This patch is adjusted for SDL-1.2.5-3 on Redhat 9, so=20 feel free to adjust the patch for your distro/setting. Do I mention=20 SDL? yes, you need to install SDL and SDL devel package if you want=20 graphical output (it is heavily recommended....at least from my point=20 of view) Now, do the usual mantra. I assume that you will=20 install into /usr/local/qemu: # ./configure --prefix=3D/usr/local/qemu/ # make && make install Now, we are ready to build the disk image. You can imagine disk image=20 as virtual hard drive for Qemu. I assume you want to create the disk=20 image inside /mnt/qemu: dd of=3D/mnt/qemu/myimage bs=3D1M seek=3D700 count=3D0 The above command is example on creating 700 MB of empty image. You=20 can set another size by changing "seek" and "bs" parameter. "man dd"=20 for complete reference export this directory on QEMU_TMPDIR environment variable: export QEMU_TMPDIR=3D/mytmpfs after that, pick you Linux CD or ISO image and run the following=20 command (from now on, please self adjust the actual path to qemu and=20 qemu-fast binary): # qemu -hda /mnt/qemu/myimage -cdrom /mnt/cdrom -boot boot d -mem 64 This is relatively easy to understand, it tolds qemu to boot from CD=20 Rom and also load the disk image so you can start the instalation.=20 Couple weeks ago, I install debian 3.0 woody inside the disk image=20 because i think it is relatively stable and compact. You can pick=20 another distro of you flavour...just remember to give enough room=20 because so far I don't know how to resize the disk image :-) Just install Linux as usual and don't forget to set swap partition.=20 So, actually when you finish installing Linux, inside the disk image,=20 it should contains the root partition and the swap. The things you need to include are gcc/glibc, shells (of course, who=20 can live without it ;-) ), automake/autoconf, tar, gzip/gunzip.=20 After finishing the Linux instalation, quit first from Qemu and now we=20 move to openMosix kernel compilation. Put the below patch on your=20 openMosix patched kernel to make it compatible with qemu-fast: =2D---------------CUT Start of patch------------------------ diff -Naur ./linux/arch/i386/vmlinux.lds=20 =2E/linux-qemu/arch/i386/vmlinux.lds--- ./linux/arch/i386/vmlinux.lds=09 2002-02-26 02:37:53.000000000 +0700+++=20 =2E/linux-qemu/arch/i386/vmlinux.lds 2004-05-17 17:15:37.000000000=20 +0700@@ -6,7 +6,7 @@ ENTRY(_start) SECTIONS { =2D . =3D 0xC0000000 + 0x100000; + . =3D 0x90000000 + 0x100000; _text =3D .; /* Text and read-only data */ .text : { *(.text) diff -Naur ./linux/include/asm-i386/page.h=20 =2E/linux-qemu/include/asm-i386/page.h---=20 =2E/linux/include/asm-i386/page.h 2004-05-14 12:26:48.000000000 +0700+++=20 =2E/linux-qemu/include/asm-i386/page.h 2004-05-17 17:14:50.000000000=20 +0700@@ -78,7 +78,7 @@ * and CONFIG_HIGHMEM64G options in the kernel=20 configuration. */ =20 =2D#define __PAGE_OFFSET (0xC0000000) +#define __PAGE_OFFSET (0x90000000) =20 /* * This much address space is reserved for vmalloc() and iomap() =2D-------------------------CUT end of patch--------------------------- This patch is modifying several kernel page offset, so it becomes=20 compatible with qemu-fast..... Why do we need qemu-fast? Why not using plain Qemu? The answer is:=20 (copied from Qemu documentation) "qemu-fast uses the host Memory Management Unit (MMU) to simulate the=20 x86 MMU. It is fast but has limitations because the whole 4 GB address=20 space cannot be used and some memory mapped peripherials cannot be=20 emulated accurately yet" In other word, qemu-fast doesn't simulate MMU, instead it use the=20 host's MMU.....should be faster right? But yes, there is 4GB=20 limitation, but who want 4GB just for simulation? :-) It should be=20 fine for general case AFAIK On kernel configuration, remember to add kernel native (not module)=20 for ne2k and ne2000: ( you can found them on "Network device=20 support"-->"Etherne 10 or 100 MBit") CONFIG_NE2000=3Dy CONFIG_NE2K_PCI=3Dy I am not sure which one actually needed for Qemu, but adding both=20 won't hurt :-) Feed another option if you think you will need them=20 Do the usual kernel compilation, and move the finished bzImage (i=20 prefer bzImage, it is up to you the pick the final type of kernel),=20 vmlinux and System.map to a directory. if you had modules, we will=20 move them later inside the disk image. Lets assume you move them to=20 /boot/qemu Oh, BTW, it is also a good idea to put tmpfs mounted=20 directory for Qemu's need. here I create 1 Gigabyte tmpfs: mount -t tmpfs -o size=3D1G tmpfs /mytmpfs/ Now, we need to testdrive the kernel. Put following command as shell=20 script : /usr/local/qemu/bin/qemu-fast -hda /mnt/qemu/myimage -hdb /dev/hda=20 =2Dkernel /boot/qemu/bzImage -append "root=3D/dev/hda1=20 ide3=3Dnoprobe ide4=3Dnoprobe ide5=3Dnoprobe" The above script assume that you create root filesystem inside the=20 disk image on first partition, that's why the root parameter is=20 "/dev/hda1". And what is "-hdb /dev/hda"? Well, we need to copy=20 several files from host system, so we need to mount the disk inside=20 the Qemu :-) If your layout is different, again feel free to modify=20 the parameter You get the login prompt? Congratulations ! Now, login and make sure=20 you have following report from "dmesg": NE*000 ethercard probe at 0x300: 52 54 00 12 34 56 eth0: NE2000 found at 0x300, using IRQ 9. This lines indicate that kernel succesfully detect emulated NE2000=20 card. So far I have no problem with the "fake" NE2000, so the only=20 trick is....just make sure you are including NE2000 support.=20 We have moved half-way so far. Now we move the kernel modules....how?=20 by mounting fake "/dev/hdb" inside Qemu. e.g: mount -t ext2 /dev/hdb1 /mnt/host There....you can access the host filesystem, now copy the /lib/modules=20 straight into the disk image. After that, "halt" the guest system and=20 restart qemu using above script. You should find out that now "the=20 missing modules" are loaded successfuly openMosix need user land tools, right? Same like above, transfer the=20 userland tools tarball (use version 0.3.5) inside the guest and do=20 compilation. This way, we make sure that it is compiled against=20 correct gcc/glibc.Oh wait? you need oM kernel headers? Mount the host=20 filesystem and create the soft link from openMosix kernel source=20 toward /usr/src/linux-openmosix and then the compilation will goes=20 smooth I will skip the oM spesific setting, just refer to the HOWTO for how=20 to setup /etc/inittab, setting maps etc. Also remember to setup ip=20 address for eth0 (on Debian, you can turn it on after start up using=20 /etc/networks/interface) Again, shut down the guest system. Now we move to setting up the=20 second node. "What, doing above steps again? You gotta be=20 kidding, right? I need faster way" ! Ok, relax :-) that's why we will=20 create COW (Copy on Write Image). What is it? You can imagine as a way=20 for sharing original disk image between Qemu instances, but each=20 instance keeping its own copy of disk image if they do some=20 modifications inside the original disk image. The original image will=20 be safe..... Lets create two COWs (yes COW, but not cows which produces milk, ok?=20 :-))) ): # qemu-mkcow -f /mnt/qemu/myimage /mnt/qemu/mycow1.cow # qemu-mkcow -f /mnt/qemu/myimage /mnt/qemu/mycow2.cow Not so difficult, right? After that, create script for enabling the=20 TUN/TAP device. "Wait wait....TUN/TAP, why do I need it?" Well,=20 TUN/TAP is virtual device that acting as network bridge between=20 guest system and its host. So, if you don't turn it on, there is=20 no "network connection" between guest and its host here is the example of the script: #!/bin/sh sudo /sbin/ifconfig $1 192.168.1.11 netmask 255.255.255.0 Modify above script for each TUN/TAP of the guests and remember not to=20 assign same IP to other TUN/TAP or guest's IP=20 I suggest to separate the netmask of TUN/TAP device and the interface=20 inside the guest against the netmask of host. I use this trick so I=20 won't mess a lot with host's routing table. For this "Dojo" i use=20 following topology: host (10.1.1.1) / \ / \ / \ 1st TUN/TAP (192.168.1.11) 2nd TUN/TAP (192.168.1.12) | | | | 1st guest (192.168.1.21) 2nd guest (192.168.1.22) Got brighter picture from above diagram? I hope so.....:-) So, back to=20 the TUN script, you should write two script: =46or 1st TUN/TAP: (name it /mnt/qemu/qemu-ifup) #!/bin/sh sudo /sbin/ifconfig $1 192.168.1.11 netmask 255.255.255.0 =46or 2nd TUN/TAP: (name it /mnt/qemu/qemu-ifup2) #!/bin/sh sudo /sbin/ifconfig $1 192.168.1.12 netmask 255.255.255.0 eth0 inside 1st guest: 192.168.21 eth0 inside 2nd guest: 192.168.22 I use above IP numbering so I can quickly remind myself about the=20 topology (x.x.x.x1 is for 1st group, x.x.x.x2 for 2nd group). You=20 don't have to follow my idea :-) because you already have two COWs, modify the qemu start script, so it=20 becomes: (for 1st guest) /usr/local/qemu/bin/qemu-fast -hda /mnt/qemu/mycow1.cow -macaddr=20 52:54:00:12:34:56 -kernel /boot/qemu/bzImage -n ./qemu-ifup =2Dappend "root=3D/dev/hda1 ide3=3Dnoprobe ide4=3Dnoprobe ide5=3Dnoprobe" (for 2nd guest) /usr/local/qemu/bin/qemu-fast -hda /mnt/qemu/mycow2.cow -macaddr=20 52:54:00:12:34:60 -kernel /boot/qemu/bzImage -n ./qemu-ifup2 =2Dappend "root=3D/dev/hda1 ide3=3Dnoprobe ide4=3Dnoprobe ide5=3Dnoprobe" "I am noticing -macaddr switch...Why do we need it?" The answer: if=20 you don't set it explicitly, you will get same mac address for the=20 both guest system...and that would confuse TCP/IP arp resolve=20 mechanism. So, we need to set distinct MAC adress for each of=20 guests. Still confuse on what I am talking about? Go to RFC about ARP=20 or TCP/IP and read about IP to MAC Adress resolution mechanism. Now, fire up both qemu instance and watch them load the openMosix=20 kernel until you got login prompt. back to host system, now we need to=20 setup a bridge connecting these 2 guests "Oh boy, another pain is=20 come....when it will stop? " :))) remember, dojo is the place to=20 practice, not for instant skill like Neo when he got kung fu skill=20 inside Matrix :-) The quote "No pain no gain" must be followed here=20 :_))) OK, back to bridge. You can imagine bridge as "a hub connecting=20 any target network interface" Copy following script to setup bridge=20 between TUN0 and TUN1: (let's name is start-bridge.sh) #!/bin/bash /sbin/modprobe bridge /sbin/route del -net 192.168.1.0 netmask 255.255.255.0 /sbin/route del -net 192.168.1.0 netmask 255.255.255.0 /usr/sbin/brctl addbr br0 /usr/sbin/brctl addif br0 tun0 /usr/sbin/brctl addif br0 tun1 /sbin/ifconfig br0 192.168.1.13 netmask 255.255.255.0 /sbin/ifconfig tun0 0.0.0.0 /sbin/ifconfig tun1 0.0.0.0 Basically, the default kernel on redhat 9 (the one I use as=20 experiment) comes with bridging capability as module (bridge.o) If you=20 don't found one, recompile the kernel and make sure you include this: CONFIG_BRIDGE=3Dm (as module)--> preferred or CONFIG_BRIDGE=3Dy (as native kernel part) You can find them under "Networking options". It is named "802.1d=20 Ethernet Bridging". Why do we need to do "route del"? Well, remember that we previously=20 turn up the TUN/TAP device? On Linux (recently), "ifconfig"=20 automatically setup routing for each new IP address assigned to a=20 device. So, basically we clean them up becaue we don't need them!=20 The next line is about setting up the bridge itself. You need to=20 install bridge-utils RPM (RH 9 includes this tools). If you don't=20 think your distro doesn't include it, go to=20 http://www.math.leidenuniv.nl/~buytenh/bridge and grab the tarball=20 there. Actually, what I am goinf to explain is short version of Bridge=20 Mini Howto, you can find more about bridging on www.tldp.org and=20 search about "bridge". usually many distribution includes this docs. Lets analyze the command /usr/sbin/brctl addbr br0 =2D-> here we create new bridge interface named "br0" /usr/sbin/brctl addif br0 tun0 /usr/sbin/brctl addif br0 tun1 =2D-> here we "bond" the tun0 and tun1 so they were attached "inside"=20 the bridge=20 /sbin/ifconfig br0 192.168.1.13 netmask 255.255.255.0 =2D-> like you know, assign an IP address and netmask for the bridge.=20 You still need to assure that the bridge on same subnet like the=20 guests are..... /sbin/ifconfig tun0 0.0.0.0 /sbin/ifconfig tun1 0.0.0.0 =2D-> easy, just assign 0.0.0.0 IP (but not turn down, i repeat DO NOT=20 turn the TUNs down) for the TUNs :-) The topology becomes host (10.1.1.1/24) / \ / \ / \ T H E B R I D G E (192.168.1.13/24) | | | | 1st TUN/TAP (0.0.0.0) 2nd TUN/TAP (0.0.0.0) | | | | 1st guest (192.168.1.21/24) 2nd guest (192.168.1.22/24) Now try to ping from guest 1 to guest 2 and likewise.....success? Now=20 start the openMosix (just copy the openMosix start/stop script from=20 openMosix userland tarball) and confirm that "mosmon" see all the=20 nodes ! Let me state something before we goes further. Something inside Qemu=20 screw up openMosix auto detection of system's speed, so make sure you=20 include this line on your openMosix startup script mosctl setspeed 15000 feel free to adjust the number, but make sure you set same number=20 across all guest system, if not, you will got weird load levelling=20 mechanism...believe me....:-) It takes 2 days for me just to find out=20 about this "speed" thing when I saw openMosix doesn't load=20 balance my program :-))) After that, try the migration between guests.....this won't be an=20 openMosix cluster if it can't migrate process, right? :-) Just compile=20 simple C program like below:void main() { int a=3D0,b=3D0; for (a=3D0;a<=3D1000000;a++) for (b=3D0;b<=3D1000000;b++) { }; } suppose you name it "silly.c" then compile it as "silly" and run silly=20 in the background (add "&"). 2 instance of "silly" is sufficient for=20 start.. Success? Then congratulations...you have exercised your Chi into=20 highest level :-) have fun with your new virtual Cluster reference: =2D Qemu user documentation and technical documentation =2D openMosix HOWTO =2D Ethernet Bridge mini HOWTO =2D Documentation/networking/tmpfs.txt inside the kernel source=20 directory --Boundary-00=_ZQdsAI7mxACFxYK--