Re: Kernel setup() and initrd problems

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Kernel setup() and initrd problems
@ 2003-03-13  8:42 Oliver Tennert
  2003-03-13 17:16 ` Kai Germaschewski
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Tennert @ 2003-03-13  8:42 UTC (permalink / raw)
  To: linux-kernel

Seems some more have problems, too. It is possibly related.

> pivot_root() is the currently preferred method. Depending on where
> the initramfs is by the time Linux 2.6 comes out it may be replaced by
> then, but for 2.4, pivot_root() is the way to go.

OK, I am pretty aware of the fact that it is the DOCUMENTED way to go. But
can you tell me ONE SINGLE current distribution using the pivot_root()
call in their initrd to mount the realrootdev?

Have a look at the following linuxrc script:

<SHELLSCRIPT>

#! /bin/sh

export PATH=/bin

echo "Loading module sym53c8xx  ..."
insmod sym53c8xx

echo "Loading module jbd  ..."
insmod jbd

echo "Loading module ext3  ..."
insmod ext3

mount -n -t proc proc /proc
echo 0x0100 > /proc/sys/kernel/real-root-dev   ## <<<---- THIS LINE IS IMPORTANT!!
mount -n -t ext3 /dev/sda4 /mnt
rm -f /mnt/.initrd 2>/dev/null
mkdir -p /mnt/.initrd
cd /mnt
pivot_root . .initrd
umount -n /.initrd/proc
exec sh -c 'umount -n /.initrd ; rmdir /.initrd ; mount -n -oremount,ro /' </dev/console >/dev/console 2>&1

</SHELLSCRIPT>

The fact is, without the "echo 0x0100 ..." line this linuxrc script WILL
NOT be able to mount your root device for kernel >=2.4.19. This is
independent of the distribution used.

So why is that?

I always thought the pivot_root() would make this echo-stuff unnecessary.

The way used by virtually all latest distributions is getting rid of the
pivot stuff altogether, leaving the loading of the modules, and that's it.

Seems totally unclear (and unclean) to me.

best regards

Oliver Tennert

		   Dr. Oliver Tennert

  		   +49 -7071 -9457-598

 		   e-mail: O.Tennert@science-computing.de
  		   science + computing AG
  		   Hagellocher Weg 71
   		   D-72070 Tuebingen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel setup() and initrd problems
  2003-03-13  8:42 Kernel setup() and initrd problems Oliver Tennert
@ 2003-03-13 17:16 ` Kai Germaschewski
  2003-03-13 18:05   ` Kevin P. Fleming
  2003-03-14 19:12   ` H. Peter Anvin
  0 siblings, 2 replies; 9+ messages in thread
From: Kai Germaschewski @ 2003-03-13 17:16 UTC (permalink / raw)
  To: Oliver Tennert; +Cc: linux-kernel

On Thu, 13 Mar 2003, Oliver Tennert wrote:

> Seems some more have problems, too. It is possibly related.
> 
> > pivot_root() is the currently preferred method. Depending on where
> > the initramfs is by the time Linux 2.6 comes out it may be replaced by
> > then, but for 2.4, pivot_root() is the way to go.
> 
> OK, I am pretty aware of the fact that it is the DOCUMENTED way to go. But
> can you tell me ONE SINGLE current distribution using the pivot_root()
> call in their initrd to mount the realrootdev?

Well, the script you attached shows one distro which does:

> [...]
> echo 0x0100 > /proc/sys/kernel/real-root-dev   ## <<<---- THIS LINE IS IMPORTANT!!
> mount -n -t ext3 /dev/sda4 /mnt
> rm -f /mnt/.initrd 2>/dev/null
> mkdir -p /mnt/.initrd
> cd /mnt
> pivot_root . .initrd
> umount -n /.initrd/proc
> exec sh -c 'umount -n /.initrd ; rmdir /.initrd ; mount -n -oremount,ro /' </dev/console >/dev/console 2>&1
> 
> </SHELLSCRIPT>
> 
> The fact is, without the "echo 0x0100 ..." line this linuxrc script WILL
> NOT be able to mount your root device for kernel >=2.4.19. This is
> independent of the distribution used.
> 
> So why is that?
> 
> I always thought the pivot_root() would make this echo-stuff unnecessary.
> 
> The way used by virtually all latest distributions is getting rid of the
> pivot stuff altogether, leaving the loading of the modules, and that's it.
> 
> Seems totally unclear (and unclean) to me.

I agree it's a mess and for all I can tell pivot_root is not used in the 
way it was originally designed.

Since I cleaned up the initrd stuff in 2.5 lately, I can at least explain
what's going on:

The kernel finds an initrd, loads that into /dev/ram, mounts that as root 
and then basically fork()s and execve()s /linuxrc. So we're still using 
the special initrd code, and the /linuxrc which is running has a pid != 1. 
At this point, we can do preparations like loading modules and mounting 
filesystems. As shown by the script above, we can also change the root 
filesystem. However, we can not finish the script by just exec'ing 
/sbin/init in the new root, since we're not pid 1. So instead, we need to 
exit from the script (actually, the above first exec's and then the new 
shells exits soon after).

Now kernel code takes control again, the "after-initrd-code" is run. 
Traditionally, that means we now mount the real root device, free the 
initrd mem (and also all __init mem a bit alter), and then execve() 
/sbin/init, which then gets run as PID 1, and normal startup begins.

However, in the case above, we have already mounted root device, so we 
don't want the kernel to mess with it. So we do echo 0x0100 > 
/proc/real-root-dev, which tells the kernel that what he thinks is our 
current root, /dev/ram, is the real root too, so it skips unmounting 
the initrd root and mounting the real one.

I think whoever came up with that just got the idea of pivot_root wrong. 
The idea was to get rid of the initrd special case. It should be possible 
to do the following, though I didn't work out the details: 

Tell the kernel that our root dev is /dev/ram and give it an initrd which 
isn't really a classical initrd (with /linuxrc on it), but instead has a 
/sbin/init which is similar to the linuxrc above.

Then, the kernel will load the image into /dev/ram, mount that as root and 
exec /sbin/init, skipping the special initrd code.

Now, we have to take care of all the remaining business in /sbin/init 
ourselves, i.e.

- load modules
- mount real root
- pivot root to real root
- execve /sbin/init on real root, pointing stdin/out/err to /dev/console 
  on the new root
- umount and free our first (ramdisk) root

--Kai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel setup() and initrd problems
  2003-03-13 17:16 ` Kai Germaschewski
@ 2003-03-13 18:05   ` Kevin P. Fleming
  2003-03-14 19:12   ` H. Peter Anvin
  1 sibling, 0 replies; 9+ messages in thread
From: Kevin P. Fleming @ 2003-03-13 18:05 UTC (permalink / raw)
  To: Kai Germaschewski; +Cc: Oliver Tennert, linux-kernel

Kai Germaschewski wrote:
> I think whoever came up with that just got the idea of pivot_root wrong. 
> The idea was to get rid of the initrd special case. It should be possible 
> to do the following, though I didn't work out the details: 
> 
> Tell the kernel that our root dev is /dev/ram and give it an initrd which 
> isn't really a classical initrd (with /linuxrc on it), but instead has a 
> /sbin/init which is similar to the linuxrc above.
> 
> Then, the kernel will load the image into /dev/ram, mount that as root and 
> exec /sbin/init, skipping the special initrd code.
> 
> Now, we have to take care of all the remaining business in /sbin/init 
> ourselves, i.e.
> 
> - load modules
> - mount real root
> - pivot root to real root
> - execve /sbin/init on real root, pointing stdin/out/err to /dev/console 
>   on the new root
> - umount and free our first (ramdisk) root

I have used exactly this process, and it works as you expect. In this 
situation you're not really using the initrd as a "classic" initrd, it's 
just a temporary root filesystem. The kernel has no idea what the real 
root is going to be, and that determination isn't made until the 
initrd's scripts decide what to mount and then pivot_root to it.

Much cleaner than the old way.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel setup() and initrd problems
  2003-03-13 17:16 ` Kai Germaschewski
  2003-03-13 18:05   ` Kevin P. Fleming
@ 2003-03-14 19:12   ` H. Peter Anvin
  2003-03-14 19:27     ` Chris Friesen
  1 sibling, 1 reply; 9+ messages in thread
From: H. Peter Anvin @ 2003-03-14 19:12 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <Pine.LNX.4.44.0303131051160.7342-100000@chaos.physics.uiowa.edu>
By author:    Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>
In newsgroup: linux.dev.kernel
> 
> I think whoever came up with that just got the idea of pivot_root wrong. 
> The idea was to get rid of the initrd special case. It should be possible 
> to do the following, though I didn't work out the details: 
> 
> Tell the kernel that our root dev is /dev/ram and give it an initrd which 
> isn't really a classical initrd (with /linuxrc on it), but instead has a 
> /sbin/init which is similar to the linuxrc above.
> 

It *is* possible, but you need to pass "root=/dev/ram0" to the kernel,
for backwards compatibility reasons.  That will incidentally make it
run /sbin/init, not /linuxrc, unless you pass init=/linuxrc as well.

See SuperRescue for an example of working use of pivot_root.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel setup() and initrd problems
  2003-03-14 19:12   ` H. Peter Anvin
@ 2003-03-14 19:27     ` Chris Friesen
  2003-03-14 19:43       ` H. Peter Anvin
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Friesen @ 2003-03-14 19:27 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, kai

H. Peter Anvin wrote:
> Followup to:  <Pine.LNX.4.44.0303131051160.7342-100000@chaos.physics.uiowa.edu>
> By author:    Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>

>>I think whoever came up with that just got the idea of pivot_root wrong. 
>>The idea was to get rid of the initrd special case. It should be possible 
>>to do the following, though I didn't work out the details: 
>>
>>Tell the kernel that our root dev is /dev/ram and give it an initrd which 
>>isn't really a classical initrd (with /linuxrc on it), but instead has a 
>>/sbin/init which is similar to the linuxrc above.

> It *is* possible, but you need to pass "root=/dev/ram0" to the kernel,
> for backwards compatibility reasons.  That will incidentally make it
> run /sbin/init, not /linuxrc, unless you pass init=/linuxrc as well.


Below is the script that I used to pivot from a standard ramdisk (for with
the infrastructure is already in place in our build environment) to a tmpfs
filesystem.  This requires no changes to the boot args.

This script runs as /sbin/init, sets up the tmpfs filesystem, pivots, and
hands off control to the real init program.

One interesting bit is the rework of fstab so that mount and df show the
root filesystem as tmpfs.  freeramdisk simply tells the kernel that its
okay to give up the space used by the ramdisk.

This seems to work fine, though it isn't actually in production yet, just
a private prototype.

Chris






#!/bin/bash
# Set up tmpfs filesystem as root and pivot into it.

echo "Setting up tmpfs..."

#mount the tmpfs filesystem
mount -t tmpfs tmpfs mnt -o size=37M

#copy the initrd into the tmpfs filesystem
cp -a `ls -1 | grep -v mnt | grep -v proc | grep -v lost+found` mnt

#change dirs and make some directories
cd mnt
mkdir proc mnt

#pivot the filesystems
/sbin/pivot_root . mnt

#set up the /etc/fstab file to have / listed as tmpfs
#this will be used instead of the ramdisk line
echo "tmpfs             /               tmpfs   rw,size=37M     0 0" > etc/fstab.new

#grab the rest of /etc/fstab after the first entry since we want to keep the info
tail -n +2 etc/fstab >> etc/fstab.new
mv etc/fstab.new etc/fstab

# remove this script, move the real init to the right place, and run it
mv /sbin/init.orig /sbin/init

#unmount the original ramdisk, free the memory, and run the real init
exec /sbin/chroot . sh -c 'umount /mnt; /sbin/freeramdisk; exec /sbin/init' <dev/console >dev/console 2>&1


-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel setup() and initrd problems
  2003-03-14 19:27     ` Chris Friesen
@ 2003-03-14 19:43       ` H. Peter Anvin
  2003-03-14 20:04         ` Chris Friesen
  0 siblings, 1 reply; 9+ messages in thread
From: H. Peter Anvin @ 2003-03-14 19:43 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel, kai

Chris Friesen wrote:
> 
> Below is the script that I used to pivot from a standard ramdisk (for with
> the infrastructure is already in place in our build environment) to a tmpfs
> filesystem.  This requires no changes to the boot args.
> 
> This script runs as /sbin/init, sets up the tmpfs filesystem, pivots, and
> hands off control to the real init program.
> 

... which means that you either have boot args or rdev so that /dev/ram0 
is the root filesystem (or it wouldn't work.)

	-hpa



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel setup() and initrd problems
  2003-03-14 19:43       ` H. Peter Anvin
@ 2003-03-14 20:04         ` Chris Friesen
  2003-03-14 20:42           ` H. Peter Anvin
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Friesen @ 2003-03-14 20:04 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, kai

H. Peter Anvin wrote:
> Chris Friesen wrote:
> 
>>
>> Below is the script that I used to pivot from a standard ramdisk (for 
>> with
>> the infrastructure is already in place in our build environment) to a 
>> tmpfs
>> filesystem.  This requires no changes to the boot args.

> ... which means that you either have boot args or rdev so that /dev/ram0 
> is the root filesystem (or it wouldn't work.)

Yes, but after the pivot, /dev/ram0 isn't the real filesytem, its tmpfs
mounted at /.  Isn't that what the original poster was talking about,
where the root on the final running system is not the same as what the
machine was booted with?

Maybe I'm just confused.


Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel setup() and initrd problems
  2003-03-14 20:04         ` Chris Friesen
@ 2003-03-14 20:42           ` H. Peter Anvin
  2003-03-14 20:53             ` Chris Friesen
  0 siblings, 1 reply; 9+ messages in thread
From: H. Peter Anvin @ 2003-03-14 20:42 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel, kai

Chris Friesen wrote:
> H. Peter Anvin wrote:
> 
>> Chris Friesen wrote:
>>
>>>
>>> Below is the script that I used to pivot from a standard ramdisk (for
>>> with
>>> the infrastructure is already in place in our build environment) to a
>>> tmpfs
>>> filesystem.  This requires no changes to the boot args.
> 
> 
>> ... which means that you either have boot args or rdev so that
>> /dev/ram0 is the root filesystem (or it wouldn't work.)
> 
> 
> Yes, but after the pivot, /dev/ram0 isn't the real filesytem, its tmpfs
> mounted at /.  Isn't that what the original poster was talking about,
> where the root on the final running system is not the same as what the
> machine was booted with?
> 
> Maybe I'm just confused.
> 

I think so.

The fundamental problem is that the original initrd protocol considered
the initrd to be something different than "a real root", and its init
(linuxrc) to be something different than "a real init."

With pivot_root, all of that is historical baggage, and worse - it gets
in the way.

The way to get around the historical baggage is to tell the kernel that
the initrd is a "permanent" initrd by using the "root=/dev/ram0"
command-line option.  This has the side effect of bypassing all the
initrd historical crap and instead spawning /sbin/init using PID 1, like
any other system would do.  Now you can just pivot and "exec /sbin/init"
like you should be.

Of course, after the pivot_root, the root is something completely
different than the root= command-line option states, but that's
irrelevant.  The command-line option is there to disable the initrd
historical garbage, not for any other purpose.

	-hpa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel setup() and initrd problems
  2003-03-14 20:42           ` H. Peter Anvin
@ 2003-03-14 20:53             ` Chris Friesen
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Friesen @ 2003-03-14 20:53 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, kai

H. Peter Anvin wrote:
> Chris Friesen wrote:

>>Maybe I'm just confused.

> I think so.
>
> The way to get around the historical baggage is to tell the kernel that
> the initrd is a "permanent" initrd by using the "root=/dev/ram0"
> command-line option.  This has the side effect of bypassing all the
> initrd historical crap and instead spawning /sbin/init using PID 1, like
> any other system would do.  Now you can just pivot and "exec /sbin/init"
> like you should be.

Thanks for that excellent explanation.

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-03-14 20:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-13  8:42 Kernel setup() and initrd problems Oliver Tennert
2003-03-13 17:16 ` Kai Germaschewski
2003-03-13 18:05   ` Kevin P. Fleming
2003-03-14 19:12   ` H. Peter Anvin
2003-03-14 19:27     ` Chris Friesen
2003-03-14 19:43       ` H. Peter Anvin
2003-03-14 20:04         ` Chris Friesen
2003-03-14 20:42           ` H. Peter Anvin
2003-03-14 20:53             ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox