linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Doug Ledford <dledford@redhat.com>
Cc: Neil Brown <neilb@suse.de>, David Greaves <david@dgreaves.com>,
	Jeff Garzik <jeff@garzik.org>, John Stoffel <john@stoffel.org>,
	Justin Piszcz <jpiszcz@lucidpixels.com>,
	linux-raid@vger.kernel.org
Subject: Re: Time to  deprecate old RAID formats?
Date: Sun, 28 Oct 2007 20:44:06 -0400	[thread overview]
Message-ID: <47252CD6.9010804@tmr.com> (raw)
In-Reply-To: <1193530713.10336.389.camel@firewall.xsintricity.com>

Doug Ledford wrote:
> On Sat, 2007-10-27 at 11:20 -0400, Bill Davidsen wrote:
>   
>>> * When using lilo to boot from a raid device, it automatically installs
>>> itself to the mbr, not to the partition.  This can not be changed.  Only
>>> 0.90 and 1.0 superblock types are supported because lilo doesn't
>>> understand the offset to the beginning of the fs otherwise.
>>>   
>>>       
>> I'm reasonably sure that's wrong, I used to set up dual boot machines by 
>> putting LILO in the partition and making that the boot partition, by 
>> changing the active partition flag I could just have the machine boot 
>> Windows, to keep people from getting confused.
>>     
>
> Yeah, someone else pointed this out too.  The original patch to lilo
> *did* do as I suggest, so they must have improved on the patch later.
>
>   
>>> * When using grub to boot from a raid device, only 0.90 and 1.0
>>> superblocks are supported[1] (because grub is ignorant of the raid and
>>> it requires the fs to start at the start of the partition).  You can use
>>> either MBR or partition based installs of grub.  However, partition
>>> based installs require that all bootable partitions be in exactly the
>>> same logical block address across all devices.  This limitation can be
>>> an extremely hazardous limitation in the event a drive dies and you have
>>> to replace it with a new drive as newer drives may not share the older
>>> drive's geometry and will require starting your boot partition in an odd
>>> location to make the logical block addresses match.
>>>
>>> * When using grub2, there is supposedly already support for raid/lvm
>>> devices.  However, I do not know if this includes version 1.0, 1.1, or
>>> 1.2 superblocks.  I intend to find that out today.  If you tell grub2 to
>>> install to an md device, it searches out all constituent devices and
>>> installs to the MBR on each device[2].  This can't be changed (at least
>>> right now, probably not ever though).
>>>   
>>>       
>> That sounds like a good reason to avoid grub2, frankly. Software which 
>> decides that it knows what to do better than the user isn't my 
>> preference. If I wanted software which fores me to do things "their way" 
>> I'd be running Windows.
>>     
>
> It's not really all that unreasonable of a restriction.  Most people
> aren't aware than when you put a boot sector at the beginning of a
> partition, you only have 512 bytes of space, so the boot loader that you
> put there is basically nothing more than code to read the remainder of
> the boot loader from the file system space.  Now, traditionally, most
> boot loaders have had to hard code the block addresses of certain key
> components into these second stage boot loaders.  If a user isn't aware
> of the fact that the boot loader does this at install time (or at kernel
> selection update time in the case of lilo), then they aren't aware that
> the files must reside at exactly the same logical block address on all
> devices.  Without that knowledge, they can easily create an unbootable
> setup by having the various boot partitions in slightly different
> locations on the disks.  And intelligent partition editors like parted
> can compound the problem because as they insulate the user from having
> to pick which partition number is used for what partition, etc., they
> can end up placing the various boot partitions in different areas of
> different drives.  The requirement above is a means of making sure that
> users aren't surprise by a non-working setup.  The whole element of
> least surprise thing.  Of course, if they keep that requirement, then I
> would expect it to be well documented so that people know this going
> into putting the boot loader in place, but I would argue that this is at
> least better than finding out when a drive dies that your system isn't
> bootable.
>
>   
>>> So, given the above situations, really, superblock format 1.2 is likely
>>> to never be needed.  None of the shipping boot loaders work with 1.2
>>> regardless, and the boot loader under development won't install to the
>>> partition in the event of an md device and therefore doesn't need that
>>> 4k buffer that 1.2 provides.
>>>   
>>>       
>> Sounds right, although it may have other uses for clever people.
>>     
>>> [1] Grub won't work with either 1.1 or 1.2 superblocks at the moment.  A
>>> person could probably hack it to work, but since grub development has
>>> stopped in preference to the still under development grub2, they won't
>>> take the patches upstream unless they are bug fixes, not new features.
>>>   
>>>       
>> If the patches were available, "doesn't work with existing raid formats" 
>> would probably qualify as a bug.
>>     
>
> Possibly.  I'm a bit overbooked on other work at the moment, but I may
> try to squeeze in some work on grub/grub2 to support version 1.1 or 1.2
> superblocks.
>
>   
>>> [2] There are two ways to install to a master boot record.  The first is
>>> to use the first 512 bytes *only* and hardcode the location of the
>>> remainder of the boot loader into those 512 bytes.  The second way is to
>>> use the free space between the MBR and the start of the first partition
>>> to embed the remainder of the boot loader.  When you point grub2 at an
>>> md device, they automatically only use the second method of boot loader
>>> installation.  This gives them the freedom to be able to modify the
>>> second stage boot loader on a boot disk by boot disk basis.  The
>>> downside to this is that they need lots of room after the MBR and before
>>> the first partition in order to put their core.img file in place.  I
>>> *think*, and I'll know for sure later today, that the core.img file is
>>> generated during grub install from the list of optional modules you
>>> specify during setup.  Eg., the pc module gives partition table support,
>>> the lvm module lvm support, etc.  You list the modules you need, and
>>> grub then builds a core.img out of all those modules.  The normal amount
>>> of space between the MBR and the first partition is (sectors_per_track -
>>> 1).  For standard disk geometries, that basically leaves 254 sectors, or
>>> 127k of space.  This might not be enough for your particular needs if
>>> you have a complex boot environment.  In that case, you would need to
>>> bump at least the starting track of your first partition to make room
>>> for your boot loader.  Unfortunately, how is a person to know how much
>>> room their setup needs until after they've installed and it's too late
>>> to bump the partition table start?  They can't.  So, that's another
>>> thing I think I will check out today, what the maximum size of grub2
>>> might be with all modules included, and what a common size might be.
>>>
>>>   
>>>       
>> Based on your description, it sounds as if grub2 may not have given 
>> adequate thought to what users other than the authors might need (that 
>> may be a premature conclusion). I have multiple installs on several of 
>> my machines, and I assume that the grub2 for 32 and 64 bit will be 
>> different. Thanks for the research.
>>     
>
> No, not really.  The grub command on the two is different, but they
> actually build the boot sector out of 16 bit non-protected mode code,
> just like DOS.  So either one would build the same boot sector given the
> same config.  And you can always use the same trick I've used in the
> past of creating a large /boot partition (say 250MB) and using that same
> partition as /boot in all of your installs.  Then they share a single
> grub config (while the grub binaries are in the individual / partitions)
> and from the single grub instance you can boot to any of the installs,
> as well as a kernel update in any install updates that global grub
> config.  The other option is to use separate /boot partitions and chain
> load the grub instances, but I find that clunky in comparison.  Of
>   

I just copy a stanza of the 64 bit grub file into the 32 bit grub file, 
and that seems to work okay, the 32 bit boot mounts /mnt/boot64, and the 
64 bit boot mounts /mnt/boot64 so I can just copy the data. I confess 
that the 64 bit stuff has little use recently, nothing I'm doing runs 
appreciably faster, and I know the 32 bit code is more used and 
therefore likely to be better debugged. Note "likely" in that. ;-)

> course, in my case I also made /lib/modules its own partition and also
> shared it between all the installs so that I could manually edit the
> various kernel boot params to specify different root partitions and in
> so doing I could boot a RHEL5 kernel using a RHEL4 install and vice
> versa.  But if you do that, you have to manually
> patch /etc/rc.d/rc.sysinit to mount the /lib/modules partition before
> ever trying to do anything with modules (and you have to mount it rw so
> they can do a depmod if needed), then remount it ro for the fsck, then
> it gets remounted rw again after the fs check.  It was a pain in the ass
> to maintain because every update to initscripts would wipe out the patch
> and if you forgot to repatch the file, the system wouldn't boot and
> you'd have to boot into another install, mount the / partition of the
> broken install, patch the file, then it would work again in that
> install.
>
>   
That sounds like *way* more complexity than appeals to me. I stand in 
awe, but have no urge to join you.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


  reply	other threads:[~2007-10-29  0:44 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-19 14:34 Time to deprecate old RAID formats? John Stoffel
2007-10-19 15:09 ` Justin Piszcz
2007-10-19 15:46   ` John Stoffel
2007-10-19 16:15     ` Doug Ledford
2007-10-19 16:35       ` Justin Piszcz
2007-10-19 16:38       ` John Stoffel
2007-10-19 16:40         ` Justin Piszcz
2007-10-19 16:44           ` John Stoffel
2007-10-19 16:45             ` Justin Piszcz
2007-10-19 17:04               ` Doug Ledford
2007-10-19 17:05                 ` Justin Piszcz
2007-10-19 17:23                   ` Doug Ledford
2007-10-19 17:47                     ` Justin Piszcz
2007-10-20 18:38                       ` Michael Tokarev
2007-10-20 20:02                         ` Doug Ledford
2007-10-19 22:43                     ` chunk size (was Re: Time to deprecate old RAID formats?) Michal Soltys
2007-10-20 13:29                       ` Doug Ledford
2007-10-23 19:21                         ` Michal Soltys
2007-10-24  0:14                           ` Doug Ledford
2007-10-19 17:11         ` Time to deprecate old RAID formats? Doug Ledford
2007-10-19 18:39           ` John Stoffel
2007-10-19 21:23             ` Iustin Pop
2007-10-19 21:42               ` Doug Ledford
2007-10-20  7:53                 ` Iustin Pop
2007-10-20 13:11                   ` Doug Ledford
2007-10-26  9:54                     ` Luca Berra
2007-10-26 16:22                       ` Gabor Gombas
2007-10-26 17:06                         ` Gabor Gombas
2007-10-27 10:34                           ` Luca Berra
2007-10-26 18:52                       ` Doug Ledford
2007-10-26 22:30                         ` Gabor Gombas
2007-10-28  0:26                           ` Doug Ledford
2007-10-28 14:13                             ` Luca Berra
2007-10-28 17:47                               ` Doug Ledford
2007-10-29  8:41                                 ` Luca Berra
2007-10-29 15:30                                   ` Doug Ledford
2007-10-29 21:44                                     ` Luca Berra
2007-10-29 23:05                                       ` Doug Ledford
2007-10-30  3:10                                         ` Neil Brown
2007-10-30  6:55                                         ` Luca Berra
2007-10-30 16:48                                           ` Doug Ledford
2007-10-27  8:00                         ` Luca Berra
2007-10-27 20:09                           ` Doug Ledford
2007-10-28 13:46                             ` Luca Berra
2007-10-23 23:09                 ` Bill Davidsen
2007-10-23 23:03             ` Bill Davidsen
2007-10-24  0:09               ` Doug Ledford
2007-10-24 23:55                 ` Neil Brown
2007-10-25  0:09                   ` Jeff Garzik
2007-10-25  8:09                     ` David Greaves
2007-10-26  6:16                       ` Neil Brown
2007-10-26 14:18                         ` Bill Davidsen
2007-10-26 18:41                           ` Doug Ledford
2007-10-26 22:20                             ` Gabor Gombas
2007-10-26 22:58                               ` Doug Ledford
2007-10-27 11:11                               ` Luca Berra
2007-10-27 15:20                             ` Bill Davidsen
2007-10-28  0:18                               ` Doug Ledford
2007-10-29  0:44                                 ` Bill Davidsen [this message]
2007-10-27 21:11                             ` Doug Ledford
2007-10-29  0:48                               ` Bill Davidsen
2007-10-30  3:25                           ` Neil Brown
2007-11-02 12:31                             ` Bill Davidsen
2007-10-25  7:01                   ` Doug Ledford
2007-10-25 14:49                   ` Bill Davidsen
2007-10-25 15:00                     ` David Greaves
2007-10-26  5:56                     ` Neil Brown
2007-10-24 14:00               ` John Stoffel
2007-10-24 15:18                 ` Mike Snitzer
2007-10-24 15:32                 ` Bill Davidsen
2007-10-20 14:09       ` Michael Tokarev
2007-10-20 14:24         ` Doug Ledford
2007-10-20 14:52         ` John Stoffel
2007-10-20 15:07           ` Iustin Pop
2007-10-20 15:36             ` Doug Ledford
2007-10-20 18:24           ` Michael Tokarev
2007-10-22 20:39             ` John Stoffel
2007-10-22 22:29               ` Michael Tokarev
2007-10-24  0:42               ` Doug Ledford
2007-10-24  9:40                 ` David Greaves
2007-10-24 20:22                 ` Bill Davidsen
2007-10-25 16:29                   ` Doug Ledford
2007-11-01 21:02                 ` H. Peter Anvin
2007-11-02 15:50                   ` Doug Ledford
2007-10-24  0:36             ` Doug Ledford
2007-10-23 23:18           ` Bill Davidsen
2007-10-19 16:34     ` Justin Piszcz
2007-10-23 23:19       ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47252CD6.9010804@tmr.com \
    --to=davidsen@tmr.com \
    --cc=david@dgreaves.com \
    --cc=dledford@redhat.com \
    --cc=jeff@garzik.org \
    --cc=john@stoffel.org \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).