All of lore.kernel.org
 help / color / mirror / Atom feed
From: Felix Zielcke <fzielcke@z-51.de>
To: The development of GRUB 2 <grub-devel@gnu.org>
Subject: Re: [PATCH] show an error instead of segfaulting on grub-probe -t partmap on a unsynced raid
Date: Sat, 02 Aug 2008 21:00:33 +0200	[thread overview]
Message-ID: <1217703633.3488.58.camel@fz.local> (raw)
In-Reply-To: <1217604123.4354.18.camel@fz-deb.local>

Now with trying if raid5 makes problem I looked further into this whole
raid stuff in grub.

Am Freitag, den 01.08.2008, 17:22 +0200 schrieb Felix Zielcke:
> Am Freitag, den 01.08.2008, 15:36 +0200 schrieb Robert Millan:
> 
> > The existing code is confusing.  What is total_devs for?  If we need to
> > iterate up to 32, sounds like this variable is pointless?
> > 
> > If it's useless, it should be removed (but maybe it isn't!).
> > 
> 
> total_devs is a fast check if the array is readable.

total_devs is the number of disks according to the superblock
nr_devs is the number grub found.

The value from the superblock is needed.
nr_devs could be removed but as I already said above it's faster with
it.
If it doestn't exist you have to iterate over all 32 devices and then
check the found number of disks with total_devs to find out if the array
is readable
(grub_is_array_readable in disk/raid.c)

And the raid5 has a check, if one disk in the array is missing and then
an error occurs while reading from a different one then stops reading
further from the whole array.

So I would suggest to just keep both, as long as you don't need the
sizeof(unsigned int) this would save.
But with the super 0.90 format both could be just an unsigned char, it
should never happen that we have more then 28 devices which 0.90
supports and the current code just assumes 32 so there'll be problems
anyway.

> ...

To make that a bit more clear:
mdadm allocates the disk number stored in superblock begining with 0
If you remove the disk with number 0 then the others still keep there
number.
If you add it back, then it gets while it's syncing the first highest
disk number which is free.
So if you make a raid1 with 2 disks then the numbers are 0 and 1
if you remove the one with has 0 and then add it back then it becomes 2 
If it's fully synced then it get's changed back to 0

And the problem in grub is that the disk number in superblock is used
for the array->device[] so there can be holes with missing disks.



I thought about changing grub_raid_scan_device so the disks get added to
a temporary array with the superblock number just as it's currently done
and then copy from that one to the used array->devices[] skipping all
invalid entrys.

This would be fine for Raid 0 and 1
But there's Raid 5 :(

On Raid 5 one disk can be missing and everything can be still read it's
just slower.
The problem is we need every disk which is in the array and not just the
working ones to calculate from which disk shall be read.

So the Raid 1 case can be fixed fine with my previous patch which
shouldn't slow down that much
All 32 devices are just checked on grub-probe
real grub stops when it found a working one which should be for the most
users always a very low number

Maybe I can find a solution for this Raid 5 problem, but I doubt.
So we should probable warn users if their /boot is on a raid 5

Raid 5 worked for me fine with qemu just using 2 of the 3 disk images,
when it was fully synced, the order of them didn't even matter.
But if I removed a disk with mdadm then grub failed with the other 2
used.




      reply	other threads:[~2008-08-02 19:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-28 23:05 grub-probe crashes on a linux softraid 1 with one disk beginning to sync Felix Zielcke
2008-07-29 20:20 ` [PATCH] show an error instead of segfaulting on grub-probe -t partmap on a unsynced raid Felix Zielcke
2008-07-30 10:37   ` Robert Millan
2008-07-30 11:39     ` Felix Zielcke
2008-07-30 15:11       ` Felix Zielcke
2008-08-01 13:36         ` Robert Millan
2008-08-01 15:22           ` Felix Zielcke
2008-08-02 19:00             ` Felix Zielcke [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1217703633.3488.58.camel@fz.local \
    --to=fzielcke@z-51.de \
    --cc=grub-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.