Linux MD Raid Bug(?) w/Kernel sync_speed

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Linux MD Raid Bug(?) w/Kernel sync_speed_min Option
@ 2007-05-08 12:27 Justin Piszcz
  2007-05-08 13:03 ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Justin Piszcz @ 2007-05-08 12:27 UTC (permalink / raw)
  To: linux-raid

Kernel: 2.6.21.1

Here is the bug:

md2: RAID1 (works fine)
md3: RAID5 (only syncs at the sync_speed_min set by the kernel)

If I do not run this command:
echo 55000 > /sys/block/md3/md/sync_speed_min

I will get 2 megabytes per second check speed for RAID 5.

However, the odd part is I can leave it the default for RAID1 and it will 
use the maximum IO available between both drives to run the check.

I think there is some kind of bug, essentially with RAID5 check's-- it 
only runs at the minimum value set (default in the kernel for raid5 is 
~2mb/s).

md2 : active raid1 sdb3[1] sda3[0]
       55681216 blocks [2/2] [UU]
       [===========>.........]  check = 59.1% (32937536/55681216) 
finish=7.4min speed=50947K/sec

md3 : active raid5 sdl1[9] sdk1[8] sdj1[7] sdi1[6] sdh1[5] sdg1[4] sdf1[3] 
sde1[
2] sdd1[1] sdc1[0]
       1318686336 blocks level 5, 128k chunk, algorithm 2 [10/10] 
[UUUUUUUUUU]
       [====>................]  check = 24.2% (35578816/146520704) 
finish=33.3min speed=55464K/sec

Set to default kernel settings, either 2000 or 2100:

echo 2000 > /sys/block/md3/md/sync_speed_min

Then,

md3 : active raid5 sdl1[9] sdk1[8] sdj1[7] sdi1[6] sdh1[5] sdg1[4] sdf1[3] 
sde1[
2] sdd1[1] sdc1[0]
       1318686336 blocks level 5, 128k chunk, algorithm 2 [10/10] 
[UUUUUUUUUU]
       [======>..............]  check = 31.5% (46191744/146520704) 
finish=715.7min speed=2335K/sec

There is some kind of nasty bug going on here with RAID 5 devices in the 
kernel.  Also, incase you wondered, there is little to no I/O on the RAID 
5 device when this check is being run, same for the root volume.

Justin.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux MD Raid Bug(?) w/Kernel sync_speed_min Option
  2007-05-08 12:27 Linux MD Raid Bug(?) w/Kernel sync_speed_min Option Justin Piszcz
@ 2007-05-08 13:03 ` Neil Brown
  2007-05-08 13:13   ` Justin Piszcz
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Neil Brown @ 2007-05-08 13:03 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

On Tuesday May 8, jpiszcz@lucidpixels.com wrote:
> Kernel: 2.6.21.1
> 
> Here is the bug:
> 
> md2: RAID1 (works fine)
> md3: RAID5 (only syncs at the sync_speed_min set by the kernel)
> 
> If I do not run this command:
> echo 55000 > /sys/block/md3/md/sync_speed_min
> 
> I will get 2 megabytes per second check speed for RAID 5.

I can only reproduce this if I set the stripe_cache_size somewhat
larger than the default of 256 - did you do this?

This code (is_mddev_idle) has always been a bit fragile, particularly
so since the block layer started account IO when it finished rather
than when it started.

This patch might help though.  Let me know if it does what you expect.

Thanks,
NeilBrown


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2007-05-07 17:47:15.000000000 +1000
+++ ./drivers/md/md.c	2007-05-08 22:57:51.000000000 +1000
@@ -5095,7 +5095,7 @@ static int is_mddev_idle(mddev_t *mddev)
 		 *
 		 * Note: the following is an unsigned comparison.
 		 */
-		if ((curr_events - rdev->last_events + 4096) > 8192) {
+		if ((long)curr_events - (long)rdev->last_events > 8192) {
 			rdev->last_events = curr_events;
 			idle = 0;
 		}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux MD Raid Bug(?) w/Kernel sync_speed_min Option
  2007-05-08 13:03 ` Neil Brown
@ 2007-05-08 13:13   ` Justin Piszcz
  2007-05-08 13:24   ` Justin Piszcz
  2007-05-08 17:24   ` Recovery of software RAID5 using FC6 rescue? Mark A. O'Neil
  2 siblings, 0 replies; 11+ messages in thread
From: Justin Piszcz @ 2007-05-08 13:13 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid



On Tue, 8 May 2007, Neil Brown wrote:

> On Tuesday May 8, jpiszcz@lucidpixels.com wrote:
>> Kernel: 2.6.21.1
>>
>> Here is the bug:
>>
>> md2: RAID1 (works fine)
>> md3: RAID5 (only syncs at the sync_speed_min set by the kernel)
>>
>> If I do not run this command:
>> echo 55000 > /sys/block/md3/md/sync_speed_min
>>
>> I will get 2 megabytes per second check speed for RAID 5.
>
> I can only reproduce this if I set the stripe_cache_size somewhat
> larger than the default of 256 - did you do this?
>
> This code (is_mddev_idle) has always been a bit fragile, particularly
> so since the block layer started account IO when it finished rather
> than when it started.
>
> This patch might help though.  Let me know if it does what you expect.
>
> Thanks,
> NeilBrown
>
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
> ./drivers/md/md.c |    2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff .prev/drivers/md/md.c ./drivers/md/md.c
> --- .prev/drivers/md/md.c	2007-05-07 17:47:15.000000000 +1000
> +++ ./drivers/md/md.c	2007-05-08 22:57:51.000000000 +1000
> @@ -5095,7 +5095,7 @@ static int is_mddev_idle(mddev_t *mddev)
> 		 *
> 		 * Note: the following is an unsigned comparison.
> 		 */
> -		if ((curr_events - rdev->last_events + 4096) > 8192) {
> +		if ((long)curr_events - (long)rdev->last_events > 8192) {
> 			rdev->last_events = curr_events;
> 			idle = 0;
> 		}
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

> I can only reproduce this if I set the stripe_cache_size somewhat
> larger than the default of 256 - did you do this?
Yes, upon bootup I use:
echo 16384 > /sys/block/md3/md/stripe_cache_size

I have applied this patch and will test it now.

Justin.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux MD Raid Bug(?) w/Kernel sync_speed_min Option
  2007-05-08 13:03 ` Neil Brown
  2007-05-08 13:13   ` Justin Piszcz
@ 2007-05-08 13:24   ` Justin Piszcz
  2007-05-09  9:13     ` Neil Brown
  2007-05-08 17:24   ` Recovery of software RAID5 using FC6 rescue? Mark A. O'Neil
  2 siblings, 1 reply; 11+ messages in thread
From: Justin Piszcz @ 2007-05-08 13:24 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Tue, 8 May 2007, Neil Brown wrote:

> On Tuesday May 8, jpiszcz@lucidpixels.com wrote:
>
> This patch might help though.  Let me know if it does what you expect.
>
> Thanks,
> NeilBrown
>
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
> ./drivers/md/md.c |    2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff .prev/drivers/md/md.c ./drivers/md/md.c
> --- .prev/drivers/md/md.c	2007-05-07 17:47:15.000000000 +1000
> +++ ./drivers/md/md.c	2007-05-08 22:57:51.000000000 +1000
> @@ -5095,7 +5095,7 @@ static int is_mddev_idle(mddev_t *mddev)
> 		 *
> 		 * Note: the following is an unsigned comparison.
> 		 */
> -		if ((curr_events - rdev->last_events + 4096) > 8192) {
> +		if ((long)curr_events - (long)rdev->last_events > 8192) {
> 			rdev->last_events = curr_events;
> 			idle = 0;
> 		}
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Neil, awesome patch-- what are the chances of it getting merged into 
2.6.22?

md3 : active raid5 sdl1[9] sdk1[8] sdj1[7] sdi1[6] sdh1[5] sdg1[4] sdf1[3] 
sde1[
2] sdd1[1] sdc1[0]
       1318686336 blocks level 5, 128k chunk, algorithm 2 [10/10] 
[UUUUUUUUUU]
       [>....................]  check =  0.5% (854084/146520704) 
finish=42.6min speed=56938K/sec

md0 : active raid1 sdb1[1] sda1[0]
       16787776 blocks [2/2] [UU]
       [=>...................]  check =  7.5% (1265984/16787776) 
finish=3.6min speed=70332K/sec

$ cat /sys/block/md2/md/sync_speed_min
1000 (system)

$ cat /sys/block/md3/md/sync_speed_min
1000 (system)

Working as advertised (utilizing all idle I/O)!

Justin.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux MD Raid Bug(?) w/Kernel sync_speed_min Option
  2007-05-08 13:24   ` Justin Piszcz
@ 2007-05-09  9:13     ` Neil Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Brown @ 2007-05-09  9:13 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

On Tuesday May 8, jpiszcz@lucidpixels.com wrote:
> 
> Neil, awesome patch-- what are the chances of it getting merged into 
> 2.6.22?
> 

Probably.  I want to think it through a bit more - to make sure I can
write a coherent and correct changelog entry.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Recovery of software RAID5 using FC6 rescue?
  2007-05-08 13:03 ` Neil Brown
  2007-05-08 13:13   ` Justin Piszcz
  2007-05-08 13:24   ` Justin Piszcz
@ 2007-05-08 17:24   ` Mark A. O'Neil
  2007-05-08 20:04     ` Michael Tokarev
  2 siblings, 1 reply; 11+ messages in thread
From: Mark A. O'Neil @ 2007-05-08 17:24 UTC (permalink / raw)
  To: linux-raid

Hello,

I hope this is the appropriate forum for this request if not please  
direct me to the correct one.

I have a system running FC6, 2.6.20-1.2925, software RAID5 and a  
power outage seems to have borked the file structure on the RAID.

Boot shows the following disks:
	sda #first disk in raid5: 250GB
	sdb #the boot disk: 80GB
	sdc #second disk in raid5: 250GB
	sdd #third disk in raid5: 250GB
	sde #fourth disk in raid5: 250GB

When I boot the system kernel panics with the following info displayed:
...
ata1.00: cd c8/00:08:e6:3e:13/00:00:00:00:00/e0 tag 0 cdb 0x0 data  
4096 in
exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: (BMDMA stat 0x25)
ata1.00: cd c8/00:08:e6:3e:13/00:00:00:00:00/e0 tag 0 cdb 0x0 data  
4096 in
EXT3-fs error (device sda3) ext_get_inode_loc: unable to read inode  
block
	-inode=8, block=1027
EXT3-fs: invalid journal inode
mount: error mounting /dev/root on /sysroot as ext3: invalid argument
setuproot: moving /dev failed: no such file or directory
setuproot: error mounting /proc:  no such file or directory
setuproot: error mounting /sys:  no such file or directory
switchroot: mount failed: no such file of directory
Kernel panic - not synching: attempted to kil init!

At which point the system locks as expected.

Another perhaps not related tidbit is when viewing sda1 using  (I  
think I did not write down the command) mdadm --misc --examine device  
I see (inpart) data describing the device in the array:

sda1 raid 4, total 4, active 4, working 4
and then a listing of disks sdc1, sdd1, sde1 all of which show

viewing the remaining disks in the list shows:
sdX1 raid 4, total 3, active 3, working 3

and then a listing of the disks with the first disk being shonw as  
removed.
It seems that the other disks do not have a reference to sda1? That  
in itself is perplexing to me but I vaguely recall seeing that before  
- it has been awhile since I set the system up.

Anyway, I think the ext3-fs error is less an issue with the software  
raid and more an issue that fsck could fix. My problem is how to non- 
destructively mount the raid from the rescue disk so that I can run  
fsck on the raid. I do not think mounting and running fsck on the  
individual disks is the correct solution.

Some straight forward instructions (or a pointer to some) on doing  
this from the rescue prompt would be most useful. I have been  
searching the last couple evenings and have yet to find something I  
completely understand. I have little experience with software raid  
and mdadm and while this is an excellent opportunity to learn a bit  
(and I am) I would like to successfully recover my data in a more  
timely fashion rather than mess it up beyond recovery as the result  
of a dolt interpretation of a man page. The applications and data  
itself is replaceable - just time consuming as in days rather than  
what I hope, with proper instruction, will amount to an evening or  
two worth of work to mount the RAID and run fsck.

I appreciate your time and any assistance you may be able to provide.  
If the above is not sufficient let me know and I will try and get at  
more info.

regards and thank you,

-m

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery of software RAID5 using FC6 rescue?
  2007-05-08 17:24   ` Recovery of software RAID5 using FC6 rescue? Mark A. O'Neil
@ 2007-05-08 20:04     ` Michael Tokarev
  2007-05-09  6:29       ` Nix
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Tokarev @ 2007-05-08 20:04 UTC (permalink / raw)
  To: Mark A. O'Neil; +Cc: linux-raid

Mark A. O'Neil wrote:
> Hello,
> 
> I hope this is the appropriate forum for this request if not please
> direct me to the correct one.
> 
> I have a system running FC6, 2.6.20-1.2925, software RAID5 and a power
> outage seems to have borked the file structure on the RAID.
> 
> Boot shows the following disks:
>     sda #first disk in raid5: 250GB
>     sdb #the boot disk: 80GB
>     sdc #second disk in raid5: 250GB
>     sdd #third disk in raid5: 250GB
>     sde #fourth disk in raid5: 250GB
>     
> When I boot the system kernel panics with the following info displayed:
> ...
> ata1.00: cd c8/00:08:e6:3e:13/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
> exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata1.00: (BMDMA stat 0x25)
> ata1.00: cd c8/00:08:e6:3e:13/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
> EXT3-fs error (device sda3) ext_get_inode_loc: unable to read inode block
>     -inode=8, block=1027
> EXT3-fs: invalid journal inode
> mount: error mounting /dev/root on /sysroot as ext3: invalid argument
> setuproot: moving /dev failed: no such file or directory
> setuproot: error mounting /proc:  no such file or directory
> setuproot: error mounting /sys:  no such file or directory
> switchroot: mount failed: no such file of directory
> Kernel panic - not synching: attempted to kil init!

Wug.

> At which point the system locks as expected.
> 
> Another perhaps not related tidbit is when viewing sda1 using  (I think
> I did not write down the command) mdadm --misc --examine device I see
> (inpart) data describing the device in the array:
> 
> sda1 raid 4, total 4, active 4, working 4
> and then a listing of disks sdc1, sdd1, sde1 all of which show
> 
> viewing the remaining disks in the list shows:
> sdX1 raid 4, total 3, active 3, working 3

You sure it's raid4, not raid5?  Because if it really is raid4, but before
you had a raid5 array, you're screwed, and the only way to recover is to
re-create the array (without losing data), re-writing superblocks (see below).

BTW, --misc can be omited - you only need

  mdadm -E /dev/sda1

> and then a listing of the disks with the first disk being shonw as removed.
> It seems that the other disks do not have a reference to sda1? That in
> itself is perplexing to me but I vaguely recall seeing that before - it
> has been awhile since I set the system up.

Check UUID values on all drives (also from mdadm -E output) - shoule be the
same.  And compare "Events" field in there too.  Maybe you had 4-disk array
before, but later re-created it to be 3-disks?  Another possible cause is the
disk failures resulting in bad superblock reads, but that's highly unlikely.

> Anyway, I think the ext3-fs error is less an issue with the software
> raid and more an issue that fsck could fix. My problem is how to
> non-destructively mount the raid from the rescue disk so that I can run
> fsck on the raid. I do not think mounting and running fsck on the
> individual disks is the correct solution.
> 
> Some straight forward instructions (or a pointer to some) on doing this
> from the rescue prompt would be most useful. I have been searching the
> last couple evenings and have yet to find something I completely
> understand. I have little experience with software raid and mdadm and
> while this is an excellent opportunity to learn a bit (and I am) I would
> like to successfully recover my data in a more timely fashion rather
> than mess it up beyond recovery as the result of a dolt interpretation
> of a man page. The applications and data itself is replaceable - just
> time consuming as in days rather than what I hope, with proper
> instruction, will amount to an evening or two worth of work to mount the
> RAID and run fsck.

Not sure about pointers.  But here are some points.

Figure out which arrays/disks you really had.  The raid level and number
of drives are really important.

Now two "mantras":

  mdadm --assemble /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1

This will try to bring the array up.  It will either come ok, or will
fail due to event count mismatches (more than 1 difference).

In case you have more than 1 mismatch, you can try adding --force option,
to tell mdadm to ignore mismatches and try the best it can.  The array
wont resync, it will be started from "best" (n-1) drives.

If there's a drive error, you can omit the bad drive from the command
and assemble a degraded array, but before doing so, see which drives
are more fresh (by examining Event counts in mdadm -E output).  If
one of the remaining drives has (much) lower event count than the rest,
while the bad one is (more-or-less) good, you've a good chance to have
bad (unrecoverable) filesystem.  This happens if the lower-events drive
has been kicked off the array (for whatever reason) long before your
last disaster happened, and hence it contains very old data and you've
very few chances to recover it without the bad drive.

And another mantra, which can be helpful if assemble doesn't work for
some reason:

  mdadm --create /dev/md0 --level=5 --num-devices=4 --layout=x --chunk-size=c \
     --assume-clean \
     /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1

This will re-create the superblock, but not touch any data inside.  The
magic word is --assume-clean - to stop md subsystem from starting any
resync, assuming the array is already all ok.

For this to work, you have to have all the parameters correct, including
order of the component devices.  You can collect that information from
your existing superblocks, and you can experiment with different options
till you see something that looks like a filesystem.

Instead of giving all the 4 devices, you can use the literal word "missing"
in place of any of them, like this:

  mdadm --create /dev/md0 --level=5 --num-devices=4 --layout=x --chunk-size=c \
     /dev/sda1 missing /dev/sdd1 /dev/sde1

(no need to specify --assume-clean as there's nothing to resync on a
degraded array).  With the same note: you still have to specify all the
correct parameters (if you didn't specify chunk-size and layout when
initially creating the array, you can omit them here as well, since
mdadm will pick the same defaults).

And finally, when everything looks ok, you can add the missing drive by
using

   mdadm --add /dev/md0 /dev/sdX1

(where sdX1 is the missing drive).  Or, in case of re-creating superblocks
with --create --assume-clean, you probably should start repair on the
array, echo repair > /sys/block/md0/md/sync_action -- but I bet it will
not work this way, ie, such build will not be satisfactory).

And oh, in case you will need to re-create the array (the 2nd "mantra"),
you probably will have to rebuild your initial ramdisk too.  Depending
on the way your initrd built, it may use UUID to find parts of the
array, which will be rewritten.

One additional note.  You may have hard time with ext3fs trying to
forcible replay the journal while experimenting with different
options.  It's sad thing, but if ext3 isn't umounted correctly,
it insists on replaying journal and refuses to work (even fsck)
without that.  But while trying different combinations to find
the best set to work with, writing to the array is a no-no.
To ensure it doesn't happen, you can start the array read-only,
echo 1 > /sys/module/md_mod/parameters/start_ro will help here.
But I'm not sure if ext3fsck will be able to do anything with
a read-only device...

BTW, for such recovery purposes, I use initrd (initramfs really, but
does not matter) with a normal (but tiny) set of commands inside,
thanks to busybox.  So everything can be done without any help from
external "recovery CD".  Very handy at times, especially since all
the network drivers are here on the initramfs too, so I can even
start a netcat server while in initramfs, and perform recovery from
remote system... ;)

Good luck!

/mjt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery of software RAID5 using FC6 rescue?
  2007-05-08 20:04     ` Michael Tokarev
@ 2007-05-09  6:29       ` Nix
  2007-05-09 11:34         ` Michael Tokarev
  0 siblings, 1 reply; 11+ messages in thread
From: Nix @ 2007-05-09  6:29 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Mark A. O'Neil, linux-raid

On 8 May 2007, Michael Tokarev told this:
> BTW, for such recovery purposes, I use initrd (initramfs really, but
> does not matter) with a normal (but tiny) set of commands inside,
> thanks to busybox.  So everything can be done without any help from
> external "recovery CD".  Very handy at times, especially since all
> the network drivers are here on the initramfs too, so I can even
> start a netcat server while in initramfs, and perform recovery from
> remote system... ;)

What you should probably do is drop into the shell that's being used to
run init if mount fails (or, more generally, if after mount runs it
hasn't ended up mounting anything: there's no need to rely on mount's
success/failure status). e.g. from my initramfs's init script (obviously
this is not runnable as is due to all the variables, but it should get
the idea across):

if [ -n $root ]; then
    /bin/mount -o $OPTS -t $TYPE $ROOT /new-root
fi

if /bin/mountpoint /new-root >/dev/null; then :; else
    echo "No root filesystem given to the kernel or found on the root RAID array."
    echo "Append the correct 'root=', 'root-type=', and/or 'root-options='"
    echo "boot options."
    echo
    echo "Dropping to a minimal shell.  Reboot with Ctrl-Alt-Delete."

    exec /bin/sh
fi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery of software RAID5 using FC6 rescue?
  2007-05-09  6:29       ` Nix
@ 2007-05-09 11:34         ` Michael Tokarev
  2007-05-09 19:50           ` Nix
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Tokarev @ 2007-05-09 11:34 UTC (permalink / raw)
  To: Nix; +Cc: Mark A. O'Neil, linux-raid

Nix wrote:
> On 8 May 2007, Michael Tokarev told this:
>> BTW, for such recovery purposes, I use initrd (initramfs really, but
>> does not matter) with a normal (but tiny) set of commands inside,
>> thanks to busybox.  So everything can be done without any help from
>> external "recovery CD".  Very handy at times, especially since all
>> the network drivers are here on the initramfs too, so I can even
>> start a netcat server while in initramfs, and perform recovery from
>> remote system... ;)
> 
> What you should probably do is drop into the shell that's being used to
> run init if mount fails (or, more generally, if after mount runs it

That's exactly what my initscript does ;)

chk() {
  while ! "$@"; do
    warn "the following command failed:"
    warn "$*"
    p="** Continue(Ignore)/Shell/Retry (C/s/r)? "
    while : ; do
      if ! read -t 10 -p "$p" x 2>&1; then
        echo "(timeout, continuing)"
        return 1
      fi
      case "$x" in
        [Ss!]*) /bin/sh 2>&1 ;;
        [Rr]*) break;;
        [CcIi]*|"") return 1;;
        *) echo "(unrecognized response)";;
      esac
    done
  done
}

chk mount -n -t proc proc /proc
chk mount -n -t sysfs sysfs /sys
...
info "mounting $rootfstype fs on $root (options: $rootflags)"
chk mount -n -t $rootfstype -o $rootflags $root /root
if [ $? != 0 ] && ! grep -q "^[^ ]\\+ /root " /proc/mounts; then
  warn "root filesystem ($rootfstype on $root) is NOT mounted!"
fi
...

> hasn't ended up mounting anything: there's no need to rely on mount's
> success/failure status). [...]

Well, so far exitcode has been reliable.

/mjt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery of software RAID5 using FC6 rescue?
  2007-05-09 11:34         ` Michael Tokarev
@ 2007-05-09 19:50           ` Nix
  2007-05-16 16:10             ` Mark A. O'Neil
  0 siblings, 1 reply; 11+ messages in thread
From: Nix @ 2007-05-09 19:50 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Mark A. O'Neil, linux-raid

On 9 May 2007, Michael Tokarev spake thusly:
> Nix wrote:
>> On 8 May 2007, Michael Tokarev told this:
>>> BTW, for such recovery purposes, I use initrd (initramfs really, but
>>> does not matter) with a normal (but tiny) set of commands inside,
>>> thanks to busybox.  So everything can be done without any help from
>>> external "recovery CD".  Very handy at times, especially since all
>>> the network drivers are here on the initramfs too, so I can even
>>> start a netcat server while in initramfs, and perform recovery from
>>> remote system... ;)
>> 
>> What you should probably do is drop into the shell that's being used to
>> run init if mount fails (or, more generally, if after mount runs it
>
> That's exactly what my initscript does ;)

I thought so. I was really talking to Mark, I suppose.

> chk() {
>   while ! "$@"; do
>     warn "the following command failed:"
>     warn "$*"
>     p="** Continue(Ignore)/Shell/Retry (C/s/r)? "

Wow. Feature-rich :)) I may reused this rather nifty stuff.

>> hasn't ended up mounting anything: there's no need to rely on mount's
>> success/failure status). [...]
>
> Well, so far exitcode has been reliable.

I guess I was being paranoid because I'm using busybox and at various
times the exitcodes of its internal commands have been... unimplemented
or unreliable.

-- 
`In the future, company names will be a 32-character hex string.'
  --- Bruce Schneier on the shortage of company names

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery of software RAID5 using FC6 rescue?
  2007-05-09 19:50           ` Nix
@ 2007-05-16 16:10             ` Mark A. O'Neil
  0 siblings, 0 replies; 11+ messages in thread
From: Mark A. O'Neil @ 2007-05-16 16:10 UTC (permalink / raw)
  To: linux-raid

I want to thank everyone for their suggestions.

After much fiddling about I eventually pushed things beyond repair,  
so had to start from scratch after all - no big I had a backup so  
that is good.

So I took the opportunity to play a bit with mdadm (adding, removing,  
repairing, etc) and think a crisis will be averted should a similar  
problem arise in the future.

regards,
-m

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-05-16 16:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-08 12:27 Linux MD Raid Bug(?) w/Kernel sync_speed_min Option Justin Piszcz
2007-05-08 13:03 ` Neil Brown
2007-05-08 13:13   ` Justin Piszcz
2007-05-08 13:24   ` Justin Piszcz
2007-05-09  9:13     ` Neil Brown
2007-05-08 17:24   ` Recovery of software RAID5 using FC6 rescue? Mark A. O'Neil
2007-05-08 20:04     ` Michael Tokarev
2007-05-09  6:29       ` Nix
2007-05-09 11:34         ` Michael Tokarev
2007-05-09 19:50           ` Nix
2007-05-16 16:10             ` Mark A. O'Neil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).