linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Synchronous vs asynchonous mdadm operations
@ 2008-11-28 16:27 Chris Webb
  2008-11-28 16:41 ` Chris Webb
  2008-12-04 10:59 ` Chris Webb
  0 siblings, 2 replies; 4+ messages in thread
From: Chris Webb @ 2008-11-28 16:27 UTC (permalink / raw)
  To: linux-raid

I notice that some mdadm operations appear to be asynchronous. For instance,

  mdadm --fail /dev/md/shelf.51000 /dev/mapper/slot.51000.1
  mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1

will always fail at the --remove stage with

  mdadm: hot remove failed for /dev/mapper/slot.51000.1: Device or resource busy

whereas adding a short sleep in between will make it successful.

Is there a 'standard' way to wait for this operation to complete or to
perform both steps in one go, other than something horrible like:

  mdadm --fail /dev/md/shelf.51000 /dev/mapper/slot.51000.1
  MD=$((`stat -c '%#T' -L /dev/md/shelf.51000`))
  MAJOR=$((`stat -c '%#t' -L /dev/mapper/slot.51000.1`))
  MINOR=$((`stat -c '%#T' -L /dev/mapper/slot.51000.1`))
  for RD in /sys/block/md$MD/md/rd*; do
    [ -f $RD/block/dev ] || continue
    [ "`<$RD/block/dev`" = "$MAJOR:$MINOR" ] || continue
    while [ "< $RD/state" != "faulty ]; do sleep 0.1; done
  done
  mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1


Also, is mdadm --stop asynchronous in the same way? If mdadm --stop succeeds
on one host and I immediately run mdadm --assemble on another host which is
able to access the same slots, am I at risk of corrupting the array?

The reason for the question is that I'm seeing occasional cases of arrays which
won't reassemble following such an operation. dmesg alleges there is an invalid
superblock for all of the six slots which were originally part of the array:

  md: md126 stopped.
  md: etherd/e24.1 does not have a valid v1.1 superblock, not importing!
  md: md_import_device returned -22
  md: etherd/e24.4 does not have a valid v1.1 superblock, not importing!
  md: md_import_device returned -22
  md: etherd/e24.5 does not have a valid v1.1 superblock, not importing!
  md: md_import_device returned -22
  md: etherd/e24.2 does not have a valid v1.1 superblock, not importing!
  md: md_import_device returned -22
  md: etherd/e24.3 does not have a valid v1.1 superblock, not importing!
  md: md_import_device returned -22
  md: etherd/e24.0 does not have a valid v1.1 superblock, not importing!
  md: md_import_device returned -22

This array had been grown from 258MB slots to 13GB slots on the old host
shortly before being stopped and attempting to reassemble on a new host, and
mdadm --examine on each of the slots shows a superblock reflecting the old
array size, rather than the new. Presumably there is other corruption too,
which I can't see.

  # mdadm --examine /dev/etherd/e24.3 
  /dev/etherd/e24.3:
            Magic : a92b4efc
          Version : 1.1
      Feature Map : 0x0
       Array UUID : 94de9400:e0cb45f4:36e50a70:184a6875
             Name : 3:shelf.24
    Creation Time : Fri Nov 21 18:22:38 2008
       Raid Level : raid6
     Raid Devices : 6

   Avail Dev Size : 27789808 (13.25 GiB 14.23 GB)
       Array Size : 2107392 (1029.17 MiB 1078.98 MB)
    Used Dev Size : 526848 (257.29 MiB 269.75 MB)
      Data Offset : 16 sectors
     Super Offset : 0 sectors
            State : clean
      Device UUID : d51aaa04:d51a524b:77b766d1:10eb7ec6

      Update Time : Fri Nov 28 13:18:19 2008
         Checksum : 9644dd7f - correct
           Events : 22

       Chunk Size : 4K

      Array Slot : 5 (0, 1, 2, 3, 4, 5)
     Array State : uuuuuU

The event count shown by mdadm --examine matches across all the slots.


For what it's worth, the underlying aoe devices through which remote slots are
made visible to the old and new hosts should correctly handle synchronous
writes/fsync(). If the sync returns as completed, the written data should
genuinely be visible and consistent from every host which can see the device,
whether locally or remotely. (Obviously if I wasn't respecting fsync()
behaviour at the network block device level, I'd expect all sorts of
consistency problems in moving an array from host to host like this.)

Cheers,

Chris.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Synchronous vs asynchonous mdadm operations
  2008-11-28 16:27 Synchronous vs asynchonous mdadm operations Chris Webb
@ 2008-11-28 16:41 ` Chris Webb
  2008-12-04 10:59 ` Chris Webb
  1 sibling, 0 replies; 4+ messages in thread
From: Chris Webb @ 2008-11-28 16:41 UTC (permalink / raw)
  To: linux-raid

Chris Webb <chris@arachsys.com> writes:

> I notice that some mdadm operations appear to be asynchronous. For instance,
> 
>   mdadm --fail /dev/md/shelf.51000 /dev/mapper/slot.51000.1
>   mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1
> 
> will always fail at the --remove stage with
> 
>   mdadm: hot remove failed for /dev/mapper/slot.51000.1: Device or resource busy
> 
> whereas adding a short sleep in between will make it successful.
[...]
> Also, is mdadm --stop asynchronous in the same way? If mdadm --stop succeeds
> on one host and I immediately run mdadm --assemble on another host which is
> able to access the same slots, am I at risk of corrupting the array?
> 
> The reason for the question is that I'm seeing occasional cases of arrays which
> won't reassemble following such an operation. dmesg alleges there is an invalid
> superblock for all of the six slots which were originally part of the array:

I should say, both of these were seen with mdadm 2.6.7 and the md driver
from kernel 2.6.27. I notice that Neil released mdadm 2.6.8 while I was
writing my message, including a changelog entry:

  Fix an error when assembling arrays that are in the middle of a reshape. 

Perhaps I've just hit this bug in this case? It would certainly explain why
I'm seeing it so rarely.

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Synchronous vs asynchonous mdadm operations
  2008-11-28 16:27 Synchronous vs asynchonous mdadm operations Chris Webb
  2008-11-28 16:41 ` Chris Webb
@ 2008-12-04 10:59 ` Chris Webb
  2008-12-05  4:45   ` Neil Brown
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Webb @ 2008-12-04 10:59 UTC (permalink / raw)
  To: linux-raid

Chris Webb <chris@arachsys.com> writes:

[Re: mdadm --stop being potentially asynchronous]
> The reason for the question is that I'm seeing occasional cases of arrays which
> won't reassemble following such an operation. dmesg alleges there is an invalid
> superblock for all of the six slots which were originally part of the array.

I tracked this one down to my scripts, which were failing to adjust the
available space on the rdevs in a particularly rare case. However, I'm still
wondering about the best way to do a fail/remove combination, given
that fail appears to be asynchronous. The shell fragment I give below seems
way over the top, but I can't see any simpler route....

> I notice that some mdadm operations appear to be asynchronous. For instance,
> 
>   mdadm --fail /dev/md/shelf.51000 /dev/mapper/slot.51000.1
>   mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1
> 
> will always fail at the --remove stage with
> 
>   mdadm: hot remove failed for /dev/mapper/slot.51000.1: Device or resource busy
> 
> whereas adding a short sleep in between will make it successful.
> 
> Is there a 'standard' way to wait for this operation to complete or to
> perform both steps in one go, other than something horrible like:
> 
>   mdadm --fail /dev/md/shelf.51000 /dev/mapper/slot.51000.1
>   MD=$((`stat -c '%#T' -L /dev/md/shelf.51000`))
>   MAJOR=$((`stat -c '%#t' -L /dev/mapper/slot.51000.1`))
>   MINOR=$((`stat -c '%#T' -L /dev/mapper/slot.51000.1`))
>   for RD in /sys/block/md$MD/md/rd*; do
>     [ -f $RD/block/dev ] || continue
>     [ "`<$RD/block/dev`" = "$MAJOR:$MINOR" ] || continue
>     while [ "< $RD/state" != "faulty ]; do sleep 0.1; done
>   done
>   mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Synchronous vs asynchonous mdadm operations
  2008-12-04 10:59 ` Chris Webb
@ 2008-12-05  4:45   ` Neil Brown
  0 siblings, 0 replies; 4+ messages in thread
From: Neil Brown @ 2008-12-05  4:45 UTC (permalink / raw)
  To: Chris Webb; +Cc: linux-raid

On Thursday December 4, chris@arachsys.com wrote:
> Chris Webb <chris@arachsys.com> writes:
> 
> [Re: mdadm --stop being potentially asynchronous]
> > The reason for the question is that I'm seeing occasional cases of arrays which
> > won't reassemble following such an operation. dmesg alleges there is an invalid
> > superblock for all of the six slots which were originally part of the array.
> 
> I tracked this one down to my scripts, which were failing to adjust the
> available space on the rdevs in a particularly rare case. However, I'm still
> wondering about the best way to do a fail/remove combination, given
> that fail appears to be asynchronous. The shell fragment I give below seems
> way over the top, but I can't see any simpler route....

Yes, --fail is asynchronous and I suspect it will remain that way.  It
is up to the raid array to decide when to let go of the device and it
might never do say: if you fail the last working drive on a raid1, you
still cannot remove it.


> 
> > I notice that some mdadm operations appear to be asynchronous. For instance,
> > 
> >   mdadm --fail /dev/md/shelf.51000 /dev/mapper/slot.51000.1
> >   mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1
> > 
> > will always fail at the --remove stage with
> > 
> >   mdadm: hot remove failed for /dev/mapper/slot.51000.1: Device or resource busy
> > 
> > whereas adding a short sleep in between will make it successful.
> > 
> > Is there a 'standard' way to wait for this operation to complete or to
> > perform both steps in one go, other than something horrible like:
> > 
> >   mdadm --fail /dev/md/shelf.51000 /dev/mapper/slot.51000.1
> >   MD=$((`stat -c '%#T' -L /dev/md/shelf.51000`))
> >   MAJOR=$((`stat -c '%#t' -L /dev/mapper/slot.51000.1`))
> >   MINOR=$((`stat -c '%#T' -L /dev/mapper/slot.51000.1`))
> >   for RD in /sys/block/md$MD/md/rd*; do
> >     [ -f $RD/block/dev ] || continue
> >     [ "`<$RD/block/dev`" = "$MAJOR:$MINOR" ] || continue
> >     while [ "< $RD/state" != "faulty ]; do sleep 0.1; done
> >   done
> >   mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1

I suspect
  until mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1
  do sleep 0.1
  done

might be slightly simpler.

NeilBrown


> 
> Cheers,
> 
> Chris.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-12-05  4:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-28 16:27 Synchronous vs asynchonous mdadm operations Chris Webb
2008-11-28 16:41 ` Chris Webb
2008-12-04 10:59 ` Chris Webb
2008-12-05  4:45   ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).