All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zdenek Kabelac <zkabelac@redhat.com>
To: lvm-devel@redhat.com
Subject: [PATCH v2]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
Date: Wed, 23 Oct 2013 10:50:08 +0200	[thread overview]
Message-ID: <52678DC0.3020903@redhat.com> (raw)
In-Reply-To: <1382485181.19061.3.camel@f16>

Dne 23.10.2013 01:39, Jonathan Brassow napsal(a):
> Changed some variable/function names and added more explanation to the
> config file.
>
> I will send a separate patch that contains a warning message if mirrors
> are activated and 'ignore_lvm_mirrors' is not set... We can talk about
> whether that is needed also.
>
>   brassow
>
> Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
>
> There is a problem with the way mirrors have been designed to handle
> failures that is resulting in stuck LVM processes and hung I/O.  When
> mirrors encounter a write failure, they block I/O and notify userspace
> to reconfigure the mirror to remove failed devices.  This process is
> open to a couple races:
> 1) Any LVM process other than the one that is meant to deal with the
> mirror failure can attempt to read the mirror, fail, and block other
> LVM commands (including the repair command) from proceeding due to
> holding a lock on the volume group.
> 2) If there are multiple mirrors that suffer a failure in the same
> volume group, a repair can block while attempting to read the LVM
> label from one mirror while trying to repair the other.
>
> Mitigation of these races has been attempted by disallowing label reading
> of mirrors that are either suspended or are indicated as blocking by
> the kernel.  While this has closed the window of opportunity for hitting

Is mirror read 'abort-able' (i.e. sigalarm()) when it's blocked ?
So our  'scan' routine could try to read mirror - which suddenly
gets 'frozen' by write error.
If we would have used sigalarm - we should be able abort() read operation
(though I'm not sure where the read gets stuck - maybe it would need change in 
the kernel driver?)  - after read failure we may detect mirror error 
conditions through dm status - and make some reaction?

The very similar thing needs to be added for scanning of i.e. thinly 
provisioned devices - which may get stuck when the pool is overfilled - so 
some solution in this direction is unavoidable - IMHO we should not hide the 
problem by disabling of scanning).


> 2) Instrument a way to allow asynchronous label reading - allowing
> blocked label reads to be ignored while continuing to process the LVM
> command.  This would action would allow LVM commands to continue even
> though they would have otherwise blocked trying to read a mirror.  They
> can then release their lock and allow a repair command to commence.  In
> the event of #2 above, the repair command already in progress can continue
> and repair the failed mirror.

Async read is not the only problem here - we have other issues:

i.e. activate mirror - and wait for confirmation  (dmsetup udevcomplete)
but this may also run watch rule - and also  blkid may get blocked (mirror error)

So now we get into fancy states - where  our command is waiting for
semaphore completion (no timeout on semaphore for now) - which doesn't happen 
since master udev kills its udev scan completely  - without any 'finalization' 
step.

So - we would need to probably make a mirror device also 'unscannable' ??
(which makes it unusable for filesystems??)

Anyway - more troubles ahead....

Zdenek




  reply	other threads:[~2013-10-23  8:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-21 22:34 [PATCH]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors Jonathan Brassow
2013-10-22  9:42 ` Zdenek Kabelac
2013-10-22 13:00   ` Brassow Jonathan
2013-10-22 23:39 ` [PATCH v2]: " Jonathan Brassow
2013-10-23  8:50   ` Zdenek Kabelac [this message]
2013-10-22 23:41 ` [PATCH]: Mirror: warn when activating mirror and !ignore_lvm_mirrors Jonathan Brassow
2013-10-23  7:52   ` Zdenek Kabelac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52678DC0.3020903@redhat.com \
    --to=zkabelac@redhat.com \
    --cc=lvm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.