All of lore.kernel.org
 help / color / mirror / Atom feed
* multipath breaks with recent udev/systemd
@ 2014-12-15  9:31 Hannes Reinecke
  2014-12-15 12:53 ` Alexander E. Patrakov
  2014-12-16 22:10 ` Benjamin Marzinski
  0 siblings, 2 replies; 9+ messages in thread
From: Hannes Reinecke @ 2014-12-15  9:31 UTC (permalink / raw)
  To: device-mapper development
  Cc: Benjamin Marzinski, Kay Sievers, systemd Mailing List,
	Mike Snitzer

Hi all,

in commit 3ebdb81ef088afd3b4c72b516beb5610f8c93a0d
(udev: serialize/synchronize block device event handling with file
locks) udev started using flock() on the device node, supposedly to
synchronize with an ominous 'block event handling'.

The code looks like this:

                  if (d) {
                        fd_lock = open(udev_device_get_devnode(d),
O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
                        if (fd_lock >= 0 && flock(fd_lock,
LOCK_SH|LOCK_NB) < 0) {
                             log_debug("Unable to flock(%s),
skipping event handling: %m", udev_device_get_devnode(d));
                             err = -EWOULDBLOCK;
                             goto skip;
                       }
                   }

However, multipath since several years is using a similar construct
to lock all devices belonging to a multipath device table before
creating a mulitpath dm-device:

	vector_foreach_slot (mpp->pg, pgp, i) {
		if (!pgp->paths)
			continue;
		vector_foreach_slot(pgp->paths, pp, j) {
			if (lock && flock(pp->fd, LOCK_SH | LOCK_NB) &&
			    errno == EWOULDBLOCK)
				goto fail;
			else if (!lock)
				flock(pp->fd, LOCK_UN);
		}
	}

So during bootup it's anyone's guess who's first, multipath or udev.
And depending on the timing either multipath will fail to setup
the device-mapper device or udev will simply ignore the device.
Neither of those is a good, but the first is an absolute killer for
modern systems which _rely_ on udev to configure devices.

So how it this supposed to work?
Why does udev ignore the entire event if it can't get the lock?
Shouldn't it rather be retried?
What is the supposed recovery here?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: multipath breaks with recent udev/systemd
  2014-12-15  9:31 multipath breaks with recent udev/systemd Hannes Reinecke
@ 2014-12-15 12:53 ` Alexander E. Patrakov
  2014-12-16 22:10 ` Benjamin Marzinski
  1 sibling, 0 replies; 9+ messages in thread
From: Alexander E. Patrakov @ 2014-12-15 12:53 UTC (permalink / raw)
  To: Hannes Reinecke, device-mapper development
  Cc: Benjamin Marzinski, Kay Sievers, systemd Mailing List,
	Mike Snitzer

15.12.2014 14:31, Hannes Reinecke wrote:
> Hi all,
>
> in commit 3ebdb81ef088afd3b4c72b516beb5610f8c93a0d
> (udev: serialize/synchronize block device event handling with file
> locks) udev started using flock() on the device node, supposedly to
> synchronize with an ominous 'block event handling'.

This is a duplicate of the issue that I have reported earlier:

https://www.redhat.com/archives/dm-devel/2014-October/msg00210.html

Therefore, I want to be CCed on replies.

>
> The code looks like this:
>
>                    if (d) {
>                          fd_lock = open(udev_device_get_devnode(d),
> O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
>                          if (fd_lock >= 0 && flock(fd_lock,
> LOCK_SH|LOCK_NB) < 0) {
>                               log_debug("Unable to flock(%s),
> skipping event handling: %m", udev_device_get_devnode(d));
>                               err = -EWOULDBLOCK;
>                               goto skip;
>                         }
>                     }
>
> However, multipath since several years is using a similar construct
> to lock all devices belonging to a multipath device table before
> creating a mulitpath dm-device:
>
> 	vector_foreach_slot (mpp->pg, pgp, i) {
> 		if (!pgp->paths)
> 			continue;
> 		vector_foreach_slot(pgp->paths, pp, j) {
> 			if (lock && flock(pp->fd, LOCK_SH | LOCK_NB) &&
> 			    errno == EWOULDBLOCK)
> 				goto fail;
> 			else if (!lock)
> 				flock(pp->fd, LOCK_UN);
> 		}
> 	}
>
> So during bootup it's anyone's guess who's first, multipath or udev.
> And depending on the timing either multipath will fail to setup
> the device-mapper device or udev will simply ignore the device.
> Neither of those is a good, but the first is an absolute killer for
> modern systems which _rely_ on udev to configure devices.
>
> So how it this supposed to work?
> Why does udev ignore the entire event if it can't get the lock?
> Shouldn't it rather be retried?
> What is the supposed recovery here?
>
> Cheers,
>
> Hannes
>


-- 
Alexander E. Patrakov
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: multipath breaks with recent udev/systemd
  2014-12-15  9:31 multipath breaks with recent udev/systemd Hannes Reinecke
  2014-12-15 12:53 ` Alexander E. Patrakov
@ 2014-12-16 22:10 ` Benjamin Marzinski
  2014-12-16 22:18   ` [dm-devel] " Benjamin Marzinski
  1 sibling, 1 reply; 9+ messages in thread
From: Benjamin Marzinski @ 2014-12-16 22:10 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: device-mapper development, Kay Sievers, systemd Mailing List,
	Mike Snitzer

On Mon, Dec 15, 2014 at 10:31:44AM +0100, Hannes Reinecke wrote:
> Hi all,
> 
> in commit 3ebdb81ef088afd3b4c72b516beb5610f8c93a0d
> (udev: serialize/synchronize block device event handling with file
> locks) udev started using flock() on the device node, supposedly to
> synchronize with an ominous 'block event handling'.
> 
> The code looks like this:
> 
>                   if (d) {
>                         fd_lock = open(udev_device_get_devnode(d),
> O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
>                         if (fd_lock >= 0 && flock(fd_lock,
> LOCK_SH|LOCK_NB) < 0) {
>                              log_debug("Unable to flock(%s),
> skipping event handling: %m", udev_device_get_devnode(d));
>                              err = -EWOULDBLOCK;
>                              goto skip;
>                        }
>                    }
> 
> However, multipath since several years is using a similar construct
> to lock all devices belonging to a multipath device table before
> creating a mulitpath dm-device:
> 
> 	vector_foreach_slot (mpp->pg, pgp, i) {
> 		if (!pgp->paths)
> 			continue;
> 		vector_foreach_slot(pgp->paths, pp, j) {
> 			if (lock && flock(pp->fd, LOCK_SH | LOCK_NB) &&
> 			    errno == EWOULDBLOCK)
> 				goto fail;
> 			else if (!lock)
> 				flock(pp->fd, LOCK_UN);
> 		}
> 	}
> 
> So during bootup it's anyone's guess who's first, multipath or udev.
> And depending on the timing either multipath will fail to setup
> the device-mapper device or udev will simply ignore the device.
> Neither of those is a good, but the first is an absolute killer for
> modern systems which _rely_ on udev to configure devices.
> 
> So how it this supposed to work?
> Why does udev ignore the entire event if it can't get the lock?
> Shouldn't it rather be retried?
> What is the supposed recovery here?

Hannes, are you against the idea that Alexander mentioned in his first
email, of just locking a file in /var/lock?  Multipathd doesn't create
devices in parallel. Multipath doesn't create files in parallel.  We are
explicitly trying to avoid multipath and multipathd creating files at
the same time. So, we should only need a single file to lock, and
/run/lock should always be there.

-Ben

> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		               zSeries & Storage
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dm-devel] multipath breaks with recent udev/systemd
  2014-12-16 22:10 ` Benjamin Marzinski
@ 2014-12-16 22:18   ` Benjamin Marzinski
  2014-12-17 12:04     ` Hannes Reinecke
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Marzinski @ 2014-12-16 22:18 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: device-mapper development, Kay Sievers, systemd Mailing List,
	Mike Snitzer

On Tue, Dec 16, 2014 at 04:10:44PM -0600, Benjamin Marzinski wrote:
> On Mon, Dec 15, 2014 at 10:31:44AM +0100, Hannes Reinecke wrote:
> > Hi all,
> > 
> > in commit 3ebdb81ef088afd3b4c72b516beb5610f8c93a0d
> > (udev: serialize/synchronize block device event handling with file
> > locks) udev started using flock() on the device node, supposedly to
> > synchronize with an ominous 'block event handling'.
> > 
> > The code looks like this:
> > 
> >                   if (d) {
> >                         fd_lock = open(udev_device_get_devnode(d),
> > O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
> >                         if (fd_lock >= 0 && flock(fd_lock,
> > LOCK_SH|LOCK_NB) < 0) {
> >                              log_debug("Unable to flock(%s),
> > skipping event handling: %m", udev_device_get_devnode(d));
> >                              err = -EWOULDBLOCK;
> >                              goto skip;
> >                        }
> >                    }
> > 
> > However, multipath since several years is using a similar construct
> > to lock all devices belonging to a multipath device table before
> > creating a mulitpath dm-device:
> > 
> > 	vector_foreach_slot (mpp->pg, pgp, i) {
> > 		if (!pgp->paths)
> > 			continue;
> > 		vector_foreach_slot(pgp->paths, pp, j) {
> > 			if (lock && flock(pp->fd, LOCK_SH | LOCK_NB) &&
> > 			    errno == EWOULDBLOCK)
> > 				goto fail;
> > 			else if (!lock)
> > 				flock(pp->fd, LOCK_UN);
> > 		}
> > 	}
> > 
> > So during bootup it's anyone's guess who's first, multipath or udev.
> > And depending on the timing either multipath will fail to setup
> > the device-mapper device or udev will simply ignore the device.
> > Neither of those is a good, but the first is an absolute killer for
> > modern systems which _rely_ on udev to configure devices.
> > 
> > So how it this supposed to work?
> > Why does udev ignore the entire event if it can't get the lock?
> > Shouldn't it rather be retried?
> > What is the supposed recovery here?
> 
> Hannes, are you against the idea that Alexander mentioned in his first
> email, of just locking a file in /var/lock?  Multipathd doesn't create
> devices in parallel. Multipath doesn't create files in parallel.  We are
> explicitly trying to avoid multipath and multipathd creating files at
> the same time. So, we should only need a single file to lock, and
> /run/lock should always be there.

O.k. So if we want to keep our current nonblocking behavior, we'll need
more lockfiles, either one per path or one per wwid.  This still seems
like a reasonable idea, if there is a good reason for systemd doing what
it's doing.

-Ben

> 
> -Ben
> 
> > 
> > Cheers,
> > 
> > Hannes
> > -- 
> > Dr. Hannes Reinecke		               zSeries & Storage
> > hare@suse.de			               +49 911 74053 688
> > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> > GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> > HRB 21284 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dm-devel] multipath breaks with recent udev/systemd
  2014-12-16 22:18   ` [dm-devel] " Benjamin Marzinski
@ 2014-12-17 12:04     ` Hannes Reinecke
  2014-12-18 20:18       ` Kay Sievers
  2014-12-18 22:04       ` Benjamin Marzinski
  0 siblings, 2 replies; 9+ messages in thread
From: Hannes Reinecke @ 2014-12-17 12:04 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: device-mapper development, Kay Sievers, systemd Mailing List,
	Mike Snitzer

On 12/16/2014 11:18 PM, Benjamin Marzinski wrote:
> On Tue, Dec 16, 2014 at 04:10:44PM -0600, Benjamin Marzinski wrote:
>> On Mon, Dec 15, 2014 at 10:31:44AM +0100, Hannes Reinecke wrote:
[ .. ]
>>> So during bootup it's anyone's guess who's first, multipath or udev.
>>> And depending on the timing either multipath will fail to setup
>>> the device-mapper device or udev will simply ignore the device.
>>> Neither of those is a good, but the first is an absolute killer for
>>> modern systems which _rely_ on udev to configure devices.
>>>
>>> So how it this supposed to work?
>>> Why does udev ignore the entire event if it can't get the lock?
>>> Shouldn't it rather be retried?
>>> What is the supposed recovery here?
>>
>> Hannes, are you against the idea that Alexander mentioned in his first
>> email, of just locking a file in /var/lock?  Multipathd doesn't create
>> devices in parallel. Multipath doesn't create files in parallel.  We are
>> explicitly trying to avoid multipath and multipathd creating files at
>> the same time. So, we should only need a single file to lock, and
>> /run/lock should always be there.
> 
> O.k. So if we want to keep our current nonblocking behavior, we'll need
> more lockfiles, either one per path or one per wwid.  This still seems
> like a reasonable idea, if there is a good reason for systemd doing what
> it's doing.
> 
The problem is as follows:

When multipathd is running we simply _cannot_ guarantee that no udev
events are currently running. This currently hits us especially bad
during system startup when device probing is still running during
multipathd startup.
Multipathd will then enumerate all block devices to setup the
initial topology.
But in doing so it might trip over device which are still processed
by udev (or, worse still, _not yet_ processed by udev).
(Yes, I know, libudev_enumerate should protect against this.
 But it doesn't. )

So it's anyone guess what'll happen now; either multipath trips over
the lock from udev when calling 'lock_multipath' (and consequently
failing to setup the multipath device), or udev
tripping over the lock from multipath and ignoring the event,
leaving us with a non-functioning device.

We can fixup the startup sequence (which we need to do anyway, given
the libudev enumerate bug) to just re-trigger all block device
events, but this still doesn't fix the actual issue.
Point is, there might be _several_ events for the same device being
queued (think of a flaky path with several PATH_FAILED /
PATH_REINSTATED events in a row), and so multipathd might be
processing one event for the device while udev is processing the
next event for the same device.

For this to work we need some synchronization with udev; _if_ there
would be a libudev callout for 'is there an event for this device
running' we can easily fail the 'lock_multipath' operation, knowing
that we'll be getting another event shortly for the same device.

Alternatively we can call flock(LOCK_EX) on that device, but that
will only work if udev would _not_ abort event handling for that
device, but rather issues a retry.
After all, there _is_ a device timeout in udev. It should be
relatively easy to retry the event and let it run into a timeout if
the lock won't be released.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dm-devel] multipath breaks with recent udev/systemd
  2014-12-17 12:04     ` Hannes Reinecke
@ 2014-12-18 20:18       ` Kay Sievers
  2014-12-18 22:04       ` Benjamin Marzinski
  1 sibling, 0 replies; 9+ messages in thread
From: Kay Sievers @ 2014-12-18 20:18 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Benjamin Marzinski, systemd Mailing List,
	device-mapper development, Mike Snitzer

On Wed, Dec 17, 2014 at 1:04 PM, Hannes Reinecke <hare@suse.de> wrote:
> On 12/16/2014 11:18 PM, Benjamin Marzinski wrote:
>> On Tue, Dec 16, 2014 at 04:10:44PM -0600, Benjamin Marzinski wrote:
>>> On Mon, Dec 15, 2014 at 10:31:44AM +0100, Hannes Reinecke wrote:
> [ .. ]
>>>> So during bootup it's anyone's guess who's first, multipath or udev.
>>>> And depending on the timing either multipath will fail to setup
>>>> the device-mapper device or udev will simply ignore the device.
>>>> Neither of those is a good, but the first is an absolute killer for
>>>> modern systems which _rely_ on udev to configure devices.
>>>>
>>>> So how it this supposed to work?
>>>> Why does udev ignore the entire event if it can't get the lock?
>>>> Shouldn't it rather be retried?
>>>> What is the supposed recovery here?
>>>
>>> Hannes, are you against the idea that Alexander mentioned in his first
>>> email, of just locking a file in /var/lock?  Multipathd doesn't create
>>> devices in parallel. Multipath doesn't create files in parallel.  We are
>>> explicitly trying to avoid multipath and multipathd creating files at
>>> the same time. So, we should only need a single file to lock, and
>>> /run/lock should always be there.
>>
>> O.k. So if we want to keep our current nonblocking behavior, we'll need
>> more lockfiles, either one per path or one per wwid.  This still seems
>> like a reasonable idea, if there is a good reason for systemd doing what
>> it's doing.
>>
> The problem is as follows:
>
> When multipathd is running we simply _cannot_ guarantee that no udev
> events are currently running.

All running block device events will take a shared lock of the device
node. This can be used?

> This currently hits us especially bad
> during system startup when device probing is still running during
> multipathd startup.
> Multipathd will then enumerate all block devices to setup the
> initial topology.
> But in doing so it might trip over device which are still processed
> by udev (or, worse still, _not yet_ processed by udev).
> (Yes, I know, libudev_enumerate should protect against this.
>  But it doesn't. )

There is a udev_enumerate_match_is_initialized(), which can help?

> So it's anyone guess what'll happen now; either multipath trips over
> the lock from udev when calling 'lock_multipath' (and consequently
> failing to setup the multipath device), or udev
> tripping over the lock from multipath and ignoring the event,
> leaving us with a non-functioning device.
>
> We can fixup the startup sequence (which we need to do anyway, given
> the libudev enumerate bug) to just re-trigger all block device
> events, but this still doesn't fix the actual issue.
> Point is, there might be _several_ events for the same device being
> queued (think of a flaky path with several PATH_FAILED /
> PATH_REINSTATED events in a row), and so multipathd might be
> processing one event for the device while udev is processing the
> next event for the same device.
>
> For this to work we need some synchronization with udev; _if_ there
> would be a libudev callout for 'is there an event for this device
> running' we can easily fail the 'lock_multipath' operation, knowing
> that we'll be getting another event shortly for the same device.
>
> Alternatively we can call flock(LOCK_EX) on that device, but that
> will only work if udev would _not_ abort event handling for that
> device, but rather issues a retry.

Exclusive locks signify "device ownership" and udev will suppress all
events to not disturb the owner's operations. There can't be any udev
dependencies or expectations of udev doing something during the time
an exclusive lock is held.

> After all, there _is_ a device timeout in udev. It should be
> relatively easy to retry the event and let it run into a timeout if
> the lock won't be released.

We should avoid any complex rules here, and if possible just reduce
the scheme to a simple ownership model.

Thanks,
Kay
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dm-devel] multipath breaks with recent udev/systemd
  2014-12-17 12:04     ` Hannes Reinecke
  2014-12-18 20:18       ` Kay Sievers
@ 2014-12-18 22:04       ` Benjamin Marzinski
  2014-12-19  7:21         ` Hannes Reinecke
  1 sibling, 1 reply; 9+ messages in thread
From: Benjamin Marzinski @ 2014-12-18 22:04 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: device-mapper development, Kay Sievers, systemd Mailing List,
	Mike Snitzer

On Wed, Dec 17, 2014 at 01:04:54PM +0100, Hannes Reinecke wrote:
> On 12/16/2014 11:18 PM, Benjamin Marzinski wrote:
> > On Tue, Dec 16, 2014 at 04:10:44PM -0600, Benjamin Marzinski wrote:
> >> On Mon, Dec 15, 2014 at 10:31:44AM +0100, Hannes Reinecke wrote:
> [ .. ]
> >>> So during bootup it's anyone's guess who's first, multipath or udev.
> >>> And depending on the timing either multipath will fail to setup
> >>> the device-mapper device or udev will simply ignore the device.
> >>> Neither of those is a good, but the first is an absolute killer for
> >>> modern systems which _rely_ on udev to configure devices.
> >>>
> >>> So how it this supposed to work?
> >>> Why does udev ignore the entire event if it can't get the lock?
> >>> Shouldn't it rather be retried?
> >>> What is the supposed recovery here?
> >>
> >> Hannes, are you against the idea that Alexander mentioned in his first
> >> email, of just locking a file in /var/lock?  Multipathd doesn't create
> >> devices in parallel. Multipath doesn't create files in parallel.  We are
> >> explicitly trying to avoid multipath and multipathd creating files at
> >> the same time. So, we should only need a single file to lock, and
> >> /run/lock should always be there.
> > 
> > O.k. So if we want to keep our current nonblocking behavior, we'll need
> > more lockfiles, either one per path or one per wwid.  This still seems
> > like a reasonable idea, if there is a good reason for systemd doing what
> > it's doing.
> > 
> The problem is as follows:
> 
> When multipathd is running we simply _cannot_ guarantee that no udev
> events are currently running. This currently hits us especially bad
> during system startup when device probing is still running during
> multipathd startup.
> Multipathd will then enumerate all block devices to setup the
> initial topology.
> But in doing so it might trip over device which are still processed
> by udev (or, worse still, _not yet_ processed by udev).
> (Yes, I know, libudev_enumerate should protect against this.
>  But it doesn't. )

But we start waiting for events before the initial multipath device
configuration, and don't process them until after that configuration
is compelete, so if there is ever a case where the initial configuration
is accessing the device to early, aren't we guaranteed to get an event
afterwards, assuming that udev doesn't drop it?

> 
> So it's anyone guess what'll happen now; either multipath trips over
> the lock from udev when calling 'lock_multipath' (and consequently
> failing to setup the multipath device), or udev
> tripping over the lock from multipath and ignoring the event,
> leaving us with a non-functioning device.

But my point above is that if we use a lockfile instead of locking the
path device itself, there won't be any lock contention, and udev won't
drop the events.

> We can fixup the startup sequence (which we need to do anyway, given
> the libudev enumerate bug) to just re-trigger all block device
> events, but this still doesn't fix the actual issue.
> Point is, there might be _several_ events for the same device being
> queued (think of a flaky path with several PATH_FAILED /
> PATH_REINSTATED events in a row), and so multipathd might be
> processing one event for the device while udev is processing the
> next event for the same device.
> 
> For this to work we need some synchronization with udev; _if_ there
> would be a libudev callout for 'is there an event for this device
> running' we can easily fail the 'lock_multipath' operation, knowing
> that we'll be getting another event shortly for the same device.

But if we can avoid the lock contention, then eventually all these
events will make it to multipathd, and we will be up to date. right?
Or am I missing something here?

-Ben

> Alternatively we can call flock(LOCK_EX) on that device, but that
> will only work if udev would _not_ abort event handling for that
> device, but rather issues a retry.
> After all, there _is_ a device timeout in udev. It should be
> relatively easy to retry the event and let it run into a timeout if
> the lock won't be released.
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		               zSeries & Storage
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dm-devel] multipath breaks with recent udev/systemd
  2014-12-18 22:04       ` Benjamin Marzinski
@ 2014-12-19  7:21         ` Hannes Reinecke
  2014-12-19 15:16           ` Benjamin Marzinski
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2014-12-19  7:21 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: device-mapper development, Kay Sievers, systemd Mailing List,
	Mike Snitzer

On 12/18/2014 11:04 PM, Benjamin Marzinski wrote:
> On Wed, Dec 17, 2014 at 01:04:54PM +0100, Hannes Reinecke wrote:
>> On 12/16/2014 11:18 PM, Benjamin Marzinski wrote:
>>> On Tue, Dec 16, 2014 at 04:10:44PM -0600, Benjamin Marzinski wrote:
>>>> On Mon, Dec 15, 2014 at 10:31:44AM +0100, Hannes Reinecke wrote:
>> [ .. ]
>>>>> So during bootup it's anyone's guess who's first, multipath or udev.
>>>>> And depending on the timing either multipath will fail to setup
>>>>> the device-mapper device or udev will simply ignore the device.
>>>>> Neither of those is a good, but the first is an absolute killer for
>>>>> modern systems which _rely_ on udev to configure devices.
>>>>>
>>>>> So how it this supposed to work?
>>>>> Why does udev ignore the entire event if it can't get the lock?
>>>>> Shouldn't it rather be retried?
>>>>> What is the supposed recovery here?
>>>>
>>>> Hannes, are you against the idea that Alexander mentioned in his first
>>>> email, of just locking a file in /var/lock?  Multipathd doesn't create
>>>> devices in parallel. Multipath doesn't create files in parallel.  We are
>>>> explicitly trying to avoid multipath and multipathd creating files at
>>>> the same time. So, we should only need a single file to lock, and
>>>> /run/lock should always be there.
>>>
>>> O.k. So if we want to keep our current nonblocking behavior, we'll need
>>> more lockfiles, either one per path or one per wwid.  This still seems
>>> like a reasonable idea, if there is a good reason for systemd doing what
>>> it's doing.
>>>
>> The problem is as follows:
>>
>> When multipathd is running we simply _cannot_ guarantee that no udev
>> events are currently running. This currently hits us especially bad
>> during system startup when device probing is still running during
>> multipathd startup.
>> Multipathd will then enumerate all block devices to setup the
>> initial topology.
>> But in doing so it might trip over device which are still processed
>> by udev (or, worse still, _not yet_ processed by udev).
>> (Yes, I know, libudev_enumerate should protect against this.
>>  But it doesn't. )
> 
> But we start waiting for events before the initial multipath device
> configuration, and don't process them until after that configuration
> is compelete, so if there is ever a case where the initial configuration
> is accessing the device to early, aren't we guaranteed to get an event
> afterwards, assuming that udev doesn't drop it?
> 
That was the initial idea. Only it doesn't do it currently :-)

>>
>> So it's anyone guess what'll happen now; either multipath trips over
>> the lock from udev when calling 'lock_multipath' (and consequently
>> failing to setup the multipath device), or udev
>> tripping over the lock from multipath and ignoring the event,
>> leaving us with a non-functioning device.
> 
> But my point above is that if we use a lockfile instead of locking the
> path device itself, there won't be any lock contention, and udev won't
> drop the events.
> 
The underlying issue here is:

Why does multipath lock the devices _at all_?
If it were to protect against device disappearing while doing the
ioctl that's already proven not to work.
And for protecting against mounts a simple open(O_EXCL) would be
sufficient. So whom are we fooling here?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dm-devel] multipath breaks with recent udev/systemd
  2014-12-19  7:21         ` Hannes Reinecke
@ 2014-12-19 15:16           ` Benjamin Marzinski
  0 siblings, 0 replies; 9+ messages in thread
From: Benjamin Marzinski @ 2014-12-19 15:16 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: device-mapper development, Kay Sievers, systemd Mailing List,
	Mike Snitzer

On Fri, Dec 19, 2014 at 08:21:57AM +0100, Hannes Reinecke wrote:
> On 12/18/2014 11:04 PM, Benjamin Marzinski wrote:
> > On Wed, Dec 17, 2014 at 01:04:54PM +0100, Hannes Reinecke wrote:
> >> On 12/16/2014 11:18 PM, Benjamin Marzinski wrote:
> >>> On Tue, Dec 16, 2014 at 04:10:44PM -0600, Benjamin Marzinski wrote:
> >>>> On Mon, Dec 15, 2014 at 10:31:44AM +0100, Hannes Reinecke wrote:
> >> [ .. ]
> >>>>> So during bootup it's anyone's guess who's first, multipath or udev.
> >>>>> And depending on the timing either multipath will fail to setup
> >>>>> the device-mapper device or udev will simply ignore the device.
> >>>>> Neither of those is a good, but the first is an absolute killer for
> >>>>> modern systems which _rely_ on udev to configure devices.
> >>>>>
> >>>>> So how it this supposed to work?
> >>>>> Why does udev ignore the entire event if it can't get the lock?
> >>>>> Shouldn't it rather be retried?
> >>>>> What is the supposed recovery here?
> >>>>
> >>>> Hannes, are you against the idea that Alexander mentioned in his first
> >>>> email, of just locking a file in /var/lock?  Multipathd doesn't create
> >>>> devices in parallel. Multipath doesn't create files in parallel.  We are
> >>>> explicitly trying to avoid multipath and multipathd creating files at
> >>>> the same time. So, we should only need a single file to lock, and
> >>>> /run/lock should always be there.
> >>>
> >>> O.k. So if we want to keep our current nonblocking behavior, we'll need
> >>> more lockfiles, either one per path or one per wwid.  This still seems
> >>> like a reasonable idea, if there is a good reason for systemd doing what
> >>> it's doing.
> >>>
> >> The problem is as follows:
> >>
> >> When multipathd is running we simply _cannot_ guarantee that no udev
> >> events are currently running. This currently hits us especially bad
> >> during system startup when device probing is still running during
> >> multipathd startup.
> >> Multipathd will then enumerate all block devices to setup the
> >> initial topology.
> >> But in doing so it might trip over device which are still processed
> >> by udev (or, worse still, _not yet_ processed by udev).
> >> (Yes, I know, libudev_enumerate should protect against this.
> >>  But it doesn't. )
> > 
> > But we start waiting for events before the initial multipath device
> > configuration, and don't process them until after that configuration
> > is compelete, so if there is ever a case where the initial configuration
> > is accessing the device to early, aren't we guaranteed to get an event
> > afterwards, assuming that udev doesn't drop it?
> > 
> That was the initial idea. Only it doesn't do it currently :-)
> 
> >>
> >> So it's anyone guess what'll happen now; either multipath trips over
> >> the lock from udev when calling 'lock_multipath' (and consequently
> >> failing to setup the multipath device), or udev
> >> tripping over the lock from multipath and ignoring the event,
> >> leaving us with a non-functioning device.
> > 
> > But my point above is that if we use a lockfile instead of locking the
> > path device itself, there won't be any lock contention, and udev won't
> > drop the events.
> > 
> The underlying issue here is:
> 
> Why does multipath lock the devices _at all_?
> If it were to protect against device disappearing while doing the
> ioctl that's already proven not to work.
> And for protecting against mounts a simple open(O_EXCL) would be
> sufficient. So whom are we fooling here?
> 

The commit where the locking was added is
4bc295cef7a50ea46bfc7b98c65848babf56c822

It is from when the mutipath command was directly called in the udev
rules.  If two path devices with the same wwid got discovered at the
same time, two multipath commands would be racing to create the devices,
it's possible that neither would succeed without locking.

The chance of this happening now is greatly reduced. If someone ran
multipath while multipathd was creating a device, it could happen.  Or
(even less likely) if a multipath device got removed, with "multipath
-f", and then two mutipath commands were run at the same time to restore
it.

I do think that some sort of locking is reasonable, since we really
don't want a to ever get into a situation where running mutipath can
interfere with multipathd, but the multipath tools are only protecting
against themselves with this locking.

-Ben

> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		               zSeries & Storage
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-12-19 15:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-15  9:31 multipath breaks with recent udev/systemd Hannes Reinecke
2014-12-15 12:53 ` Alexander E. Patrakov
2014-12-16 22:10 ` Benjamin Marzinski
2014-12-16 22:18   ` [dm-devel] " Benjamin Marzinski
2014-12-17 12:04     ` Hannes Reinecke
2014-12-18 20:18       ` Kay Sievers
2014-12-18 22:04       ` Benjamin Marzinski
2014-12-19  7:21         ` Hannes Reinecke
2014-12-19 15:16           ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.