dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* dm-multipath performance patches
@ 2016-03-08  7:39 Hannes Reinecke
  2016-03-08 14:48 ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2016-03-08  7:39 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Hi Mike,

to picking up an old topic, what's the status of your performance
patches for dm-multipath?

I've started looking a branch 'devel3', and was wondering if these
are the latest patches available.
Or would you have newer?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dm-multipath performance patches
  2016-03-08  7:39 dm-multipath performance patches Hannes Reinecke
@ 2016-03-08 14:48 ` Mike Snitzer
  2016-03-08 15:27   ` Hannes Reinecke
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2016-03-08 14:48 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Junichi Nomura, device-mapper development

On Tue, Mar 08 2016 at  2:39am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> Hi Mike,
> 
> to picking up an old topic, what's the status of your performance
> patches for dm-multipath?
> 
> I've started looking a branch 'devel3', and was wondering if these
> are the latest patches available.
> Or would you have newer?

The 'devel3' was just a temporary branch I was using during
development.  I've since staged all the changes in linux-next via
linux-dm.git's 'for-next' branch:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next

If you could test this 'for-next' and report back I'd appreciate it.
It'd build my confidence in pushing these changes once the 4.6 merge
window opens.

That said, these changes have held up to Junichi's test suite.  WHich
I've now staged here (until I get around to porting them over to
device-mapper-test-suite): https://github.com/snitm/mptest

As for code review, please have a good look at this commit:
   b3c39bf1d dm mpath: convert from spinlock to RCU for most locking

If you find anything feel free to share an incremental fix and I'll get
it reviewed/staged.

(Junichi I'd welcome your review of all these request-based DM core and
multipath changes too)

Thanks,
Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dm-multipath performance patches
  2016-03-08 14:48 ` Mike Snitzer
@ 2016-03-08 15:27   ` Hannes Reinecke
  2016-03-08 17:40     ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2016-03-08 15:27 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Junichi Nomura, device-mapper development

On 03/08/2016 03:48 PM, Mike Snitzer wrote:
> On Tue, Mar 08 2016 at  2:39am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> Hi Mike,
>>
>> to picking up an old topic, what's the status of your performance
>> patches for dm-multipath?
>>
>> I've started looking a branch 'devel3', and was wondering if these
>> are the latest patches available.
>> Or would you have newer?
> 
> The 'devel3' was just a temporary branch I was using during
> development.  I've since staged all the changes in linux-next via
> linux-dm.git's 'for-next' branch:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next
> 
> If you could test this 'for-next' and report back I'd appreciate it.
> It'd build my confidence in pushing these changes once the 4.6 merge
> window opens.
> 
> That said, these changes have held up to Junichi's test suite.  WHich
> I've now staged here (until I get around to porting them over to
> device-mapper-test-suite): https://github.com/snitm/mptest
> 
> As for code review, please have a good look at this commit:
>    b3c39bf1d dm mpath: convert from spinlock to RCU for most locking
> 
> If you find anything feel free to share an incremental fix and I'll get
> it reviewed/staged.
> 
> (Junichi I'd welcome your review of all these request-based DM core and
> multipath changes too)
> 
Right.

Looks like it's essential the same as the devel3 branch, so here
hare some general comments:

- It _really_ has a creative RCU usage. From my understanding, RCU
is protecting access to the _pointer_, not the structure pointed to.
IOW changing fields in an RCU-proctected structure are not
necessarily visible to other CPUs, and I'm not sure if
synchronize_rcu() is sufficient here.
I'd be the first to admit that I'm no expert here, but I really
would like to have an explanation if one can do that.
- I've got some patches moving the bitfields into bitops, and use
atomic_t for counters. With that I guess we can get rid of most of
the creative RCU usage as we can use smp_mb__(after|before)_atomic()
with the same results.
That also enables us to get rid of the spinlock in most cases.
- Splitting 'struct multipath' off in an RCU-protected bit shouldn't
be required anymore with those patches, so maybe we should rework
that again.
- What I'm worried of is the list of pgs and pgpaths. We can protect
against changes within the pgs and pgpaths with bitops and atomic,
but not against changes of the lists. However, from what I've seen
the pg list and pgpath lists are pretty much immutable during normal
I/O, and can only be changed by a table load.
If the table load is protected by other means (eg by the
device-mapper core) than we wouldn't need any lock here, either.
But that needs to be clarified.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dm-multipath performance patches
  2016-03-08 15:27   ` Hannes Reinecke
@ 2016-03-08 17:40     ` Mike Snitzer
  2016-03-12 13:45       ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2016-03-08 17:40 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Junichi Nomura, device-mapper development

On Tue, Mar 08 2016 at 10:27am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 03/08/2016 03:48 PM, Mike Snitzer wrote:
> > On Tue, Mar 08 2016 at  2:39am -0500,
> > Hannes Reinecke <hare@suse.de> wrote:
> > 
> >> Hi Mike,
> >>
> >> to picking up an old topic, what's the status of your performance
> >> patches for dm-multipath?
> >>
> >> I've started looking a branch 'devel3', and was wondering if these
> >> are the latest patches available.
> >> Or would you have newer?
> > 
> > The 'devel3' was just a temporary branch I was using during
> > development.  I've since staged all the changes in linux-next via
> > linux-dm.git's 'for-next' branch:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next
> > 
> > If you could test this 'for-next' and report back I'd appreciate it.
> > It'd build my confidence in pushing these changes once the 4.6 merge
> > window opens.
> > 
> > That said, these changes have held up to Junichi's test suite.  WHich
> > I've now staged here (until I get around to porting them over to
> > device-mapper-test-suite): https://github.com/snitm/mptest
> > 
> > As for code review, please have a good look at this commit:
> >    b3c39bf1d dm mpath: convert from spinlock to RCU for most locking
> > 
> > If you find anything feel free to share an incremental fix and I'll get
> > it reviewed/staged.
> > 
> > (Junichi I'd welcome your review of all these request-based DM core and
> > multipath changes too)
> > 
> Right.
> 
> Looks like it's essential the same as the devel3 branch, so here
> hare some general comments:
> 
> - It _really_ has a creative RCU usage. From my understanding, RCU
> is protecting access to the _pointer_, not the structure pointed to.
> IOW changing fields in an RCU-proctected structure are not
> necessarily visible to other CPUs, and I'm not sure if
> synchronize_rcu() is sufficient here.
> I'd be the first to admit that I'm no expert here, but I really
> would like to have an explanation if one can do that.

I understand your concern and do share it.  I'm no expert here either.

It was my understanding that dereferences of the pointer trigger the
read/write side of the RCU.  And by-virtue of the write-side having
exclussive access (after grace period, etc) the concurrent reads should
be consistent.

You'll note that all writes to the 'struct multipath_paths' members are
made while holding the m->lock.  I'd have thought visibility of any
writes from other CPUs is moot -- primarily because any reads of those
'struct multipath_paths' state flags or unsigned counters should be
effectively atomic (due to read vs write side I touched on above).  But
in practice a suppose it possible for a cpu to see incorrect values.

Explicit use of atomic_t for the counters and bitsets (with atomic
test_bit, set_bit) would be best.

> - I've got some patches moving the bitfields into bitops, and use
> atomic_t for counters. With that I guess we can get rid of most of
> the creative RCU usage as we can use smp_mb__(after|before)_atomic()
> with the same results.
> That also enables us to get rid of the spinlock in most cases.

The use of bitops implies the use of a spinlocks (see
_atomic_spin_lock_irqsave and _atomic_spin_unlock_irqrestore) but I have
to assume that it shouldn't be a bottleneck.

Please rebase those changes ontop of 'for-next'.  I really don't think
my changes and yours are mutually exclussive.

> - Splitting 'struct multipath' off in an RCU-protected bit shouldn't
> be required anymore with those patches, so maybe we should rework
> that again.

I actually like how the split out 'struct multipath_paths' worked out.
Sure it is churn but it gives more coherent glimpse into the
proected/heavy-use members.  The use of 'struct multipath_paths __rcu
*paths;' can be revisited after (e.g. '__rcu' dropped, etc) later.

But I've done quite a lot of testing and would like to move forward
rather than blow the code up yet again.

> - What I'm worried of is the list of pgs and pgpaths. We can protect
> against changes within the pgs and pgpaths with bitops and atomic,
> but not against changes of the lists. However, from what I've seen
> the pg list and pgpath lists are pretty much immutable during normal
> I/O, and can only be changed by a table load.
> If the table load is protected by other means (eg by the
> device-mapper core) than we wouldn't need any lock here, either.
> But that needs to be clarified.

Yes, those lists are immutable.  They are established during table load
and never changed.

There isn't any explicit use of locking to protect the lists (the use of
rcu_read_lock() around their accessess is a side-effect of getting the
multipath_paths).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dm-multipath performance patches
  2016-03-08 17:40     ` Mike Snitzer
@ 2016-03-12 13:45       ` Mike Snitzer
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Snitzer @ 2016-03-12 13:45 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Junichi Nomura, device-mapper development, Jeff Moyer

On Tue, Mar 08 2016 at 12:40pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Mar 08 2016 at 10:27am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
> > On 03/08/2016 03:48 PM, Mike Snitzer wrote:
> > > On Tue, Mar 08 2016 at  2:39am -0500,
> > > Hannes Reinecke <hare@suse.de> wrote:
> > > 
> > >> Hi Mike,
> > >>
> > >> to picking up an old topic, what's the status of your performance
> > >> patches for dm-multipath?
> > >>
> > >> I've started looking a branch 'devel3', and was wondering if these
> > >> are the latest patches available.
> > >> Or would you have newer?
> > > 
> > > The 'devel3' was just a temporary branch I was using during
> > > development.  I've since staged all the changes in linux-next via
> > > linux-dm.git's 'for-next' branch:
> > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next
> > > 
> > > If you could test this 'for-next' and report back I'd appreciate it.
> > > It'd build my confidence in pushing these changes once the 4.6 merge
> > > window opens.
> > > 
> > > That said, these changes have held up to Junichi's test suite.  WHich
> > > I've now staged here (until I get around to porting them over to
> > > device-mapper-test-suite): https://github.com/snitm/mptest
> > > 
> > > As for code review, please have a good look at this commit:
> > >    b3c39bf1d dm mpath: convert from spinlock to RCU for most locking
> > > 
> > > If you find anything feel free to share an incremental fix and I'll get
> > > it reviewed/staged.
> > > 
> > > (Junichi I'd welcome your review of all these request-based DM core and
> > > multipath changes too)
> > > 
> > Right.
> > 
> > Looks like it's essential the same as the devel3 branch, so here
> > hare some general comments:
> > 
> > - It _really_ has a creative RCU usage. From my understanding, RCU
> > is protecting access to the _pointer_, not the structure pointed to.
> > IOW changing fields in an RCU-proctected structure are not
> > necessarily visible to other CPUs, and I'm not sure if
> > synchronize_rcu() is sufficient here.
> > I'd be the first to admit that I'm no expert here, but I really
> > would like to have an explanation if one can do that.
> 
> I understand your concern and do share it.  I'm no expert here either.
> 
> It was my understanding that dereferences of the pointer trigger the
> read/write side of the RCU.  And by-virtue of the write-side having
> exclussive access (after grace period, etc) the concurrent reads should
> be consistent.
> 
> You'll note that all writes to the 'struct multipath_paths' members are
> made while holding the m->lock.  I'd have thought visibility of any
> writes from other CPUs is moot -- primarily because any reads of those
> 'struct multipath_paths' state flags or unsigned counters should be
> effectively atomic (due to read vs write side I touched on above).  But
> in practice a suppose it possible for a cpu to see incorrect values.
> 
> Explicit use of atomic_t for the counters and bitsets (with atomic
> test_bit, set_bit) would be best.
> 
> > - I've got some patches moving the bitfields into bitops, and use
> > atomic_t for counters. With that I guess we can get rid of most of
> > the creative RCU usage as we can use smp_mb__(after|before)_atomic()
> > with the same results.
> > That also enables us to get rid of the spinlock in most cases.
> 
> The use of bitops implies the use of a spinlocks (see
> _atomic_spin_lock_irqsave and _atomic_spin_unlock_irqrestore) but I have
> to assume that it shouldn't be a bottleneck.
> 
> Please rebase those changes ontop of 'for-next'.  I really don't think
> my changes and yours are mutually exclussive.

After further review, from Jeff Moyer, and some introspection I've
decided to drop the commits that factor out 'struct multipath_paths' and
switch to using RCU for the 4.6 merge window.  My RCU changes (while not
obviously broken) were overly complex.  To the point that I'd rather
avoid the potential for difficult/subtle dm-mpath bugs in production.

As such, please rebase your bitops and atomic_t changes (hopefully those
are 2 separate commits) ontop of linux-dm.git's 'for-next'.  I'd like us
to be able to tackle dm-multipath's NUMA scalability issues in time for
the 4.7 merge window -- but develop the changes _now_.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-12 13:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-08  7:39 dm-multipath performance patches Hannes Reinecke
2016-03-08 14:48 ` Mike Snitzer
2016-03-08 15:27   ` Hannes Reinecke
2016-03-08 17:40     ` Mike Snitzer
2016-03-12 13:45       ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).