* dm-multipath performance patches @ 2016-03-08 7:39 Hannes Reinecke 2016-03-08 14:48 ` Mike Snitzer 0 siblings, 1 reply; 5+ messages in thread From: Hannes Reinecke @ 2016-03-08 7:39 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development Hi Mike, to picking up an old topic, what's the status of your performance patches for dm-multipath? I've started looking a branch 'devel3', and was wondering if these are the latest patches available. Or would you have newer? Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: dm-multipath performance patches 2016-03-08 7:39 dm-multipath performance patches Hannes Reinecke @ 2016-03-08 14:48 ` Mike Snitzer 2016-03-08 15:27 ` Hannes Reinecke 0 siblings, 1 reply; 5+ messages in thread From: Mike Snitzer @ 2016-03-08 14:48 UTC (permalink / raw) To: Hannes Reinecke; +Cc: Junichi Nomura, device-mapper development On Tue, Mar 08 2016 at 2:39am -0500, Hannes Reinecke <hare@suse.de> wrote: > Hi Mike, > > to picking up an old topic, what's the status of your performance > patches for dm-multipath? > > I've started looking a branch 'devel3', and was wondering if these > are the latest patches available. > Or would you have newer? The 'devel3' was just a temporary branch I was using during development. I've since staged all the changes in linux-next via linux-dm.git's 'for-next' branch: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next If you could test this 'for-next' and report back I'd appreciate it. It'd build my confidence in pushing these changes once the 4.6 merge window opens. That said, these changes have held up to Junichi's test suite. WHich I've now staged here (until I get around to porting them over to device-mapper-test-suite): https://github.com/snitm/mptest As for code review, please have a good look at this commit: b3c39bf1d dm mpath: convert from spinlock to RCU for most locking If you find anything feel free to share an incremental fix and I'll get it reviewed/staged. (Junichi I'd welcome your review of all these request-based DM core and multipath changes too) Thanks, Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: dm-multipath performance patches 2016-03-08 14:48 ` Mike Snitzer @ 2016-03-08 15:27 ` Hannes Reinecke 2016-03-08 17:40 ` Mike Snitzer 0 siblings, 1 reply; 5+ messages in thread From: Hannes Reinecke @ 2016-03-08 15:27 UTC (permalink / raw) To: Mike Snitzer; +Cc: Junichi Nomura, device-mapper development On 03/08/2016 03:48 PM, Mike Snitzer wrote: > On Tue, Mar 08 2016 at 2:39am -0500, > Hannes Reinecke <hare@suse.de> wrote: > >> Hi Mike, >> >> to picking up an old topic, what's the status of your performance >> patches for dm-multipath? >> >> I've started looking a branch 'devel3', and was wondering if these >> are the latest patches available. >> Or would you have newer? > > The 'devel3' was just a temporary branch I was using during > development. I've since staged all the changes in linux-next via > linux-dm.git's 'for-next' branch: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next > > If you could test this 'for-next' and report back I'd appreciate it. > It'd build my confidence in pushing these changes once the 4.6 merge > window opens. > > That said, these changes have held up to Junichi's test suite. WHich > I've now staged here (until I get around to porting them over to > device-mapper-test-suite): https://github.com/snitm/mptest > > As for code review, please have a good look at this commit: > b3c39bf1d dm mpath: convert from spinlock to RCU for most locking > > If you find anything feel free to share an incremental fix and I'll get > it reviewed/staged. > > (Junichi I'd welcome your review of all these request-based DM core and > multipath changes too) > Right. Looks like it's essential the same as the devel3 branch, so here hare some general comments: - It _really_ has a creative RCU usage. From my understanding, RCU is protecting access to the _pointer_, not the structure pointed to. IOW changing fields in an RCU-proctected structure are not necessarily visible to other CPUs, and I'm not sure if synchronize_rcu() is sufficient here. I'd be the first to admit that I'm no expert here, but I really would like to have an explanation if one can do that. - I've got some patches moving the bitfields into bitops, and use atomic_t for counters. With that I guess we can get rid of most of the creative RCU usage as we can use smp_mb__(after|before)_atomic() with the same results. That also enables us to get rid of the spinlock in most cases. - Splitting 'struct multipath' off in an RCU-protected bit shouldn't be required anymore with those patches, so maybe we should rework that again. - What I'm worried of is the list of pgs and pgpaths. We can protect against changes within the pgs and pgpaths with bitops and atomic, but not against changes of the lists. However, from what I've seen the pg list and pgpath lists are pretty much immutable during normal I/O, and can only be changed by a table load. If the table load is protected by other means (eg by the device-mapper core) than we wouldn't need any lock here, either. But that needs to be clarified. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: dm-multipath performance patches 2016-03-08 15:27 ` Hannes Reinecke @ 2016-03-08 17:40 ` Mike Snitzer 2016-03-12 13:45 ` Mike Snitzer 0 siblings, 1 reply; 5+ messages in thread From: Mike Snitzer @ 2016-03-08 17:40 UTC (permalink / raw) To: Hannes Reinecke; +Cc: Junichi Nomura, device-mapper development On Tue, Mar 08 2016 at 10:27am -0500, Hannes Reinecke <hare@suse.de> wrote: > On 03/08/2016 03:48 PM, Mike Snitzer wrote: > > On Tue, Mar 08 2016 at 2:39am -0500, > > Hannes Reinecke <hare@suse.de> wrote: > > > >> Hi Mike, > >> > >> to picking up an old topic, what's the status of your performance > >> patches for dm-multipath? > >> > >> I've started looking a branch 'devel3', and was wondering if these > >> are the latest patches available. > >> Or would you have newer? > > > > The 'devel3' was just a temporary branch I was using during > > development. I've since staged all the changes in linux-next via > > linux-dm.git's 'for-next' branch: > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next > > > > If you could test this 'for-next' and report back I'd appreciate it. > > It'd build my confidence in pushing these changes once the 4.6 merge > > window opens. > > > > That said, these changes have held up to Junichi's test suite. WHich > > I've now staged here (until I get around to porting them over to > > device-mapper-test-suite): https://github.com/snitm/mptest > > > > As for code review, please have a good look at this commit: > > b3c39bf1d dm mpath: convert from spinlock to RCU for most locking > > > > If you find anything feel free to share an incremental fix and I'll get > > it reviewed/staged. > > > > (Junichi I'd welcome your review of all these request-based DM core and > > multipath changes too) > > > Right. > > Looks like it's essential the same as the devel3 branch, so here > hare some general comments: > > - It _really_ has a creative RCU usage. From my understanding, RCU > is protecting access to the _pointer_, not the structure pointed to. > IOW changing fields in an RCU-proctected structure are not > necessarily visible to other CPUs, and I'm not sure if > synchronize_rcu() is sufficient here. > I'd be the first to admit that I'm no expert here, but I really > would like to have an explanation if one can do that. I understand your concern and do share it. I'm no expert here either. It was my understanding that dereferences of the pointer trigger the read/write side of the RCU. And by-virtue of the write-side having exclussive access (after grace period, etc) the concurrent reads should be consistent. You'll note that all writes to the 'struct multipath_paths' members are made while holding the m->lock. I'd have thought visibility of any writes from other CPUs is moot -- primarily because any reads of those 'struct multipath_paths' state flags or unsigned counters should be effectively atomic (due to read vs write side I touched on above). But in practice a suppose it possible for a cpu to see incorrect values. Explicit use of atomic_t for the counters and bitsets (with atomic test_bit, set_bit) would be best. > - I've got some patches moving the bitfields into bitops, and use > atomic_t for counters. With that I guess we can get rid of most of > the creative RCU usage as we can use smp_mb__(after|before)_atomic() > with the same results. > That also enables us to get rid of the spinlock in most cases. The use of bitops implies the use of a spinlocks (see _atomic_spin_lock_irqsave and _atomic_spin_unlock_irqrestore) but I have to assume that it shouldn't be a bottleneck. Please rebase those changes ontop of 'for-next'. I really don't think my changes and yours are mutually exclussive. > - Splitting 'struct multipath' off in an RCU-protected bit shouldn't > be required anymore with those patches, so maybe we should rework > that again. I actually like how the split out 'struct multipath_paths' worked out. Sure it is churn but it gives more coherent glimpse into the proected/heavy-use members. The use of 'struct multipath_paths __rcu *paths;' can be revisited after (e.g. '__rcu' dropped, etc) later. But I've done quite a lot of testing and would like to move forward rather than blow the code up yet again. > - What I'm worried of is the list of pgs and pgpaths. We can protect > against changes within the pgs and pgpaths with bitops and atomic, > but not against changes of the lists. However, from what I've seen > the pg list and pgpath lists are pretty much immutable during normal > I/O, and can only be changed by a table load. > If the table load is protected by other means (eg by the > device-mapper core) than we wouldn't need any lock here, either. > But that needs to be clarified. Yes, those lists are immutable. They are established during table load and never changed. There isn't any explicit use of locking to protect the lists (the use of rcu_read_lock() around their accessess is a side-effect of getting the multipath_paths). ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: dm-multipath performance patches 2016-03-08 17:40 ` Mike Snitzer @ 2016-03-12 13:45 ` Mike Snitzer 0 siblings, 0 replies; 5+ messages in thread From: Mike Snitzer @ 2016-03-12 13:45 UTC (permalink / raw) To: Hannes Reinecke; +Cc: Junichi Nomura, device-mapper development, Jeff Moyer On Tue, Mar 08 2016 at 12:40pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, Mar 08 2016 at 10:27am -0500, > Hannes Reinecke <hare@suse.de> wrote: > > > On 03/08/2016 03:48 PM, Mike Snitzer wrote: > > > On Tue, Mar 08 2016 at 2:39am -0500, > > > Hannes Reinecke <hare@suse.de> wrote: > > > > > >> Hi Mike, > > >> > > >> to picking up an old topic, what's the status of your performance > > >> patches for dm-multipath? > > >> > > >> I've started looking a branch 'devel3', and was wondering if these > > >> are the latest patches available. > > >> Or would you have newer? > > > > > > The 'devel3' was just a temporary branch I was using during > > > development. I've since staged all the changes in linux-next via > > > linux-dm.git's 'for-next' branch: > > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next > > > > > > If you could test this 'for-next' and report back I'd appreciate it. > > > It'd build my confidence in pushing these changes once the 4.6 merge > > > window opens. > > > > > > That said, these changes have held up to Junichi's test suite. WHich > > > I've now staged here (until I get around to porting them over to > > > device-mapper-test-suite): https://github.com/snitm/mptest > > > > > > As for code review, please have a good look at this commit: > > > b3c39bf1d dm mpath: convert from spinlock to RCU for most locking > > > > > > If you find anything feel free to share an incremental fix and I'll get > > > it reviewed/staged. > > > > > > (Junichi I'd welcome your review of all these request-based DM core and > > > multipath changes too) > > > > > Right. > > > > Looks like it's essential the same as the devel3 branch, so here > > hare some general comments: > > > > - It _really_ has a creative RCU usage. From my understanding, RCU > > is protecting access to the _pointer_, not the structure pointed to. > > IOW changing fields in an RCU-proctected structure are not > > necessarily visible to other CPUs, and I'm not sure if > > synchronize_rcu() is sufficient here. > > I'd be the first to admit that I'm no expert here, but I really > > would like to have an explanation if one can do that. > > I understand your concern and do share it. I'm no expert here either. > > It was my understanding that dereferences of the pointer trigger the > read/write side of the RCU. And by-virtue of the write-side having > exclussive access (after grace period, etc) the concurrent reads should > be consistent. > > You'll note that all writes to the 'struct multipath_paths' members are > made while holding the m->lock. I'd have thought visibility of any > writes from other CPUs is moot -- primarily because any reads of those > 'struct multipath_paths' state flags or unsigned counters should be > effectively atomic (due to read vs write side I touched on above). But > in practice a suppose it possible for a cpu to see incorrect values. > > Explicit use of atomic_t for the counters and bitsets (with atomic > test_bit, set_bit) would be best. > > > - I've got some patches moving the bitfields into bitops, and use > > atomic_t for counters. With that I guess we can get rid of most of > > the creative RCU usage as we can use smp_mb__(after|before)_atomic() > > with the same results. > > That also enables us to get rid of the spinlock in most cases. > > The use of bitops implies the use of a spinlocks (see > _atomic_spin_lock_irqsave and _atomic_spin_unlock_irqrestore) but I have > to assume that it shouldn't be a bottleneck. > > Please rebase those changes ontop of 'for-next'. I really don't think > my changes and yours are mutually exclussive. After further review, from Jeff Moyer, and some introspection I've decided to drop the commits that factor out 'struct multipath_paths' and switch to using RCU for the 4.6 merge window. My RCU changes (while not obviously broken) were overly complex. To the point that I'd rather avoid the potential for difficult/subtle dm-mpath bugs in production. As such, please rebase your bitops and atomic_t changes (hopefully those are 2 separate commits) ontop of linux-dm.git's 'for-next'. I'd like us to be able to tackle dm-multipath's NUMA scalability issues in time for the 4.7 merge window -- but develop the changes _now_. Thanks, Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-03-12 13:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-08 7:39 dm-multipath performance patches Hannes Reinecke 2016-03-08 14:48 ` Mike Snitzer 2016-03-08 15:27 ` Hannes Reinecke 2016-03-08 17:40 ` Mike Snitzer 2016-03-12 13:45 ` Mike Snitzer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).