* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] [not found] ` <20170215145617.GA4241@infradead.org> @ 2017-02-16 2:53 ` Mike Snitzer 2017-02-16 5:00 ` [dm-devel] " Bart Van Assche 2017-02-16 14:26 ` [dm-devel] " Christoph Hellwig 0 siblings, 2 replies; 20+ messages in thread From: Mike Snitzer @ 2017-02-16 2:53 UTC (permalink / raw) On Wed, Feb 15 2017 at 9:56am -0500, Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Feb 14, 2017@04:19:13PM -0500, Keith Busch wrote: > > These devices are mulitpath capable, and have been able to stack with > > dm-mpath since kernel 4.2. > > Can we make this conditional on something? I have native NVMe > multipathing almost ready for the next merge window which is a lot easier > to use and faster than dm. So I don't want us to be locked into this > mode just before that. You've avoided discussing this on any level (and I guess you aren't going to LSF/MM?). Yet you're expecting to just drop it into the tree without a care in the world about the implications. Nobody has interest in Linux multipathing becoming fragmented. If every transport implemented their own multipathing the end-user would be amazingly screwed trying to keep track of all the quirks/configuration/management of each. Not saying multipath-tools is great, nor that DM multipath is god's gift. But substantiating _why_ you need this "native NVMe multipathing" would go a really long way to justifying your effort. For starters, how about you show just how much better than DM multipath this native NVMe multipathing performs? NOTE: it'd imply you put effort to making DM multipath work with NVMe.. if you've sat on that code too that'd be amazingly unfortunate/frustrating. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 2:53 ` hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] Mike Snitzer @ 2017-02-16 5:00 ` Bart Van Assche 2017-02-16 12:37 ` Mike Snitzer 2017-02-16 14:26 ` [dm-devel] " Christoph Hellwig 1 sibling, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2017-02-16 5:00 UTC (permalink / raw) On 02/15/17 18:53, Mike Snitzer wrote: > Nobody has interest in Linux multipathing becoming fragmented. > > If every transport implemented their own multipathing the end-user would > be amazingly screwed trying to keep track of all the > quirks/configuration/management of each. > > Not saying multipath-tools is great, nor that DM multipath is god's > gift. But substantiating _why_ you need this "native NVMe > multipathing" would go a really long way to justifying your effort. > > For starters, how about you show just how much better than DM multipath > this native NVMe multipathing performs? NOTE: it'd imply you put effort > to making DM multipath work with NVMe.. if you've sat on that code too > that'd be amazingly unfortunate/frustrating. Another question is what your attitude is towards dm-mpath changes? Last time I posted a series of patches that significantly clean up and improve readability of the dm-mpath code you refused to take these upstream. Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 5:00 ` [dm-devel] " Bart Van Assche @ 2017-02-16 12:37 ` Mike Snitzer 2017-02-16 19:46 ` Bart Van Assche 0 siblings, 1 reply; 20+ messages in thread From: Mike Snitzer @ 2017-02-16 12:37 UTC (permalink / raw) On Thu, Feb 16 2017 at 12:00am -0500, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > On 02/15/17 18:53, Mike Snitzer wrote: > >Nobody has interest in Linux multipathing becoming fragmented. > > > >If every transport implemented their own multipathing the end-user would > >be amazingly screwed trying to keep track of all the > >quirks/configuration/management of each. > > > >Not saying multipath-tools is great, nor that DM multipath is god's > >gift. But substantiating _why_ you need this "native NVMe > >multipathing" would go a really long way to justifying your effort. > > > >For starters, how about you show just how much better than DM multipath > >this native NVMe multipathing performs? NOTE: it'd imply you put effort > >to making DM multipath work with NVMe.. if you've sat on that code too > >that'd be amazingly unfortunate/frustrating. > > Another question is what your attitude is towards dm-mpath changes? > Last time I posted a series of patches that significantly clean up > and improve readability of the dm-mpath code you refused to take these upstream. Weird. I did push back on those changes initially (just felt like churn) but I ultimately did take them: $ git log --oneline --author=bart drivers/md/dm-mpath.c 6599c84 dm mpath: do not modify *__clone if blk_mq_alloc_request() fails 4813577 dm mpath: change return type of pg_init_all_paths() from int to void 9f4c3f8 dm: convert wait loops to use autoremove_wake_function() Did I miss any? But to be 100% clear, I'm very appreciative of any DM mpath (and request-based DM core) changes. I'll review them with a critical eye but if they hold up they get included. Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 12:37 ` Mike Snitzer @ 2017-02-16 19:46 ` Bart Van Assche 2017-02-16 20:23 ` Mike Snitzer 0 siblings, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2017-02-16 19:46 UTC (permalink / raw) On Thu, 2017-02-16@07:37 -0500, Mike Snitzer wrote: > Weird. I did push back on those changes initially (just felt like > churn) but I ultimately did take them: > > $ git log --oneline --author=bart drivers/md/dm-mpath.c > 6599c84 dm mpath: do not modify *__clone if blk_mq_alloc_request() fails > 4813577 dm mpath: change return type of pg_init_all_paths() from int to void > 9f4c3f8 dm: convert wait loops to use autoremove_wake_function() > > Did I miss any? Hello Mike, Thank you for having accepted these patches. However, I was not referring to these patches but to the eight patches available at https://github.com/bvanassche/linux/tree/dm-mpath Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 19:46 ` Bart Van Assche @ 2017-02-16 20:23 ` Mike Snitzer 2017-02-16 20:58 ` Bart Van Assche 0 siblings, 1 reply; 20+ messages in thread From: Mike Snitzer @ 2017-02-16 20:23 UTC (permalink / raw) On Thu, Feb 16 2017 at 2:46pm -0500, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote: > On Thu, 2017-02-16@07:37 -0500, Mike Snitzer wrote: > > Weird. I did push back on those changes initially (just felt like > > churn) but I ultimately did take them: > > > > $ git log --oneline --author=bart drivers/md/dm-mpath.c > > 6599c84 dm mpath: do not modify *__clone if blk_mq_alloc_request() fails > > 4813577 dm mpath: change return type of pg_init_all_paths() from int to void > > 9f4c3f8 dm: convert wait loops to use autoremove_wake_function() > > > > Did I miss any? > > Hello Mike, > > Thank you for having accepted these patches. However, I was not referring > to these patches but to the eight patches available at > https://github.com/bvanassche/linux/tree/dm-mpath Some of these look familiar (the dm-mpath micro-optimize one made me cringe.. potential for regression, etc. But careful review should ease those concerns). But yeah, these certainly slipped through the cracks. I'll review these for 4.12 (too late for 4.11, sorry) Please rebase these ontop of linux-block.git's for-4.11/rq-refactor because Christoph significantly reworked the request-based DM code (all request allocation is done in DM multipath now), see: http://git.kernel.dk/cgit/linux-block/commit/?h=for-4.11/rq-refactor&id=eb8db831be80692bf4bda3dfc55001daf64ec299 Or you can wait to rebase on v4.11-rc1 in ~2 weeks. Thanks ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 20:23 ` Mike Snitzer @ 2017-02-16 20:58 ` Bart Van Assche 0 siblings, 0 replies; 20+ messages in thread From: Bart Van Assche @ 2017-02-16 20:58 UTC (permalink / raw) On Thu, 2017-02-16@15:23 -0500, Mike Snitzer wrote: > Or you can wait to rebase on v4.11-rc1 in ~2 weeks. Hello Mike, I will wait until v4.11-rc1 has been released. Thanks, Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 2:53 ` hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] Mike Snitzer 2017-02-16 5:00 ` [dm-devel] " Bart Van Assche @ 2017-02-16 14:26 ` Christoph Hellwig 2017-02-16 15:13 ` Mike Snitzer 1 sibling, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2017-02-16 14:26 UTC (permalink / raw) On Wed, Feb 15, 2017@09:53:57PM -0500, Mike Snitzer wrote: > going to LSF/MM?). Yet you're expecting to just drop it into the tree > without a care in the world about the implications. I am planning to post it for comments, and then plan to "drop it in the tree" exactly because I think of the implications. Keith did that. But once we already do the discovery of the path relations in the transport (e.g scsi_dh) we can just move the path selectors (for which I'm reusing the DM code anyway btw) and the bouncing of I/O to the block layer and cut out the middle man. The main reason is that things will just work (TM) instead of having to carry around additional userspace to configure a an unneded additional device layer that just causes confusion. Beyond the scsi_dh equivalent there is basically no new code in nvme, just a little new code in the block layer, and a move of the path selectors from dm to the block layer. I would not call this fragmentation. Anyway, there is very little point in having an abstract discussion here, I'll try to get the code ready ASAP, although until the end of next week I'm really pressed with other deadlines. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 14:26 ` [dm-devel] " Christoph Hellwig @ 2017-02-16 15:13 ` Mike Snitzer 2017-02-16 17:38 ` Keith Busch ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Mike Snitzer @ 2017-02-16 15:13 UTC (permalink / raw) On Thu, Feb 16 2017 at 9:26am -0500, Christoph Hellwig <hch@infradead.org> wrote: > On Wed, Feb 15, 2017@09:53:57PM -0500, Mike Snitzer wrote: > > going to LSF/MM?). Yet you're expecting to just drop it into the tree > > without a care in the world about the implications. > > I am planning to post it for comments, and then plan to "drop it in the > tree" exactly because I think of the implications. > > Keith did that Not following what you're saying Keith did. Please feel free to clarify. But we definitely need to devise a way for NVMe to inform DM multipath (and vice-versa): hands off this device. Awkward details to work through to be sure... > But once we already do the discovery of the path > relations in the transport (e.g scsi_dh) we can just move the path > selectors (for which I'm reusing the DM code anyway btw) and the > bouncing of I/O to the block layer and cut out the middle man. The middle man is useful if it can support all transports. If it only supports some then yeah the utility is certainly reduced. > The main reason is that things will just work (TM) instead of having > to carry around additional userspace to configure a an unneded > additional device layer that just causes confusion. Beyond the > scsi_dh equivalent there is basically no new code in nvme, I'm going to look at removing any scsi_dh code from DM multipath (someone already proposed removing the 'retain_attached_hw_handler' feature). Not much point having anything in DM multipath now that scsi discovery has the ability to auto-attach the right scsi_dh via scsi_dh's .match hook. As a side-effect it will fix Keith's scsi_dh crash (when operating on NVMe request_queue). My hope is that your NVMe equivalent for scsi_dh will "just work" (TM) like scsi_dh auto-attach does. There isn't a finished ALUA equivalent standard for NVMe so I'd imagine at this point you have a single device handler for NVMe to do error translation? Anyway, the scsi_dh equivalent for NVMe is welcomed news! > just a little new code in the block layer, and a move of the path > selectors from dm to the block layer. I would not call this > fragmentation. I'm fine with the path selectors getting moved out; maybe it'll encourage new path selectors to be developed. But there will need to be some userspace interface stood up to support your native NVMe multipathing (you may not think it needed but think in time there will be a need to configure _something_). That is the fragmentation I'm referring to. > Anyway, there is very little point in having an abstract discussion > here, I'll try to get the code ready ASAP, although until the end of > next week I'm really pressed with other deadlines. OK. FYI, I never wanted to have an abstract discussion. We need a real nuts and bolts discussion. Happy to have it play out on the lists. I'm not violently opposed to your native NVMe multipathing -- especially from a reference implementation point of view. I think that in practice it'll keep DM multipath honest -- help drive scalability improvements, etc. If over time the native NVMe multipathing _is_ the preferred multipathing solution for NVme, then so be it. It'll be on merits.. as it should be. But I'm sure you're well aware that I and Red Hat and our partners have a vested interest in providing a single multipath stack that "just works" for all appropriate storage. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 15:13 ` Mike Snitzer @ 2017-02-16 17:38 ` Keith Busch 2017-02-16 17:37 ` Bart Van Assche 2017-02-16 18:05 ` Sagi Grimberg 2017-02-17 9:33 ` [dm-devel] " Christoph Hellwig 2 siblings, 1 reply; 20+ messages in thread From: Keith Busch @ 2017-02-16 17:38 UTC (permalink / raw) On Thu, Feb 16, 2017@10:13:37AM -0500, Mike Snitzer wrote: > On Thu, Feb 16 2017 at 9:26am -0500, > Christoph Hellwig <hch@infradead.org> wrote: > > > just a little new code in the block layer, and a move of the path > > selectors from dm to the block layer. I would not call this > > fragmentation. > > I'm fine with the path selectors getting moved out; maybe it'll > encourage new path selectors to be developed. > > But there will need to be some userspace interface stood up to support > your native NVMe multipathing (you may not think it needed but think in > time there will be a need to configure _something_). That is the > fragmentation I'm referring to. I'm not sure what Christoph's proposal looks like, but I have to agree that multipath support directly in the kernel without requiring user space to setup the mpath block device is easier for everyone. The only NVMe specific part, though, just needs to be how it reports unique identifiers to the multipath layer. Maybe I'm not seeing the bigger picture. Is there some part to multipath that the kernel is not in a better position to handle? ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 17:38 ` Keith Busch @ 2017-02-16 17:37 ` Bart Van Assche 2017-02-16 18:07 ` Keith Busch 0 siblings, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2017-02-16 17:37 UTC (permalink / raw) On Thu, 2017-02-16@12:38 -0500, Keith Busch wrote: > Maybe I'm not seeing the bigger picture. Is there some part to multipath > that the kernel is not in a better position to handle? Does this mean that the code to parse /etc/multipath.conf will be moved into the kernel? Or will we lose the ability to configure the policies that /etc/multipath.conf allows to configure? Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 17:37 ` Bart Van Assche @ 2017-02-16 18:07 ` Keith Busch 2017-02-16 18:21 ` Mike Snitzer 0 siblings, 1 reply; 20+ messages in thread From: Keith Busch @ 2017-02-16 18:07 UTC (permalink / raw) On Thu, Feb 16, 2017@05:37:41PM +0000, Bart Van Assche wrote: > On Thu, 2017-02-16@12:38 -0500, Keith Busch wrote: > > Maybe I'm not seeing the bigger picture. Is there some part to multipath > > that the kernel is not in a better position to handle? > > Does this mean that the code to parse /etc/multipath.conf will be moved into > the kernel? Or will we lose the ability to configure the policies that > /etc/multipath.conf allows to configure? No, I'm just considering the settings for a device that won't work at all if multipath.conf is wrong. For example, the uuid attributes, path priority, or path checker. These can't be considered configurable policies if all but one of them are invalid for a specific device type. It shouldn't even be an option to let a user select TUR path checker for NVMe, and the only checkers multipath-tools provide that even make sense for NVMe are deprecated. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 18:07 ` Keith Busch @ 2017-02-16 18:21 ` Mike Snitzer 2017-02-16 20:40 ` Keith Busch 2017-02-17 9:04 ` [dm-devel] " hch 0 siblings, 2 replies; 20+ messages in thread From: Mike Snitzer @ 2017-02-16 18:21 UTC (permalink / raw) On Thu, Feb 16 2017 at 1:07pm -0500, Keith Busch <keith.busch@intel.com> wrote: > On Thu, Feb 16, 2017@05:37:41PM +0000, Bart Van Assche wrote: > > On Thu, 2017-02-16@12:38 -0500, Keith Busch wrote: > > > Maybe I'm not seeing the bigger picture. Is there some part to multipath > > > that the kernel is not in a better position to handle? > > > > Does this mean that the code to parse /etc/multipath.conf will be moved into > > the kernel? Or will we lose the ability to configure the policies that > > /etc/multipath.conf allows to configure? > > No, I'm just considering the settings for a device that won't work > at all if multipath.conf is wrong. For example, the uuid attributes, > path priority, or path checker. These can't be considered configurable > policies if all but one of them are invalid for a specific device type. > > It shouldn't even be an option to let a user select TUR path checker > for NVMe, and the only checkers multipath-tools provide that even make > sense for NVMe are deprecated. Then undeprecate them. Decisions like marking a path checker deprecated were _not_ made with NVMe in mind. They must predate NVMe. multipath-tools has tables that specify all the defaults for a given target backend. NVMe will just be yet another. Yes some user _could_ shoot themselves in the foot by overriding the proper configuration but since when are we motivated by _not_ giving users the power to hang themselves? As for configurability (chosing between N valid configs/settings): At some point the user will want one behaviour vs another. Thinking otherwise is just naive. Think error timeouts, etc. Any multipath kernel implementation (which dm-multipath is BTW) will eventually find itself at a crossroads where the underlying fabric could be tweaked in different ways. Thinking you can just hardcode these attributes and settings is foolish. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 18:21 ` Mike Snitzer @ 2017-02-16 20:40 ` Keith Busch 2017-02-17 9:04 ` [dm-devel] " hch 1 sibling, 0 replies; 20+ messages in thread From: Keith Busch @ 2017-02-16 20:40 UTC (permalink / raw) On Thu, Feb 16, 2017@01:21:29PM -0500, Mike Snitzer wrote: > Then undeprecate them. Decisions like marking a path checker deprecated > were _not_ made with NVMe in mind. They must predate NVMe. > > multipath-tools has tables that specify all the defaults for a given > target backend. NVMe will just be yet another. Yes some user _could_ > shoot themselves in the foot by overriding the proper configuration but > since when are we motivated by _not_ giving users the power to hang > themselves? > > As for configurability (chosing between N valid configs/settings): At > some point the user will want one behaviour vs another. Thinking > otherwise is just naive. Think error timeouts, etc. Any multipath > kernel implementation (which dm-multipath is BTW) will eventually find > itself at a crossroads where the underlying fabric could be tweaked in > different ways. Thinking you can just hardcode these attributes and > settings is foolish. Roger that, and I absolutely want to see this work with the existing framework. I just think it'd be easier for everyone if multipath were more like the generic block layer, in that devices are surfaced with configurable policies without userspace telling it which to use. The kernel knowing safe defaults for a particular device is probably the more common case, and userspace can still tune them as needed. Of course, I accept you're in a better position to know if this is folly. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 18:21 ` Mike Snitzer 2017-02-16 20:40 ` Keith Busch @ 2017-02-17 9:04 ` hch 2017-02-17 14:43 ` Mike Snitzer 1 sibling, 1 reply; 20+ messages in thread From: hch @ 2017-02-17 9:04 UTC (permalink / raw) On Thu, Feb 16, 2017@01:21:29PM -0500, Mike Snitzer wrote: > multipath-tools has tables that specify all the defaults for a given > target backend. NVMe will just be yet another. No, if we get things right it won't. ALUA already got rid of most of the parameter people would have to set under normal conditions, and I plan to make sure the NVMe equivalent will do it for all parameters. I am active in the NVMe working group and will do my best to get there. There's a few others folks here that are more or less active there as well (Keith, Martin, Jens for example), so I think we have a chance. That beeing said Keith is right that we'll always have odd setups where we need to overrid things, and we will have to provide tunables for that. It's no different from any other kernel subsystem in that. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-17 9:04 ` [dm-devel] " hch @ 2017-02-17 14:43 ` Mike Snitzer 0 siblings, 0 replies; 20+ messages in thread From: Mike Snitzer @ 2017-02-17 14:43 UTC (permalink / raw) On Fri, Feb 17 2017 at 4:04am -0500, hch@infradead.org <hch@infradead.org> wrote: > On Thu, Feb 16, 2017@01:21:29PM -0500, Mike Snitzer wrote: > > multipath-tools has tables that specify all the defaults for a given > > target backend. NVMe will just be yet another. > > No, if we get things right it won't. ALUA already got rid of most > of the parameter people would have to set under normal conditions, > and I plan to make sure the NVMe equivalent will do it for all > parameters. I am active in the NVMe working group and will do my > best to get there. There's a few others folks here that are more or > less active there as well (Keith, Martin, Jens for example), so I > think we have a chance. > > That beeing said Keith is right that we'll always have odd setups > where we need to overrid things, and we will have to provide tunables > for that. It's no different from any other kernel subsystem in that. Before ALUA fixed all that vendor specific fragmentation there was the even worse fragmentation where different vendors pushed multipathing into their FC drivers. James correctly pushed them toward a generic solution (and DM multipath was born). If every transport implements its own multipathing then it'll be a more generic, yet very similar, fragmentation problem. But if your native NVMe multipathing really is factored such that the actual IO fast path is implemented in block core, and transport specific hooks are called out to as needed, then you've simply reimplement DM multipath in block core. Pretty weird place to invest _so_ much energy before you've fully explored how unworkable DM multipath support for NVMe is. But I digress. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 15:13 ` Mike Snitzer 2017-02-16 17:38 ` Keith Busch @ 2017-02-16 18:05 ` Sagi Grimberg 2017-02-17 9:05 ` [dm-devel] " Christoph Hellwig 2017-02-17 9:33 ` [dm-devel] " Christoph Hellwig 2 siblings, 1 reply; 20+ messages in thread From: Sagi Grimberg @ 2017-02-16 18:05 UTC (permalink / raw) > I'm fine with the path selectors getting moved out; maybe it'll > encourage new path selectors to be developed. > > But there will need to be some userspace interface stood up to support > your native NVMe multipathing (you may not think it needed but think in > time there will be a need to configure _something_). That is the > fragmentation I'm referring to. I guess one config option that we'd need is multibus vs. failover which are used per use-case. I'd have to say that having something much simpler than multipath-tools does sound appealing. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 18:05 ` Sagi Grimberg @ 2017-02-17 9:05 ` Christoph Hellwig 2017-02-17 14:37 ` Mike Snitzer 0 siblings, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2017-02-17 9:05 UTC (permalink / raw) On Thu, Feb 16, 2017@08:05:36PM +0200, Sagi Grimberg wrote: > I guess one config option that we'd need is multibus vs. failover > which are used per use-case. Which fundamentally is a property of the target first, and it should tell us that. There might be the occasional need for an override, but certainly not locally maintained tables of targets and their preferences. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-17 9:05 ` [dm-devel] " Christoph Hellwig @ 2017-02-17 14:37 ` Mike Snitzer 0 siblings, 0 replies; 20+ messages in thread From: Mike Snitzer @ 2017-02-17 14:37 UTC (permalink / raw) On Fri, Feb 17 2017 at 4:05am -0500, Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Feb 16, 2017@08:05:36PM +0200, Sagi Grimberg wrote: > > I guess one config option that we'd need is multibus vs. failover > > which are used per use-case. > > Which fundamentally is a property of the target first, and it should > tell us that. There might be the occasional need for an override, > but certainly not locally maintained tables of targets and their > preferences. Think you're being idealistic to think that once NVMe is on the roadmap, and an engineering priority, of every player in the industry that your more-or-less prestine implementation isn't going to get stretched to its limits and ultimately need a fair amount of vendor specific tweaks. But hopefully I'm wrong. I just know what has happened with SCSI/FC/IB multipathing vendor hardware and the jockeying that all these vendors do ultimately results in the need for quirks because they raced to market and ultimately missed something or invented their own ways forward to allow NVMe to dove-tail into their preexisting legacy "enterprise" offerings. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-16 15:13 ` Mike Snitzer 2017-02-16 17:38 ` Keith Busch 2017-02-16 18:05 ` Sagi Grimberg @ 2017-02-17 9:33 ` Christoph Hellwig 2017-02-17 14:32 ` Mike Snitzer 2 siblings, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2017-02-17 9:33 UTC (permalink / raw) On Thu, Feb 16, 2017@10:13:37AM -0500, Mike Snitzer wrote: > Not following what you're saying Keith did. Please feel free to > clarify. Keith demonstrated what it takes to support NVMe with dm. He also gave a couple presentations on it in addition to various ptches on the list. > The middle man is useful if it can support all transports. If it only > supports some then yeah the utility is certainly reduced. Again let's look at what multipathing involves: - discovery of multiple paths for a device, and path preferences: Storage protocol specific - handling of path state and grouping changes: Storage protocol specific - handling of path up / down events: Storage protocol / transport specific if provided - keep alive / path checking: Storage protocol specific with possible generic fallback - path selection: Generic, although building heavily on protocol / transport specific information So most of the hard work is transport specific anyway. And I fully agree that generic code should be, well generic. And with generic I mean right in the block layer instead of involving a layer block driver that relies on lots of low-level driver information and setup from user space. > I'm going to look at removing any scsi_dh code from DM multipath > (someone already proposed removing the 'retain_attached_hw_handler' > feature). Not much point having anything in DM multipath now that scsi > discovery has the ability to auto-attach the right scsi_dh via scsi_dh's > .match hook. Great. > As a side-effect it will fix Keith's scsi_dh crash (when > operating on NVMe request_queue). I think we'll need to have a quick fix for that ASAP, though. > My hope is that your NVMe equivalent for scsi_dh will "just work" (TM) > like scsi_dh auto-attach does. There isn't a finished ALUA equivalent > standard for NVMe so I'd imagine at this point you have a single device > handler for NVMe to do error translation? Yes, error translation for the block layer, but most importantly discovery of multiple paths to the same namespace. ^ permalink raw reply [flat|nested] 20+ messages in thread
* hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] 2017-02-17 9:33 ` [dm-devel] " Christoph Hellwig @ 2017-02-17 14:32 ` Mike Snitzer 0 siblings, 0 replies; 20+ messages in thread From: Mike Snitzer @ 2017-02-17 14:32 UTC (permalink / raw) On Fri, Feb 17 2017 at 4:33am -0500, Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Feb 16, 2017@10:13:37AM -0500, Mike Snitzer wrote: > > Not following what you're saying Keith did. Please feel free to > > clarify. > > Keith demonstrated what it takes to support NVMe with dm. He also > gave a couple presentations on it in addition to various ptches on > the list. Well I know very well that Keith was an early consumer of DM multipath for NVMe (back when Intel was looking at doing dual port NVMe drives). Keith's interest in NVMe support in multipath is what motivated all the blk-mq request-based DM multipath advances. But I'll follow-up with Keith to see if he can sweep through whatever patches he has and make sure I'm aware of them. > > The middle man is useful if it can support all transports. If it only > > supports some then yeah the utility is certainly reduced. > > Again let's look at what multipathing involves: > > - discovery of multiple paths for a device, and path preferences: > Storage protocol specific > > - handling of path state and grouping changes: > Storage protocol specific > > - handling of path up / down events: > Storage protocol / transport specific if provided > > - keep alive / path checking: > Storage protocol specific with possible generic fallback > > - path selection: > Generic, although building heavily on protocol / transport specific > information > > So most of the hard work is transport specific anyway. And I fully > agree that generic code should be, well generic. And with generic > I mean right in the block layer instead of involving a layer block > driver that relies on lots of low-level driver information and setup > from user space. "Lots of low-level driver information?" You mean like SCSI's pg_init and such? > > I'm going to look at removing any scsi_dh code from DM multipath > > (someone already proposed removing the 'retain_attached_hw_handler' > > feature). Not much point having anything in DM multipath now that scsi > > discovery has the ability to auto-attach the right scsi_dh via scsi_dh's > > .match hook. > > Great. > > > As a side-effect it will fix Keith's scsi_dh crash (when > > operating on NVMe request_queue). > > I think we'll need to have a quick fix for that ASAP, though. Hannes' latest patch looked good to me. Should take care of it. I assume mkp will pick it up for the 4.11 merge window and mark for stable? ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-02-17 14:43 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1487107154-24883-1-git-send-email-keith.busch@intel.com>
[not found] ` <20170215145617.GA4241@infradead.org>
2017-02-16 2:53 ` hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme] Mike Snitzer
2017-02-16 5:00 ` [dm-devel] " Bart Van Assche
2017-02-16 12:37 ` Mike Snitzer
2017-02-16 19:46 ` Bart Van Assche
2017-02-16 20:23 ` Mike Snitzer
2017-02-16 20:58 ` Bart Van Assche
2017-02-16 14:26 ` [dm-devel] " Christoph Hellwig
2017-02-16 15:13 ` Mike Snitzer
2017-02-16 17:38 ` Keith Busch
2017-02-16 17:37 ` Bart Van Assche
2017-02-16 18:07 ` Keith Busch
2017-02-16 18:21 ` Mike Snitzer
2017-02-16 20:40 ` Keith Busch
2017-02-17 9:04 ` [dm-devel] " hch
2017-02-17 14:43 ` Mike Snitzer
2017-02-16 18:05 ` Sagi Grimberg
2017-02-17 9:05 ` [dm-devel] " Christoph Hellwig
2017-02-17 14:37 ` Mike Snitzer
2017-02-17 9:33 ` [dm-devel] " Christoph Hellwig
2017-02-17 14:32 ` Mike Snitzer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox