All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagig@dev.mellanox.co.il>
To: Hannes Reinecke <hare@suse.de>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>
Cc: device-mapper development <dm-devel@redhat.com>,
	"linux-scsi@vger.kernel.org" <Linux-scsi@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [LSF/MM ATTEND][LSF/MM TOPIC] Multipath redesign
Date: Wed, 13 Jan 2016 12:50:03 +0200	[thread overview]
Message-ID: <56962BDB.4080509@dev.mellanox.co.il> (raw)
In-Reply-To: <56961493.5010901@suse.de>


> Hi all,
>
> I'd like to attend LSF/MM and would like to present my ideas for a
> multipath redesign.
>
> The overall idea is to break up the centralized multipath handling in
> device-mapper (and multipath-tools) and delegate to the appropriate
> sub-systems.

I agree that would be very useful. Great topic. I'd like to attend
this talk as well.

>
> Individually the plan is:
> a) use the 'wwid' sysfs attribute to detect multipath devices;
>     this removes the need of the current 'path_id' functionality
>     in multipath-tools

CC'ing Linux-nvme,

I've recently looked at multipathing support for nvme (and nvme over
fabrics) as well. For nvme the wwid equivalent is the nsid (namespace
identifier). I'm wandering if we can have better abstraction for
user-space so it won't need to change its behavior for scsi/nvme.
The same applies for the the timeout attribute for example which
assumes scsi device sysfs structure.

> b) leverage topology information from scsi_dh_alua (which we will
>     have once my ALUA handler update is in) to detect the multipath
>     topology. This removes the need of a 'prio' infrastructure
>     in multipath-tools

This would require further attention for nvme.

> c) implement block or scsi events whenever a remote port becomes
>     unavailable. This removes the need of the 'path_checker'
>     functionality in multipath-tools.

I'd prefer if we'd have it in the block layer so we can have it for all
block drivers. Also, this assumes that port events are independent of
I/O. This assumption is incorrect in SRP for example which detects port
failures only by I/O errors (which makes path sensing a must).

> d) leverage these events to handle path-up/path-down events
>     in-kernel
> e) move the I/O redirection logic out of device-mapper proper
>     and use blk-mq to redirect I/O. This is still a bit of
>     hand-waving, and definitely would need discussion to figure
>     out if and how it can be achieved.
>     This is basically the same topic Mike Snitzer proposed, but
>     coming from a different angle.

Another (adjacent) topic is multipath performance with blk-mq.

As I said, I've been looking at nvme multipathing support and
initial measurements show huge contention on the multipath lock
which really defeats the entire point of blk-mq...

I have yet to report this as my work is still in progress. I'm not sure
if it's a topic on it's own but I'd love to talk about that as well...

> But in the end we should be able to do strip down the current (rather
> complex) multipath-tools to just handle topology changes; everything
> else will be done internally.

I'd love to see that happening.

WARNING: multiple messages have this Message-ID (diff)
From: sagig@dev.mellanox.co.il (Sagi Grimberg)
Subject: [LSF/MM ATTEND][LSF/MM TOPIC] Multipath redesign
Date: Wed, 13 Jan 2016 12:50:03 +0200	[thread overview]
Message-ID: <56962BDB.4080509@dev.mellanox.co.il> (raw)
In-Reply-To: <56961493.5010901@suse.de>


> Hi all,
>
> I'd like to attend LSF/MM and would like to present my ideas for a
> multipath redesign.
>
> The overall idea is to break up the centralized multipath handling in
> device-mapper (and multipath-tools) and delegate to the appropriate
> sub-systems.

I agree that would be very useful. Great topic. I'd like to attend
this talk as well.

>
> Individually the plan is:
> a) use the 'wwid' sysfs attribute to detect multipath devices;
>     this removes the need of the current 'path_id' functionality
>     in multipath-tools

CC'ing Linux-nvme,

I've recently looked at multipathing support for nvme (and nvme over
fabrics) as well. For nvme the wwid equivalent is the nsid (namespace
identifier). I'm wandering if we can have better abstraction for
user-space so it won't need to change its behavior for scsi/nvme.
The same applies for the the timeout attribute for example which
assumes scsi device sysfs structure.

> b) leverage topology information from scsi_dh_alua (which we will
>     have once my ALUA handler update is in) to detect the multipath
>     topology. This removes the need of a 'prio' infrastructure
>     in multipath-tools

This would require further attention for nvme.

> c) implement block or scsi events whenever a remote port becomes
>     unavailable. This removes the need of the 'path_checker'
>     functionality in multipath-tools.

I'd prefer if we'd have it in the block layer so we can have it for all
block drivers. Also, this assumes that port events are independent of
I/O. This assumption is incorrect in SRP for example which detects port
failures only by I/O errors (which makes path sensing a must).

> d) leverage these events to handle path-up/path-down events
>     in-kernel
> e) move the I/O redirection logic out of device-mapper proper
>     and use blk-mq to redirect I/O. This is still a bit of
>     hand-waving, and definitely would need discussion to figure
>     out if and how it can be achieved.
>     This is basically the same topic Mike Snitzer proposed, but
>     coming from a different angle.

Another (adjacent) topic is multipath performance with blk-mq.

As I said, I've been looking at nvme multipathing support and
initial measurements show huge contention on the multipath lock
which really defeats the entire point of blk-mq...

I have yet to report this as my work is still in progress. I'm not sure
if it's a topic on it's own but I'd love to talk about that as well...

> But in the end we should be able to do strip down the current (rather
> complex) multipath-tools to just handle topology changes; everything
> else will be done internally.

I'd love to see that happening.

  reply	other threads:[~2016-01-13 10:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-13  9:10 [LSF/MM ATTEND][LSF/MM TOPIC] Multipath redesign Hannes Reinecke
2016-01-13 10:50 ` Sagi Grimberg [this message]
2016-01-13 10:50   ` Sagi Grimberg
2016-01-13 11:46   ` Hannes Reinecke
2016-01-13 11:46     ` Hannes Reinecke
2016-01-13 15:42   ` Mike Snitzer
2016-01-13 15:42     ` Mike Snitzer
2016-01-13 16:06     ` Sagi Grimberg
2016-01-13 16:06       ` Sagi Grimberg
2016-01-13 16:21       ` Mike Snitzer
2016-01-13 16:21         ` Mike Snitzer
2016-01-13 16:30         ` Sagi Grimberg
2016-01-13 16:30           ` Sagi Grimberg
2016-01-13 16:18     ` Hannes Reinecke
2016-01-13 16:18       ` Hannes Reinecke
2016-01-13 16:54       ` Mike Snitzer
2016-01-13 16:54         ` Mike Snitzer
2016-01-13 11:08 ` [dm-devel] " Alasdair G Kergon
2016-01-13 11:17   ` Hannes Reinecke
2016-01-13 11:25     ` Alasdair G Kergon
2016-01-13 17:52 ` Benjamin Marzinski
2016-01-14  7:25   ` Hannes Reinecke
2016-01-14 19:09     ` Bart Van Assche
2016-01-15  7:12       ` Hannes Reinecke
2016-01-21  0:38     ` Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56962BDB.4080509@dev.mellanox.co.il \
    --to=sagig@dev.mellanox.co.il \
    --cc=Linux-scsi@vger.kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=hare@suse.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.