* [dm-devel] [PATCH v2 1/4] dm: Allow dm_call_pr to be used for path searches
2022-07-17 22:45 [dm-devel] [PATCH v2 0/4] dm pr_ops fixes Mike Christie
@ 2022-07-17 22:45 ` Mike Christie
2022-07-17 22:45 ` [dm-devel] [PATCH v2 2/4] dm: Start pr_reserve from the same starting path Mike Christie
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Mike Christie @ 2022-07-17 22:45 UTC (permalink / raw)
To: dm-devel, snitzer, hch; +Cc: Mike Christie
The specs state that if you send a reserve down a path that is already
the holder success must be returned and if it goes down a path that
is not the holder reservation conflict must be returned. Windows failover
clustering will send a second reservation and expects that a device
returns success. The problem for multipathing is that for an All
Registrants reservation, we can send the reserve down any path but for all
other reservation types there is one path that is the holder.
To handle this we could add PR state to dm but that can get nasty. Look at
target_core_pr.c for an example of the type things we will have to track.
It will also get more complicated because other initiators can change the
state so we will have to add in async event/sense handling.
This patchset tries to keep dm simple and keep just doing passthrough.
This patch modifies dm_call_pr to be able to find the first usable path
that can execute our pr_op then return. When dm_pr_reserve is converted to
dm_call_pr in the next patch for the normal case we will use the same
path for every reserve.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/md/dm.c | 51 ++++++++++++++++++++++++++++++++++++-------------
1 file changed, 38 insertions(+), 13 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 2b75f1ef7386..67b46ae896b1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3047,10 +3047,11 @@ struct dm_pr {
u64 new_key;
u32 flags;
bool fail_early;
+ int ret;
};
static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn,
- void *data)
+ struct dm_pr *pr)
{
struct mapped_device *md = bdev->bd_disk->private_data;
struct dm_table *table;
@@ -3070,7 +3071,8 @@ static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn,
if (!ti->type->iterate_devices)
goto out;
- ret = ti->type->iterate_devices(ti, fn, data);
+ ti->type->iterate_devices(ti, fn, pr);
+ ret = 0;
out:
dm_put_live_table(md, srcu_idx);
return ret;
@@ -3084,10 +3086,24 @@ static int __dm_pr_register(struct dm_target *ti, struct dm_dev *dev,
{
struct dm_pr *pr = data;
const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops;
+ int ret;
+
+ if (!ops || !ops->pr_register) {
+ pr->ret = -EOPNOTSUPP;
+ return -1;
+ }
+
+ ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags);
+ if (!ret)
+ return 0;
+
+ if (!pr->ret)
+ pr->ret = ret;
+
+ if (pr->fail_early)
+ return -1;
- if (!ops || !ops->pr_register)
- return -EOPNOTSUPP;
- return ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags);
+ return 0;
}
static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
@@ -3098,19 +3114,28 @@ static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
.new_key = new_key,
.flags = flags,
.fail_early = true,
+ .ret = 0,
};
int ret;
ret = dm_call_pr(bdev, __dm_pr_register, &pr);
- if (ret && new_key) {
- /* unregister all paths if we failed to register any path */
- pr.old_key = new_key;
- pr.new_key = 0;
- pr.flags = 0;
- pr.fail_early = false;
- dm_call_pr(bdev, __dm_pr_register, &pr);
- }
+ if (ret)
+ /* Didn't even get to register a path */
+ return ret;
+
+ if (!pr.ret)
+ return 0;
+ ret = pr.ret;
+
+ if (!new_key)
+ return ret;
+ /* unregister all paths if we failed to register any path */
+ pr.old_key = new_key;
+ pr.new_key = 0;
+ pr.flags = 0;
+ pr.fail_early = false;
+ dm_call_pr(bdev, __dm_pr_register, &pr);
return ret;
}
--
2.25.1
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply related [flat|nested] 6+ messages in thread* [dm-devel] [PATCH v2 2/4] dm: Start pr_reserve from the same starting path
2022-07-17 22:45 [dm-devel] [PATCH v2 0/4] dm pr_ops fixes Mike Christie
2022-07-17 22:45 ` [dm-devel] [PATCH v2 1/4] dm: Allow dm_call_pr to be used for path searches Mike Christie
@ 2022-07-17 22:45 ` Mike Christie
2022-07-17 22:45 ` [dm-devel] [PATCH v2 3/4] dm: Fix PR release handling for non All Registrants Mike Christie
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Mike Christie @ 2022-07-17 22:45 UTC (permalink / raw)
To: dm-devel, snitzer, hch; +Cc: Mike Christie
When an app does a pr_reserve it will go to whatever path we happen to
be using at the time. This can result in errors where the app does a
second pr_reserve call and expects success but gets a failure becuase
the reserve is not done on the holder's path. This patch has us always
start trying to do reserves from the first path in the first group.
Windows failover clustering will produce the type of pattern above. With
this patch, we will now pass its validation test for this case.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/md/dm.c | 46 ++++++++++++++++++++++++++++++++--------------
1 file changed, 32 insertions(+), 14 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 67b46ae896b1..cdd1656b99d6 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3048,6 +3048,7 @@ struct dm_pr {
u32 flags;
bool fail_early;
int ret;
+ enum pr_type type;
};
static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn,
@@ -3139,25 +3140,42 @@ static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
return ret;
}
+
+static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev,
+ sector_t start, sector_t len, void *data)
+{
+ struct dm_pr *pr = data;
+ const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_reserve) {
+ pr->ret = -EOPNOTSUPP;
+ return -1;
+ }
+
+ pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags);
+ if (!pr->ret)
+ return -1;
+
+ return 0;
+}
+
static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
u32 flags)
{
- struct mapped_device *md = bdev->bd_disk->private_data;
- const struct pr_ops *ops;
- int r, srcu_idx;
+ struct dm_pr pr = {
+ .old_key = key,
+ .flags = flags,
+ .type = type,
+ .fail_early = false,
+ .ret = 0,
+ };
+ int ret;
- r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
- if (r < 0)
- goto out;
+ ret = dm_call_pr(bdev, __dm_pr_reserve, &pr);
+ if (ret)
+ return ret;
- ops = bdev->bd_disk->fops->pr_ops;
- if (ops && ops->pr_reserve)
- r = ops->pr_reserve(bdev, key, type, flags);
- else
- r = -EOPNOTSUPP;
-out:
- dm_unprepare_ioctl(md, srcu_idx);
- return r;
+ return pr.ret;
}
static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
--
2.25.1
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply related [flat|nested] 6+ messages in thread* [dm-devel] [PATCH v2 3/4] dm: Fix PR release handling for non All Registrants
2022-07-17 22:45 [dm-devel] [PATCH v2 0/4] dm pr_ops fixes Mike Christie
2022-07-17 22:45 ` [dm-devel] [PATCH v2 1/4] dm: Allow dm_call_pr to be used for path searches Mike Christie
2022-07-17 22:45 ` [dm-devel] [PATCH v2 2/4] dm: Start pr_reserve from the same starting path Mike Christie
@ 2022-07-17 22:45 ` Mike Christie
2022-07-17 22:45 ` [dm-devel] [PATCH v2 4/4] dm: Start pr_preempt from the same starting path Mike Christie
2022-07-22 19:22 ` [dm-devel] [PATCH v2 0/4] dm pr_ops fixes Mike Snitzer
4 siblings, 0 replies; 6+ messages in thread
From: Mike Christie @ 2022-07-17 22:45 UTC (permalink / raw)
To: dm-devel, snitzer, hch; +Cc: Mike Christie
This patch fixes a bug where we are leaving the reservation in place
even though pr_release has run and returned success.
If we have a Write Exclusive, Exclusive Access, or Write/Exclusive
Registrants only reservation, the release must be sent down the path
that is the reservation holder. The problem is multipath_prepare_ioctl
most likely selected path N for the reservation, then later when we do
the release multipath_prepare_ioctl will select a completely different
path. The device will then return success becuase the nvme and scsi
specs say to return success if there is no reservation or if the release
is sent down from a path that is not the holder. We then think we have
released the reservation.
This patch has us loop over each path and send a release so we can make
sure the release is executed on the correct path. It has been tested with
windows failover clustering's validation test which checks this case, and
it has been tested manually (the libiscsi PGR tests don't have a test case
for this yet, but I will be adding one).
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/md/dm.c | 48 ++++++++++++++++++++++++++++++++++--------------
1 file changed, 34 insertions(+), 14 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index cdd1656b99d6..c31f99c9d2b2 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3178,24 +3178,44 @@ static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
return pr.ret;
}
+/*
+ * If there is a non-All Registrants type of reservation, the release must be
+ * sent down the holding path. For the cases where there is no reservation or
+ * the path is not the holder the device will also return success, so we must
+ * try each path to make sure we got the correct path.
+ */
+static int __dm_pr_release(struct dm_target *ti, struct dm_dev *dev,
+ sector_t start, sector_t len, void *data)
+{
+ struct dm_pr *pr = data;
+ const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_release) {
+ pr->ret = -EOPNOTSUPP;
+ return -1;
+ }
+
+ pr->ret = ops->pr_release(dev->bdev, pr->old_key, pr->type);
+ if (pr->ret)
+ return -1;
+
+ return 0;
+}
+
static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
{
- struct mapped_device *md = bdev->bd_disk->private_data;
- const struct pr_ops *ops;
- int r, srcu_idx;
+ struct dm_pr pr = {
+ .old_key = key,
+ .type = type,
+ .fail_early = false,
+ };
+ int ret;
- r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
- if (r < 0)
- goto out;
+ ret = dm_call_pr(bdev, __dm_pr_release, &pr);
+ if (ret)
+ return ret;
- ops = bdev->bd_disk->fops->pr_ops;
- if (ops && ops->pr_release)
- r = ops->pr_release(bdev, key, type);
- else
- r = -EOPNOTSUPP;
-out:
- dm_unprepare_ioctl(md, srcu_idx);
- return r;
+ return pr.ret;
}
static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
--
2.25.1
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply related [flat|nested] 6+ messages in thread* [dm-devel] [PATCH v2 4/4] dm: Start pr_preempt from the same starting path
2022-07-17 22:45 [dm-devel] [PATCH v2 0/4] dm pr_ops fixes Mike Christie
` (2 preceding siblings ...)
2022-07-17 22:45 ` [dm-devel] [PATCH v2 3/4] dm: Fix PR release handling for non All Registrants Mike Christie
@ 2022-07-17 22:45 ` Mike Christie
2022-07-22 19:22 ` [dm-devel] [PATCH v2 0/4] dm pr_ops fixes Mike Snitzer
4 siblings, 0 replies; 6+ messages in thread
From: Mike Christie @ 2022-07-17 22:45 UTC (permalink / raw)
To: dm-devel, snitzer, hch; +Cc: Mike Christie
pr_preempt has a similar issue as reserve where for all the reservation
types except the All Registrants ones the preempt can create a
reservation. And a follow up reservation or release needs to go down the
same path the preempt did. This has the pr_preempt work like reserve
and release where we always start from the first path in the first
group.
This patch has been tested with windows failover clustering's validation
test and libiscsi's PGR tests to check for regressions. They both don't
have tests to verify this case, so I tested it manually.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/md/dm.c | 45 +++++++++++++++++++++++++++++++--------------
1 file changed, 31 insertions(+), 14 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index c31f99c9d2b2..7f23dde45088 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3046,6 +3046,7 @@ struct dm_pr {
u64 old_key;
u64 new_key;
u32 flags;
+ bool abort;
bool fail_early;
int ret;
enum pr_type type;
@@ -3218,25 +3219,41 @@ static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
return pr.ret;
}
+static int __dm_pr_preempt(struct dm_target *ti, struct dm_dev *dev,
+ sector_t start, sector_t len, void *data)
+{
+ struct dm_pr *pr = data;
+ const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops;
+
+ if (!ops || !ops->pr_preempt) {
+ pr->ret = -EOPNOTSUPP;
+ return -1;
+ }
+
+ pr->ret = ops->pr_preempt(dev->bdev, pr->old_key, pr->new_key, pr->type,
+ pr->abort);
+ if (!pr->ret)
+ return -1;
+
+ return 0;
+}
+
static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
enum pr_type type, bool abort)
{
- struct mapped_device *md = bdev->bd_disk->private_data;
- const struct pr_ops *ops;
- int r, srcu_idx;
+ struct dm_pr pr = {
+ .new_key = new_key,
+ .old_key = old_key,
+ .type = type,
+ .fail_early = false,
+ };
+ int ret;
- r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
- if (r < 0)
- goto out;
+ ret = dm_call_pr(bdev, __dm_pr_preempt, &pr);
+ if (ret)
+ return ret;
- ops = bdev->bd_disk->fops->pr_ops;
- if (ops && ops->pr_preempt)
- r = ops->pr_preempt(bdev, old_key, new_key, type, abort);
- else
- r = -EOPNOTSUPP;
-out:
- dm_unprepare_ioctl(md, srcu_idx);
- return r;
+ return pr.ret;
}
static int dm_pr_clear(struct block_device *bdev, u64 key)
--
2.25.1
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [dm-devel] [PATCH v2 0/4] dm pr_ops fixes
2022-07-17 22:45 [dm-devel] [PATCH v2 0/4] dm pr_ops fixes Mike Christie
` (3 preceding siblings ...)
2022-07-17 22:45 ` [dm-devel] [PATCH v2 4/4] dm: Start pr_preempt from the same starting path Mike Christie
@ 2022-07-22 19:22 ` Mike Snitzer
4 siblings, 0 replies; 6+ messages in thread
From: Mike Snitzer @ 2022-07-22 19:22 UTC (permalink / raw)
To: Mike Christie; +Cc: hch, dm-devel
On Sun, Jul 17 2022 at 6:45P -0400,
Mike Christie <michael.christie@oracle.com> wrote:
> The following patches were made over Linus's tree and fix a couple bugs
> in the pr_ops code when a reservation type other than one of the All
> Registrants types is used. They were tested with the Windows failover
> cluster verification tests and libiscsi's PGR tests.
>
> The current dm pr_ops code works well for All Registrants because any
> registered path is the reservation holder. Commands like reserve and
> release can go down any path and the behavior is the same. The problems
> these patches fix is when only one path is the holder as is the case
> for the other reservation types which is used by Window Failover Cluster
> and Linux Cluster (tools like pacemaker + scsi/multipath_fence agents).
> For example for Registrants Only the path that got the RESERVE command is
> the reservation holder. The RELEASE must be sent down that path to release
> the reservation.
>
> With our current design we send down non-registration PR commands down
> whatever path we are currenly using, and then later PR commands end
> up on different paths. To continue the current design where dm's pr_ops
> are just passing through requests, and to avoid adding PR state to dm
> these patches modify pr_reserve/release to work similar to pr_register
> where we loop over all paths or at least loop over all paths until we
> find the path we are looking for.
>
> v2:
> - Added info about testing.
> - Added patch for pr_preempt.
I picked this set up for 5.20 and staged in linux-next.
I tweaked the patch headers a bit while proof-reading and
understanding the scope of the changes.
I noticed that dm_pr_clear is the only remaining dm_pr_* method that
is using dm_{prepare,unprepare}_ioctl. I assume that'll be fine, but
the one gap it leaves is handling for the possibility that the DM
device is suspended. Shouldn't dm_call_pr() be enhanced to check:
if (dm_suspended_md(md)) ?
Mike
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 6+ messages in thread