* [PATCH v2 1/4] nfs/blocklayout: Fix premature PR key unregistration
2024-06-21 16:22 [PATCH v2 0/4] Fixes for pNFS SCSI layout PR key registration cel
@ 2024-06-21 16:22 ` cel
2024-06-22 5:03 ` Christoph Hellwig
2024-06-21 16:22 ` [PATCH v2 2/4] nfs/blocklayout: Use bulk page allocation APIs cel
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: cel @ 2024-06-21 16:22 UTC (permalink / raw)
To: linux-nfs; +Cc: Christoph Hellwig, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
During generic/069 runs with pNFS SCSI layouts, the NFS client emits
the following in the system journal:
kernel: pNFS: failed to open device /dev/disk/by-id/dm-uuid-mpath-0x6001405e3366f045b7949eb8e4540b51 (-2)
kernel: pNFS: using block device sdb (reservation key 0x666b60901e7b26b3)
kernel: pNFS: failed to open device /dev/disk/by-id/dm-uuid-mpath-0x6001405e3366f045b7949eb8e4540b51 (-2)
kernel: pNFS: using block device sdb (reservation key 0x666b60901e7b26b3)
kernel: sd 6:0:0:1: reservation conflict
kernel: sd 6:0:0:1: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
kernel: sd 6:0:0:1: [sdb] tag#16 CDB: Write(10) 2a 00 00 00 00 50 00 00 08 00
kernel: reservation conflict error, dev sdb, sector 80 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
kernel: sd 6:0:0:1: reservation conflict
kernel: sd 6:0:0:1: reservation conflict
kernel: sd 6:0:0:1: [sdb] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
kernel: sd 6:0:0:1: [sdb] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
kernel: sd 6:0:0:1: [sdb] tag#18 CDB: Write(10) 2a 00 00 00 00 60 00 00 08 00
kernel: sd 6:0:0:1: [sdb] tag#17 CDB: Write(10) 2a 00 00 00 00 58 00 00 08 00
kernel: reservation conflict error, dev sdb, sector 96 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
kernel: reservation conflict error, dev sdb, sector 88 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
systemd[1]: fstests-generic-069.scope: Deactivated successfully.
systemd[1]: fstests-generic-069.scope: Consumed 5.092s CPU time.
systemd[1]: media-test.mount: Deactivated successfully.
systemd[1]: media-scratch.mount: Deactivated successfully.
kernel: sd 6:0:0:1: reservation conflict
kernel: failed to unregister PR key.
This appears to be due to a race. bl_alloc_lseg() calls this:
561 static struct nfs4_deviceid_node *
562 bl_find_get_deviceid(struct nfs_server *server,
563 const struct nfs4_deviceid *id, const struct cred *cred,
564 gfp_t gfp_mask)
565 {
566 struct nfs4_deviceid_node *node;
567 unsigned long start, end;
568
569 retry:
570 node = nfs4_find_get_deviceid(server, id, cred, gfp_mask);
571 if (!node)
572 return ERR_PTR(-ENODEV);
nfs4_find_get_deviceid() does a lookup without the spin lock first.
If it can't find a matching deviceid, it creates a new device_info
(which calls bl_alloc_deviceid_node, and that registers the device's
PR key).
Then it takes the nfs4_deviceid_lock and looks up the deviceid again.
If it finds it this time, bl_find_get_deviceid() frees the spare
(new) device_info, which unregisters the PR key for the same device.
Any subsequent I/O from this client on that device gets EBADE.
The umount later unregisters the device's PR key again.
To prevent this problem, register the PR key after the deviceid_node
lookup.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfs/blocklayout/blocklayout.c | 13 +++++++++--
fs/nfs/blocklayout/blocklayout.h | 8 ++++++-
fs/nfs/blocklayout/dev.c | 39 +++++++++++++++++++++++---------
3 files changed, 46 insertions(+), 14 deletions(-)
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 6be13e0ec170..947b2c523440 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -564,6 +564,7 @@ bl_find_get_deviceid(struct nfs_server *server,
gfp_t gfp_mask)
{
struct nfs4_deviceid_node *node;
+ struct pnfs_block_dev *d;
unsigned long start, end;
retry:
@@ -571,9 +572,16 @@ bl_find_get_deviceid(struct nfs_server *server,
if (!node)
return ERR_PTR(-ENODEV);
- if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags) == 0)
- return node;
+ if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags))
+ goto transient;
+ d = container_of(node, struct pnfs_block_dev, node);
+ if (d->pr_register)
+ if (!d->pr_register(d))
+ goto out_put;
+ return node;
+
+transient:
end = jiffies;
start = end - PNFS_DEVICE_RETRY_TIMEOUT;
if (!time_in_range(node->timestamp_unavailable, start, end)) {
@@ -581,6 +589,7 @@ bl_find_get_deviceid(struct nfs_server *server,
goto retry;
}
+out_put:
nfs4_put_deviceid_node(node);
return ERR_PTR(-ENODEV);
}
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index f1eeb4914199..cc788e8ce909 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -110,12 +110,18 @@ struct pnfs_block_dev {
struct file *bdev_file;
u64 disk_offset;
+ unsigned long flags;
u64 pr_key;
- bool pr_registered;
bool (*map)(struct pnfs_block_dev *dev, u64 offset,
struct pnfs_block_dev_map *map);
+ bool (*pr_register)(struct pnfs_block_dev *dev);
+};
+
+/* pnfs_block_dev flag bits */
+enum {
+ PNFS_BDEV_REGISTERED = 0,
};
/* sector_t fields are all in 512-byte sectors */
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index 519c310c745d..83753a08a19d 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -23,9 +23,9 @@ bl_free_device(struct pnfs_block_dev *dev)
bl_free_device(&dev->children[i]);
kfree(dev->children);
} else {
- if (dev->pr_registered) {
- const struct pr_ops *ops =
- file_bdev(dev->bdev_file)->bd_disk->fops->pr_ops;
+ if (test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) {
+ struct block_device *bdev = file_bdev(dev->bdev_file);
+ const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
int error;
error = ops->pr_register(file_bdev(dev->bdev_file),
@@ -226,6 +226,30 @@ static bool bl_map_stripe(struct pnfs_block_dev *dev, u64 offset,
return true;
}
+/**
+ * bl_pr_register_scsi - Register a SCSI PR key for @d
+ * @dev: pNFS block device, key to register is already in @d->pr_key
+ *
+ * Returns true if the device's PR key is registered, otherwise false.
+ */
+static bool bl_pr_register_scsi(struct pnfs_block_dev *dev)
+{
+ struct block_device *bdev = file_bdev(dev->bdev_file);
+ const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
+ int status;
+
+ if (test_and_set_bit(PNFS_BDEV_REGISTERED, &dev->flags))
+ return true;
+
+ status = ops->pr_register(bdev, 0, dev->pr_key, true);
+ if (status) {
+ pr_err("pNFS: failed to register key for block device %s.",
+ bdev->bd_disk->disk_name);
+ return false;
+ }
+ return true;
+}
+
static int
bl_parse_deviceid(struct nfs_server *server, struct pnfs_block_dev *d,
struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask);
@@ -367,14 +391,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
goto out_blkdev_put;
}
- error = ops->pr_register(file_bdev(d->bdev_file), 0, d->pr_key, true);
- if (error) {
- pr_err("pNFS: failed to register key for block device %s.",
- file_bdev(d->bdev_file)->bd_disk->disk_name);
- goto out_blkdev_put;
- }
-
- d->pr_registered = true;
+ d->pr_register = bl_pr_register_scsi;
return 0;
out_blkdev_put:
--
2.45.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v2 1/4] nfs/blocklayout: Fix premature PR key unregistration
2024-06-21 16:22 ` [PATCH v2 1/4] nfs/blocklayout: Fix premature PR key unregistration cel
@ 2024-06-22 5:03 ` Christoph Hellwig
2024-06-22 17:26 ` Chuck Lever
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2024-06-22 5:03 UTC (permalink / raw)
To: cel; +Cc: linux-nfs, Christoph Hellwig, Chuck Lever
On Fri, Jun 21, 2024 at 12:22:29PM -0400, cel@kernel.org wrote:
> @@ -367,14 +391,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
> goto out_blkdev_put;
> }
>
> - error = ops->pr_register(file_bdev(d->bdev_file), 0, d->pr_key, true);
> - if (error) {
> - pr_err("pNFS: failed to register key for block device %s.",
> - file_bdev(d->bdev_file)->bd_disk->disk_name);
> - goto out_blkdev_put;
> - }
> -
> - d->pr_registered = true;
> + d->pr_register = bl_pr_register_scsi;
I think this will break complex (slice, concat, stripe) volumes,
as we'll never call ->pr_register for them at all. We'll also need
a register callback for them, which then calls into underlying
volume, similar to how bl_parse_deviceid works. That would also
do away with the need for the d->pr_register callback, we could
just do the swithc on the volume types which might be more
efficient. (the same is actually true for the ->map callback,
but that's a separate cleanup).
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2 1/4] nfs/blocklayout: Fix premature PR key unregistration
2024-06-22 5:03 ` Christoph Hellwig
@ 2024-06-22 17:26 ` Chuck Lever
2024-06-23 7:36 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2024-06-22 17:26 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: cel, linux-nfs
On Sat, Jun 22, 2024 at 07:03:25AM +0200, Christoph Hellwig wrote:
> On Fri, Jun 21, 2024 at 12:22:29PM -0400, cel@kernel.org wrote:
> > @@ -367,14 +391,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
> > goto out_blkdev_put;
> > }
> >
> > - error = ops->pr_register(file_bdev(d->bdev_file), 0, d->pr_key, true);
> > - if (error) {
> > - pr_err("pNFS: failed to register key for block device %s.",
> > - file_bdev(d->bdev_file)->bd_disk->disk_name);
> > - goto out_blkdev_put;
> > - }
> > -
> > - d->pr_registered = true;
> > + d->pr_register = bl_pr_register_scsi;
>
> I think this will break complex (slice, concat, stripe) volumes,
> as we'll never call ->pr_register for them at all. We'll also need
> a register callback for them, which then calls into underlying
> volume, similar to how bl_parse_deviceid works.
This patch currently adds the pr_reg callback to
bl_find_get_deviceid(), which has no visibility of the volume
hierarchy. Where should the registration be done instead? I'm
missing something.
> That would also
> do away with the need for the d->pr_register callback, we could
> just do the swithc on the volume types which might be more
> efficient. (the same is actually true for the ->map callback,
> but that's a separate cleanup).
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2 1/4] nfs/blocklayout: Fix premature PR key unregistration
2024-06-22 17:26 ` Chuck Lever
@ 2024-06-23 7:36 ` Christoph Hellwig
2024-06-24 15:08 ` Chuck Lever
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2024-06-23 7:36 UTC (permalink / raw)
To: Chuck Lever; +Cc: Christoph Hellwig, cel, linux-nfs
On Sat, Jun 22, 2024 at 01:26:10PM -0400, Chuck Lever wrote:
> This patch currently adds the pr_reg callback to
> bl_find_get_deviceid(), which has no visibility of the volume
> hierarchy. Where should the registration be done instead? I'm
> missing something.
Something like the patch below (untested Sunday morning coding) should
do the trick:
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 947b2c52344097..6db54b215066e0 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -564,34 +564,32 @@ bl_find_get_deviceid(struct nfs_server *server,
gfp_t gfp_mask)
{
struct nfs4_deviceid_node *node;
- struct pnfs_block_dev *d;
- unsigned long start, end;
+ int err = -ENODEV;
retry:
node = nfs4_find_get_deviceid(server, id, cred, gfp_mask);
if (!node)
return ERR_PTR(-ENODEV);
- if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags))
- goto transient;
+ if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) {
+ unsigned long end = jiffies;
+ unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT;
- d = container_of(node, struct pnfs_block_dev, node);
- if (d->pr_register)
- if (!d->pr_register(d))
- goto out_put;
- return node;
-
-transient:
- end = jiffies;
- start = end - PNFS_DEVICE_RETRY_TIMEOUT;
- if (!time_in_range(node->timestamp_unavailable, start, end)) {
- nfs4_delete_deviceid(node->ld, node->nfs_client, id);
- goto retry;
+ if (!time_in_range(node->timestamp_unavailable, start, end)) {
+ nfs4_delete_deviceid(node->ld, node->nfs_client, id);
+ goto retry;
+ }
+ goto out_put;
}
+ err = bl_register_dev(container_of(node, struct pnfs_block_dev, node));
+ if (err)
+ goto out_put;
+ return node;
+
out_put:
nfs4_put_deviceid_node(node);
- return ERR_PTR(-ENODEV);
+ return ERR_PTR(err);
}
static int
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index cc788e8ce90933..7efbef9d10dba8 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -104,6 +104,7 @@ struct pnfs_block_dev {
u64 start;
u64 len;
+ enum pnfs_block_volume_type type;
u32 nr_children;
struct pnfs_block_dev *children;
u64 chunk_size;
@@ -116,7 +117,6 @@ struct pnfs_block_dev {
bool (*map)(struct pnfs_block_dev *dev, u64 offset,
struct pnfs_block_dev_map *map);
- bool (*pr_register)(struct pnfs_block_dev *dev);
};
/* pnfs_block_dev flag bits */
@@ -178,6 +178,7 @@ struct bl_msg_hdr {
#define BL_DEVICE_REQUEST_ERR 0x2 /* User level process fails */
/* dev.c */
+int bl_register_dev(struct pnfs_block_dev *d);
struct nfs4_deviceid_node *bl_alloc_deviceid_node(struct nfs_server *server,
struct pnfs_device *pdev, gfp_t gfp_mask);
void bl_free_deviceid_node(struct nfs4_deviceid_node *d);
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index 16fb64d4af31db..72e061e87e145a 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -13,9 +13,74 @@
#define NFSDBG_FACILITY NFSDBG_PNFS_LD
+static void bl_unregister_scsi(struct pnfs_block_dev *dev)
+{
+ struct block_device *bdev = file_bdev(dev->bdev_file);
+ const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
+
+ if (!test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags))
+ return;
+
+ if (ops->pr_register(bdev, dev->pr_key, 0, false))
+ pr_err("failed to unregister PR key.\n");
+}
+
+static bool bl_register_scsi(struct pnfs_block_dev *dev)
+{
+ struct block_device *bdev = file_bdev(dev->bdev_file);
+ const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
+ int status;
+
+ if (test_and_set_bit(PNFS_BDEV_REGISTERED, &dev->flags))
+ return true;
+
+ status = ops->pr_register(bdev, 0, dev->pr_key, true);
+ if (status) {
+ pr_err("pNFS: failed to register key for block device %s.",
+ bdev->bd_disk->disk_name);
+ return false;
+ }
+ return true;
+}
+
+static void bl_unregister_dev(struct pnfs_block_dev *dev)
+{
+ if (dev->nr_children) {
+ for (u32 i = 0; i < dev->nr_children; i++)
+ bl_unregister_dev(&dev->children[i]);
+ return;
+ }
+
+ if (dev->type == PNFS_BLOCK_VOLUME_SCSI)
+ bl_unregister_scsi(dev);
+}
+
+int bl_register_dev(struct pnfs_block_dev *dev)
+{
+ if (dev->nr_children) {
+ for (u32 i = 0; i < dev->nr_children; i++) {
+ int ret = bl_register_dev(&dev->children[i]);
+
+ if (ret) {
+ while (i > 0)
+ bl_unregister_dev(&dev->children[--i]);
+ return ret;
+ }
+ }
+
+ return 0;
+ }
+
+ if (dev->type == PNFS_BLOCK_VOLUME_SCSI)
+ return bl_register_scsi(dev);
+ return 0;
+}
+
static void
bl_free_device(struct pnfs_block_dev *dev)
{
+ bl_unregister_dev(dev);
+
if (dev->nr_children) {
int i;
@@ -23,17 +88,6 @@ bl_free_device(struct pnfs_block_dev *dev)
bl_free_device(&dev->children[i]);
kfree(dev->children);
} else {
- if (test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) {
- struct block_device *bdev = file_bdev(dev->bdev_file);
- const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
- int error;
-
- error = ops->pr_register(file_bdev(dev->bdev_file),
- dev->pr_key, 0, false);
- if (error)
- pr_err("failed to unregister PR key.\n");
- }
-
if (dev->bdev_file)
fput(dev->bdev_file);
}
@@ -226,30 +280,6 @@ static bool bl_map_stripe(struct pnfs_block_dev *dev, u64 offset,
return true;
}
-/**
- * bl_pr_register_scsi - Register a SCSI PR key for @d
- * @dev: pNFS block device, key to register is already in @d->pr_key
- *
- * Returns true if the device's PR key is registered, otherwise false.
- */
-static bool bl_pr_register_scsi(struct pnfs_block_dev *dev)
-{
- struct block_device *bdev = file_bdev(dev->bdev_file);
- const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
- int status;
-
- if (test_and_set_bit(PNFS_BDEV_REGISTERED, &dev->flags))
- return true;
-
- status = ops->pr_register(bdev, 0, dev->pr_key, true);
- if (status) {
- pr_err("pNFS: failed to register key for block device %s.",
- bdev->bd_disk->disk_name);
- return false;
- }
- return true;
-}
-
static int
bl_parse_deviceid(struct nfs_server *server, struct pnfs_block_dev *d,
struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask);
@@ -392,7 +422,6 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
goto out_blkdev_put;
}
- d->pr_register = bl_pr_register_scsi;
return 0;
out_blkdev_put:
@@ -478,7 +507,9 @@ static int
bl_parse_deviceid(struct nfs_server *server, struct pnfs_block_dev *d,
struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask)
{
- switch (volumes[idx].type) {
+ d->type = volumes[idx].type;
+
+ switch (d->type) {
case PNFS_BLOCK_VOLUME_SIMPLE:
return bl_parse_simple(server, d, volumes, idx, gfp_mask);
case PNFS_BLOCK_VOLUME_SLICE:
@@ -490,7 +521,7 @@ bl_parse_deviceid(struct nfs_server *server, struct pnfs_block_dev *d,
case PNFS_BLOCK_VOLUME_SCSI:
return bl_parse_scsi(server, d, volumes, idx, gfp_mask);
default:
- dprintk("unsupported volume type: %d\n", volumes[idx].type);
+ dprintk("unsupported volume type: %d\n", d->type);
return -EIO;
}
}
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v2 1/4] nfs/blocklayout: Fix premature PR key unregistration
2024-06-23 7:36 ` Christoph Hellwig
@ 2024-06-24 15:08 ` Chuck Lever
0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2024-06-24 15:08 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: cel, linux-nfs
On Sun, Jun 23, 2024 at 09:36:27AM +0200, Christoph Hellwig wrote:
> On Sat, Jun 22, 2024 at 01:26:10PM -0400, Chuck Lever wrote:
> > This patch currently adds the pr_reg callback to
> > bl_find_get_deviceid(), which has no visibility of the volume
> > hierarchy. Where should the registration be done instead? I'm
> > missing something.
>
> Something like the patch below (untested Sunday morning coding) should
> do the trick:
>
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 947b2c52344097..6db54b215066e0 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -564,34 +564,32 @@ bl_find_get_deviceid(struct nfs_server *server,
> gfp_t gfp_mask)
> {
> struct nfs4_deviceid_node *node;
> - struct pnfs_block_dev *d;
> - unsigned long start, end;
> + int err = -ENODEV;
>
> retry:
> node = nfs4_find_get_deviceid(server, id, cred, gfp_mask);
> if (!node)
> return ERR_PTR(-ENODEV);
>
> - if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags))
> - goto transient;
> + if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) {
> + unsigned long end = jiffies;
> + unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT;
>
> - d = container_of(node, struct pnfs_block_dev, node);
> - if (d->pr_register)
> - if (!d->pr_register(d))
> - goto out_put;
> - return node;
> -
> -transient:
> - end = jiffies;
> - start = end - PNFS_DEVICE_RETRY_TIMEOUT;
> - if (!time_in_range(node->timestamp_unavailable, start, end)) {
> - nfs4_delete_deviceid(node->ld, node->nfs_client, id);
> - goto retry;
> + if (!time_in_range(node->timestamp_unavailable, start, end)) {
> + nfs4_delete_deviceid(node->ld, node->nfs_client, id);
> + goto retry;
> + }
> + goto out_put;
> }
>
> + err = bl_register_dev(container_of(node, struct pnfs_block_dev, node));
> + if (err)
> + goto out_put;
> + return node;
> +
> out_put:
> nfs4_put_deviceid_node(node);
> - return ERR_PTR(-ENODEV);
> + return ERR_PTR(err);
> }
>
> static int
> diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
> index cc788e8ce90933..7efbef9d10dba8 100644
> --- a/fs/nfs/blocklayout/blocklayout.h
> +++ b/fs/nfs/blocklayout/blocklayout.h
> @@ -104,6 +104,7 @@ struct pnfs_block_dev {
> u64 start;
> u64 len;
>
> + enum pnfs_block_volume_type type;
> u32 nr_children;
> struct pnfs_block_dev *children;
> u64 chunk_size;
> @@ -116,7 +117,6 @@ struct pnfs_block_dev {
>
> bool (*map)(struct pnfs_block_dev *dev, u64 offset,
> struct pnfs_block_dev_map *map);
> - bool (*pr_register)(struct pnfs_block_dev *dev);
> };
>
> /* pnfs_block_dev flag bits */
> @@ -178,6 +178,7 @@ struct bl_msg_hdr {
> #define BL_DEVICE_REQUEST_ERR 0x2 /* User level process fails */
>
> /* dev.c */
> +int bl_register_dev(struct pnfs_block_dev *d);
> struct nfs4_deviceid_node *bl_alloc_deviceid_node(struct nfs_server *server,
> struct pnfs_device *pdev, gfp_t gfp_mask);
> void bl_free_deviceid_node(struct nfs4_deviceid_node *d);
> diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
> index 16fb64d4af31db..72e061e87e145a 100644
> --- a/fs/nfs/blocklayout/dev.c
> +++ b/fs/nfs/blocklayout/dev.c
> @@ -13,9 +13,74 @@
>
> #define NFSDBG_FACILITY NFSDBG_PNFS_LD
>
> +static void bl_unregister_scsi(struct pnfs_block_dev *dev)
> +{
> + struct block_device *bdev = file_bdev(dev->bdev_file);
> + const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
> +
> + if (!test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags))
> + return;
> +
> + if (ops->pr_register(bdev, dev->pr_key, 0, false))
> + pr_err("failed to unregister PR key.\n");
> +}
> +
> +static bool bl_register_scsi(struct pnfs_block_dev *dev)
> +{
> + struct block_device *bdev = file_bdev(dev->bdev_file);
> + const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
> + int status;
> +
> + if (test_and_set_bit(PNFS_BDEV_REGISTERED, &dev->flags))
> + return true;
> +
> + status = ops->pr_register(bdev, 0, dev->pr_key, true);
> + if (status) {
> + pr_err("pNFS: failed to register key for block device %s.",
> + bdev->bd_disk->disk_name);
> + return false;
> + }
> + return true;
> +}
> +
> +static void bl_unregister_dev(struct pnfs_block_dev *dev)
> +{
> + if (dev->nr_children) {
> + for (u32 i = 0; i < dev->nr_children; i++)
> + bl_unregister_dev(&dev->children[i]);
> + return;
> + }
> +
> + if (dev->type == PNFS_BLOCK_VOLUME_SCSI)
> + bl_unregister_scsi(dev);
> +}
> +
> +int bl_register_dev(struct pnfs_block_dev *dev)
> +{
> + if (dev->nr_children) {
> + for (u32 i = 0; i < dev->nr_children; i++) {
> + int ret = bl_register_dev(&dev->children[i]);
> +
> + if (ret) {
> + while (i > 0)
> + bl_unregister_dev(&dev->children[--i]);
> + return ret;
> + }
> + }
> +
> + return 0;
> + }
> +
> + if (dev->type == PNFS_BLOCK_VOLUME_SCSI)
> + return bl_register_scsi(dev);
> + return 0;
> +}
> +
> static void
> bl_free_device(struct pnfs_block_dev *dev)
> {
> + bl_unregister_dev(dev);
> +
> if (dev->nr_children) {
> int i;
>
> @@ -23,17 +88,6 @@ bl_free_device(struct pnfs_block_dev *dev)
> bl_free_device(&dev->children[i]);
> kfree(dev->children);
> } else {
> - if (test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) {
> - struct block_device *bdev = file_bdev(dev->bdev_file);
> - const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
> - int error;
> -
> - error = ops->pr_register(file_bdev(dev->bdev_file),
> - dev->pr_key, 0, false);
> - if (error)
> - pr_err("failed to unregister PR key.\n");
> - }
> -
> if (dev->bdev_file)
> fput(dev->bdev_file);
> }
> @@ -226,30 +280,6 @@ static bool bl_map_stripe(struct pnfs_block_dev *dev, u64 offset,
> return true;
> }
>
> -/**
> - * bl_pr_register_scsi - Register a SCSI PR key for @d
> - * @dev: pNFS block device, key to register is already in @d->pr_key
> - *
> - * Returns true if the device's PR key is registered, otherwise false.
> - */
> -static bool bl_pr_register_scsi(struct pnfs_block_dev *dev)
> -{
> - struct block_device *bdev = file_bdev(dev->bdev_file);
> - const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
> - int status;
> -
> - if (test_and_set_bit(PNFS_BDEV_REGISTERED, &dev->flags))
> - return true;
> -
> - status = ops->pr_register(bdev, 0, dev->pr_key, true);
> - if (status) {
> - pr_err("pNFS: failed to register key for block device %s.",
> - bdev->bd_disk->disk_name);
> - return false;
> - }
> - return true;
> -}
> -
> static int
> bl_parse_deviceid(struct nfs_server *server, struct pnfs_block_dev *d,
> struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask);
> @@ -392,7 +422,6 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
> goto out_blkdev_put;
> }
>
> - d->pr_register = bl_pr_register_scsi;
> return 0;
>
> out_blkdev_put:
> @@ -478,7 +507,9 @@ static int
> bl_parse_deviceid(struct nfs_server *server, struct pnfs_block_dev *d,
> struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask)
> {
> - switch (volumes[idx].type) {
> + d->type = volumes[idx].type;
> +
> + switch (d->type) {
> case PNFS_BLOCK_VOLUME_SIMPLE:
> return bl_parse_simple(server, d, volumes, idx, gfp_mask);
> case PNFS_BLOCK_VOLUME_SLICE:
> @@ -490,7 +521,7 @@ bl_parse_deviceid(struct nfs_server *server, struct pnfs_block_dev *d,
> case PNFS_BLOCK_VOLUME_SCSI:
> return bl_parse_scsi(server, d, volumes, idx, gfp_mask);
> default:
> - dprintk("unsupported volume type: %d\n", volumes[idx].type);
> + dprintk("unsupported volume type: %d\n", d->type);
> return -EIO;
> }
> }
Thanks. I've applied this as a separate patch (I can squash it into
1/4 once it passes testing). The first write I/O segfaults at L143
in do_add_page_to_bio() :
142 if (!offset_in_map(disk_addr, map)) {
143 if (!dev->map(dev, disk_addr, map) || !offset_in_map(disk_addr, map))
144 return ERR_PTR(-EIO);
I'm looking into it.
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 2/4] nfs/blocklayout: Use bulk page allocation APIs
2024-06-21 16:22 [PATCH v2 0/4] Fixes for pNFS SCSI layout PR key registration cel
2024-06-21 16:22 ` [PATCH v2 1/4] nfs/blocklayout: Fix premature PR key unregistration cel
@ 2024-06-21 16:22 ` cel
2024-06-22 5:08 ` Christoph Hellwig
2024-06-21 16:22 ` [PATCH v2 3/4] nfs/blocklayout: Report only when /no/ device is found cel
` (2 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: cel @ 2024-06-21 16:22 UTC (permalink / raw)
To: linux-nfs; +Cc: Christoph Hellwig, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
nfs4_get_device_info() frequently requests more than a few pages
when provisioning a nfs4_deviceid_node object. Make this more
efficient by using alloc_pages_bulk_array(). This API is known to be
several times faster than an open-coded loop around alloc_page().
release_pages() is folio-enabled so it is also more efficient than
repeatedly invoking __free_pages().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfs/pnfs_dev.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index 178001c90156..26a78d69acab 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -101,9 +101,8 @@ nfs4_get_device_info(struct nfs_server *server,
struct nfs4_deviceid_node *d = NULL;
struct pnfs_device *pdev = NULL;
struct page **pages = NULL;
+ int rc, i, max_pages;
u32 max_resp_sz;
- int max_pages;
- int rc, i;
/*
* Use the session max response size as the basis for setting
@@ -125,11 +124,9 @@ nfs4_get_device_info(struct nfs_server *server,
if (!pages)
goto out_free_pdev;
- for (i = 0; i < max_pages; i++) {
- pages[i] = alloc_page(gfp_flags);
- if (!pages[i])
- goto out_free_pages;
- }
+ i = alloc_pages_bulk_array(GFP_KERNEL, max_pages, pages);
+ if (i != max_pages)
+ goto out_free_pages;
memcpy(&pdev->dev_id, dev_id, sizeof(*dev_id));
pdev->layout_type = server->pnfs_curr_ld->id;
@@ -154,8 +151,8 @@ nfs4_get_device_info(struct nfs_server *server,
set_bit(NFS_DEVICEID_NOCACHE, &d->flags);
out_free_pages:
- while (--i >= 0)
- __free_page(pages[i]);
+ if (i)
+ release_pages(pages, i);
kfree(pages);
out_free_pdev:
kfree(pdev);
--
2.45.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v2 2/4] nfs/blocklayout: Use bulk page allocation APIs
2024-06-21 16:22 ` [PATCH v2 2/4] nfs/blocklayout: Use bulk page allocation APIs cel
@ 2024-06-22 5:08 ` Christoph Hellwig
2024-06-22 16:29 ` Chuck Lever
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2024-06-22 5:08 UTC (permalink / raw)
To: cel; +Cc: linux-nfs, Christoph Hellwig, Chuck Lever
On Fri, Jun 21, 2024 at 12:22:30PM -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> nfs4_get_device_info() frequently requests more than a few pages
> when provisioning a nfs4_deviceid_node object. Make this more
> efficient by using alloc_pages_bulk_array(). This API is known to be
> several times faster than an open-coded loop around alloc_page().
>
> release_pages() is folio-enabled so it is also more efficient than
> repeatedly invoking __free_pages().
This isn't really a pnfs fix, right? Just a little optimization.
It does looks fine to me:
Reviewed-by: Christoph Hellwig <hch@lst.de>
But I'd really with if we could do better than this with lazy
decoding in ->alloc_deviceid_node, which (at least for blocklayout)
knows roughly how much we need to decode after the first value
parsed. Or at least cache it if it is that frequent (which it
really shouldn't be due to the device id cache, or am I missing
something?)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/4] nfs/blocklayout: Use bulk page allocation APIs
2024-06-22 5:08 ` Christoph Hellwig
@ 2024-06-22 16:29 ` Chuck Lever
0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2024-06-22 16:29 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: cel, linux-nfs
On Sat, Jun 22, 2024 at 07:08:12AM +0200, Christoph Hellwig wrote:
> On Fri, Jun 21, 2024 at 12:22:30PM -0400, cel@kernel.org wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> >
> > nfs4_get_device_info() frequently requests more than a few pages
> > when provisioning a nfs4_deviceid_node object. Make this more
> > efficient by using alloc_pages_bulk_array(). This API is known to be
> > several times faster than an open-coded loop around alloc_page().
> >
> > release_pages() is folio-enabled so it is also more efficient than
> > repeatedly invoking __free_pages().
>
> This isn't really a pnfs fix, right? Just a little optimization.
It doesn't say "fix" anywhere and doesn't include a Fixes: tag.
And subsequent patches in the series are also clearly not fixes.
I can make it more clear that this one is only an optimization.
> It does looks fine to me:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
Thank you!
> But I'd really wish if we could do better than this with lazy
> decoding in ->alloc_deviceid_node, which (at least for blocklayout)
> knows roughly how much we need to decode after the first value
> parsed.
Agreed. And it's not the only culprit in NFS and RPC of this kind
of temporary "just in case" overallocation.
> Or at least cache it if it is that frequent (which it
> really shouldn't be due to the device id cache, or am I missing
> something?)
It's not a frequent operation; it's done the first time pNFS
encounters a new block device. But the alloc_page() loop is slow and
takes and releases an IRQ spinlock repeatedly (IIRC) so it's an
opportunity for IRQs to run and delay get_device_info considerably.
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 3/4] nfs/blocklayout: Report only when /no/ device is found
2024-06-21 16:22 [PATCH v2 0/4] Fixes for pNFS SCSI layout PR key registration cel
2024-06-21 16:22 ` [PATCH v2 1/4] nfs/blocklayout: Fix premature PR key unregistration cel
2024-06-21 16:22 ` [PATCH v2 2/4] nfs/blocklayout: Use bulk page allocation APIs cel
@ 2024-06-21 16:22 ` cel
2024-06-21 16:22 ` [PATCH v2 4/4] nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg cel
2024-06-21 18:03 ` [PATCH v2 0/4] Fixes for pNFS SCSI layout PR key registration Benjamin Coddington
4 siblings, 0 replies; 15+ messages in thread
From: cel @ 2024-06-21 16:22 UTC (permalink / raw)
To: linux-nfs; +Cc: Christoph Hellwig, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Since commit f931d8374cad ("nfs/blocklayout: refactor block device
opening"), an error is reported when no multi-path device is found.
But this isn't a fatal error if the subsequent device open is
successful. On systems without multi-path devices, this message
always appears whether there is a problem or not.
Instead, generate less system journal noise by reporting an error
only when both open attempts fail. The new error message is more
actionable since it indicates that there is a real configuration
issue to be addressed.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfs/blocklayout/dev.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index 83753a08a19d..568f685dee4b 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -338,7 +338,7 @@ bl_open_path(struct pnfs_block_volume *v, const char *prefix)
bdev_file = bdev_file_open_by_path(devname, BLK_OPEN_READ | BLK_OPEN_WRITE,
NULL, NULL);
if (IS_ERR(bdev_file)) {
- pr_warn("pNFS: failed to open device %s (%ld)\n",
+ dprintk("failed to open device %s (%ld)\n",
devname, PTR_ERR(bdev_file));
}
@@ -367,8 +367,11 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
bdev_file = bl_open_path(v, "dm-uuid-mpath-0x");
if (IS_ERR(bdev_file))
bdev_file = bl_open_path(v, "wwn-0x");
- if (IS_ERR(bdev_file))
+ if (IS_ERR(bdev_file)) {
+ pr_warn("pNFS: no device found for volume %*phN\n",
+ v->scsi.designator_len, v->scsi.designator);
return PTR_ERR(bdev_file);
+ }
d->bdev_file = bdev_file;
d->len = bdev_nr_bytes(file_bdev(d->bdev_file));
--
2.45.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH v2 4/4] nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg
2024-06-21 16:22 [PATCH v2 0/4] Fixes for pNFS SCSI layout PR key registration cel
` (2 preceding siblings ...)
2024-06-21 16:22 ` [PATCH v2 3/4] nfs/blocklayout: Report only when /no/ device is found cel
@ 2024-06-21 16:22 ` cel
2024-06-21 17:21 ` Anna Schumaker
2024-06-22 5:09 ` Christoph Hellwig
2024-06-21 18:03 ` [PATCH v2 0/4] Fixes for pNFS SCSI layout PR key registration Benjamin Coddington
4 siblings, 2 replies; 15+ messages in thread
From: cel @ 2024-06-21 16:22 UTC (permalink / raw)
To: linux-nfs; +Cc: Christoph Hellwig, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
An administrator cannot take action on these messages, but the
reported errors might be helpful for troubleshooting. Transition
them to trace points so these events appear in the trace log and
can be easily lined up with other traced NFS client operations.
Examples:
append_writer-6147 [000] 80.247393: bl_pr_key_reg: device=sdb key=0x666dcdabf29514fe
append_writer-6147 [000] 80.247842: bl_pr_key_unreg: device=sdb key=0x666dcdabf29514fe
umount.nfs4-6172 [002] 84.950409: bl_pr_key_unreg_err: device=sdb key=0x666dcdabf29514fe error=24
Christoph points out that:
> ... Note that the disk_name isn't really what
> we'd want to trace anyway, as it misses the partition information.
> The normal way to print the device name is the %pg printk specifier,
> but I'm not sure how to correctly use that for tracing which wants
> a string in the entry for binary tracing.
The trace points copy the pr_info() that they are replacing, and
show only the parent device name and not the partition. I'm still
looking into how to record both parts of the device name.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfs/blocklayout/dev.c | 30 +++++++-------
fs/nfs/nfs4trace.c | 7 ++++
fs/nfs/nfs4trace.h | 88 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 111 insertions(+), 14 deletions(-)
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index 568f685dee4b..6c5d290ca81d 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -10,6 +10,7 @@
#include <linux/pr.h>
#include "blocklayout.h"
+#include "../nfs4trace.h"
#define NFSDBG_FACILITY NFSDBG_PNFS_LD
@@ -26,12 +27,14 @@ bl_free_device(struct pnfs_block_dev *dev)
if (test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) {
struct block_device *bdev = file_bdev(dev->bdev_file);
const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
- int error;
+ int status;
- error = ops->pr_register(file_bdev(dev->bdev_file),
- dev->pr_key, 0, false);
- if (error)
- pr_err("failed to unregister PR key.\n");
+ status = ops->pr_register(bdev, dev->pr_key, 0, false);
+ if (status)
+ trace_bl_pr_key_unreg_err(bdev, dev->pr_key,
+ status);
+ else
+ trace_bl_pr_key_unreg(bdev, dev->pr_key);
}
if (dev->bdev_file)
@@ -243,10 +246,10 @@ static bool bl_pr_register_scsi(struct pnfs_block_dev *dev)
status = ops->pr_register(bdev, 0, dev->pr_key, true);
if (status) {
- pr_err("pNFS: failed to register key for block device %s.",
- bdev->bd_disk->disk_name);
+ trace_bl_pr_key_reg_err(bdev, dev->pr_key, status);
return false;
}
+ trace_bl_pr_key_reg(bdev, dev->pr_key);
return true;
}
@@ -351,8 +354,9 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask)
{
struct pnfs_block_volume *v = &volumes[idx];
- struct file *bdev_file;
+ struct block_device *bdev;
const struct pr_ops *ops;
+ struct file *bdev_file;
int error;
if (!bl_validate_designator(v))
@@ -373,8 +377,9 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
return PTR_ERR(bdev_file);
}
d->bdev_file = bdev_file;
+ bdev = file_bdev(bdev_file);
- d->len = bdev_nr_bytes(file_bdev(d->bdev_file));
+ d->len = bdev_nr_bytes(bdev);
d->map = bl_map_simple;
d->pr_key = v->scsi.pr_key;
@@ -383,13 +388,10 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
goto out_blkdev_put;
}
- pr_info("pNFS: using block device %s (reservation key 0x%llx)\n",
- file_bdev(d->bdev_file)->bd_disk->disk_name, d->pr_key);
-
- ops = file_bdev(d->bdev_file)->bd_disk->fops->pr_ops;
+ ops = bdev->bd_disk->fops->pr_ops;
if (!ops) {
pr_err("pNFS: block device %s does not support reservations.",
- file_bdev(d->bdev_file)->bd_disk->disk_name);
+ bdev->bd_disk->disk_name);
error = -EINVAL;
goto out_blkdev_put;
}
diff --git a/fs/nfs/nfs4trace.c b/fs/nfs/nfs4trace.c
index d22c6670f770..389941ccc9c9 100644
--- a/fs/nfs/nfs4trace.c
+++ b/fs/nfs/nfs4trace.c
@@ -2,6 +2,8 @@
/*
* Copyright (c) 2013 Trond Myklebust <Trond.Myklebust@netapp.com>
*/
+#include <uapi/linux/pr.h>
+#include <linux/blkdev.h>
#include <linux/nfs_fs.h>
#include "nfs4_fs.h"
#include "internal.h"
@@ -29,5 +31,10 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(ff_layout_read_error);
EXPORT_TRACEPOINT_SYMBOL_GPL(ff_layout_write_error);
EXPORT_TRACEPOINT_SYMBOL_GPL(ff_layout_commit_error);
+EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_reg);
+EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_reg_err);
+EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_unreg);
+EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_unreg_err);
+
EXPORT_TRACEPOINT_SYMBOL_GPL(fl_getdevinfo);
#endif
diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index 4de8780a7c48..f2090a491fcb 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -2153,6 +2153,94 @@ TRACE_EVENT(ff_layout_commit_error,
)
);
+DECLARE_EVENT_CLASS(pnfs_bl_pr_key_class,
+ TP_PROTO(
+ const struct block_device *bdev,
+ u64 key
+ ),
+ TP_ARGS(bdev, key),
+ TP_STRUCT__entry(
+ __field(u64, key)
+ __field(dev_t, dev)
+ __string(device, bdev->bd_disk->disk_name)
+ ),
+ TP_fast_assign(
+ __entry->key = key;
+ __entry->dev = bdev->bd_dev;
+ __assign_str(device);
+ ),
+ TP_printk("dev=%d,%d (%s) key=0x%016llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __get_str(device), __entry->key
+ )
+);
+
+#define DEFINE_NFS4_BLOCK_PRKEY_EVENT(name) \
+ DEFINE_EVENT(pnfs_bl_pr_key_class, name, \
+ TP_PROTO( \
+ const struct block_device *bdev, \
+ u64 key \
+ ), \
+ TP_ARGS(bdev, key))
+DEFINE_NFS4_BLOCK_PRKEY_EVENT(bl_pr_key_reg);
+DEFINE_NFS4_BLOCK_PRKEY_EVENT(bl_pr_key_unreg);
+
+/*
+ * From uapi/linux/pr.h
+ */
+TRACE_DEFINE_ENUM(PR_STS_SUCCESS);
+TRACE_DEFINE_ENUM(PR_STS_IOERR);
+TRACE_DEFINE_ENUM(PR_STS_RESERVATION_CONFLICT);
+TRACE_DEFINE_ENUM(PR_STS_RETRY_PATH_FAILURE);
+TRACE_DEFINE_ENUM(PR_STS_PATH_FAST_FAILED);
+TRACE_DEFINE_ENUM(PR_STS_PATH_FAILED);
+
+#define show_pr_status(x) \
+ __print_symbolic(x, \
+ { PR_STS_SUCCESS, "SUCCESS" }, \
+ { PR_STS_IOERR, "IOERR" }, \
+ { PR_STS_RESERVATION_CONFLICT, "RESERVATION_CONFLICT" }, \
+ { PR_STS_RETRY_PATH_FAILURE, "RETRY_PATH_FAILURE" }, \
+ { PR_STS_PATH_FAST_FAILED, "PATH_FAST_FAILED" }, \
+ { PR_STS_PATH_FAILED, "PATH_FAILED" })
+
+DECLARE_EVENT_CLASS(pnfs_bl_pr_key_err_class,
+ TP_PROTO(
+ const struct block_device *bdev,
+ u64 key,
+ int status
+ ),
+ TP_ARGS(bdev, key, status),
+ TP_STRUCT__entry(
+ __field(u64, key)
+ __field(dev_t, dev)
+ __field(unsigned long, status)
+ __string(device, bdev->bd_disk->disk_name)
+ ),
+ TP_fast_assign(
+ __entry->key = key;
+ __entry->dev = bdev->bd_dev;
+ __entry->status = status;
+ __assign_str(device);
+ ),
+ TP_printk("dev=%d,%d (%s) key=0x%016llx status=%s",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __get_str(device), __entry->key,
+ show_pr_status(__entry->status)
+ )
+);
+
+#define DEFINE_NFS4_BLOCK_PRKEY_ERR_EVENT(name) \
+ DEFINE_EVENT(pnfs_bl_pr_key_err_class, name, \
+ TP_PROTO( \
+ const struct block_device *bdev, \
+ u64 key, \
+ int status \
+ ), \
+ TP_ARGS(bdev, key, status))
+DEFINE_NFS4_BLOCK_PRKEY_ERR_EVENT(bl_pr_key_reg_err);
+DEFINE_NFS4_BLOCK_PRKEY_ERR_EVENT(bl_pr_key_unreg_err);
+
#ifdef CONFIG_NFS_V4_2
TRACE_DEFINE_ENUM(NFS4_CONTENT_DATA);
TRACE_DEFINE_ENUM(NFS4_CONTENT_HOLE);
--
2.45.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v2 4/4] nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg
2024-06-21 16:22 ` [PATCH v2 4/4] nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg cel
@ 2024-06-21 17:21 ` Anna Schumaker
2024-06-21 17:46 ` Chuck Lever III
2024-06-22 5:09 ` Christoph Hellwig
1 sibling, 1 reply; 15+ messages in thread
From: Anna Schumaker @ 2024-06-21 17:21 UTC (permalink / raw)
To: cel; +Cc: linux-nfs, Christoph Hellwig, Chuck Lever
Hi Chuck,
On Fri, Jun 21, 2024 at 12:22 PM <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> An administrator cannot take action on these messages, but the
> reported errors might be helpful for troubleshooting. Transition
> them to trace points so these events appear in the trace log and
> can be easily lined up with other traced NFS client operations.
>
> Examples:
>
> append_writer-6147 [000] 80.247393: bl_pr_key_reg: device=sdb key=0x666dcdabf29514fe
> append_writer-6147 [000] 80.247842: bl_pr_key_unreg: device=sdb key=0x666dcdabf29514fe
>
> umount.nfs4-6172 [002] 84.950409: bl_pr_key_unreg_err: device=sdb key=0x666dcdabf29514fe error=24
>
> Christoph points out that:
> > ... Note that the disk_name isn't really what
> > we'd want to trace anyway, as it misses the partition information.
> > The normal way to print the device name is the %pg printk specifier,
> > but I'm not sure how to correctly use that for tracing which wants
> > a string in the entry for binary tracing.
>
> The trace points copy the pr_info() that they are replacing, and
> show only the parent device name and not the partition. I'm still
> looking into how to record both parts of the device name.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfs/blocklayout/dev.c | 30 +++++++-------
> fs/nfs/nfs4trace.c | 7 ++++
> fs/nfs/nfs4trace.h | 88 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 111 insertions(+), 14 deletions(-)
>
> diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
> index 568f685dee4b..6c5d290ca81d 100644
> --- a/fs/nfs/blocklayout/dev.c
> +++ b/fs/nfs/blocklayout/dev.c
> @@ -10,6 +10,7 @@
> #include <linux/pr.h>
>
> #include "blocklayout.h"
> +#include "../nfs4trace.h"
>
> #define NFSDBG_FACILITY NFSDBG_PNFS_LD
>
> @@ -26,12 +27,14 @@ bl_free_device(struct pnfs_block_dev *dev)
> if (test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) {
> struct block_device *bdev = file_bdev(dev->bdev_file);
> const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
> - int error;
> + int status;
>
> - error = ops->pr_register(file_bdev(dev->bdev_file),
> - dev->pr_key, 0, false);
> - if (error)
> - pr_err("failed to unregister PR key.\n");
> + status = ops->pr_register(bdev, dev->pr_key, 0, false);
> + if (status)
> + trace_bl_pr_key_unreg_err(bdev, dev->pr_key,
> + status);
> + else
> + trace_bl_pr_key_unreg(bdev, dev->pr_key);
> }
>
> if (dev->bdev_file)
> @@ -243,10 +246,10 @@ static bool bl_pr_register_scsi(struct pnfs_block_dev *dev)
>
> status = ops->pr_register(bdev, 0, dev->pr_key, true);
> if (status) {
> - pr_err("pNFS: failed to register key for block device %s.",
> - bdev->bd_disk->disk_name);
> + trace_bl_pr_key_reg_err(bdev, dev->pr_key, status);
> return false;
> }
> + trace_bl_pr_key_reg(bdev, dev->pr_key);
> return true;
> }
>
> @@ -351,8 +354,9 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
> struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask)
> {
> struct pnfs_block_volume *v = &volumes[idx];
> - struct file *bdev_file;
> + struct block_device *bdev;
> const struct pr_ops *ops;
> + struct file *bdev_file;
> int error;
>
> if (!bl_validate_designator(v))
> @@ -373,8 +377,9 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
> return PTR_ERR(bdev_file);
> }
> d->bdev_file = bdev_file;
> + bdev = file_bdev(bdev_file);
>
> - d->len = bdev_nr_bytes(file_bdev(d->bdev_file));
> + d->len = bdev_nr_bytes(bdev);
> d->map = bl_map_simple;
> d->pr_key = v->scsi.pr_key;
>
> @@ -383,13 +388,10 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
> goto out_blkdev_put;
> }
>
> - pr_info("pNFS: using block device %s (reservation key 0x%llx)\n",
> - file_bdev(d->bdev_file)->bd_disk->disk_name, d->pr_key);
> -
> - ops = file_bdev(d->bdev_file)->bd_disk->fops->pr_ops;
> + ops = bdev->bd_disk->fops->pr_ops;
> if (!ops) {
> pr_err("pNFS: block device %s does not support reservations.",
> - file_bdev(d->bdev_file)->bd_disk->disk_name);
> + bdev->bd_disk->disk_name);
> error = -EINVAL;
> goto out_blkdev_put;
> }
> diff --git a/fs/nfs/nfs4trace.c b/fs/nfs/nfs4trace.c
> index d22c6670f770..389941ccc9c9 100644
> --- a/fs/nfs/nfs4trace.c
> +++ b/fs/nfs/nfs4trace.c
> @@ -2,6 +2,8 @@
> /*
> * Copyright (c) 2013 Trond Myklebust <Trond.Myklebust@netapp.com>
> */
> +#include <uapi/linux/pr.h>
> +#include <linux/blkdev.h>
> #include <linux/nfs_fs.h>
> #include "nfs4_fs.h"
> #include "internal.h"
> @@ -29,5 +31,10 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(ff_layout_read_error);
> EXPORT_TRACEPOINT_SYMBOL_GPL(ff_layout_write_error);
> EXPORT_TRACEPOINT_SYMBOL_GPL(ff_layout_commit_error);
>
> +EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_reg);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_reg_err);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_unreg);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_unreg_err);
> +
> EXPORT_TRACEPOINT_SYMBOL_GPL(fl_getdevinfo);
> #endif
> diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
> index 4de8780a7c48..f2090a491fcb 100644
> --- a/fs/nfs/nfs4trace.h
> +++ b/fs/nfs/nfs4trace.h
> @@ -2153,6 +2153,94 @@ TRACE_EVENT(ff_layout_commit_error,
> )
> );
>
> +DECLARE_EVENT_CLASS(pnfs_bl_pr_key_class,
> + TP_PROTO(
> + const struct block_device *bdev,
> + u64 key
> + ),
> + TP_ARGS(bdev, key),
> + TP_STRUCT__entry(
> + __field(u64, key)
> + __field(dev_t, dev)
> + __string(device, bdev->bd_disk->disk_name)
> + ),
> + TP_fast_assign(
> + __entry->key = key;
> + __entry->dev = bdev->bd_dev;
^^^^^^^
b4 tells me this patch adds trailing whitespace at the line above:
Patch failed at 0004 nfs/blocklayout: SCSI layout trace points for
reservation key reg/unreg
/home/anna/Projects/linux-nfs.git/.git/rebase-apply/patch:135:
trailing whitespace.
__entry->dev = bdev->bd_dev;
/home/anna/Projects/linux-nfs.git/.git/rebase-apply/patch:188:
trailing whitespace.
__entry->dev = bdev->bd_dev;
error: patch failed: fs/nfs/blocklayout/dev.c:383
error: fs/nfs/blocklayout/dev.c: patch does not apply
> + __assign_str(device);
> + ),
> + TP_printk("dev=%d,%d (%s) key=0x%016llx",
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __get_str(device), __entry->key
> + )
> +);
> +
> +#define DEFINE_NFS4_BLOCK_PRKEY_EVENT(name) \
> + DEFINE_EVENT(pnfs_bl_pr_key_class, name, \
> + TP_PROTO( \
> + const struct block_device *bdev, \
> + u64 key \
> + ), \
> + TP_ARGS(bdev, key))
> +DEFINE_NFS4_BLOCK_PRKEY_EVENT(bl_pr_key_reg);
> +DEFINE_NFS4_BLOCK_PRKEY_EVENT(bl_pr_key_unreg);
> +
> +/*
> + * From uapi/linux/pr.h
> + */
> +TRACE_DEFINE_ENUM(PR_STS_SUCCESS);
> +TRACE_DEFINE_ENUM(PR_STS_IOERR);
> +TRACE_DEFINE_ENUM(PR_STS_RESERVATION_CONFLICT);
> +TRACE_DEFINE_ENUM(PR_STS_RETRY_PATH_FAILURE);
> +TRACE_DEFINE_ENUM(PR_STS_PATH_FAST_FAILED);
> +TRACE_DEFINE_ENUM(PR_STS_PATH_FAILED);
> +
> +#define show_pr_status(x) \
> + __print_symbolic(x, \
> + { PR_STS_SUCCESS, "SUCCESS" }, \
> + { PR_STS_IOERR, "IOERR" }, \
> + { PR_STS_RESERVATION_CONFLICT, "RESERVATION_CONFLICT" }, \
> + { PR_STS_RETRY_PATH_FAILURE, "RETRY_PATH_FAILURE" }, \
> + { PR_STS_PATH_FAST_FAILED, "PATH_FAST_FAILED" }, \
> + { PR_STS_PATH_FAILED, "PATH_FAILED" })
> +
> +DECLARE_EVENT_CLASS(pnfs_bl_pr_key_err_class,
> + TP_PROTO(
> + const struct block_device *bdev,
> + u64 key,
> + int status
> + ),
> + TP_ARGS(bdev, key, status),
> + TP_STRUCT__entry(
> + __field(u64, key)
> + __field(dev_t, dev)
> + __field(unsigned long, status)
> + __string(device, bdev->bd_disk->disk_name)
> + ),
> + TP_fast_assign(
> + __entry->key = key;
> + __entry->dev = bdev->bd_dev;
^^^^^^
And for this line, too.
Thanks,
Anna
> + __entry->status = status;
> + __assign_str(device);
> + ),
> + TP_printk("dev=%d,%d (%s) key=0x%016llx status=%s",
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __get_str(device), __entry->key,
> + show_pr_status(__entry->status)
> + )
> +);
> +
> +#define DEFINE_NFS4_BLOCK_PRKEY_ERR_EVENT(name) \
> + DEFINE_EVENT(pnfs_bl_pr_key_err_class, name, \
> + TP_PROTO( \
> + const struct block_device *bdev, \
> + u64 key, \
> + int status \
> + ), \
> + TP_ARGS(bdev, key, status))
> +DEFINE_NFS4_BLOCK_PRKEY_ERR_EVENT(bl_pr_key_reg_err);
> +DEFINE_NFS4_BLOCK_PRKEY_ERR_EVENT(bl_pr_key_unreg_err);
> +
> #ifdef CONFIG_NFS_V4_2
> TRACE_DEFINE_ENUM(NFS4_CONTENT_DATA);
> TRACE_DEFINE_ENUM(NFS4_CONTENT_HOLE);
> --
> 2.45.1
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2 4/4] nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg
2024-06-21 17:21 ` Anna Schumaker
@ 2024-06-21 17:46 ` Chuck Lever III
0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever III @ 2024-06-21 17:46 UTC (permalink / raw)
To: Anna Schumaker; +Cc: Chuck Lever, Linux NFS Mailing List, Christoph Hellwig
> On Jun 21, 2024, at 1:21 PM, Anna Schumaker <schumaker.anna@gmail.com> wrote:
>
> Hi Chuck,
>
> On Fri, Jun 21, 2024 at 12:22 PM <cel@kernel.org> wrote:
>>
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> An administrator cannot take action on these messages, but the
>> reported errors might be helpful for troubleshooting. Transition
>> them to trace points so these events appear in the trace log and
>> can be easily lined up with other traced NFS client operations.
>>
>> Examples:
>>
>> append_writer-6147 [000] 80.247393: bl_pr_key_reg: device=sdb key=0x666dcdabf29514fe
>> append_writer-6147 [000] 80.247842: bl_pr_key_unreg: device=sdb key=0x666dcdabf29514fe
>>
>> umount.nfs4-6172 [002] 84.950409: bl_pr_key_unreg_err: device=sdb key=0x666dcdabf29514fe error=24
>>
>> Christoph points out that:
>>> ... Note that the disk_name isn't really what
>>> we'd want to trace anyway, as it misses the partition information.
>>> The normal way to print the device name is the %pg printk specifier,
>>> but I'm not sure how to correctly use that for tracing which wants
>>> a string in the entry for binary tracing.
>>
>> The trace points copy the pr_info() that they are replacing, and
>> show only the parent device name and not the partition. I'm still
>> looking into how to record both parts of the device name.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfs/blocklayout/dev.c | 30 +++++++-------
>> fs/nfs/nfs4trace.c | 7 ++++
>> fs/nfs/nfs4trace.h | 88 ++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 111 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
>> index 568f685dee4b..6c5d290ca81d 100644
>> --- a/fs/nfs/blocklayout/dev.c
>> +++ b/fs/nfs/blocklayout/dev.c
>> @@ -10,6 +10,7 @@
>> #include <linux/pr.h>
>>
>> #include "blocklayout.h"
>> +#include "../nfs4trace.h"
>>
>> #define NFSDBG_FACILITY NFSDBG_PNFS_LD
>>
>> @@ -26,12 +27,14 @@ bl_free_device(struct pnfs_block_dev *dev)
>> if (test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) {
>> struct block_device *bdev = file_bdev(dev->bdev_file);
>> const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
>> - int error;
>> + int status;
>>
>> - error = ops->pr_register(file_bdev(dev->bdev_file),
>> - dev->pr_key, 0, false);
>> - if (error)
>> - pr_err("failed to unregister PR key.\n");
>> + status = ops->pr_register(bdev, dev->pr_key, 0, false);
>> + if (status)
>> + trace_bl_pr_key_unreg_err(bdev, dev->pr_key,
>> + status);
>> + else
>> + trace_bl_pr_key_unreg(bdev, dev->pr_key);
>> }
>>
>> if (dev->bdev_file)
>> @@ -243,10 +246,10 @@ static bool bl_pr_register_scsi(struct pnfs_block_dev *dev)
>>
>> status = ops->pr_register(bdev, 0, dev->pr_key, true);
>> if (status) {
>> - pr_err("pNFS: failed to register key for block device %s.",
>> - bdev->bd_disk->disk_name);
>> + trace_bl_pr_key_reg_err(bdev, dev->pr_key, status);
>> return false;
>> }
>> + trace_bl_pr_key_reg(bdev, dev->pr_key);
>> return true;
>> }
>>
>> @@ -351,8 +354,9 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
>> struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask)
>> {
>> struct pnfs_block_volume *v = &volumes[idx];
>> - struct file *bdev_file;
>> + struct block_device *bdev;
>> const struct pr_ops *ops;
>> + struct file *bdev_file;
>> int error;
>>
>> if (!bl_validate_designator(v))
>> @@ -373,8 +377,9 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
>> return PTR_ERR(bdev_file);
>> }
>> d->bdev_file = bdev_file;
>> + bdev = file_bdev(bdev_file);
>>
>> - d->len = bdev_nr_bytes(file_bdev(d->bdev_file));
>> + d->len = bdev_nr_bytes(bdev);
>> d->map = bl_map_simple;
>> d->pr_key = v->scsi.pr_key;
>>
>> @@ -383,13 +388,10 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
>> goto out_blkdev_put;
>> }
>>
>> - pr_info("pNFS: using block device %s (reservation key 0x%llx)\n",
>> - file_bdev(d->bdev_file)->bd_disk->disk_name, d->pr_key);
>> -
>> - ops = file_bdev(d->bdev_file)->bd_disk->fops->pr_ops;
>> + ops = bdev->bd_disk->fops->pr_ops;
>> if (!ops) {
>> pr_err("pNFS: block device %s does not support reservations.",
>> - file_bdev(d->bdev_file)->bd_disk->disk_name);
>> + bdev->bd_disk->disk_name);
>> error = -EINVAL;
>> goto out_blkdev_put;
>> }
>> diff --git a/fs/nfs/nfs4trace.c b/fs/nfs/nfs4trace.c
>> index d22c6670f770..389941ccc9c9 100644
>> --- a/fs/nfs/nfs4trace.c
>> +++ b/fs/nfs/nfs4trace.c
>> @@ -2,6 +2,8 @@
>> /*
>> * Copyright (c) 2013 Trond Myklebust <Trond.Myklebust@netapp.com>
>> */
>> +#include <uapi/linux/pr.h>
>> +#include <linux/blkdev.h>
>> #include <linux/nfs_fs.h>
>> #include "nfs4_fs.h"
>> #include "internal.h"
>> @@ -29,5 +31,10 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(ff_layout_read_error);
>> EXPORT_TRACEPOINT_SYMBOL_GPL(ff_layout_write_error);
>> EXPORT_TRACEPOINT_SYMBOL_GPL(ff_layout_commit_error);
>>
>> +EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_reg);
>> +EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_reg_err);
>> +EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_unreg);
>> +EXPORT_TRACEPOINT_SYMBOL_GPL(bl_pr_key_unreg_err);
>> +
>> EXPORT_TRACEPOINT_SYMBOL_GPL(fl_getdevinfo);
>> #endif
>> diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
>> index 4de8780a7c48..f2090a491fcb 100644
>> --- a/fs/nfs/nfs4trace.h
>> +++ b/fs/nfs/nfs4trace.h
>> @@ -2153,6 +2153,94 @@ TRACE_EVENT(ff_layout_commit_error,
>> )
>> );
>>
>> +DECLARE_EVENT_CLASS(pnfs_bl_pr_key_class,
>> + TP_PROTO(
>> + const struct block_device *bdev,
>> + u64 key
>> + ),
>> + TP_ARGS(bdev, key),
>> + TP_STRUCT__entry(
>> + __field(u64, key)
>> + __field(dev_t, dev)
>> + __string(device, bdev->bd_disk->disk_name)
>> + ),
>> + TP_fast_assign(
>> + __entry->key = key;
>> + __entry->dev = bdev->bd_dev;
> ^^^^^^^
> b4 tells me this patch adds trailing whitespace at the line above:
>
> Patch failed at 0004 nfs/blocklayout: SCSI layout trace points for
> reservation key reg/unreg
> /home/anna/Projects/linux-nfs.git/.git/rebase-apply/patch:135:
> trailing whitespace.
> __entry->dev = bdev->bd_dev;
> /home/anna/Projects/linux-nfs.git/.git/rebase-apply/patch:188:
> trailing whitespace.
> __entry->dev = bdev->bd_dev;
> error: patch failed: fs/nfs/blocklayout/dev.c:383
> error: fs/nfs/blocklayout/dev.c: patch does not apply
Fixed in my tree. Will appear in v3 of this series.
>> + __assign_str(device);
>> + ),
>> + TP_printk("dev=%d,%d (%s) key=0x%016llx",
>> + MAJOR(__entry->dev), MINOR(__entry->dev),
>> + __get_str(device), __entry->key
>> + )
>> +);
>> +
>> +#define DEFINE_NFS4_BLOCK_PRKEY_EVENT(name) \
>> + DEFINE_EVENT(pnfs_bl_pr_key_class, name, \
>> + TP_PROTO( \
>> + const struct block_device *bdev, \
>> + u64 key \
>> + ), \
>> + TP_ARGS(bdev, key))
>> +DEFINE_NFS4_BLOCK_PRKEY_EVENT(bl_pr_key_reg);
>> +DEFINE_NFS4_BLOCK_PRKEY_EVENT(bl_pr_key_unreg);
>> +
>> +/*
>> + * From uapi/linux/pr.h
>> + */
>> +TRACE_DEFINE_ENUM(PR_STS_SUCCESS);
>> +TRACE_DEFINE_ENUM(PR_STS_IOERR);
>> +TRACE_DEFINE_ENUM(PR_STS_RESERVATION_CONFLICT);
>> +TRACE_DEFINE_ENUM(PR_STS_RETRY_PATH_FAILURE);
>> +TRACE_DEFINE_ENUM(PR_STS_PATH_FAST_FAILED);
>> +TRACE_DEFINE_ENUM(PR_STS_PATH_FAILED);
>> +
>> +#define show_pr_status(x) \
>> + __print_symbolic(x, \
>> + { PR_STS_SUCCESS, "SUCCESS" }, \
>> + { PR_STS_IOERR, "IOERR" }, \
>> + { PR_STS_RESERVATION_CONFLICT, "RESERVATION_CONFLICT" }, \
>> + { PR_STS_RETRY_PATH_FAILURE, "RETRY_PATH_FAILURE" }, \
>> + { PR_STS_PATH_FAST_FAILED, "PATH_FAST_FAILED" }, \
>> + { PR_STS_PATH_FAILED, "PATH_FAILED" })
>> +
>> +DECLARE_EVENT_CLASS(pnfs_bl_pr_key_err_class,
>> + TP_PROTO(
>> + const struct block_device *bdev,
>> + u64 key,
>> + int status
>> + ),
>> + TP_ARGS(bdev, key, status),
>> + TP_STRUCT__entry(
>> + __field(u64, key)
>> + __field(dev_t, dev)
>> + __field(unsigned long, status)
>> + __string(device, bdev->bd_disk->disk_name)
>> + ),
>> + TP_fast_assign(
>> + __entry->key = key;
>> + __entry->dev = bdev->bd_dev;
> ^^^^^^
> And for this line, too.
>
> Thanks,
> Anna
>
>> + __entry->status = status;
>> + __assign_str(device);
>> + ),
>> + TP_printk("dev=%d,%d (%s) key=0x%016llx status=%s",
>> + MAJOR(__entry->dev), MINOR(__entry->dev),
>> + __get_str(device), __entry->key,
>> + show_pr_status(__entry->status)
>> + )
>> +);
>> +
>> +#define DEFINE_NFS4_BLOCK_PRKEY_ERR_EVENT(name) \
>> + DEFINE_EVENT(pnfs_bl_pr_key_err_class, name, \
>> + TP_PROTO( \
>> + const struct block_device *bdev, \
>> + u64 key, \
>> + int status \
>> + ), \
>> + TP_ARGS(bdev, key, status))
>> +DEFINE_NFS4_BLOCK_PRKEY_ERR_EVENT(bl_pr_key_reg_err);
>> +DEFINE_NFS4_BLOCK_PRKEY_ERR_EVENT(bl_pr_key_unreg_err);
>> +
>> #ifdef CONFIG_NFS_V4_2
>> TRACE_DEFINE_ENUM(NFS4_CONTENT_DATA);
>> TRACE_DEFINE_ENUM(NFS4_CONTENT_HOLE);
>> --
>> 2.45.1
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 4/4] nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg
2024-06-21 16:22 ` [PATCH v2 4/4] nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg cel
2024-06-21 17:21 ` Anna Schumaker
@ 2024-06-22 5:09 ` Christoph Hellwig
1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2024-06-22 5:09 UTC (permalink / raw)
To: cel; +Cc: linux-nfs, Christoph Hellwig, Chuck Lever
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/4] Fixes for pNFS SCSI layout PR key registration
2024-06-21 16:22 [PATCH v2 0/4] Fixes for pNFS SCSI layout PR key registration cel
` (3 preceding siblings ...)
2024-06-21 16:22 ` [PATCH v2 4/4] nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg cel
@ 2024-06-21 18:03 ` Benjamin Coddington
4 siblings, 0 replies; 15+ messages in thread
From: Benjamin Coddington @ 2024-06-21 18:03 UTC (permalink / raw)
To: cel; +Cc: linux-nfs, Christoph Hellwig, Chuck Lever
On 21 Jun 2024, at 12:22, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> The double registration/unregistration I observed was actually the
> registration and unregistration of two separate block devices: one
> for /media/test and one for /media/scratch. So, that was a false
> alarm.
>
> The complete fstests run shows:
>
> Failures: generic/126 generic/355 generic/450 generic/740
>
> unknown: run fstests generic/108 at 2024-06-21 10:13:58
> systemd[1]: Started fstests-generic-108.scope - /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/108.
> kernel: sd 6:0:0:1: reservation conflict
> kernel: sd 6:0:0:1: [sdb] tag#30 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> kernel: sd 6:0:0:1: [sdb] tag#30 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
> kernel: reservation conflict error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
> systemd[1]: fstests-generic-108.scope: Deactivated successfully.
>
> These errors appear in the system journal only when the whole
> fstests series is run. I can see the "block_rq_complete [-52]" in
> the trace log. But the test output shows:
>
> generic/108 [not run] require cel-nfsd:/export/nfs-pnfs-fs-s to be valid block disk
>
> generic/450 is also failing:
>
> generic/450 - output mismatch (see /data/fstests-install/xfstests/results/cel-nfs-pnfs/6.10.0-rc4-gd24c98202dbe/nfs_pnfs/generic/450.out.bad)
> --- tests/generic/450.out 2024-06-20 16:50:06.548035014 -0400
> +++ /data/fstests-install/xfstests/results/cel-nfs-pnfs/6.10.0-rc4-gd24c98202dbe/nfs_pnfs/generic/450.out.bad 2024-06-21 10:44:02.600634341 -0400
> @@ -8,4 +8,6 @@
> direct read the second block contains EOF
> direct read a sector at (after) EOF
> direct read the last sector past EOF
> +expect [2093056,4096,0], got [2093056,4096,4096]
> direct read at far away from EOF
> +expect [104857600,4096,0], got [104857600,4096,4096]
> ...
>
> However this might be a bug that existed before this series.
>
> The other three explicit test failures are usual for NFSv4.1.
>
> ---
> Changes since RFC:
> - series re-ordered to place fixes first
> - address review comments as best I can
Looks good, I like the bitops over the bool for pr_registered.
For the series:
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Ben
^ permalink raw reply [flat|nested] 15+ messages in thread