public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] two fixes for pNFS SCSI device handling
@ 2024-11-18 22:40 Benjamin Coddington
  2024-11-18 22:40 ` [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device Benjamin Coddington
  2024-11-18 22:40 ` [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure Benjamin Coddington
  0 siblings, 2 replies; 6+ messages in thread
From: Benjamin Coddington @ 2024-11-18 22:40 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever; +Cc: linux-nfs, linux-kernel

A bit late for v6.13 perhaps, but here are two fresh corrections for pNFS
SCSI device handling, and some comments as requeted by Christoph.

Benjamin Coddington (2):
  nfs/blocklayout: Don't attempt unregister for invalid block device
  nfs/blocklayout: Limit repeat device registration on failure

 fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++-
 fs/nfs/blocklayout/dev.c         |  7 +++++--
 2 files changed, 16 insertions(+), 3 deletions(-)


base-commit: adc218676eef25575469234709c2d87185ca223a
-- 
2.47.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device
  2024-11-18 22:40 [PATCH 0/2] two fixes for pNFS SCSI device handling Benjamin Coddington
@ 2024-11-18 22:40 ` Benjamin Coddington
  2024-11-19 12:42   ` Christoph Hellwig
  2024-11-18 22:40 ` [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure Benjamin Coddington
  1 sibling, 1 reply; 6+ messages in thread
From: Benjamin Coddington @ 2024-11-18 22:40 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever; +Cc: linux-nfs, linux-kernel

Since commit d869da91cccb, an unmount of a pNFS SCSI layout-enabled NFS
will dereference a NULL block_device in:

  bl_unregister_scsi+0x16/0xe0 [blocklayoutdriver]
  bl_free_device+0x70/0x80 [blocklayoutdriver]
  bl_free_deviceid_node+0x12/0x30 [blocklayoutdriver]
  nfs4_put_deviceid_node+0x60/0xc0 [nfsv4]
  nfs4_deviceid_purge_client+0x132/0x190 [nfsv4]
  unset_pnfs_layoutdriver+0x59/0x60 [nfsv4]
  nfs4_destroy_server+0x36/0x70 [nfsv4]
  nfs_free_server+0x23/0xe0 [nfs]
  deactivate_locked_super+0x30/0xb0
  cleanup_mnt+0xba/0x150
  task_work_run+0x59/0x90
  syscall_exit_to_user_mode+0x217/0x220
  do_syscall_64+0x8e/0x160

This happens because even though we were able to create the
nfs4_deviceid_node, the lookup for the device was unable to attach the
block device to the pnfs_block_dev.

If we never found a block device to register, we can avoid this case with
the PNFS_BDEV_REGISTERED flag.  Move the deref behind the test for the
flag.

Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfs/blocklayout/dev.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index 6252f4447945..7ae79814f4ff 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -16,13 +16,16 @@
 
 static void bl_unregister_scsi(struct pnfs_block_dev *dev)
 {
-	struct block_device *bdev = file_bdev(dev->bdev_file);
-	const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
+	struct block_device *bdev;
+	const struct pr_ops *ops;
 	int status;
 
 	if (!test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags))
 		return;
 
+	bdev = file_bdev(dev->bdev_file);
+	ops = bdev->bd_disk->fops->pr_ops;
+
 	status = ops->pr_register(bdev, dev->pr_key, 0, false);
 	if (status)
 		trace_bl_pr_key_unreg_err(bdev, dev->pr_key, status);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure
  2024-11-18 22:40 [PATCH 0/2] two fixes for pNFS SCSI device handling Benjamin Coddington
  2024-11-18 22:40 ` [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device Benjamin Coddington
@ 2024-11-18 22:40 ` Benjamin Coddington
  2024-11-19 12:42   ` Christoph Hellwig
  1 sibling, 1 reply; 6+ messages in thread
From: Benjamin Coddington @ 2024-11-18 22:40 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever; +Cc: linux-nfs, linux-kernel

If we're unable to register a SCSI device, ensure we mark the device as
unavailable so that it will timeout and be re-added via GETDEVINFO.  This
avoids repeated doomed attempts to register a device in the IO path.

Add some clarifying comments as well.

Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 0becdec12970..b36bc2f4f7e2 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server,
 	if (!node)
 		return ERR_PTR(-ENODEV);
 
+	/*
+	 * Devices that are marked unavailable are left in the cache with a
+	 * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or
+	 * constantly attempting to register the device.  Once marked as
+	 * unavailable they must be deleted and never reused.
+	 */
 	if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) {
 		unsigned long end = jiffies;
 		unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT;
 
 		if (!time_in_range(node->timestamp_unavailable, start, end)) {
+			/* Force a new GETDEVINFO for this LAYOUT */
 			nfs4_delete_deviceid(node->ld, node->nfs_client, id);
 			goto retry;
 		}
 		goto out_put;
 	}
 
-	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node)))
+	/* If we cannot register, treat this device as transient */
+	if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) {
+		nfs4_mark_deviceid_unavailable(node);
 		goto out_put;
+	}
 
 	return node;
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device
  2024-11-18 22:40 ` [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device Benjamin Coddington
@ 2024-11-19 12:42   ` Christoph Hellwig
  2024-11-19 13:46     ` Benjamin Coddington
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2024-11-19 12:42 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Trond Myklebust, Anna Schumaker, Chuck Lever, linux-nfs,
	linux-kernel

On Mon, Nov 18, 2024 at 05:40:40PM -0500, Benjamin Coddington wrote:
> Since commit d869da91cccb, an unmount of a pNFS SCSI layout-enabled NFS

Please also spell out the commit subject in the commit log body, similar
to to the Fixes tag.

> index 6252f4447945..7ae79814f4ff 100644
> --- a/fs/nfs/blocklayout/dev.c
> +++ b/fs/nfs/blocklayout/dev.c
> @@ -16,13 +16,16 @@
>  
>  static void bl_unregister_scsi(struct pnfs_block_dev *dev)
>  {
> -	struct block_device *bdev = file_bdev(dev->bdev_file);
> -	const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
> +	struct block_device *bdev;
> +	const struct pr_ops *ops;
>  	int status;
>  
>  	if (!test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags))
>  		return;
>  
> +	bdev = file_bdev(dev->bdev_file);
> +	ops = bdev->bd_disk->fops->pr_ops;
> +

Hmm.  Just moving the test_and_clear_bit to the caller would
feel cleaner than this to me.

But either way the change looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure
  2024-11-18 22:40 ` [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure Benjamin Coddington
@ 2024-11-19 12:42   ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2024-11-19 12:42 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Trond Myklebust, Anna Schumaker, Chuck Lever, linux-nfs,
	linux-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device
  2024-11-19 12:42   ` Christoph Hellwig
@ 2024-11-19 13:46     ` Benjamin Coddington
  0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Coddington @ 2024-11-19 13:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Trond Myklebust, Anna Schumaker, Chuck Lever, linux-nfs,
	linux-kernel

On 19 Nov 2024, at 7:42, Christoph Hellwig wrote:

> On Mon, Nov 18, 2024 at 05:40:40PM -0500, Benjamin Coddington wrote:
>> Since commit d869da91cccb, an unmount of a pNFS SCSI layout-enabled NFS
>
> Please also spell out the commit subject in the commit log body, similar
> to to the Fixes tag.

Will do.

>> index 6252f4447945..7ae79814f4ff 100644
>> --- a/fs/nfs/blocklayout/dev.c
>> +++ b/fs/nfs/blocklayout/dev.c
>> @@ -16,13 +16,16 @@
>>
>>  static void bl_unregister_scsi(struct pnfs_block_dev *dev)
>>  {
>> -	struct block_device *bdev = file_bdev(dev->bdev_file);
>> -	const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
>> +	struct block_device *bdev;
>> +	const struct pr_ops *ops;
>>  	int status;
>>
>>  	if (!test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags))
>>  		return;
>>
>> +	bdev = file_bdev(dev->bdev_file);
>> +	ops = bdev->bd_disk->fops->pr_ops;
>> +
>
> Hmm.  Just moving the test_and_clear_bit to the caller would
> feel cleaner than this to me.

We can do this too - I'll send another version.  I didn't go this way
because I felt the bit test and clear was an important part of the function
that could be lost if we ever had another caller.

> But either way the change looks good:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks for the review.

Ben


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-11-19 13:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-18 22:40 [PATCH 0/2] two fixes for pNFS SCSI device handling Benjamin Coddington
2024-11-18 22:40 ` [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device Benjamin Coddington
2024-11-19 12:42   ` Christoph Hellwig
2024-11-19 13:46     ` Benjamin Coddington
2024-11-18 22:40 ` [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure Benjamin Coddington
2024-11-19 12:42   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox