* [PATCH v2 0/2] two fixes for pNFS SCSI device handling @ 2024-11-20 14:09 Benjamin Coddington 2024-11-20 14:09 ` [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device Benjamin Coddington 2024-11-20 14:09 ` [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure Benjamin Coddington 0 siblings, 2 replies; 7+ messages in thread From: Benjamin Coddington @ 2024-11-20 14:09 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker, Chuck Lever Cc: linux-nfs, linux-kernel, Christoph Hellwig A bit late for v6.13 perhaps, but here are two fresh corrections for pNFS SCSI device handling, and some comments as requested by Christoph. On v2: add full commit subject in 1/2, change the caller in 2/2. Benjamin Coddington (2): nfs/blocklayout: Don't attempt unregister for invalid block device nfs/blocklayout: Limit repeat device registration on failure fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++- fs/nfs/blocklayout/dev.c | 6 ++---- 2 files changed, 13 insertions(+), 5 deletions(-) base-commit: adc218676eef25575469234709c2d87185ca223a -- 2.47.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device 2024-11-20 14:09 [PATCH v2 0/2] two fixes for pNFS SCSI device handling Benjamin Coddington @ 2024-11-20 14:09 ` Benjamin Coddington 2024-11-20 15:07 ` Chuck Lever 2024-11-20 14:09 ` [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure Benjamin Coddington 1 sibling, 1 reply; 7+ messages in thread From: Benjamin Coddington @ 2024-11-20 14:09 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker, Chuck Lever Cc: linux-nfs, linux-kernel, Christoph Hellwig Since commit d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") an unmount of a pNFS SCSI layout-enabled NFS may dereference a NULL block_device in: bl_unregister_scsi+0x16/0xe0 [blocklayoutdriver] bl_free_device+0x70/0x80 [blocklayoutdriver] bl_free_deviceid_node+0x12/0x30 [blocklayoutdriver] nfs4_put_deviceid_node+0x60/0xc0 [nfsv4] nfs4_deviceid_purge_client+0x132/0x190 [nfsv4] unset_pnfs_layoutdriver+0x59/0x60 [nfsv4] nfs4_destroy_server+0x36/0x70 [nfsv4] nfs_free_server+0x23/0xe0 [nfs] deactivate_locked_super+0x30/0xb0 cleanup_mnt+0xba/0x150 task_work_run+0x59/0x90 syscall_exit_to_user_mode+0x217/0x220 do_syscall_64+0x8e/0x160 This happens because even though we were able to create the nfs4_deviceid_node, the lookup for the device was unable to attach the block device to the pnfs_block_dev. If we never found a block device to register, we can avoid this case with the PNFS_BDEV_REGISTERED flag. Move the deref behind the test for the flag. Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> --- fs/nfs/blocklayout/dev.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c index 6252f4447945..cab8809f0e0f 100644 --- a/fs/nfs/blocklayout/dev.c +++ b/fs/nfs/blocklayout/dev.c @@ -20,9 +20,6 @@ static void bl_unregister_scsi(struct pnfs_block_dev *dev) const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops; int status; - if (!test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) - return; - status = ops->pr_register(bdev, dev->pr_key, 0, false); if (status) trace_bl_pr_key_unreg_err(bdev, dev->pr_key, status); @@ -58,7 +55,8 @@ static void bl_unregister_dev(struct pnfs_block_dev *dev) return; } - if (dev->type == PNFS_BLOCK_VOLUME_SCSI) + if (dev->type == PNFS_BLOCK_VOLUME_SCSI && + test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) bl_unregister_scsi(dev); } -- 2.47.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device 2024-11-20 14:09 ` [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device Benjamin Coddington @ 2024-11-20 15:07 ` Chuck Lever 0 siblings, 0 replies; 7+ messages in thread From: Chuck Lever @ 2024-11-20 15:07 UTC (permalink / raw) To: Benjamin Coddington Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-kernel, Christoph Hellwig On Wed, Nov 20, 2024 at 09:09:34AM -0500, Benjamin Coddington wrote: > Since commit d869da91cccb ("nfs/blocklayout: Fix premature PR key > unregistration") an unmount of a pNFS SCSI layout-enabled NFS may > dereference a NULL block_device in: > > bl_unregister_scsi+0x16/0xe0 [blocklayoutdriver] > bl_free_device+0x70/0x80 [blocklayoutdriver] > bl_free_deviceid_node+0x12/0x30 [blocklayoutdriver] > nfs4_put_deviceid_node+0x60/0xc0 [nfsv4] > nfs4_deviceid_purge_client+0x132/0x190 [nfsv4] > unset_pnfs_layoutdriver+0x59/0x60 [nfsv4] > nfs4_destroy_server+0x36/0x70 [nfsv4] > nfs_free_server+0x23/0xe0 [nfs] > deactivate_locked_super+0x30/0xb0 > cleanup_mnt+0xba/0x150 > task_work_run+0x59/0x90 > syscall_exit_to_user_mode+0x217/0x220 > do_syscall_64+0x8e/0x160 > > This happens because even though we were able to create the > nfs4_deviceid_node, the lookup for the device was unable to attach the > block device to the pnfs_block_dev. > > If we never found a block device to register, we can avoid this case with > the PNFS_BDEV_REGISTERED flag. Move the deref behind the test for the > flag. > > Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > --- > fs/nfs/blocklayout/dev.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c > index 6252f4447945..cab8809f0e0f 100644 > --- a/fs/nfs/blocklayout/dev.c > +++ b/fs/nfs/blocklayout/dev.c > @@ -20,9 +20,6 @@ static void bl_unregister_scsi(struct pnfs_block_dev *dev) > const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops; > int status; > > - if (!test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) > - return; > - > status = ops->pr_register(bdev, dev->pr_key, 0, false); > if (status) > trace_bl_pr_key_unreg_err(bdev, dev->pr_key, status); > @@ -58,7 +55,8 @@ static void bl_unregister_dev(struct pnfs_block_dev *dev) > return; > } > > - if (dev->type == PNFS_BLOCK_VOLUME_SCSI) > + if (dev->type == PNFS_BLOCK_VOLUME_SCSI && > + test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) > bl_unregister_scsi(dev); > } > > -- > 2.47.0 > Reviewed-by: Chuck Lever <chuck.lever@oracle.com> -- Chuck Lever ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure 2024-11-20 14:09 [PATCH v2 0/2] two fixes for pNFS SCSI device handling Benjamin Coddington 2024-11-20 14:09 ` [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device Benjamin Coddington @ 2024-11-20 14:09 ` Benjamin Coddington 2024-11-20 15:22 ` Chuck Lever 1 sibling, 1 reply; 7+ messages in thread From: Benjamin Coddington @ 2024-11-20 14:09 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker, Chuck Lever Cc: linux-nfs, linux-kernel, Christoph Hellwig If we're unable to register a SCSI device, ensure we mark the device as unavailable so that it will timeout and be re-added via GETDEVINFO. This avoids repeated doomed attempts to register a device in the IO path. Add some clarifying comments as well. Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> --- fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c index 0becdec12970..b36bc2f4f7e2 100644 --- a/fs/nfs/blocklayout/blocklayout.c +++ b/fs/nfs/blocklayout/blocklayout.c @@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server, if (!node) return ERR_PTR(-ENODEV); + /* + * Devices that are marked unavailable are left in the cache with a + * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or + * constantly attempting to register the device. Once marked as + * unavailable they must be deleted and never reused. + */ if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) { unsigned long end = jiffies; unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT; if (!time_in_range(node->timestamp_unavailable, start, end)) { + /* Force a new GETDEVINFO for this LAYOUT */ nfs4_delete_deviceid(node->ld, node->nfs_client, id); goto retry; } goto out_put; } - if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) + /* If we cannot register, treat this device as transient */ + if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) { + nfs4_mark_deviceid_unavailable(node); goto out_put; + } return node; -- 2.47.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure 2024-11-20 14:09 ` [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure Benjamin Coddington @ 2024-11-20 15:22 ` Chuck Lever 2024-11-21 15:11 ` Benjamin Coddington 0 siblings, 1 reply; 7+ messages in thread From: Chuck Lever @ 2024-11-20 15:22 UTC (permalink / raw) To: Benjamin Coddington Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-kernel, Christoph Hellwig On Wed, Nov 20, 2024 at 09:09:35AM -0500, Benjamin Coddington wrote: > If we're unable to register a SCSI device, ensure we mark the device as > unavailable so that it will timeout and be re-added via GETDEVINFO. This > avoids repeated doomed attempts to register a device in the IO path. > > Add some clarifying comments as well. > > Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > --- > fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c > index 0becdec12970..b36bc2f4f7e2 100644 > --- a/fs/nfs/blocklayout/blocklayout.c > +++ b/fs/nfs/blocklayout/blocklayout.c > @@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server, > if (!node) > return ERR_PTR(-ENODEV); > > + /* > + * Devices that are marked unavailable are left in the cache with a > + * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or > + * constantly attempting to register the device. Once marked as > + * unavailable they must be deleted and never reused. > + */ > if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) { > unsigned long end = jiffies; > unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT; > > if (!time_in_range(node->timestamp_unavailable, start, end)) { > + /* Force a new GETDEVINFO for this LAYOUT */ Or perhaps: "Uncork subsequent GETDEVINFO operations for this device" <shrug> > nfs4_delete_deviceid(node->ld, node->nfs_client, id); > goto retry; > } > goto out_put; > } > > - if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) > + /* If we cannot register, treat this device as transient */ How about "Make a negative cache entry for this device" > + if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) { > + nfs4_mark_deviceid_unavailable(node); > goto out_put; > + } > > return node; > > -- > 2.47.0 > It took me a bit to understand what this patch does. It is like setting up a negative dentry so the local device cache absorbs bursts of checks for the device. OK. Just an observation: Negative caching has some consequences too. For instance, there will now be a period where, if the device happens to become available, the layout is still unusable. I wonder if that's going to have some undesirable operational effects. -- Chuck Lever ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure 2024-11-20 15:22 ` Chuck Lever @ 2024-11-21 15:11 ` Benjamin Coddington 2024-11-21 15:32 ` Chuck Lever 0 siblings, 1 reply; 7+ messages in thread From: Benjamin Coddington @ 2024-11-21 15:11 UTC (permalink / raw) To: Chuck Lever Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-kernel, Christoph Hellwig On 20 Nov 2024, at 10:22, Chuck Lever wrote: > On Wed, Nov 20, 2024 at 09:09:35AM -0500, Benjamin Coddington wrote: >> If we're unable to register a SCSI device, ensure we mark the device as >> unavailable so that it will timeout and be re-added via GETDEVINFO. This >> avoids repeated doomed attempts to register a device in the IO path. >> >> Add some clarifying comments as well. >> >> Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") >> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> >> Reviewed-by: Christoph Hellwig <hch@lst.de> >> --- >> fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c >> index 0becdec12970..b36bc2f4f7e2 100644 >> --- a/fs/nfs/blocklayout/blocklayout.c >> +++ b/fs/nfs/blocklayout/blocklayout.c >> @@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server, >> if (!node) >> return ERR_PTR(-ENODEV); >> >> + /* >> + * Devices that are marked unavailable are left in the cache with a >> + * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or >> + * constantly attempting to register the device. Once marked as >> + * unavailable they must be deleted and never reused. >> + */ >> if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) { >> unsigned long end = jiffies; >> unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT; >> >> if (!time_in_range(node->timestamp_unavailable, start, end)) { >> + /* Force a new GETDEVINFO for this LAYOUT */ > > Or perhaps: "Uncork subsequent GETDEVINFO operations for this device" > <shrug> Sure, ok! >> nfs4_delete_deviceid(node->ld, node->nfs_client, id); >> goto retry; >> } >> goto out_put; >> } >> >> - if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) >> + /* If we cannot register, treat this device as transient */ > > How about "Make a negative cache entry for this device" Hmm - that's closer to the dentry language rather than how we refer to temporary error cases in device land. For me the "transient" has some hopeful meaning as in we expect this might work in the future - but I'm ok changing this comment. There will be some NFS clients that might try to do pNFS SCSI but will never actually have the devices locally, and so that's not a "transient" situation. This can only fixed today with export policy. > >> + if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) { >> + nfs4_mark_deviceid_unavailable(node); >> goto out_put; >> + } >> >> return node; >> >> -- >> 2.47.0 >> > > It took me a bit to understand what this patch does. It is like > setting up a negative dentry so the local device cache absorbs > bursts of checks for the device. OK. Yes, its like the layout error handling, but for devices. Its not obvious at this layer, but every IO wants to do LAYOUTGET, then figure out which device GETDEVINFO, then here we need to prep the device with a reservation. Its a lot of slow work that makes a mess of IO latencies if one of the later steps is going to fail for awhile. > Just an observation: Negative caching has some consequences too. > For instance, there will now be a period where, if the device > happens to become available, the layout is still unusable. I wonder > if that's going to have some undesirable operational effects. It sure does, but I don't think there's a simple way to get notified that a SCSI device has re-appeared or has started supporting persistent reservations. Ben ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure 2024-11-21 15:11 ` Benjamin Coddington @ 2024-11-21 15:32 ` Chuck Lever 0 siblings, 0 replies; 7+ messages in thread From: Chuck Lever @ 2024-11-21 15:32 UTC (permalink / raw) To: Benjamin Coddington Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-kernel, Christoph Hellwig On Thu, Nov 21, 2024 at 10:11:33AM -0500, Benjamin Coddington wrote: > On 20 Nov 2024, at 10:22, Chuck Lever wrote: > > > On Wed, Nov 20, 2024 at 09:09:35AM -0500, Benjamin Coddington wrote: > >> If we're unable to register a SCSI device, ensure we mark the device as > >> unavailable so that it will timeout and be re-added via GETDEVINFO. This > >> avoids repeated doomed attempts to register a device in the IO path. > >> > >> Add some clarifying comments as well. > >> > >> Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") > >> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > >> Reviewed-by: Christoph Hellwig <hch@lst.de> > >> --- > >> fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++- > >> 1 file changed, 11 insertions(+), 1 deletion(-) > >> > >> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c > >> index 0becdec12970..b36bc2f4f7e2 100644 > >> --- a/fs/nfs/blocklayout/blocklayout.c > >> +++ b/fs/nfs/blocklayout/blocklayout.c > >> @@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server, > >> if (!node) > >> return ERR_PTR(-ENODEV); > >> > >> + /* > >> + * Devices that are marked unavailable are left in the cache with a > >> + * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or > >> + * constantly attempting to register the device. Once marked as > >> + * unavailable they must be deleted and never reused. > >> + */ > >> if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) { > >> unsigned long end = jiffies; > >> unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT; > >> > >> if (!time_in_range(node->timestamp_unavailable, start, end)) { > >> + /* Force a new GETDEVINFO for this LAYOUT */ > > > > Or perhaps: "Uncork subsequent GETDEVINFO operations for this device" > > <shrug> > > Sure, ok! > > >> nfs4_delete_deviceid(node->ld, node->nfs_client, id); > >> goto retry; > >> } > >> goto out_put; > >> } > >> > >> - if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) > >> + /* If we cannot register, treat this device as transient */ > > > > How about "Make a negative cache entry for this device" > > Hmm - that's closer to the dentry language rather than how we refer to > temporary error cases in device land. For me the "transient" has some > hopeful meaning as in we expect this might work in the future - but I'm ok > changing this comment. There will be some NFS clients that might try to do > pNFS SCSI but will never actually have the devices locally, and so that's > not a "transient" situation. This can only fixed today with export policy. No big deal! > >> + if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) { > >> + nfs4_mark_deviceid_unavailable(node); > >> goto out_put; > >> + } > >> > >> return node; > >> > >> -- > >> 2.47.0 > >> > > > > It took me a bit to understand what this patch does. It is like > > setting up a negative dentry so the local device cache absorbs > > bursts of checks for the device. OK. > > Yes, its like the layout error handling, but for devices. > > Its not obvious at this layer, but every IO wants to do LAYOUTGET, then > figure out which device GETDEVINFO, then here we need to prep the device > with a reservation. Its a lot of slow work that makes a mess of IO > latencies if one of the later steps is going to fail for awhile. Thanks! It would be nice to add this bit of context to the patch description. > > Just an observation: Negative caching has some consequences too. > > For instance, there will now be a period where, if the device > > happens to become available, the layout is still unusable. I wonder > > if that's going to have some undesirable operational effects. > > It sure does, but I don't think there's a simple way to get notified that a > SCSI device has re-appeared or has started supporting persistent > reservations. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> -- Chuck Lever ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-11-21 15:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-20 14:09 [PATCH v2 0/2] two fixes for pNFS SCSI device handling Benjamin Coddington 2024-11-20 14:09 ` [PATCH 1/2] nfs/blocklayout: Don't attempt unregister for invalid block device Benjamin Coddington 2024-11-20 15:07 ` Chuck Lever 2024-11-20 14:09 ` [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on failure Benjamin Coddington 2024-11-20 15:22 ` Chuck Lever 2024-11-21 15:11 ` Benjamin Coddington 2024-11-21 15:32 ` Chuck Lever
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox