public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: cel@kernel.org, linux-nfs@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [RFC PATCH 3/4] nfs/blocklayout: Fix premature PR key unregistration
Date: Thu, 20 Jun 2024 13:08:38 -0400	[thread overview]
Message-ID: <ZnRiFp7sSc1S2wp/@tissot.1015granger.net> (raw)
In-Reply-To: <3590D3CB-E9C3-4C78-8077-7458F9E6C966@redhat.com>

On Thu, Jun 20, 2024 at 12:45:56PM -0400, Benjamin Coddington wrote:
> On 20 Jun 2024, at 11:56, Benjamin Coddington wrote:
> 
> > On 20 Jun 2024, at 11:46, Chuck Lever wrote:
> >
> >> On Thu, Jun 20, 2024 at 11:30:54AM -0400, Benjamin Coddington wrote:
> >>> On 20 Jun 2024, at 10:34, Chuck Lever wrote:
> >>>
> >>>> On Thu, Jun 20, 2024 at 09:51:46AM -0400, Benjamin Coddington wrote:
> >>>>> On 19 Jun 2024, at 13:39, cel@kernel.org wrote:
> >>>>>
> >>>>>> From: Chuck Lever <chuck.lever@oracle.com>
> >>>>>>
> >>>>>> During generic/069 runs with pNFS SCSI layouts, the NFS client emits
> >>>>>> the following in the system journal:
> >>>>>>
> >>>>>> kernel: pNFS: failed to open device /dev/disk/by-id/dm-uuid-mpath-0x6001405e3366f045b7949eb8e4540b51 (-2)
> >>>>>> kernel: pNFS: using block device sdb (reservation key 0x666b60901e7b26b3)
> >>>>>> kernel: pNFS: failed to open device /dev/disk/by-id/dm-uuid-mpath-0x6001405e3366f045b7949eb8e4540b51 (-2)
> >>>>>> kernel: pNFS: using block device sdb (reservation key 0x666b60901e7b26b3)
> >>>>>> kernel: sd 6:0:0:1: reservation conflict
> >>>>>> kernel: sd 6:0:0:1: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> >>>>>> kernel: sd 6:0:0:1: [sdb] tag#16 CDB: Write(10) 2a 00 00 00 00 50 00 00 08 00
> >>>>>> kernel: reservation conflict error, dev sdb, sector 80 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
> >>>>>> kernel: sd 6:0:0:1: reservation conflict
> >>>>>> kernel: sd 6:0:0:1: reservation conflict
> >>>>>> kernel: sd 6:0:0:1: [sdb] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> >>>>>> kernel: sd 6:0:0:1: [sdb] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> >>>>>> kernel: sd 6:0:0:1: [sdb] tag#18 CDB: Write(10) 2a 00 00 00 00 60 00 00 08 00
> >>>>>> kernel: sd 6:0:0:1: [sdb] tag#17 CDB: Write(10) 2a 00 00 00 00 58 00 00 08 00
> >>>>>> kernel: reservation conflict error, dev sdb, sector 96 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
> >>>>>> kernel: reservation conflict error, dev sdb, sector 88 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
> >>>>>> systemd[1]: fstests-generic-069.scope: Deactivated successfully.
> >>>>>> systemd[1]: fstests-generic-069.scope: Consumed 5.092s CPU time.
> >>>>>> systemd[1]: media-test.mount: Deactivated successfully.
> >>>>>> systemd[1]: media-scratch.mount: Deactivated successfully.
> >>>>>> kernel: sd 6:0:0:1: reservation conflict
> >>>>>> kernel: failed to unregister PR key.
> >>>>>>
> >>>>>> This appears to be due to a race. bl_alloc_lseg() calls this:
> >>>>>>
> >>>>>> 561 static struct nfs4_deviceid_node *
> >>>>>> 562 bl_find_get_deviceid(struct nfs_server *server,
> >>>>>> 563                 const struct nfs4_deviceid *id, const struct cred *cred,
> >>>>>> 564                 gfp_t gfp_mask)
> >>>>>> 565 {
> >>>>>> 566         struct nfs4_deviceid_node *node;
> >>>>>> 567         unsigned long start, end;
> >>>>>> 568
> >>>>>> 569 retry:
> >>>>>> 570         node = nfs4_find_get_deviceid(server, id, cred, gfp_mask);
> >>>>>> 571         if (!node)
> >>>>>> 572                 return ERR_PTR(-ENODEV);
> >>>>>>
> >>>>>> nfs4_find_get_deviceid() does a lookup without the spin lock first.
> >>>>>> If it can't find a matching deviceid, it creates a new device_info
> >>>>>> (which calls bl_alloc_deviceid_node, and that registers the device's
> >>>>>> PR key).
> >>>>>>
> >>>>>> Then it takes the nfs4_deviceid_lock and looks up the deviceid again.
> >>>>>> If it finds it this time, bl_find_get_deviceid() frees the spare
> >>>>>> (new) device_info, which unregisters the PR key for the same device.
> >>>>>>
> >>>>>> Any subsequent I/O from this client on that device gets EBADE.
> >>>>>>
> >>>>>> The umount later unregisters the device's PR key again.
> >>>>>>
> >>>>>> To prevent this problem, register the PR key after the deviceid_node
> >>>>>> lookup.
> >>>>>
> >>>>> Hi Chuck - nice catch, but I'm not seeing how we don't have the same problem
> >>>>> after this patch, instead it just seems like it moves the race.  What
> >>>>> prevents another process waiting to take the nfs4_deviceid_lock from
> >>>>> unregistering the same device?  I think we need another way to signal
> >>>>> bl_free_device that we don't want to unregister for the case where the new
> >>>>> device isn't added to nfs4_deviceid_cache.
> >>>>
> >>>> That's a (related but) somewhat different issue. I haven't seen
> >>>> that in practice so far.
> >>>>
> >>>>
> >>>>> No good ideas yet - maybe we can use a flag set within the
> >>>>> nfs4_deviceid_lock?
> >>>>
> >>>> Well this smells like a use for a reference count on the block
> >>>> device, but fs/nfs doesn't control the definition of that data
> >>>> structure.
> >>>
> >>> I think we need two things to fix this race:
> >>>  - a way to determine if the current thread is the one
> >>>    that added the device to the to the cache, if so do the register
> >>>  - a way to determine if we're freeing because we lost the race to the
> >>>    cache, if so don't un-register.
> >>
> >> My patch is supposed to deal with all of that already. Can you show
> >> me specifically what is not getting handled by my proposed change?
> >
> > Well - I may be missing something, but it looks like with this patch you can
> > still have:
> >
> > Thread
> > A                           B
> >
> > nfs4_find_get_deviceid
> > new{a} = nfs4_get_device_info
> > locks nfs4_deviceid_lock
> >                             nfs4_find_get_deviceid
> >                             new{b} = nfs4_get_device_info
> >                             spins on nfs4_deviceid_lock
> > adds new{a} to the cache
> > unlocks nfs4_deviceid_lock
> > pr_register
> >                             locks nfs4_deviceid_lock
> >                             finds new{a}
> >                             pr_UNregisters new{b}
> >
> > In this case, you end up with an unregistered device.
> 
> Oh jeez Chuck, nevermind - I am missing something, that we check the the
> new{b} pnfs_block_dev->pr_registred before unregistering it.
> 
> So, actually - I think this patch does solve the problem.  I apologize for
> the noise here.

Thanks! Was wondering, because I thought that was exactly the race
I was trying to fix!

-- 
Chuck Lever

  reply	other threads:[~2024-06-20 17:09 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-19 17:39 [RFC PATCH 0/4] Snapshot of fixes for SCSI PR key registration cel
2024-06-19 17:39 ` [RFC PATCH 1/4] nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg cel
2024-06-20  4:50   ` Christoph Hellwig
2024-06-20  4:52     ` Christoph Hellwig
2024-06-20 14:30     ` Chuck Lever
2024-06-19 17:39 ` [RFC PATCH 2/4] nfs/blocklayout: Report only when /no/ device is found cel
2024-06-20  4:36   ` Christoph Hellwig
2024-06-20 14:59     ` Chuck Lever
2024-06-20 12:17   ` Benjamin Coddington
2024-06-20 14:10     ` Christoph Hellwig
2024-06-19 17:39 ` [RFC PATCH 3/4] nfs/blocklayout: Fix premature PR key unregistration cel
2024-06-20  5:06   ` Christoph Hellwig
2024-06-20 13:52     ` Benjamin Coddington
2024-06-20 13:58       ` Chuck Lever
2024-06-20 14:15       ` Christoph Hellwig
2024-06-20 14:18         ` Chuck Lever III
2024-06-20 15:45         ` Benjamin Coddington
2024-06-20 15:48           ` Chuck Lever
2024-06-20 15:58             ` Benjamin Coddington
2024-06-20 15:39     ` Chuck Lever
2024-06-20 13:51   ` Benjamin Coddington
2024-06-20 14:34     ` Chuck Lever
2024-06-20 14:37       ` Christoph Hellwig
2024-06-20 15:30       ` Benjamin Coddington
2024-06-20 15:46         ` Chuck Lever
2024-06-20 15:56           ` Benjamin Coddington
2024-06-20 16:45             ` Benjamin Coddington
2024-06-20 17:08               ` Chuck Lever [this message]
2024-06-19 17:39 ` [RFC PATCH 4/4] nfs/blocklayout: Use bulk page allocation APIs cel
2024-06-20  4:44   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZnRiFp7sSc1S2wp/@tissot.1015granger.net \
    --to=chuck.lever@oracle.com \
    --cc=bcodding@redhat.com \
    --cc=cel@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox