From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Andy Adamson <andros@netapp.com>
Cc: Peng Tao <bergwolf@gmail.com>, Jim Rees <rees@umich.edu>,
Christoph Hellwig <hch@infradead.org>,
linux-nfs@vger.kernel.org, peter honeyman <honey@citi.umich.edu>
Subject: Re: [PATCH v4 00/27] add block layout driver to pnfs client
Date: Mon, 01 Aug 2011 19:11:18 -0400 [thread overview]
Message-ID: <1312240278.4616.7.camel@lade.trondhjem.org> (raw)
In-Reply-To: <0A94625F-4E4F-4A88-A0DF-63D429924F66@netapp.com>
On Mon, 2011-08-01 at 18:57 -0400, Andy Adamson wrote:
> On Aug 1, 2011, at 6:35 PM, Trond Myklebust wrote:
>
> > On Mon, 2011-08-01 at 17:10 -0400, Trond Myklebust wrote:
> >> Looking at the callback code, I see that if tbl->highest_used_slotid !=
> >> 0, then we BUG() while holding the backchannel's tbl->slot_tbl_lock
> >> spinlock. That seems a likely candidate for the above hang.
> >>
> >> Andy, how we are guaranteed that tbl->highest_used_slotid won't take
> >> values other than 0, and why do we commit suicide when it does? As far
> >> as I can see, there is no guarantee that we call nfs4_cb_take_slot() in
> >> nfs4_callback_compound(), however we appear to unconditionally call
> >> nfs4_cb_free_slot() provided there is a session.
> >>
> >> The other strangeness would be the fact that there is nothing enforcing
> >> the NFS4_SESSION_DRAINING flag. If the session is draining, then the
> >> back-channel simply ignores that and goes ahead with processing the
> >> callback. Is this to avoid deadlocks with the server returning
> >> NFS4ERR_BACK_CHAN_BUSY when the client does a DESTROY_SESSION?
>
>
> When NFS4_SESSION_DRAINING is set, the backchannel is drained first - and waits for current processing to complete signaled by the highest_used_slotid == -1. Any other backchannel requests that occur under the NFS4_SESSION_DRAINING flag are processed - but just the CB_SEQUENCE operation which returns NFS4ERR_DELAY. It does indeed prevent the BACK_CHAN_BUSY deadlock.
>
> >
> > How about something like the following?
>
> Looks good. Nice catch. One change below and a comment inline
>
> -->Andy
>
> >
> > 8<-------------------------------------------------------------------------------
> > From c0c499b0ca9d0af8cbdc29c40effba38475461d9 Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > Date: Mon, 1 Aug 2011 18:31:12 -0400
> > Subject: [PATCH] NFSv4.1: Fix the callback 'highest_used_slotid' behaviour
> >
> > Currently, there is no guarantee that we will call nfs4_cb_take_slot() even
> > though nfs4_callback_compound() will consistently call
> > nfs4_cb_free_slot() provided the cb_process_state has set the 'clp' field.
> > The result is that we can trigger the BUG_ON() upon the next call to
> > nfs4_cb_take_slot().
> > This patch fixes the above problem by using the slot id that was taken in
> > the CB_SEQUENCE operation as a flag for whether or not we need to call
> > nfs4_cb_free_slot().
> > It also fixes an atomicity problem: we need to set tbl->highest_used_slotid
> > atomically with the check for NFS4_SESSION_DRAINING, otherwise we end up
> > racing with the various tests in nfs4_begin_drain_session().
>
> But the code below doesn't do this - it locks the backchannel slot table to check the flag, but does not set the highest_used_slot atomically with this check.
It should. The patch ensures that we hold the backchannel slot table
lock across both the NFS4_SESSION_DRAINING test and the
validate_seqid(), and also ensures that the latter function sets
tbl->highest_used_slotid on success.
> >
> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > ---
> > fs/nfs/callback.h | 2 +-
> > fs/nfs/callback_proc.c | 18 +++++++++++++-----
> > fs/nfs/callback_xdr.c | 24 +++++++-----------------
> > 3 files changed, 21 insertions(+), 23 deletions(-)
> >
> > diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
> > index b257383..07df5f1 100644
> > --- a/fs/nfs/callback.h
> > +++ b/fs/nfs/callback.h
> > @@ -38,6 +38,7 @@ enum nfs4_callback_opnum {
> > struct cb_process_state {
> > __be32 drc_status;
> > struct nfs_client *clp;
> > + int slotid;
> > };
> >
> > struct cb_compound_hdr_arg {
> > @@ -166,7 +167,6 @@ extern unsigned nfs4_callback_layoutrecall(
> > void *dummy, struct cb_process_state *cps);
> >
> > extern void nfs4_check_drain_bc_complete(struct nfs4_session *ses);
> > -extern void nfs4_cb_take_slot(struct nfs_client *clp);
> >
> > struct cb_devicenotifyitem {
> > uint32_t cbd_notify_type;
> > diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> > index 74780f9..1bd2c81 100644
> > --- a/fs/nfs/callback_proc.c
> > +++ b/fs/nfs/callback_proc.c
> > @@ -348,7 +348,7 @@ validate_seqid(struct nfs4_slot_table *tbl, struct cb_sequenceargs * args)
> > /* Normal */
> > if (likely(args->csa_sequenceid == slot->seq_nr + 1)) {
> > slot->seq_nr++;
> > - return htonl(NFS4_OK);
> > + goto out_ok;
> > }
> >
> > /* Replay */
> > @@ -367,11 +367,14 @@ validate_seqid(struct nfs4_slot_table *tbl, struct cb_sequenceargs * args)
> > /* Wraparound */
> > if (args->csa_sequenceid == 1 && (slot->seq_nr + 1) == 0) {
> > slot->seq_nr = 1;
> > - return htonl(NFS4_OK);
> > + goto out_ok;
> > }
> >
> > /* Misordered request */
> > return htonl(NFS4ERR_SEQ_MISORDERED);
> > +out_ok:
> > + tbl->highest_used_slotid = args->csa_slotid;
> > + return htonl(NFS4_OK);
> > }
> >
> > /*
> > @@ -433,26 +436,32 @@ __be32 nfs4_callback_sequence(struct cb_sequenceargs *args,
> > struct cb_sequenceres *res,
> > struct cb_process_state *cps)
> > {
> > + struct nfs4_slot_table *tbl;
> > struct nfs_client *clp;
> > int i;
> > __be32 status = htonl(NFS4ERR_BADSESSION);
> >
> > - cps->clp = NULL;
> > -
> > clp = nfs4_find_client_sessionid(args->csa_addr, &args->csa_sessionid);
> > if (clp == NULL)
> > goto out;
> >
> > + tbl = &clp->cl_session->bc_slot_table;
> > +
> > + spin_lock(&tbl->slot_tbl_lock);
> > /* state manager is resetting the session */
> > if (test_bit(NFS4_SESSION_DRAINING, &clp->cl_session->session_state)) {
> > status = NFS4ERR_DELAY;
>
> status = htonl(NFS4ERR_DELAY);
Yep. I'll fix that too...
> > + spin_unlock(&tbl->slot_tbl_lock);
> > goto out;
> > }
> >
> > status = validate_seqid(&clp->cl_session->bc_slot_table, args);
> > + spin_unlock(&tbl->slot_tbl_lock);
This is what guarantees the atomicity. Even if
nfs4_begin_drain_session() sets NFS4_SESSION_DRAINING, we know that it
can't test the value of tbl->highest_used_slotid without first taking
the tbl->slot_tbl_lock, and we're holding that...
> > if (status)
> > goto out;
> >
> > + cps->slotid = args->csa_slotid;
>
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
next prev parent reply other threads:[~2011-08-01 23:11 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-28 17:30 [PATCH v4 00/27] add block layout driver to pnfs client Jim Rees
2011-07-28 17:30 ` [PATCH v4 01/27] pnfs: GETDEVICELIST Jim Rees
2011-07-28 17:30 ` [PATCH v4 02/27] pnfs: add set-clear layoutdriver interface Jim Rees
2011-07-28 17:30 ` [PATCH v4 03/27] pnfs: save layoutcommit lwb at layout header Jim Rees
2011-07-28 17:30 ` [PATCH v4 04/27] pnfs: save layoutcommit cred " Jim Rees
2011-07-28 17:30 ` [PATCH v4 05/27] pnfs: let layoutcommit handle a list of lseg Jim Rees
2011-07-28 18:52 ` Boaz Harrosh
2011-07-28 17:30 ` [PATCH v4 06/27] pnfs: use lwb as layoutcommit length Jim Rees
2011-07-28 17:30 ` [PATCH v4 07/27] NFS41: save layoutcommit cred in layout header init Jim Rees
2011-07-28 17:30 ` [PATCH v4 08/27] pnfs: ask for layout_blksize and save it in nfs_server Jim Rees
2011-07-28 17:30 ` [PATCH v4 09/27] pnfs: cleanup_layoutcommit Jim Rees
2011-07-28 18:26 ` Boaz Harrosh
2011-07-29 3:16 ` Jim Rees
2011-07-28 17:30 ` [PATCH v4 10/27] pnfsblock: add blocklayout Kconfig option, Makefile, and stubs Jim Rees
2011-07-28 17:31 ` [PATCH v4 11/27] pnfsblock: use pageio_ops api Jim Rees
2011-07-28 17:31 ` [PATCH v4 12/27] pnfsblock: basic extent code Jim Rees
2011-07-28 17:31 ` [PATCH v4 13/27] pnfsblock: add device operations Jim Rees
2011-07-28 17:31 ` [PATCH v4 14/27] pnfsblock: remove " Jim Rees
2011-07-28 17:31 ` [PATCH v4 15/27] pnfsblock: lseg alloc and free Jim Rees
2011-07-28 17:31 ` [PATCH v4 16/27] pnfsblock: merge extents Jim Rees
2011-07-28 17:31 ` [PATCH v4 17/27] pnfsblock: call and parse getdevicelist Jim Rees
2011-07-28 17:31 ` [PATCH v4 18/27] pnfsblock: xdr decode pnfs_block_layout4 Jim Rees
2011-07-28 17:31 ` [PATCH v4 19/27] pnfsblock: bl_find_get_extent Jim Rees
2011-07-28 17:31 ` [PATCH v4 20/27] pnfsblock: add extent manipulation functions Jim Rees
2011-07-28 17:31 ` [PATCH v4 21/27] pnfsblock: merge rw extents Jim Rees
2011-07-28 17:31 ` [PATCH v4 22/27] pnfsblock: encode_layoutcommit Jim Rees
2011-07-28 17:31 ` [PATCH v4 23/27] pnfsblock: cleanup_layoutcommit Jim Rees
2011-07-28 17:31 ` [PATCH v4 24/27] pnfsblock: bl_read_pagelist Jim Rees
2011-07-28 17:31 ` [PATCH v4 25/27] pnfsblock: bl_write_pagelist Jim Rees
2011-07-28 17:31 ` [PATCH v4 26/27] pnfsblock: note written INVAL areas for layoutcommit Jim Rees
2011-07-28 17:31 ` [PATCH v4 27/27] pnfsblock: write_pagelist handle zero invalid extents Jim Rees
2011-07-29 15:51 ` [PATCH v4 00/27] add block layout driver to pnfs client Christoph Hellwig
2011-07-29 17:45 ` Peng Tao
2011-07-29 18:44 ` Christoph Hellwig
2011-07-29 18:54 ` Jim Rees
2011-07-29 19:01 ` Christoph Hellwig
2011-07-29 19:13 ` Jim Rees
2011-07-30 1:09 ` Trond Myklebust
2011-07-30 3:26 ` Jim Rees
2011-07-30 14:25 ` Peng Tao
2011-08-01 21:10 ` Trond Myklebust
2011-08-01 22:35 ` Trond Myklebust
2011-08-01 22:57 ` Andy Adamson
2011-08-01 23:11 ` Trond Myklebust [this message]
2011-08-02 17:30 ` Trond Myklebust
2011-08-02 18:50 ` [PATCH v2 1/2] NFSv4.1: Fix the callback 'highest_used_slotid' behaviour Trond Myklebust
2011-08-02 18:50 ` [PATCH v2 2/2] NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets Trond Myklebust
2011-08-03 8:52 ` [PATCH v2 1/2] NFSv4.1: Fix the callback 'highest_used_slotid' behaviour Peng Tao
2011-08-02 2:21 ` [PATCH v4 00/27] add block layout driver to pnfs client Jim Rees
2011-08-02 2:29 ` Myklebust, Trond
2011-08-02 3:23 ` Jim Rees
2011-08-02 12:28 ` Trond Myklebust
2011-08-02 12:56 ` Jim Rees
2011-08-03 1:48 ` Jim Rees
2011-08-03 2:07 ` Myklebust, Trond
[not found] ` <2E1EB2CF9ED1CB4AA966F0EB76EAB4430A778AE2-hX7t0kiaRRrlMGe9HJ1VYQK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2011-08-03 2:11 ` Jim Rees
2011-08-03 2:38 ` Jim Rees
2011-08-03 8:43 ` Peng Tao
2011-08-03 11:49 ` Jim Rees
2011-08-03 11:53 ` Jim Rees
2011-08-03 13:59 ` Peng Tao
2011-08-03 14:11 ` Jim Rees
2011-07-30 14:18 ` Jim Rees
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1312240278.4616.7.camel@lade.trondhjem.org \
--to=trond.myklebust@netapp.com \
--cc=andros@netapp.com \
--cc=bergwolf@gmail.com \
--cc=hch@infradead.org \
--cc=honey@citi.umich.edu \
--cc=linux-nfs@vger.kernel.org \
--cc=rees@umich.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).