From: Benny Halevy <bhalevy@panasas.com>
To: Fred Isaman <iisaman@netapp.com>
Cc: linux-nfs@vger.kernel.org, Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: Re: [PATCH 1/3] pnfs: avoid incorrect use of layout stateid
Date: Wed, 02 Feb 2011 06:19:37 +0200 [thread overview]
Message-ID: <4D48DB59.9010102@panasas.com> (raw)
In-Reply-To: <AF70A3A4-2A06-4BD1-AC69-270658CF6365@netapp.com>
On 2011-02-01 18:31, Fred Isaman wrote:
>
> On Feb 1, 2011, at 10:38 AM, Benny Halevy wrote:
>
>> On 2011-01-31 17:27, Fred Isaman wrote:
>>> The code could violate the following from RFC5661, section 12.5.3:
>>> "Once a client has no more layouts on a file, the layout stateid is no
>>> longer valid and MUST NOT be used."
>>
>> NACK.
>>
>> Fred, this is your interpretation of the forgetful model and it is taken
>> out of context.
>>
>> Until the spec is changed only the server invalidates the stateid by returning
>> lrs_present=false on LAYOUTRETURN. Merely forgetting the layout state without
>> LAYOUTRETURN does not determine that.
>>
>>
>> Also from 12.5.3:
>> Once a layout stateid is established, the "other"
>> field will stay constant unless the stateid is revoked or the client
>> returns all layouts on the file and the server disposes of the stateid.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>
>
> OK, so the question is, does sending a LAYOUTGET with openstateid "return all layouts".
> Making the answer to this "yes" is perfectly consistent with the spec, given its complete absence
> of direction here.
I disagree, and so are other server implementations, including linux-pnfs!
(It will return BAD_STATEID if the client forgets the layout
in any case but LAYOUTRETURN or CB_LAYOUTRECALL/NOMATCHING_LAYOUT)
> The alternative is a forgetful model where the client can forget layouts, but
> not layout stateid.
Right, and this is the direction we should go until the errata is in place.
>
> The question is, is there any objection to the above interpretation of LAYOUTGET with openstateid?
> Because if not, then we will shortly get an appropriate errata,
> and there is no good reason to delay the patch until the errata is finalized.
> I know previously you had objected on the basis that this prevents parallel use of openstateid with LAYOUTGET.
> But given that you had indicated that such parallel use can not be done for other reasons,
> I have been working on the assumption that the interpretation above will be accepted without controversy at the upcoming IETF.
>
> On the other hand, if there is objection to the above interpretation, that should be made known.
I simulated this with the layout-sim and given the LAYOUTGET sent with the initial stateid
is always fully serialized with other layout stateid-changing operations, this model works.
I object the timing, before the IETF discussion is over and before the upcoming Connectathon
where other server vendors have no chance to implement this erratum.
Benny
>
> Fred
>
>
>> and
>>
>> Once the client receives a layout stateid, it MUST use the correct
>> "seqid" for subsequent LAYOUTGET or LAYOUTRETURN operations. The
>> correct "seqid" is defined as the highest "seqid" value from
>> responses of fully processed LAYOUTGET or LAYOUTRETURN operations or
>> arguments of a fully processed CB_LAYOUTRECALL operation.
>>
>> and 18.43.3
>>
>> The loga_stateid field specifies a valid stateid. If a layout is not
>> currently held by the client, the loga_stateid field represents a
>> stateid reflecting the correspondingly valid open, byte-range lock,
>> or delegation stateid. Once a layout is held on the file by the
>> client, the loga_stateid field MUST be a stateid as returned from a
>> previous LAYOUTGET or LAYOUTRETURN operation or provided by a
>> CB_LAYOUTRECALL operation (see Section 12.5.3).
>>
>> Only when we agree that the client can throw away the layout state and
>> send a singular future LAYOUTGET with a non-layout stateid - something it MUST not
>> do at the moment - only then we can do what this patch suggests.
>>
>> Benny
>>
>>>
>>> This can occur when a layout already has a lseg, starts another
>>> non-everlapping LAYOUTGET, and a CB_LAYOUTRECALL for the existing lseg
>>> is processed before we hit pnfs_layout_process().
>>>
>>> Solve by setting, each time the client has no more lsegs for a file, a
>>> flag which blocks further use of the layout and triggers its removal.
>>>
>>> This also fixes a second bug which occurs in the same instance as
>>> above. If we actually use pnfs_layout_process, we add the new lseg to
>>> the layout, but the layout has been removed from the nfs_client list
>>> by the intervening CB_LAYOUTRECALL and will not be added back. Thus
>>> the newly acquired lseg will not be properly returned in the event of
>>> a subsequent CB_LAYOUTRECALL.
>>>
>>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>>> ---
>>> fs/nfs/pnfs.c | 12 +++++++++---
>>> 1 files changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index 1b1bc1a..dcd5d54 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -255,6 +255,9 @@ put_lseg_locked(struct pnfs_layout_segment *lseg,
>>> list_del_init(&lseg->pls_layout->plh_layouts);
>>> spin_unlock(&clp->cl_lock);
>>> clear_bit(NFS_LAYOUT_BULK_RECALL, &lseg->pls_layout->plh_flags);
>>> + set_bit(NFS_LAYOUT_DESTROYED, &lseg->pls_layout->plh_flags);
>>> + /* Matched by initial refcount set in alloc_init_layout_hdr */
>>> + put_layout_hdr_locked(lseg->pls_layout);
>>> }
>>> rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq);
>>> list_add(&lseg->pls_list, tmp_list);
>>> @@ -299,6 +302,11 @@ mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
>>>
>>> dprintk("%s:Begin lo %p\n", __func__, lo);
>>>
>>> + if (list_empty(&lo->plh_segs)) {
>>> + if (!test_and_set_bit(NFS_LAYOUT_DESTROYED, &lo->plh_flags))
>>> + put_layout_hdr_locked(lo);
>>> + return 0;
>>> + }
>>> list_for_each_entry_safe(lseg, next, &lo->plh_segs, pls_list)
>>> if (should_free_lseg(lseg->pls_range.iomode, iomode)) {
>>> dprintk("%s: freeing lseg %p iomode %d "
>>> @@ -332,10 +340,7 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
>>> spin_lock(&nfsi->vfs_inode.i_lock);
>>> lo = nfsi->layout;
>>> if (lo) {
>>> - set_bit(NFS_LAYOUT_DESTROYED, &nfsi->layout->plh_flags);
>>> mark_matching_lsegs_invalid(lo, &tmp_list, IOMODE_ANY);
>>> - /* Matched by refcount set to 1 in alloc_init_layout_hdr */
>>> - put_layout_hdr_locked(lo);
>>> }
>>> spin_unlock(&nfsi->vfs_inode.i_lock);
>>> pnfs_free_lseg_list(&tmp_list);
>>> @@ -403,6 +408,7 @@ pnfs_layoutgets_blocked(struct pnfs_layout_hdr *lo, nfs4_stateid *stateid,
>>> (int)(lo->plh_barrier - be32_to_cpu(stateid->stateid.seqid)) >= 0)
>>> return true;
>>> return lo->plh_block_lgets ||
>>> + test_bit(NFS_LAYOUT_DESTROYED, &lo->plh_flags) ||
>>> test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) ||
>>> (list_empty(&lo->plh_segs) &&
>>> (atomic_read(&lo->plh_outstanding) > lget));
>
next prev parent reply other threads:[~2011-02-02 4:19 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-31 15:27 [PATCH 0/3]: pnfs: fix pnfs lock inversion Fred Isaman
2011-01-31 15:27 ` [PATCH 1/3] pnfs: avoid incorrect use of layout stateid Fred Isaman
2011-02-01 15:38 ` Benny Halevy
2011-02-01 16:31 ` Fred Isaman
2011-02-02 4:19 ` Benny Halevy [this message]
2011-02-02 5:56 ` Trond Myklebust
2011-02-03 8:15 ` Benny Halevy
2011-02-02 15:45 ` Fred Isaman
2011-02-03 8:17 ` Benny Halevy
2011-01-31 15:27 ` [PATCH 2/3] pnfs: do not need to clear NFS_LAYOUT_BULK_RECALL flag Fred Isaman
2011-01-31 15:27 ` [PATCH 3/3] pnfs: fix pnfs lock inversion of i_lock and cl_lock Fred Isaman
2011-02-01 15:41 ` Benny Halevy
2011-02-01 15:54 ` Fred Isaman
2011-02-01 16:03 ` Benny Halevy
2011-02-03 8:58 ` Benny Halevy
-- strict thread matches above, loose matches on Subject: below --
2011-02-03 18:28 [PATCH 0/3] pnfs: fix pnfs lock inversion, try 2 Fred Isaman
2011-02-03 18:28 ` [PATCH 1/3] pnfs: avoid incorrect use of layout stateid Fred Isaman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D48DB59.9010102@panasas.com \
--to=bhalevy@panasas.com \
--cc=Trond.Myklebust@netapp.com \
--cc=iisaman@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).