linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>,
	Anna Schumaker <Anna.Schumaker@netapp.com>,
	linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] NFSv4: always set NFS_LOCK_LOST when a lock is lost.
Date: Fri, 02 Feb 2018 09:04:21 +1100	[thread overview]
Message-ID: <87tvv02wpm.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <CAN-5tyHB50kKNfcXyTVgPO+6p6p1iZVCZpROMN-jxP7Be53_UQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3413 bytes --]

On Thu, Feb 01 2018, Olga Kornievskaia wrote:

> On Thu, Feb 1, 2018 at 11:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>> On Tue, Dec 19, 2017 at 4:15 PM, NeilBrown <neilb@suse.com> wrote:
>>> On Wed, Dec 13 2017, NeilBrown wrote:
>>>
>>>> There are 2 comments in the NFSv4 code which suggest that
>>>> SIGLOST should possibly be sent to a process.  In these
>>>> cases a lock has been lost.
>>>> The current practice is to set NFS_LOCK_LOST so that
>>>> read/write returns EIO when a lock is lost.
>>>> So change these comments to code when sets NFS_LOCK_LOST.
>>>>
>>>> One case is when lock recovery after apparent server restart
>>>> fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or
>>>> NFS4ERRO_RECLAIM_CONFLICT.  The other case is when a lock
>>>> attempt as part of lease recovery fails with NFS4ERR_DENIED.
>>>>
>>>> In an ideal world, these should not happen.  However I have
>>>> a packet trace showing an NFSv4.1 session getting
>>>> NFS4ERR_BADSESSION after an extended network parition.  The
>>>> NFSv4.1 client treats this like server reboot until/unless
>>>> it get NFS4ERR_NO_GRACE, in which case it switches over to
>>>> "nograce" recovery mode.  In this network trace, the client
>>>> attempts to recover a lock and the server (incorrectly)
>>>> reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE.  This
>>>> leads to the ineffective comment and the client then
>>>> continues to write using the OPEN stateid.
>>>>
>>>> Signed-off-by: NeilBrown <neilb@suse.com>
>>>> ---
>>>>
>>>> Note that I have not tested this as I do not have direct access to a
>>>> faulty NFS server.  Once I get confirmation I will provide an update.
>>>
>>> Hi,
>>>  I have now received confirmation that this change does fix the locking
>>>  behavior in this case where the server is returning the wrong error
>>>  code.
>>>
>>
>> Hi Neil,
>>
>> I'm testing your patch and in my testing it does not address the
>> issue. But I'd like to double check if my testing scenario is the
>> problem the same as what the patch suppose to address.
>>
>> My testing,
>> 1. client 1 opens a file, grabs a lock (goes to sleep so that I could
>> trigger network partition).
>> 2. wait sufficient amount of time for the client 1's state goes stale
>> on the server.
>> 3. client 2 opens the same file, and grabs the lock and starting doing
>> IO (say writing).
>> 4. plug network back into client 1. it recovers its client id and
>> session and initiates state recovery. server didn't reboot so it
>> doesn't reply with ERR_NO_GRACE to client's open and it succeeds. the
>> client sends a LOCK and gets ERR_DENIED from the server (because
>> client 2 is holding the lock now).
>> 5. client 1's app now wakes up and does IO.
>>
>> What should happen is that IO should fail. What I see is client 1
>> continues to do IO (say writing).
>>
>> Is my scenario same as yours?
>
> Sorry never mind. The OS doesn't fail any writes but internally the
> kernel fails the NFS writes with EIO and nothing is sent on the wire.
> If an fsync() is called explicitly after the write, then that fails.
> So either mount needed "sync" and then writes fail. Or application
> needs to call fsync() and then that would fail.

Yes, or O_DIRECT.  The important thing to test is: was client 1 able to
change the file on the server?  You say your tests show that it wasn't,
so no data corruption happened.  Excellent.

Thanks for testing!!

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

      reply	other threads:[~2018-02-02  9:32 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-12 22:57 [PATCH] NFSv4: always set NFS_LOCK_LOST when a lock is lost NeilBrown
2017-12-19 21:15 ` NeilBrown
2018-02-01 16:23   ` Olga Kornievskaia
2018-02-01 17:51     ` Olga Kornievskaia
2018-02-01 22:04       ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tvv02wpm.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=aglo@umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).