linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
@ 2015-09-14 23:54 Olga Kornievskaia
  2015-09-15 13:39 ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Olga Kornievskaia @ 2015-09-14 23:54 UTC (permalink / raw)
  To: linux-nfs

A test case is as the description says:
open(foobar, O_WRONLY);
sleep()  --> reboot the server
close(foobar)

The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
line before going to restart, there is
clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).

NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
out state and when we go to close it, “call_close” doesn’t get set as
state flag is not set and CLOSE doesn’t go on the wire.

That line was introduced to fix an infinite loop for OPEN recovery
upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
injecting BAD_STATEID error using the patch below and the code
recovers without problems. However, I'm not sure the clearing of the
bit is needed any more. I have tested for infinite loop by reverting
the patch and didn't hit the infinite loop.

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index da73bc4..5db3246 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1481,7 +1481,7 @@ restart:
                                        spin_unlock(&state->state_lock);
                                }
                                nfs4_put_open_state(state);
-                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
+                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
                                        &state->flags);
                                spin_lock(&sp->so_lock);
                                goto restart;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
  2015-09-14 23:54 Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount Olga Kornievskaia
@ 2015-09-15 13:39 ` Trond Myklebust
  2015-09-15 14:27   ` Olga Kornievskaia
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2015-09-15 13:39 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-nfs

On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> A test case is as the description says:
> open(foobar, O_WRONLY);
> sleep()  --> reboot the server
> close(foobar)
>
> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
> line before going to restart, there is
> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>
> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
> out state and when we go to close it, “call_close” doesn’t get set as
> state flag is not set and CLOSE doesn’t go on the wire.
>
> That line was introduced to fix an infinite loop for OPEN recovery
> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
> injecting BAD_STATEID error using the patch below and the code
> recovers without problems. However, I'm not sure the clearing of the
> bit is needed any more. I have tested for infinite loop by reverting
> the patch and didn't hit the infinite loop.
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index da73bc4..5db3246 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1481,7 +1481,7 @@ restart:
>                                         spin_unlock(&state->state_lock);
>                                 }
>                                 nfs4_put_open_state(state);
> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>                                         &state->flags);
>                                 spin_lock(&sp->so_lock);
>                                 goto restart;

That's an obvious typo. Thanks for spotting it!

As for whether or not the bit clear is needed at all, I think it is
for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
state recovery drain the slot table (just like we've always done for
NFSv4.1) and so I agree that those kernels probably won't be
afflicted.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
  2015-09-15 13:39 ` Trond Myklebust
@ 2015-09-15 14:27   ` Olga Kornievskaia
  2015-09-15 15:49     ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Olga Kornievskaia @ 2015-09-15 14:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>> A test case is as the description says:
>> open(foobar, O_WRONLY);
>> sleep()  --> reboot the server
>> close(foobar)
>>
>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>> line before going to restart, there is
>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>
>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>> out state and when we go to close it, “call_close” doesn’t get set as
>> state flag is not set and CLOSE doesn’t go on the wire.
>>
>> That line was introduced to fix an infinite loop for OPEN recovery
>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>> injecting BAD_STATEID error using the patch below and the code
>> recovers without problems. However, I'm not sure the clearing of the
>> bit is needed any more. I have tested for infinite loop by reverting
>> the patch and didn't hit the infinite loop.
>>
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index da73bc4..5db3246 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -1481,7 +1481,7 @@ restart:
>>                                         spin_unlock(&state->state_lock);
>>                                 }
>>                                 nfs4_put_open_state(state);
>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>                                         &state->flags);
>>                                 spin_lock(&sp->so_lock);
>>                                 goto restart;
>
> That's an obvious typo. Thanks for spotting it!
>
> As for whether or not the bit clear is needed at all, I think it is
> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
> state recovery drain the slot table (just like we've always done for
> NFSv4.1) and so I agree that those kernels probably won't be
> afflicted.
>

Thanks Trond. Do you need me to resubmit it without the last paragraph
or is the patch ok as is?

> Cheers
>   Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
  2015-09-15 14:27   ` Olga Kornievskaia
@ 2015-09-15 15:49     ` Trond Myklebust
  2015-09-15 16:52       ` Olga Kornievskaia
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2015-09-15 15:49 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-nfs

On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>> A test case is as the description says:
>>> open(foobar, O_WRONLY);
>>> sleep()  --> reboot the server
>>> close(foobar)
>>>
>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>> line before going to restart, there is
>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>
>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>> out state and when we go to close it, “call_close” doesn’t get set as
>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>
>>> That line was introduced to fix an infinite loop for OPEN recovery
>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>> injecting BAD_STATEID error using the patch below and the code
>>> recovers without problems. However, I'm not sure the clearing of the
>>> bit is needed any more. I have tested for infinite loop by reverting
>>> the patch and didn't hit the infinite loop.
>>>
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index da73bc4..5db3246 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -1481,7 +1481,7 @@ restart:
>>>                                         spin_unlock(&state->state_lock);
>>>                                 }
>>>                                 nfs4_put_open_state(state);
>>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>                                         &state->flags);
>>>                                 spin_lock(&sp->so_lock);
>>>                                 goto restart;
>>
>> That's an obvious typo. Thanks for spotting it!
>>
>> As for whether or not the bit clear is needed at all, I think it is
>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>> state recovery drain the slot table (just like we've always done for
>> NFSv4.1) and so I agree that those kernels probably won't be
>> afflicted.
>>
>
> Thanks Trond. Do you need me to resubmit it without the last paragraph
> or is the patch ok as is?
>

I can easily remove that paragraph when applying the patch, if you
agree that it is superfluous.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
  2015-09-15 15:49     ` Trond Myklebust
@ 2015-09-15 16:52       ` Olga Kornievskaia
  2015-09-17 13:34         ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Olga Kornievskaia @ 2015-09-15 16:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Tue, Sep 15, 2015 at 11:49 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
>> <trond.myklebust@primarydata.com> wrote:
>>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>> A test case is as the description says:
>>>> open(foobar, O_WRONLY);
>>>> sleep()  --> reboot the server
>>>> close(foobar)
>>>>
>>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>>> line before going to restart, there is
>>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>>
>>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>>> out state and when we go to close it, “call_close” doesn’t get set as
>>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>>
>>>> That line was introduced to fix an infinite loop for OPEN recovery
>>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>>> injecting BAD_STATEID error using the patch below and the code
>>>> recovers without problems. However, I'm not sure the clearing of the
>>>> bit is needed any more. I have tested for infinite loop by reverting
>>>> the patch and didn't hit the infinite loop.
>>>>
>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>> index da73bc4..5db3246 100644
>>>> --- a/fs/nfs/nfs4state.c
>>>> +++ b/fs/nfs/nfs4state.c
>>>> @@ -1481,7 +1481,7 @@ restart:
>>>>                                         spin_unlock(&state->state_lock);
>>>>                                 }
>>>>                                 nfs4_put_open_state(state);
>>>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>>                                         &state->flags);
>>>>                                 spin_lock(&sp->so_lock);
>>>>                                 goto restart;
>>>
>>> That's an obvious typo. Thanks for spotting it!
>>>
>>> As for whether or not the bit clear is needed at all, I think it is
>>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>>> state recovery drain the slot table (just like we've always done for
>>> NFSv4.1) and so I agree that those kernels probably won't be
>>> afflicted.
>>>
>>
>> Thanks Trond. Do you need me to resubmit it without the last paragraph
>> or is the patch ok as is?
>>
>
> I can easily remove that paragraph when applying the patch, if you
> agree that it is superfluous.

Thanks. Works for me.

>
> Cheers
>   Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
  2015-09-15 16:52       ` Olga Kornievskaia
@ 2015-09-17 13:34         ` Trond Myklebust
  2015-09-17 13:36           ` Olga Kornievskaia
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2015-09-17 13:34 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-nfs

On Tue, Sep 15, 2015 at 12:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> On Tue, Sep 15, 2015 at 11:49 AM, Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>> On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
>>> <trond.myklebust@primarydata.com> wrote:
>>>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>> A test case is as the description says:
>>>>> open(foobar, O_WRONLY);
>>>>> sleep()  --> reboot the server
>>>>> close(foobar)
>>>>>
>>>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>>>> line before going to restart, there is
>>>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>>>
>>>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>>>> out state and when we go to close it, “call_close” doesn’t get set as
>>>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>>>
>>>>> That line was introduced to fix an infinite loop for OPEN recovery
>>>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>>>> injecting BAD_STATEID error using the patch below and the code
>>>>> recovers without problems. However, I'm not sure the clearing of the
>>>>> bit is needed any more. I have tested for infinite loop by reverting
>>>>> the patch and didn't hit the infinite loop.
>>>>>
>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>> index da73bc4..5db3246 100644
>>>>> --- a/fs/nfs/nfs4state.c
>>>>> +++ b/fs/nfs/nfs4state.c
>>>>> @@ -1481,7 +1481,7 @@ restart:
>>>>>                                         spin_unlock(&state->state_lock);
>>>>>                                 }
>>>>>                                 nfs4_put_open_state(state);
>>>>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>>>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>>>                                         &state->flags);
>>>>>                                 spin_lock(&sp->so_lock);
>>>>>                                 goto restart;
>>>>
>>>> That's an obvious typo. Thanks for spotting it!
>>>>
>>>> As for whether or not the bit clear is needed at all, I think it is
>>>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>>>> state recovery drain the slot table (just like we've always done for
>>>> NFSv4.1) and so I agree that those kernels probably won't be
>>>> afflicted.
>>>>
>>>
>>> Thanks Trond. Do you need me to resubmit it without the last paragraph
>>> or is the patch ok as is?
>>>
>>
>> I can easily remove that paragraph when applying the patch, if you
>> agree that it is superfluous.
>
> Thanks. Works for me.
>

May I also add a signed-off-by line from you? I can't really apply
this (or any other patches) without it.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
  2015-09-17 13:34         ` Trond Myklebust
@ 2015-09-17 13:36           ` Olga Kornievskaia
  0 siblings, 0 replies; 7+ messages in thread
From: Olga Kornievskaia @ 2015-09-17 13:36 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Thu, Sep 17, 2015 at 9:34 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> On Tue, Sep 15, 2015 at 12:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>> On Tue, Sep 15, 2015 at 11:49 AM, Trond Myklebust
>> <trond.myklebust@primarydata.com> wrote:
>>> On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
>>>> <trond.myklebust@primarydata.com> wrote:
>>>>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>> A test case is as the description says:
>>>>>> open(foobar, O_WRONLY);
>>>>>> sleep()  --> reboot the server
>>>>>> close(foobar)
>>>>>>
>>>>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>>>>> line before going to restart, there is
>>>>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>>>>
>>>>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>>>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>>>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>>>>> out state and when we go to close it, “call_close” doesn’t get set as
>>>>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>>>>
>>>>>> That line was introduced to fix an infinite loop for OPEN recovery
>>>>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>>>>> injecting BAD_STATEID error using the patch below and the code
>>>>>> recovers without problems. However, I'm not sure the clearing of the
>>>>>> bit is needed any more. I have tested for infinite loop by reverting
>>>>>> the patch and didn't hit the infinite loop.
>>>>>>
>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>>> index da73bc4..5db3246 100644
>>>>>> --- a/fs/nfs/nfs4state.c
>>>>>> +++ b/fs/nfs/nfs4state.c
>>>>>> @@ -1481,7 +1481,7 @@ restart:
>>>>>>                                         spin_unlock(&state->state_lock);
>>>>>>                                 }
>>>>>>                                 nfs4_put_open_state(state);
>>>>>> -                               clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>>>>> +                               clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>>>>                                         &state->flags);
>>>>>>                                 spin_lock(&sp->so_lock);
>>>>>>                                 goto restart;
>>>>>
>>>>> That's an obvious typo. Thanks for spotting it!
>>>>>
>>>>> As for whether or not the bit clear is needed at all, I think it is
>>>>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>>>>> state recovery drain the slot table (just like we've always done for
>>>>> NFSv4.1) and so I agree that those kernels probably won't be
>>>>> afflicted.
>>>>>
>>>>
>>>> Thanks Trond. Do you need me to resubmit it without the last paragraph
>>>> or is the patch ok as is?
>>>>
>>>
>>> I can easily remove that paragraph when applying the patch, if you
>>> agree that it is superfluous.
>>
>> Thanks. Works for me.
>>
>
> May I also add a signed-off-by line from you? I can't really apply
> this (or any other patches) without it.

Of course.

>
> Cheers
>   Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-09-17 13:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-14 23:54 Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount Olga Kornievskaia
2015-09-15 13:39 ` Trond Myklebust
2015-09-15 14:27   ` Olga Kornievskaia
2015-09-15 15:49     ` Trond Myklebust
2015-09-15 16:52       ` Olga Kornievskaia
2015-09-17 13:34         ` Trond Myklebust
2015-09-17 13:36           ` Olga Kornievskaia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).