* [PATCH 1/2] nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
@ 2015-06-02 10:59 Kinglong Mee
2015-06-03 15:03 ` J. Bruce Fields
0 siblings, 1 reply; 4+ messages in thread
From: Kinglong Mee @ 2015-06-02 10:59 UTC (permalink / raw)
To: J. Bruce Fields, linux-nfs@vger.kernel.org
Cc: Christoph Hellwig, Trond Myklebust, kinglongmee
nfsd enters a infinite loop and print message per 10 seconds,
May 31 18:33:52 test-server kernel: Error sending entire callback!
May 31 18:34:01 test-server kernel: Error sending entire callback!
It is caused by a cb_layoutreturn got error -10008 (NFS4ERR_DELAY),
and then, the client crash, nfsd enter the infinite loop.
bc_sendto --> call_timeout --> nfsd4_cb_done --> nfsd4_cb_layout_done
with error -10008 --> rpc_delay(task, HZ/100) --> bc_sendto ...
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
fs/nfsd/nfs4callback.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 5694cfb..8b1ac8d 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -875,6 +875,7 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
u32 minorversion = clp->cl_minorversion;
cb->cb_minorversion = minorversion;
+ cb->cb_status = 0;
if (minorversion) {
if (!nfsd41_cb_get_slot(clp, task))
return;
--
2.4.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
2015-06-02 10:59 [PATCH 1/2] nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying Kinglong Mee
@ 2015-06-03 15:03 ` J. Bruce Fields
2015-06-04 0:06 ` Kinglong Mee
0 siblings, 1 reply; 4+ messages in thread
From: J. Bruce Fields @ 2015-06-03 15:03 UTC (permalink / raw)
To: Kinglong Mee
Cc: linux-nfs@vger.kernel.org, Christoph Hellwig, Trond Myklebust
On Tue, Jun 02, 2015 at 06:59:19PM +0800, Kinglong Mee wrote:
> nfsd enters a infinite loop and print message per 10 seconds,
>
> May 31 18:33:52 test-server kernel: Error sending entire callback!
> May 31 18:34:01 test-server kernel: Error sending entire callback!
>
> It is caused by a cb_layoutreturn got error -10008 (NFS4ERR_DELAY),
> and then, the client crash, nfsd enter the infinite loop.
>
> bc_sendto --> call_timeout --> nfsd4_cb_done --> nfsd4_cb_layout_done
> with error -10008 --> rpc_delay(task, HZ/100) --> bc_sendto ...
How are you reproducing this?
--b.
>
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
> fs/nfsd/nfs4callback.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 5694cfb..8b1ac8d 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -875,6 +875,7 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
> u32 minorversion = clp->cl_minorversion;
>
> cb->cb_minorversion = minorversion;
> + cb->cb_status = 0;
> if (minorversion) {
> if (!nfsd41_cb_get_slot(clp, task))
> return;
> --
> 2.4.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
2015-06-03 15:03 ` J. Bruce Fields
@ 2015-06-04 0:06 ` Kinglong Mee
2015-06-04 20:41 ` J. Bruce Fields
0 siblings, 1 reply; 4+ messages in thread
From: Kinglong Mee @ 2015-06-04 0:06 UTC (permalink / raw)
To: J. Bruce Fields
Cc: linux-nfs@vger.kernel.org, Christoph Hellwig, Trond Myklebust
On 6/3/2015 11:03 PM, J. Bruce Fields wrote:
> On Tue, Jun 02, 2015 at 06:59:19PM +0800, Kinglong Mee wrote:
>> nfsd enters a infinite loop and print message per 10 seconds,
>>
>> May 31 18:33:52 test-server kernel: Error sending entire callback!
>> May 31 18:34:01 test-server kernel: Error sending entire callback!
>>
>> It is caused by a cb_layoutreturn got error -10008 (NFS4ERR_DELAY),
>> and then, the client crash, nfsd enter the infinite loop.
>>
>> bc_sendto --> call_timeout --> nfsd4_cb_done --> nfsd4_cb_layout_done
>> with error -10008 --> rpc_delay(task, HZ/100) --> bc_sendto ...
>
> How are you reproducing this?
Yes,
I test it by xfstests 074 with nfs client's kdump is on,
set CONFIG_DEFAULT_HUNG_TASK_TIMEOUT, and client's blkmapd is down.
1. nfs client's write operation will get the layout of file,
and then the getdeviceinfo,
2. but layout segment is not record by client for blkmapd is down,
3. client write data by sending WRITE to server,
4. nfs server will recall the layout of the file before WRITE,
5. network error cause the client reset the session and return NFS4ERR_DELAY,
6. so client's WRITE operation is waiting the reply,
if the task hang 120s, client will crash.
7. so that, the next bc_sendto will fail with TIMEOUT,
and cb_status is NFS4ERR_DELAY.
thanks,
Kinglong Mee
>
> --b.
>
>>
>> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
>> ---
>> fs/nfsd/nfs4callback.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>> index 5694cfb..8b1ac8d 100644
>> --- a/fs/nfsd/nfs4callback.c
>> +++ b/fs/nfsd/nfs4callback.c
>> @@ -875,6 +875,7 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
>> u32 minorversion = clp->cl_minorversion;
>>
>> cb->cb_minorversion = minorversion;
>> + cb->cb_status = 0;
>> if (minorversion) {
>> if (!nfsd41_cb_get_slot(clp, task))
>> return;
>> --
>> 2.4.2
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
2015-06-04 0:06 ` Kinglong Mee
@ 2015-06-04 20:41 ` J. Bruce Fields
0 siblings, 0 replies; 4+ messages in thread
From: J. Bruce Fields @ 2015-06-04 20:41 UTC (permalink / raw)
To: Kinglong Mee
Cc: linux-nfs@vger.kernel.org, Christoph Hellwig, Trond Myklebust
On Thu, Jun 04, 2015 at 08:06:57AM +0800, Kinglong Mee wrote:
> On 6/3/2015 11:03 PM, J. Bruce Fields wrote:
> > On Tue, Jun 02, 2015 at 06:59:19PM +0800, Kinglong Mee wrote:
> >> nfsd enters a infinite loop and print message per 10 seconds,
> >>
> >> May 31 18:33:52 test-server kernel: Error sending entire callback!
> >> May 31 18:34:01 test-server kernel: Error sending entire callback!
> >>
> >> It is caused by a cb_layoutreturn got error -10008 (NFS4ERR_DELAY),
> >> and then, the client crash, nfsd enter the infinite loop.
> >>
> >> bc_sendto --> call_timeout --> nfsd4_cb_done --> nfsd4_cb_layout_done
> >> with error -10008 --> rpc_delay(task, HZ/100) --> bc_sendto ...
> >
> > How are you reproducing this?
>
> Yes,
>
> I test it by xfstests 074 with nfs client's kdump is on,
> set CONFIG_DEFAULT_HUNG_TASK_TIMEOUT, and client's blkmapd is down.
>
> 1. nfs client's write operation will get the layout of file,
> and then the getdeviceinfo,
> 2. but layout segment is not record by client for blkmapd is down,
> 3. client write data by sending WRITE to server,
> 4. nfs server will recall the layout of the file before WRITE,
> 5. network error cause the client reset the session and return NFS4ERR_DELAY,
> 6. so client's WRITE operation is waiting the reply,
> if the task hang 120s, client will crash.
> 7. so that, the next bc_sendto will fail with TIMEOUT,
> and cb_status is NFS4ERR_DELAY.
OK, that's complicated. Sounds like you're giving this code a
workout--thanks. I'll add the reproducer to the changelog....
--b.
>
> thanks,
> Kinglong Mee
>
> >
> > --b.
> >
> >>
> >> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> >> ---
> >> fs/nfsd/nfs4callback.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> >> index 5694cfb..8b1ac8d 100644
> >> --- a/fs/nfsd/nfs4callback.c
> >> +++ b/fs/nfsd/nfs4callback.c
> >> @@ -875,6 +875,7 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
> >> u32 minorversion = clp->cl_minorversion;
> >>
> >> cb->cb_minorversion = minorversion;
> >> + cb->cb_status = 0;
> >> if (minorversion) {
> >> if (!nfsd41_cb_get_slot(clp, task))
> >> return;
> >> --
> >> 2.4.2
> >
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-06-04 20:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-02 10:59 [PATCH 1/2] nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying Kinglong Mee
2015-06-03 15:03 ` J. Bruce Fields
2015-06-04 0:06 ` Kinglong Mee
2015-06-04 20:41 ` J. Bruce Fields
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.