All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] nfs41: Do not free slot if retried while operation was in progress
@ 2009-09-25  4:29 Benny Halevy
  2009-09-25 15:01 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Benny Halevy @ 2009-09-25  4:29 UTC (permalink / raw)
  To: trond.myklebust; +Cc: pnfs, linux-nfs, Benny Halevy

The client cannot reuse a slot while the server's
still working on the previous RPC using it.

The symptom Bruce saw without this patch was occasional
ESERVERFAULT during Connectathon special tests, 30 MB write/read test.

Reported-by: J. Bruce Fields <bfields@fieldses.org>
Tested-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4proc.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index be6544a..70895d6 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -386,6 +386,12 @@ static void nfs41_sequence_done(struct nfs_client *clp,
 		spin_unlock(&clp->cl_lock);
 		return;
 	}
+
+	/* Do not free slot if retried while operation was in progress */
+	if (res->sr_status == -NFS4ERR_DELAY) {
+		dprintk("%s: Operation in progress\n", __func__);
+		return;
+	}
 out:
 	/* The session may be reset by one of the error handlers. */
 	dprintk("%s: Error %d free the slot \n", __func__, res->sr_status);
-- 
1.6.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] nfs41: Do not free slot if retried while operation was in progress
  2009-09-25  4:29 [PATCH 1/1] nfs41: Do not free slot if retried while operation was in progress Benny Halevy
@ 2009-09-25 15:01 ` Trond Myklebust
       [not found]   ` <1253890918.31072.35.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2009-09-25 15:01 UTC (permalink / raw)
  To: Benny Halevy; +Cc: pnfs, linux-nfs

On Fri, 2009-09-25 at 07:29 +0300, Benny Halevy wrote:
> The client cannot reuse a slot while the server's
> still working on the previous RPC using it.
> 
> The symptom Bruce saw without this patch was occasional
> ESERVERFAULT during Connectathon special tests, 30 MB write/read test.
> 
> Reported-by: J. Bruce Fields <bfields@fieldses.org>
> Tested-by: J. Bruce Fields <bfields@fieldses.org>
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> ---
>  fs/nfs/nfs4proc.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index be6544a..70895d6 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -386,6 +386,12 @@ static void nfs41_sequence_done(struct nfs_client *clp,
>  		spin_unlock(&clp->cl_lock);
>  		return;
>  	}
> +
> +	/* Do not free slot if retried while operation was in progress */
> +	if (res->sr_status == -NFS4ERR_DELAY) {
> +		dprintk("%s: Operation in progress\n", __func__);
> +		return;
> +	}
>  out:
>  	/* The session may be reset by one of the error handlers. */
>  	dprintk("%s: Error %d free the slot \n", __func__, res->sr_status);

OK... Can somebody please tell me how this is supposed to work? As far
as I can see, most callers of nfs41_sequence_done() are calling
nfs41_sequence_free_slot() themselves.

IOW: Who is responsible for calling nfs41_sequence_free_slot()? Is it
nfs41_sequence_done(), or is it the callers, or is it some combination
of the two (and if so, what determines who calls it)?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] nfs41: Do not free slot if retried while operation was in progress
       [not found]   ` <1253890918.31072.35.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-09-25 18:07     ` Andy Adamson
  2009-09-25 23:04       ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Andy Adamson @ 2009-09-25 18:07 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Benny Halevy, pnfs, linux-nfs


On Sep 25, 2009, at 11:01 AM, Trond Myklebust wrote:

> On Fri, 2009-09-25 at 07:29 +0300, Benny Halevy wrote:
>> The client cannot reuse a slot while the server's
>> still working on the previous RPC using it.
>>
>> The symptom Bruce saw without this patch was occasional
>> ESERVERFAULT during Connectathon special tests, 30 MB write/read  
>> test.
>>
>> Reported-by: J. Bruce Fields <bfields@fieldses.org>
>> Tested-by: J. Bruce Fields <bfields@fieldses.org>
>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>> ---
>> fs/nfs/nfs4proc.c |    6 ++++++
>> 1 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index be6544a..70895d6 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -386,6 +386,12 @@ static void nfs41_sequence_done(struct  
>> nfs_client *clp,
>> 		spin_unlock(&clp->cl_lock);
>> 		return;
>> 	}
>> +
>> +	/* Do not free slot if retried while operation was in progress */
>> +	if (res->sr_status == -NFS4ERR_DELAY) {
>> +		dprintk("%s: Operation in progress\n", __func__);
>> +		return;
>> +	}
>> out:
>> 	/* The session may be reset by one of the error handlers. */
>> 	dprintk("%s: Error %d free the slot \n", __func__, res->sr_status);
>
> OK... Can somebody please tell me how this is supposed to work? As far
> as I can see, most callers of nfs41_sequence_done() are calling
> nfs41_sequence_free_slot() themselves.
>
> IOW: Who is responsible for calling nfs41_sequence_free_slot()? Is it
> nfs41_sequence_done(), or is it the callers, or is it some combination
> of the two (and if so, what determines who calls it)?

It's a combination.

nfs41_sequence_free_slot was separated from nfs41_sequence_done so  
that the error handling code such as nfs4_async_handle_error or  
calling rpc_restart could retry a call using the same session slot -  
we didn't want to give up the slot just to wait for a new slot.

So  nfs41_sequence_free_slot  is called by the callers of  
nfs41_sequence_done when:
1)  the caller either does not try to handle and error or restart.  
nfs4_sequence_done_free_slot is used for this,  
pnfs_layoutcommit_rpc_done is also an example.

2) The restart attempt is also done by the caller.  nfs4_close_done is  
an example where nfs4_async_handle_error is called in between the  
calls to nfs4_sequence_done and nfs4_sequence free_slot.

nfs41_sequence_free_slot is not called by the caller of  
nfs41_sequence_done when:

1) the caller does not contain the restart code.  The read and write  
paths are an example of this. For read, nfs4_sequence_done is called  
in nfs4_read_done  but the retry code is called in nfs_readpage_retry  
which is where the corresponding nfs4_sequence_free_slot is called.  
For write, nfs4_sequence_done is called in nfs4_write_done  but the  
retry code is in nfs_writeback_done where nfs4_sequence_free_slot is  
called.

-->Andy

>
> -- 
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] nfs41: Do not free slot if retried while operation was in progress
  2009-09-25 18:07     ` Andy Adamson
@ 2009-09-25 23:04       ` Trond Myklebust
       [not found]         ` <1253919848.3549.22.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2009-09-25 23:04 UTC (permalink / raw)
  To: Andy Adamson; +Cc: Benny Halevy, pnfs, linux-nfs

On Fri, 2009-09-25 at 14:07 -0400, Andy Adamson wrote:
> On Sep 25, 2009, at 11:01 AM, Trond Myklebust wrote:
> 
> > On Fri, 2009-09-25 at 07:29 +0300, Benny Halevy wrote:
> >> The client cannot reuse a slot while the server's
> >> still working on the previous RPC using it.
> >>
> >> The symptom Bruce saw without this patch was occasional
> >> ESERVERFAULT during Connectathon special tests, 30 MB write/read  
> >> test.
> >>
> >> Reported-by: J. Bruce Fields <bfields@fieldses.org>
> >> Tested-by: J. Bruce Fields <bfields@fieldses.org>
> >> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> >> ---
> >> fs/nfs/nfs4proc.c |    6 ++++++
> >> 1 files changed, 6 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> >> index be6544a..70895d6 100644
> >> --- a/fs/nfs/nfs4proc.c
> >> +++ b/fs/nfs/nfs4proc.c
> >> @@ -386,6 +386,12 @@ static void nfs41_sequence_done(struct  
> >> nfs_client *clp,
> >> 		spin_unlock(&clp->cl_lock);
> >> 		return;
> >> 	}
> >> +
> >> +	/* Do not free slot if retried while operation was in progress */
> >> +	if (res->sr_status == -NFS4ERR_DELAY) {
> >> +		dprintk("%s: Operation in progress\n", __func__);
> >> +		return;
> >> +	}
> >> out:
> >> 	/* The session may be reset by one of the error handlers. */
> >> 	dprintk("%s: Error %d free the slot \n", __func__, res->sr_status);
> >
> > OK... Can somebody please tell me how this is supposed to work? As far
> > as I can see, most callers of nfs41_sequence_done() are calling
> > nfs41_sequence_free_slot() themselves.
> >
> > IOW: Who is responsible for calling nfs41_sequence_free_slot()? Is it
> > nfs41_sequence_done(), or is it the callers, or is it some combination
> > of the two (and if so, what determines who calls it)?
> 
> It's a combination.
> 
> nfs41_sequence_free_slot was separated from nfs41_sequence_done so  
> that the error handling code such as nfs4_async_handle_error or  
> calling rpc_restart could retry a call using the same session slot -  
> we didn't want to give up the slot just to wait for a new slot.
> 
> So  nfs41_sequence_free_slot  is called by the callers of  
> nfs41_sequence_done when:
> 1)  the caller either does not try to handle and error or restart.  
> nfs4_sequence_done_free_slot is used for this,  
> pnfs_layoutcommit_rpc_done is also an example.
> 
> 2) The restart attempt is also done by the caller.  nfs4_close_done is  
> an example where nfs4_async_handle_error is called in between the  
> calls to nfs4_sequence_done and nfs4_sequence free_slot.
> 
> nfs41_sequence_free_slot is not called by the caller of  
> nfs41_sequence_done when:
> 
> 1) the caller does not contain the restart code.  The read and write  
> paths are an example of this. For read, nfs4_sequence_done is called  
> in nfs4_read_done  but the retry code is called in nfs_readpage_retry  
> which is where the corresponding nfs4_sequence_free_slot is called.  
> For write, nfs4_sequence_done is called in nfs4_write_done  but the  
> retry code is in nfs_writeback_done where nfs4_sequence_free_slot is  
> called.

So...

Look for instance at nfs41_call_sync_done(), which will call
nfs41_sequence_free_slot() _twice_ if res->sr_status is not 0 or 1, or
if res->sr_slotid == NFS4_MAX_SLOT_TABLE.

Ditto in nfs4_get_lease_time_done(), unless task->tk_status happens to
be NFS4ERR_DELAY.

IOW: Afaics, all errors that happen to the SEQUENCE operation (e.g.
NFS4ERR_BADSESSION, NFS4ERR_CONN_NOT_BOUND_TO_SESSION,
NFS4ERR_DEADSESSION, ...) will apparently result in a double call to
nfs41_sequence_free_slot().

Please could we make a decision here. Either all callers are responsible
for freeing the slot, or you have to ensure that they can safely call
nfs41_sequence_free_slot twice.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] nfs41: Do not free slot if retried while operation was in progress
       [not found]         ` <1253919848.3549.22.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-09-27  9:13           ` Benny Halevy
  0 siblings, 0 replies; 5+ messages in thread
From: Benny Halevy @ 2009-09-27  9:13 UTC (permalink / raw)
  To: Trond Myklebust, Andy Adamson; +Cc: pnfs, linux-nfs

On Sep. 26, 2009, 2:04 +0300, Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> On Fri, 2009-09-25 at 14:07 -0400, Andy Adamson wrote:
>> On Sep 25, 2009, at 11:01 AM, Trond Myklebust wrote:
>>
>>> On Fri, 2009-09-25 at 07:29 +0300, Benny Halevy wrote:
>>>> The client cannot reuse a slot while the server's
>>>> still working on the previous RPC using it.
>>>>
>>>> The symptom Bruce saw without this patch was occasional
>>>> ESERVERFAULT during Connectathon special tests, 30 MB write/read  
>>>> test.
>>>>
>>>> Reported-by: J. Bruce Fields <bfields@fieldses.org>
>>>> Tested-by: J. Bruce Fields <bfields@fieldses.org>
>>>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>>>> ---
>>>> fs/nfs/nfs4proc.c |    6 ++++++
>>>> 1 files changed, 6 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index be6544a..70895d6 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -386,6 +386,12 @@ static void nfs41_sequence_done(struct  
>>>> nfs_client *clp,
>>>> 		spin_unlock(&clp->cl_lock);
>>>> 		return;
>>>> 	}
>>>> +
>>>> +	/* Do not free slot if retried while operation was in progress */
>>>> +	if (res->sr_status == -NFS4ERR_DELAY) {
>>>> +		dprintk("%s: Operation in progress\n", __func__);
>>>> +		return;
>>>> +	}
>>>> out:
>>>> 	/* The session may be reset by one of the error handlers. */
>>>> 	dprintk("%s: Error %d free the slot \n", __func__, res->sr_status);
>>> OK... Can somebody please tell me how this is supposed to work? As far
>>> as I can see, most callers of nfs41_sequence_done() are calling
>>> nfs41_sequence_free_slot() themselves.
>>>
>>> IOW: Who is responsible for calling nfs41_sequence_free_slot()? Is it
>>> nfs41_sequence_done(), or is it the callers, or is it some combination
>>> of the two (and if so, what determines who calls it)?
>> It's a combination.
>>
>> nfs41_sequence_free_slot was separated from nfs41_sequence_done so  
>> that the error handling code such as nfs4_async_handle_error or  
>> calling rpc_restart could retry a call using the same session slot -  
>> we didn't want to give up the slot just to wait for a new slot.
>>
>> So  nfs41_sequence_free_slot  is called by the callers of  
>> nfs41_sequence_done when:
>> 1)  the caller either does not try to handle and error or restart.  
>> nfs4_sequence_done_free_slot is used for this,  
>> pnfs_layoutcommit_rpc_done is also an example.
>>
>> 2) The restart attempt is also done by the caller.  nfs4_close_done is  
>> an example where nfs4_async_handle_error is called in between the  
>> calls to nfs4_sequence_done and nfs4_sequence free_slot.
>>
>> nfs41_sequence_free_slot is not called by the caller of  
>> nfs41_sequence_done when:
>>
>> 1) the caller does not contain the restart code.  The read and write  
>> paths are an example of this. For read, nfs4_sequence_done is called  
>> in nfs4_read_done  but the retry code is called in nfs_readpage_retry  
>> which is where the corresponding nfs4_sequence_free_slot is called.  
>> For write, nfs4_sequence_done is called in nfs4_write_done  but the  
>> retry code is in nfs_writeback_done where nfs4_sequence_free_slot is  
>> called.
> 
> So...
> 
> Look for instance at nfs41_call_sync_done(), which will call
> nfs41_sequence_free_slot() _twice_ if res->sr_status is not 0 or 1, or
> if res->sr_slotid == NFS4_MAX_SLOT_TABLE.
> 
> Ditto in nfs4_get_lease_time_done(), unless task->tk_status happens to
> be NFS4ERR_DELAY.
> 
> IOW: Afaics, all errors that happen to the SEQUENCE operation (e.g.
> NFS4ERR_BADSESSION, NFS4ERR_CONN_NOT_BOUND_TO_SESSION,
> NFS4ERR_DEADSESSION, ...) will apparently result in a double call to
> nfs41_sequence_free_slot().
> 
> Please could we make a decision here. Either all callers are responsible
> for freeing the slot, or you have to ensure that they can safely call
> nfs41_sequence_free_slot twice.
> 

Agreed.

Benny

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-09-27  9:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-25  4:29 [PATCH 1/1] nfs41: Do not free slot if retried while operation was in progress Benny Halevy
2009-09-25 15:01 ` Trond Myklebust
     [not found]   ` <1253890918.31072.35.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-09-25 18:07     ` Andy Adamson
2009-09-25 23:04       ` Trond Myklebust
     [not found]         ` <1253919848.3549.22.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-09-27  9:13           ` Benny Halevy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.