All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [pnfs] nfs41_sequence_done
       [not found] ` <20100712182927.GB22461-8f4Pc2RrbJmHXe+LvDLADg@public.gmane.org>
@ 2010-07-12 18:58   ` Benny Halevy
  2010-07-12 19:14     ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Benny Halevy @ 2010-07-12 18:58 UTC (permalink / raw)
  To: Jim Rees; +Cc: NFS list, Trond Myklebust

[pnfs@linux-nfs.org -> linux-nfs@vger.kernel.org]

On Jul. 12, 2010, 21:29 +0300, Jim Rees <rees@umich.edu> wrote:
> Does anyone still care about this?
> 
>  WARNING: nfs41_sequence_done: Operation in progress slot=1 seq=7 highest_used_slotid=1: please report to pnfs@linux-nfs.org if you saw this message

Heh, need to update hard-coded instructions to point to the new list...

> 
> I'm getting this on the client side of a pnfs block layout mount against the
> spnfs server.  Kernel is benny's pnfs-all-2.6.35-rc3-2010-07-01 plus EMC
> complex block layout patches.  It's possible the complex layout code is to
> blame, but I doubt it because this isn't a complex layout mount.  I can
> provide more details.

I agree.  This is a generic issue.
The patch that  adds this check is
d6ce9ad DEVONLY: nfs41: Do not free slot if retried while operation was in progress

It was originally rejected (http://www.spinics.net/lists/linux-nfs/msg09562.html)
due to noise regarding where nfs41_sequence_free_slot is called
but that masked the real issue.

Can you readily reproduce this?
Can you debug also the server side to see if indeed the client retries the RPC
while it is in progress on the server?

Benny

> _______________________________________________
> NOTE: THIS LIST IS DEPRECATED.  Please use linux-nfs@vger.kernel.org
> instead.  (To subscribe to linux-nfs@vger.kernel.org: send "subscribe
> linux-nfs" in the body of a message to majordomo-u79uwXL29TaiAVqoAR/hOA@public.gmane.org)
> 
> pNFS mailing list
> pNFS@linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pnfs] nfs41_sequence_done
  2010-07-12 18:58   ` [pnfs] nfs41_sequence_done Benny Halevy
@ 2010-07-12 19:14     ` Trond Myklebust
       [not found]       ` <1278962046.12559.3.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2010-07-12 19:14 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Jim Rees, NFS list

On Mon, 2010-07-12 at 21:58 +0300, Benny Halevy wrote:
> [pnfs@linux-nfs.org -> linux-nfs@vger.kernel.org]
> 
> On Jul. 12, 2010, 21:29 +0300, Jim Rees <rees@umich.edu> wrote:
> > Does anyone still care about this?
> > 
> >  WARNING: nfs41_sequence_done: Operation in progress slot=1 seq=7 highest_used_slotid=1: please report to pnfs@linux-nfs.org if you saw this message
> 
> Heh, need to update hard-coded instructions to point to the new list...
> 
> > 
> > I'm getting this on the client side of a pnfs block layout mount against the
> > spnfs server.  Kernel is benny's pnfs-all-2.6.35-rc3-2010-07-01 plus EMC
> > complex block layout patches.  It's possible the complex layout code is to
> > blame, but I doubt it because this isn't a complex layout mount.  I can
> > provide more details.
> 
> I agree.  This is a generic issue.
> The patch that  adds this check is
> d6ce9ad DEVONLY: nfs41: Do not free slot if retried while operation was in progress
> 
> It was originally rejected (http://www.spinics.net/lists/linux-nfs/msg09562.html)
> due to noise regarding where nfs41_sequence_free_slot is called
> but that masked the real issue.
> 
> Can you readily reproduce this?
> Can you debug also the server side to see if indeed the client retries the RPC
> while it is in progress on the server?

So what is the root cause here? Is it the known issue that we don't deal
correctly with an NFS4ERR_DELAY on the SEQUENCE operation?

Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pnfs] nfs41_sequence_done
       [not found]       ` <1278962046.12559.3.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2010-07-12 19:16         ` Benny Halevy
  2010-07-12 19:26           ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Benny Halevy @ 2010-07-12 19:16 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, NFS list

On Jul. 12, 2010, 22:14 +0300, Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> On Mon, 2010-07-12 at 21:58 +0300, Benny Halevy wrote:
>> [pnfs@linux-nfs.org -> linux-nfs@vger.kernel.org]
>>
>> On Jul. 12, 2010, 21:29 +0300, Jim Rees <rees@umich.edu> wrote:
>>> Does anyone still care about this?
>>>
>>>  WARNING: nfs41_sequence_done: Operation in progress slot=1 seq=7 highest_used_slotid=1: please report to pnfs@linux-nfs.org if you saw this message
>>
>> Heh, need to update hard-coded instructions to point to the new list...
>>
>>>
>>> I'm getting this on the client side of a pnfs block layout mount against the
>>> spnfs server.  Kernel is benny's pnfs-all-2.6.35-rc3-2010-07-01 plus EMC
>>> complex block layout patches.  It's possible the complex layout code is to
>>> blame, but I doubt it because this isn't a complex layout mount.  I can
>>> provide more details.
>>
>> I agree.  This is a generic issue.
>> The patch that  adds this check is
>> d6ce9ad DEVONLY: nfs41: Do not free slot if retried while operation was in progress
>>
>> It was originally rejected (http://www.spinics.net/lists/linux-nfs/msg09562.html)
>> due to noise regarding where nfs41_sequence_free_slot is called
>> but that masked the real issue.
>>
>> Can you readily reproduce this?
>> Can you debug also the server side to see if indeed the client retries the RPC
>> while it is in progress on the server?
> 
> So what is the root cause here? Is it the known issue that we don't deal
> correctly with an NFS4ERR_DELAY on the SEQUENCE operation?

Yes.

> 
> Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pnfs] nfs41_sequence_done
  2010-07-12 19:16         ` Benny Halevy
@ 2010-07-12 19:26           ` Trond Myklebust
       [not found]             ` <1278962763.12559.10.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2010-07-12 19:26 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Jim Rees, NFS list

On Mon, 2010-07-12 at 22:16 +0300, Benny Halevy wrote:
> On Jul. 12, 2010, 22:14 +0300, Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > On Mon, 2010-07-12 at 21:58 +0300, Benny Halevy wrote:
> >> [pnfs@linux-nfs.org -> linux-nfs@vger.kernel.org]
> >>
> >> On Jul. 12, 2010, 21:29 +0300, Jim Rees <rees@umich.edu> wrote:
> >>> Does anyone still care about this?
> >>>
> >>>  WARNING: nfs41_sequence_done: Operation in progress slot=1 seq=7 highest_used_slotid=1: please report to pnfs@linux-nfs.org if you saw this message
> >>
> >> Heh, need to update hard-coded instructions to point to the new list...
> >>
> >>>
> >>> I'm getting this on the client side of a pnfs block layout mount against the
> >>> spnfs server.  Kernel is benny's pnfs-all-2.6.35-rc3-2010-07-01 plus EMC
> >>> complex block layout patches.  It's possible the complex layout code is to
> >>> blame, but I doubt it because this isn't a complex layout mount.  I can
> >>> provide more details.
> >>
> >> I agree.  This is a generic issue.
> >> The patch that  adds this check is
> >> d6ce9ad DEVONLY: nfs41: Do not free slot if retried while operation was in progress
> >>
> >> It was originally rejected (http://www.spinics.net/lists/linux-nfs/msg09562.html)
> >> due to noise regarding where nfs41_sequence_free_slot is called
> >> but that masked the real issue.
> >>
> >> Can you readily reproduce this?
> >> Can you debug also the server side to see if indeed the client retries the RPC
> >> while it is in progress on the server?
> > 
> > So what is the root cause here? Is it the known issue that we don't deal
> > correctly with an NFS4ERR_DELAY on the SEQUENCE operation?
> 
> Yes.

I'm happy to take patches to fix that.

Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nfs41: Do not free slot if retried while operation was in progress
       [not found]             ` <1278962763.12559.10.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2010-07-12 19:49               ` Benny Halevy
  2010-07-12 19:59                 ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Benny Halevy @ 2010-07-12 19:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, NFS list

On Jul. 12, 2010, 22:26 +0300, Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> On Mon, 2010-07-12 at 22:16 +0300, Benny Halevy wrote:
>> On Jul. 12, 2010, 22:14 +0300, Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
>>> On Mon, 2010-07-12 at 21:58 +0300, Benny Halevy wrote:
>>>> [pnfs@linux-nfs.org -> linux-nfs@vger.kernel.org]
>>>>
>>>> On Jul. 12, 2010, 21:29 +0300, Jim Rees <rees@umich.edu> wrote:
>>>>> Does anyone still care about this?
>>>>>
>>>>>  WARNING: nfs41_sequence_done: Operation in progress slot=1 seq=7 highest_used_slotid=1: please report to pnfs@linux-nfs.org if you saw this message
>>>>
>>>> Heh, need to update hard-coded instructions to point to the new list...
>>>>
>>>>>
>>>>> I'm getting this on the client side of a pnfs block layout mount against the
>>>>> spnfs server.  Kernel is benny's pnfs-all-2.6.35-rc3-2010-07-01 plus EMC
>>>>> complex block layout patches.  It's possible the complex layout code is to
>>>>> blame, but I doubt it because this isn't a complex layout mount.  I can
>>>>> provide more details.
>>>>
>>>> I agree.  This is a generic issue.
>>>> The patch that  adds this check is
>>>> d6ce9ad DEVONLY: nfs41: Do not free slot if retried while operation was in progress
>>>>
>>>> It was originally rejected (http://www.spinics.net/lists/linux-nfs/msg09562.html)
>>>> due to noise regarding where nfs41_sequence_free_slot is called
>>>> but that masked the real issue.
>>>>
>>>> Can you readily reproduce this?
>>>> Can you debug also the server side to see if indeed the client retries the RPC
>>>> while it is in progress on the server?
>>>
>>> So what is the root cause here? Is it the known issue that we don't deal
>>> correctly with an NFS4ERR_DELAY on the SEQUENCE operation?
>>
>> Yes.
> 
> I'm happy to take patches to fix that.
> 
> Trond

>From 7dc3c468463a337dabff7f714a3475e3f51380f6 Mon Sep 17 00:00:00 2001
From: Benny Halevy <bhalevy@panasas.com>
Date: Mon, 12 Jul 2010 22:42:15 +0300
Subject: [PATCH] nfs41: Do not free slot if retried while operation was in progress

Getting NFS4ERR_DELAY on OP_SEQUENCE means that the compound was retried
while it's still in progress on the server.  Therefore its respective
slot must not be freed and reused for other compounds until it either
succeeds or fails with another error status.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---

That fixed, do we ensure that the client either closes or loses the connection
before retrying?

 fs/nfs/nfs4proc.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 70015dd..baf86b9 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -425,6 +425,12 @@ static void nfs41_sequence_done(struct nfs_client *clp,
 		/* Check sequence flags */
 		if (atomic_read(&clp->cl_count) > 1)
 			nfs41_handle_sequence_flag_errors(clp, res->sr_status_flags);
+	} else if (unlikely(res->sr_status == -NFS4ERR_DELAY)) {
+		/* Do not free slot if retried while operation was in progress */
+		tbl = &res->sr_session->fc_slot_table;
+		dprintk("%s: slot=%d seq=%d: Operation in progress\n", __func__,
+			res->sr_slotid, tbl->slots[res->sr_slotid].seq_nr);
+		return;
 	}
 out:
 	/* The session may be reset by one of the error handlers. */

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfs41: Do not free slot if retried while operation was in progress
  2010-07-12 19:49               ` [PATCH] nfs41: Do not free slot if retried while operation was in progress Benny Halevy
@ 2010-07-12 19:59                 ` Trond Myklebust
  0 siblings, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2010-07-12 19:59 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Jim Rees, NFS list

On Mon, 2010-07-12 at 22:49 +0300, Benny Halevy wrote:
> On Jul. 12, 2010, 22:26 +0300, Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > On Mon, 2010-07-12 at 22:16 +0300, Benny Halevy wrote:
> >> On Jul. 12, 2010, 22:14 +0300, Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> >>> On Mon, 2010-07-12 at 21:58 +0300, Benny Halevy wrote:
> >>>> [pnfs@linux-nfs.org -> linux-nfs@vger.kernel.org]
> >>>>
> >>>> On Jul. 12, 2010, 21:29 +0300, Jim Rees <rees@umich.edu> wrote:
> >>>>> Does anyone still care about this?
> >>>>>
> >>>>>  WARNING: nfs41_sequence_done: Operation in progress slot=1 seq=7 highest_used_slotid=1: please report to pnfs@linux-nfs.org if you saw this message
> >>>>
> >>>> Heh, need to update hard-coded instructions to point to the new list...
> >>>>
> >>>>>
> >>>>> I'm getting this on the client side of a pnfs block layout mount against the
> >>>>> spnfs server.  Kernel is benny's pnfs-all-2.6.35-rc3-2010-07-01 plus EMC
> >>>>> complex block layout patches.  It's possible the complex layout code is to
> >>>>> blame, but I doubt it because this isn't a complex layout mount.  I can
> >>>>> provide more details.
> >>>>
> >>>> I agree.  This is a generic issue.
> >>>> The patch that  adds this check is
> >>>> d6ce9ad DEVONLY: nfs41: Do not free slot if retried while operation was in progress
> >>>>
> >>>> It was originally rejected (http://www.spinics.net/lists/linux-nfs/msg09562.html)
> >>>> due to noise regarding where nfs41_sequence_free_slot is called
> >>>> but that masked the real issue.
> >>>>
> >>>> Can you readily reproduce this?
> >>>> Can you debug also the server side to see if indeed the client retries the RPC
> >>>> while it is in progress on the server?
> >>>
> >>> So what is the root cause here? Is it the known issue that we don't deal
> >>> correctly with an NFS4ERR_DELAY on the SEQUENCE operation?
> >>
> >> Yes.
> > 
> > I'm happy to take patches to fix that.
> > 
> > Trond
> 
> >From 7dc3c468463a337dabff7f714a3475e3f51380f6 Mon Sep 17 00:00:00 2001
> From: Benny Halevy <bhalevy@panasas.com>
> Date: Mon, 12 Jul 2010 22:42:15 +0300
> Subject: [PATCH] nfs41: Do not free slot if retried while operation was in progress
> 
> Getting NFS4ERR_DELAY on OP_SEQUENCE means that the compound was retried
> while it's still in progress on the server.  Therefore its respective
> slot must not be freed and reused for other compounds until it either
> succeeds or fails with another error status.
> 
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> ---
> 
> That fixed, do we ensure that the client either closes or loses the connection
> before retrying?
> 
>  fs/nfs/nfs4proc.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 70015dd..baf86b9 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -425,6 +425,12 @@ static void nfs41_sequence_done(struct nfs_client *clp,
>  		/* Check sequence flags */
>  		if (atomic_read(&clp->cl_count) > 1)
>  			nfs41_handle_sequence_flag_errors(clp, res->sr_status_flags);
> +	} else if (unlikely(res->sr_status == -NFS4ERR_DELAY)) {
> +		/* Do not free slot if retried while operation was in progress */
> +		tbl = &res->sr_session->fc_slot_table;
> +		dprintk("%s: slot=%d seq=%d: Operation in progress\n", __func__,
> +			res->sr_slotid, tbl->slots[res->sr_slotid].seq_nr);
> +		return;
>  	}
>  out:
>  	/* The session may be reset by one of the error handlers. */

No. That is very clearly insufficient...


Never mind. I'll do it...

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-07-12 20:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20100712182927.GB22461@merit.edu>
     [not found] ` <20100712182927.GB22461-8f4Pc2RrbJmHXe+LvDLADg@public.gmane.org>
2010-07-12 18:58   ` [pnfs] nfs41_sequence_done Benny Halevy
2010-07-12 19:14     ` Trond Myklebust
     [not found]       ` <1278962046.12559.3.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2010-07-12 19:16         ` Benny Halevy
2010-07-12 19:26           ` Trond Myklebust
     [not found]             ` <1278962763.12559.10.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2010-07-12 19:49               ` [PATCH] nfs41: Do not free slot if retried while operation was in progress Benny Halevy
2010-07-12 19:59                 ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.