Re: Kernel 3.4.X NFS server regression

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Boaz Harrosh <bharrosh@panasas.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: bfields <bfields@fieldses.org>, Steve Dickson <steved@redhat.com>,
	"Myklebust, Trond" <Trond.Myklebust@netapp.com>,
	Joerg Platte <jplatte@naasa.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	Hans de Bruin <jmdebruin@xmsnet.nl>
Subject: Re: Kernel 3.4.X NFS server regression
Date: Mon, 11 Jun 2012 18:04:12 +0300	[thread overview]
Message-ID: <4FD608EC.2050307@panasas.com> (raw)
In-Reply-To: <20120611105515.3b99942c@tlielax.poochiereds.net>

On 06/11/2012 05:55 PM, Jeff Layton wrote:

> On Mon, 11 Jun 2012 17:45:06 +0300
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
>> On 06/11/2012 05:11 PM, Jeff Layton wrote:
>>
>>> On Mon, 11 Jun 2012 17:05:28 +0300
>>> Boaz Harrosh <bharrosh@panasas.com> wrote:
>>>
>>>> On 06/11/2012 04:51 PM, Jeff Layton wrote:
>>>>
>>>>>
>>>>> That was considered here, but the problem with the usermode helper is
>>>>> that you can't pass anything back to the kernel but a simple status
>>>>> code (and that's assuming that you wait for it to exit). In the near
>>>>> future, we'll need to pass back more info to the kernel for this, so
>>>>> the usermode helper callout wasn't suitable.
>>>>>
>>>>
>>>>
>>>> I have answered that in my mail. Repeated here again. Well you made 
>>>> a simple mistake. Because it is *easy* to pass back any number and
>>>> size of information from user-mode.
>>>>
>>>> You just setup a sysfs entry points where the answers are written
>>>> back to. It's an easy trick to setup a thread safe, way with a
>>>> cookie but 90% of the time you don't have to. Say you set up
>>>> a structure of per-client (identified uniquely) then user mode
>>>> answers back per client, concurrency will not do any harm, since
>>>> you answer to the same question the same answer. ans so on. Each
>>>> problem it's own.
>>>>
>>>> If you want we can talk about this, it would be easy for me to setup
>>>> a toll free conference number we can all use.
>>>
>>> That helpful advice would have been welcome about 3-4 months ago when I
>>> first proposed this in detail. At that point you're working with
>>> multiple upcall/downcall mechanisms, which was something I was keen to
>>> avoid.
>>>
>>> I'm not opposed to moving in that direction, but it basically means
>>> you're going to rip out everything I've got here so far and replace it.
>>>
>>> If you're willing to do that work, I'll be happy to work with you on
>>> it, but I don't have the time or inclination to do that on my own right
>>> now.
>>>
>>
>>
>> No such luck. sorry. I wish I could, but coming from a competing server
>> company, you can imagine the priority of that ever happening.
>> (Even though I use the Linux-Server everyday for my development and
>>  am putting lots of efforts into still, mainly in pnfs)
>>
>> Hopefully re-examining the code, it could all be salvaged just the
>> same, only lots of code thrown a way.
>>
>> But mean-while please address my concern below:
>> Boaz Harrosh wrote: 
>>
>>> One more thing, the most important one. We have already fixed that in the
>>> past and I was hoping the lesson was learned. Apparently it was not, and
>>> we are doomed to do this mistake for ever!!
>>>
>>> What ever crap fails times out and crashes, in the recovery code, we don't
>>> give a dam. It should never affect any Server-client communication.
>>>
>>> When the grace periods ends the clients gates opens period. *Any* error
>>> return from state recovery code must be carefully ignored and normal
>>> operations resumed. At most on error, we move into a mode where any
>>> recovery request from client is accepted, since we don't have any better
>>> data to verify it.
>>>
>>> Please comb recovery code to make sure any catastrophe is safely ignored.
>>> We already did that before and it used to work.
>>
>>
>> We should make sure that any state recovery code does not interfere with
>> regular operations. and fails gracefully / shuts up. 
>>
>> We used to have that, apparently it re-broke. Clients should always be granted
>> access, after grace period. And Server should be made sure not to fail in any
>> situation.
>>
>> I would look into it but I'm not uptodate anymore, I wish you or Bruce could.
>>
>> Thanks for your work so far, sorry to be bearer of bad news
>> Boaz
> 
> This problem turned out to be a fairly straightforward bug in the
> rpc_pipefs queue timeout mechanism that was causing the laundromat job
> to hang and hence to keep the state lock locked. I just sent a patch
> that should fix it.
> 
> I guess I'm not clear on what you're saying is broken. Modulo the
> original bug here, clients are allowed access after the grace period
> whether the upcalls are working or not.
> 
> What we cannot allow is reclaim requests outside of the grace period,
> since we can't verify whether there was conflicting state in the
> interim period. That's true whether the server has a functioning client
> tracking mechanism or not.
> 


I agree. Sorry we keep communicating on two different threads. Dis regard
the other last mail.

Sounds good then. My point, we should be very defensive with  state recovery
code not getting in our way.

Thanks
Boaz

next prev parent reply	other threads:[~2012-06-11 15:04 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4FD47D4E.9070307@naasa.net>
2012-06-10 11:43 ` Kernel 3.4.X NFS regression Boaz Harrosh
2012-06-10 15:00 ` Kernel 3.4.X NFS server regression Myklebust, Trond
2012-06-11 12:16   ` bfields
2012-06-11 12:39     ` Jeff Layton
2012-06-11 13:13       ` Jeff Layton
2012-06-11 13:25         ` Jörg Platte
2012-06-11 14:20           ` Jeff Layton
2012-06-11 15:55             ` Joerg Platte
2012-06-11 13:32       ` Boaz Harrosh
2012-06-11 13:44         ` Boaz Harrosh
2012-06-11 14:29           ` Jeff Layton
2012-06-11 15:01             ` Boaz Harrosh
2012-06-11 15:15               ` bfields
2012-06-11 16:25                 ` Boaz Harrosh
2012-06-11 13:51         ` Jeff Layton
2012-06-11 14:05           ` Boaz Harrosh
2012-06-11 14:11             ` Jeff Layton
2012-06-11 14:45               ` Boaz Harrosh
2012-06-11 14:55                 ` Jeff Layton
2012-06-11 15:04                   ` Boaz Harrosh [this message]
2012-06-11 14:03   ` [PATCH] rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer Jeff Layton
2012-06-15 15:24   ` Kernel 3.4.X NFS server regression Joerg Platte
2012-06-15 16:28     ` bfields
2012-06-15 17:19       ` Joerg Platte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD608EC.2050307@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=jlayton@redhat.com \
    --cc=jmdebruin@xmsnet.nl \
    --cc=jplatte@naasa.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=steved@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).