public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tom Tucker <tom@opengridcomputing.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ian Campbell <ijc@hellion.org.uk>,
	linux-nfs@vger.kernel.org, Max Kellermann <mk@cm4all.com>,
	linux-kernel@vger.kernel.org, gcosta@redhat.com,
	Grant Coady <grant_lkml@dodo.com.au>,
	"J. Bruce Fields" <bfields@fieldses.org>
Subject: Re: [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports
Date: Tue, 23 Dec 2008 08:49:04 -0600	[thread overview]
Message-ID: <4950FA60.3060405@opengridcomputing.com> (raw)
In-Reply-To: <1229540877.7257.97.camel@heimdal.trondhjem.org>

Trond Myklebust wrote:
> On Wed, 2008-12-17 at 09:35 -0600, Tom Tucker wrote:
>   
>> Trond Myklebust wrote:
>>     
>>> Aside from being racy (there is nothing preventing someone setting XPT_DEAD
>>> after the test in svc_xprt_enqueue, and before XPT_BUSY is set), it is
>>> wrong to assume that transports which have called svc_delete_xprt() might
>>> not need to be re-enqueued.
>>>       
>> This is only true because now you allow transports with XPT_DEAD set to 
>> be enqueued -- yes?
>>
>>     
>>> See the list of deferred requests, which is currently never going to
>>> be cleared if the revisit call happens after svc_delete_xprt(). In this
>>> case, the deferred request will currently keep a reference to the transport
>>> forever.
>>>
>>>       
>> I agree this is a possibility and it needs to be fixed. I'm concerned 
>> that the root cause is still there though. I thought the test case was 
>> the client side timing out the connection. Why are there deferred 
>> requests sitting on what is presumably an idle connection?
>>     
>
> I haven't said that they are the cause of this test case. I've said that
> deferred requests hold references to the socket that can obviously
> deadlock. That needs to be fixed regardless of whether or not it is the
> cause here.
>
> There are plenty of situations in which the client may choose to close
> the TCP socket even if there are outstanding requests. One of the most
> common is when the user signals the process, so that an RPC call that
> was partially transmitted (ran out of buffer space) gets cancelled
> before it can finish transmitting. In that case the client has no choice
> but to disconnect and immediately reconnect.
>
>   
>>> The fix should be to allow dead transports to be enqueued in order to clear
>>> the deferred requests, then change the order of processing in svc_recv() so
>>> that we pick up deferred requests before we do the XPT_CLOSE processing.
>>>
>>>       
>> Wouldn't it be simpler to clean up any deferred requests in the close 
>> path instead of changing the meaning of XPT_DEAD and dispatching 
>> N-threads to do the same?
>>     
>
> AFAICS, deferred requests are the property of the cache until they
> expire or a downcall occurs. I'm not aware of any way to cancel only
> those deferred requests that hold a reference to this particular
> transport.
>
>   
Ok, I think you're right, and I think that this fix is correct and makes 
the symptom go away.

I may be completely confused here, but:

- The deferred requests should be getting cleaned up by timing out, and 
that does not not seem to be happening, (Is this true?)

- By releasing the underlying connection prior to releasing the 
transport that manages it, we've converted the visible resource leek to 
an invisible one.

- This has been around forever and changing the client side close 
behavior graceful exposed this bug,

So I'm wondering if what we want to do here is to provide a mechanism 
for canceling deferred requests for a particular transport. This would 
provide a mechanism for the generic transport driver to force 
cancellation of deferred requests when closing.

Tom



  reply	other threads:[~2008-12-23 14:49 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-17 12:32 [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Max Kellermann
2008-10-17 14:33 ` Glauber Costa
2008-10-20  6:51   ` Max Kellermann
2008-10-20  7:43     ` Ian Campbell
2008-10-20 13:15     ` Glauber Costa
2008-10-20 14:12       ` Max Kellermann
2008-10-20 14:34         ` Cyrill Gorcunov
2008-10-20 14:21       ` Cyrill Gorcunov
2009-05-22 20:59     ` H. Peter Anvin
2009-05-25 13:12       ` Max Kellermann
2008-10-20  6:27 ` Ian Campbell
2008-11-01 11:45   ` Ian Campbell
2008-11-01 13:41     ` Trond Myklebust
2008-11-02 14:40       ` Ian Campbell
2008-11-07  2:12         ` kenneth johansson
2008-11-04 19:10       ` Ian Campbell
2008-11-25  7:09       ` Ian Campbell
2008-11-25 13:28         ` Trond Myklebust
2008-11-25 13:38           ` Ian Campbell
2008-11-25 13:57             ` Trond Myklebust
2008-11-25 14:04               ` Ian Campbell
2008-11-26 22:12                 ` Ian Campbell
2008-12-01  0:17                   ` [PATCH 0/3] " Trond Myklebust
2008-12-01  0:18                     ` [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion Trond Myklebust
2008-12-17 15:27                       ` Tom Tucker
2008-12-17 18:08                         ` Trond Myklebust
2008-12-17 18:59                           ` Tom Tucker
2008-12-01  0:19                     ` [PATCH 2/3] SUNRPC: We only need to call svc_delete_xprt() once Trond Myklebust
2008-12-01  0:20                     ` [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports Trond Myklebust
2008-12-17 15:35                       ` Tom Tucker
2008-12-17 19:07                         ` Trond Myklebust
2008-12-23 14:49                           ` Tom Tucker [this message]
2008-12-23 23:39                             ` Tom Tucker
2008-12-01  0:29                     ` [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Trond Myklebust
2008-12-02 15:22                       ` Kasparek Tomas
2008-12-02 15:37                         ` Trond Myklebust
2008-12-02 16:26                           ` Kasparek Tomas
2008-12-02 18:10                             ` Trond Myklebust
2008-12-01 22:09                     ` Ian Campbell
2008-12-06 12:16                       ` Ian Campbell
2008-12-14 18:24                         ` Ian Campbell
2008-12-16 17:55                           ` J. Bruce Fields
2008-12-16 18:39                             ` Ian Campbell
2009-01-07 22:21                               ` J. Bruce Fields
2009-01-08 18:20                                 ` J. Bruce Fields
2009-01-08 21:22                                   ` Ian Campbell
2009-01-08 21:26                                     ` J. Bruce Fields
2009-01-12  9:46                                       ` Ian Campbell
2009-01-22  8:27                                       ` Ian Campbell
2009-01-22 16:44                                         ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4950FA60.3060405@opengridcomputing.com \
    --to=tom@opengridcomputing.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=gcosta@redhat.com \
    --cc=grant_lkml@dodo.com.au \
    --cc=ijc@hellion.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mk@cm4all.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox