From: Andrew Cooper <andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
To: Trond Myklebust
<Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
Cc: "linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: NFS TCP race condition with SOCK_ASYNC_NOSPACE
Date: Fri, 18 Nov 2011 19:04:39 +0000 [thread overview]
Message-ID: <4EC6AC47.60404@citrix.com> (raw)
In-Reply-To: <1321642368.2653.35.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
On 18/11/11 18:52, Trond Myklebust wrote:
> On Fri, 2011-11-18 at 18:40 +0000, Andrew Cooper wrote:
>> Hello,
>>
>> As described originally in
>> http://www.spinics.net/lists/linux-nfs/msg25314.html, we were
>> encountering a bug whereby the NFS session was unexpectedly timing out.
>>
>> I believe I have found the source of the race condition causing the timeout.
>>
>> Brief overview of setup:
>> 10GiB network, NFS mounted using TCP. Problem reproduces with
>> multiple different NICs, with synchronous or asynchronous mounts, and
>> with soft and hard mounts. Reproduces on 2.6.32 and I am currently
>> trying to reproduce with mainline. (I don't have physical access to the
>> servers so installing stuff is not fantastically easy)
>>
>>
>>
>> In net/sunrpc/xprtsock.c:xs_tcp_send_request(), we try to write data to
>> the sock buffer using xs_sendpages()
>>
>> When the sock buffer is nearly fully, we get an EAGAIN from
>> xs_sendpages() which causes a break out of the loop. Lower down the
>> function, we switch on status which cases us to call xs_nospace() with
>> the task.
>>
>> In xs_nospace(), we test the SOCK_ASYNC_NOSPACE bit from the socket, and
>> in the rare case where that bit is clear, we return 0 instead of
>> EAGAIN. This promptly overwrites status in xs_tcp_send_request().
>>
>> The result is that xs_tcp_release_xprt() finds a request which has no
>> error, but has not sent all of the bytes in its send buffer. It cleans
>> up by setting XPRT_CLOSE_WAIT which causes xprt_clear_locked() to queue
>> xprt->task_cleanup, which closes the TCP connection.
>>
>>
>> Under normal operation, the TCP connection goes down and back up without
>> interruption to the NFS layer. However, when the NFS server hangs in a
>> half closed state, the client forces a RST of the TCP connection,
>> leading to the timeout.
>>
>> I have tried a few naive fixes such as changing the default return value
>> in xs_nospace() from 0 to -EAGAIN (meaning that 0 will never be
>> returned) but this causes a kernel memory leak. Can someone who a
>> better understanding of these interactions than me have a look? It
>> seems that the if (test_bit()) test in xs_nospace() should have an else
>> clause.
> I fully agree with your analysis. The correct thing to do here is to
> always return either EAGAIN or ENOTCONN. Thank you very much for working
> this one out!
>
> Trond
Returning EAGAIN seems to cause a kernel memory leak, as the oomkiller
starts going after processes holding large amounts of LowMem. Returning
ENOTCONN causes the NFS session to complain about a timeout in the logs,
and in the case of a softmout, give an EIO to the calling process.
>From the looks of the TCP stream, and from the the looks of some
targeted debugging, nothing is actually wrong, so the client should not
be trying to FIN the TCP connection. Is it possible that there is a
more sinister reason for SOCK_ASYNC_NOSPACE being clear?
I can attempt to find which of the many calls to clear that bit is
actually causing the problem, but I have a feeing that is going to a
little more tricky to narrow down.
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-11-18 19:04 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-18 18:40 NFS TCP race condition with SOCK_ASYNC_NOSPACE Andrew Cooper
[not found] ` <4EC6A681.30902-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2011-11-18 18:52 ` Trond Myklebust
[not found] ` <1321642368.2653.35.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-11-18 19:04 ` Andrew Cooper [this message]
[not found] ` <4EC6AC47.60404-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2011-11-18 19:14 ` Trond Myklebust
[not found] ` <1321643673.2653.41.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-11-18 19:55 ` Andrew Cooper
[not found] ` <4EC6B82B.3000701-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2011-11-21 18:14 ` Andrew Cooper
[not found] ` <4ECA94F9.4090503-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2011-11-22 11:38 ` Trond Myklebust
2011-11-22 12:02 ` Andrew Cooper
2011-11-22 12:10 ` Trond Myklebust
2011-11-22 12:16 ` Andrew Cooper
2011-11-22 12:22 ` Trond Myklebust
[not found] ` <1321964578.7645.9.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-11-22 12:34 ` Andrew Cooper
[not found] ` <4ECB96DA.9030202-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2011-11-22 12:45 ` Trond Myklebust
[not found] ` <1321965938.7645.13.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-11-22 13:23 ` Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EC6AC47.60404@citrix.com \
--to=andrew.cooper3-sxgqhf6nn4dqt0dzr+alfa@public.gmane.org \
--cc=Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).