From: Chuck Lever <chuck.lever@oracle.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Florin Iucha <florin@iucha.net>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Adrian Bunk <bunk@stusta.de>,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
Date: Thu, 19 Apr 2007 11:12:09 -0400 [thread overview]
Message-ID: <462786C9.7090405@oracle.com> (raw)
In-Reply-To: <1176950713.6422.95.camel@heimdal.trondhjem.org>
[-- Attachment #1: Type: text/plain, Size: 6687 bytes --]
Trond Myklebust wrote:
> On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:
>> On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
>>> Do you have a copy of wireshark or ethereal on hand? If so, could you
>>> take a look at whether or not any NFS traffic is going between the
>>> client and server once the hang happens?
>> I used the following command
>>
>> tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs
>>
>> to capture
>>
>> http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2
>>
>> I started the capture before starting the copy and left it to run for
>> a few minutes after the traffic slowed to a crawl.
>>
>> The iostat and vmstat are at:
>>
>> http://iucha.net/nfs/21-rc7-nfs4/iostat
>> http://iucha.net/nfs/21-rc7-nfs4/vmstat
>>
>> It seems that my original problem report had a big mistake! There is
>> no hang, but at some point the write slows down to a trickle (from
>> 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.
>
> Yeah. You only captured the outgoing traffic to the server, but already
> it looks as if there were 'interesting' things going on. In frames 29346
> to 29350, the traffic stops altogether for 5 seconds (I only see
> keepalives) then it starts up again. Ditto for frames 40477-40482
> (another 5 seconds). ...
> Then at around frame 92072, the client starts to send a bunch of RSTs.
> Aha.... I'll bet that reverting the appended patch fixes the problem.
>
> The assumption Chuck makes is that if _no_ request bytes have been sent,
> yet the request is on the 'receive list' then it must be a resend is
> patently false in the case where the send queue just happens to be full.
There are other places in the RPC client where "zero bytes sent" implies
that the request has been sent. The real problem here is that zeroing
the "bytes sent" field is overloaded.
Perhaps instead of looking at the number of bytes sent, the logic in the
last hunk of this patch should check which queue the request is sitting on.
> -------------------------------------------
> commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
> Author: Chuck Lever <chuck.lever@oracle.com>
> Date: Tue Feb 6 18:26:11 2007 -0500
>
> NFS: disconnect before retrying NFSv4 requests over TCP
>
> RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
> twice on the same connection unless it is the NULL procedure. Section
> 3.1.1 suggests that the client should disconnect and reconnect if it
> wants to retry a request.
>
> Implement this by adding an rpc_clnt flag that an ULP can use to
> specify that the underlying transport should be disconnected on a
> major timeout. The NFSv4 client asserts this new flag, and requests
> no retries after a minor retransmit timeout.
>
> Note that disconnecting on a retransmit is in general not safe to do
> if the RPC client does not reuse the TCP port number when reconnecting.
>
> See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index a3191f0..c46e94f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, int proto,
> static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
> unsigned int timeo,
> unsigned int retrans,
> - rpc_authflavor_t flavor)
> + rpc_authflavor_t flavor,
> + int flags)
> {
> struct rpc_timeout timeparms;
> struct rpc_clnt *clnt = NULL;
> @@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
> .program = &nfs_program,
> .version = clp->rpc_ops->version,
> .authflavor = flavor,
> + .flags = flags,
> };
>
> if (!IS_ERR(clp->cl_rpcclient))
> @@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const struct nfs_mount_data *
> * - RFC 2623, sec 2.3.2
> */
> error = nfs_create_rpc_client(clp, proto, data->timeo, data->retrans,
> - RPC_AUTH_UNIX);
> + RPC_AUTH_UNIX, 0);
> if (error < 0)
> goto error;
> nfs_mark_client_ready(clp, NFS_CS_READY);
> @@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
> /* Check NFS protocol revision and initialize RPC op vector */
> clp->rpc_ops = &nfs_v4_clientops;
>
> - error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);
> + error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
> + RPC_CLNT_CREATE_DISCRTRY);
> if (error < 0)
> goto error;
> memcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
> index a1be89d..c7a78ee 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -40,6 +40,7 @@ struct rpc_clnt {
>
> unsigned int cl_softrtry : 1,/* soft timeouts */
> cl_intr : 1,/* interruptible */
> + cl_discrtry : 1,/* disconnect before retry */
> cl_autobind : 1,/* use getport() */
> cl_oneshot : 1,/* dispose after use */
> cl_dead : 1;/* abandoned */
> @@ -111,6 +112,7 @@ struct rpc_create_args {
> #define RPC_CLNT_CREATE_ONESHOT (1UL << 3)
> #define RPC_CLNT_CREATE_NONPRIVPORT (1UL << 4)
> #define RPC_CLNT_CREATE_NOPING (1UL << 5)
> +#define RPC_CLNT_CREATE_DISCRTRY (1UL << 6)
>
> struct rpc_clnt *rpc_create(struct rpc_create_args *args);
> struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *,
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 393e70a..c21aa0a 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -249,6 +249,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
> clnt->cl_autobind = 1;
> if (args->flags & RPC_CLNT_CREATE_ONESHOT)
> clnt->cl_oneshot = 1;
> + if (args->flags & RPC_CLNT_CREATE_DISCRTRY)
> + clnt->cl_discrtry = 1;
>
> return clnt;
> }
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index cf59f7d..1975139 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -735,6 +735,16 @@ void xprt_transmit(struct rpc_task *task)
> xprt_reset_majortimeo(req);
> /* Turn off autodisconnect */
> del_singleshot_timer_sync(&xprt->timer);
> + } else {
> + /* If all request bytes have been sent,
> + * then we must be retransmitting this one */
> + if (!req->rq_bytes_sent) {
> + if (task->tk_client->cl_discrtry) {
> + xprt_disconnect(xprt);
> + task->tk_status = -ENOTCONN;
> + return;
> + }
> + }
> }
> } else if (!req->rq_bytes_sent)
> return;
[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 315 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
email;internet:chuck dot lever at nospam oracle dot com
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard
next prev parent reply other threads:[~2007-04-19 15:16 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1176736734.6761.45.camel@heimdal.trondhjem.org>
[not found] ` <Pine.LNX.4.64.0704160904560.5473@woody.linux-foundation.org>
[not found] ` <1176740307.6761.56.camel@heimdal.trondhjem.org>
[not found] ` <1176741408.6761.62.camel@heimdal.trondhjem.org>
[not found] ` <20070416125905.GA2769@iucha.net>
[not found] ` <1176792399.3035.30.camel@twins>
[not found] ` <1176796503.3035.33.camel@twins>
2007-04-17 17:01 ` nfs: desynchronized value of nfs_i.ncommit OGAWA Hirofumi
2007-04-17 22:44 ` Trond Myklebust
2007-04-18 1:19 ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
2007-04-18 1:29 ` [PATCH 1/4] NFS: clean up the unstable write code Trond Myklebust
2007-04-18 1:29 ` [PATCH 2/4] NFS: Don't clear PG_writeback until after we've processed unstable writes Trond Myklebust
2007-04-18 1:29 ` [PATCH 3/4] NFS: Fix the 'desynchronized value of nfs_i.ncommit' error Trond Myklebust
2007-04-18 1:29 ` [PATCH 4/4] NFS: Fix race in nfs_set_page_dirty Trond Myklebust
2007-04-18 2:58 ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Andrew Morton
2007-04-18 3:06 ` Trond Myklebust
2007-04-18 3:30 ` Florin Iucha
2007-04-18 3:54 ` Trond Myklebust
2007-04-18 4:07 ` Florin Iucha
2007-04-18 4:13 ` Andrew Morton
2007-04-18 4:30 ` Florin Iucha
2007-04-18 5:14 ` Linus Torvalds
2007-04-18 5:26 ` Florin Iucha
2007-04-18 5:37 ` Andrew Morton
2007-04-18 12:38 ` Florin Iucha
2007-04-18 13:15 ` Trond Myklebust
2007-04-18 13:42 ` Florin Iucha
2007-04-18 14:11 ` Trond Myklebust
2007-04-18 14:17 ` Florin Iucha
2007-04-18 14:19 ` Trond Myklebust
2007-04-19 1:52 ` Florin Iucha
2007-04-19 2:45 ` Trond Myklebust
2007-04-19 4:38 ` Success! Was: " Florin Iucha
2007-04-19 15:12 ` Chuck Lever [this message]
2007-04-19 15:17 ` Trond Myklebust
2007-04-19 15:50 ` Florin Iucha
2007-04-19 16:09 ` Trond Myklebust
2007-04-19 19:58 ` Failure! " Florin Iucha
2007-04-19 21:30 ` Trond Myklebust
2007-04-19 21:49 ` Florin Iucha
2007-04-20 13:30 ` Success! Was: " Florin Iucha
2007-04-20 13:37 ` Trond Myklebust
2007-04-20 13:51 ` Florin Iucha
2007-04-18 14:14 ` Florin Iucha
2007-04-29 19:41 ` Rogier Wolff
2007-04-29 20:09 ` Peter Zijlstra
2007-04-18 11:38 ` Trond Myklebust
2007-04-18 9:54 ` OGAWA Hirofumi
2007-04-18 8:19 ` Peter Zijlstra
2007-04-18 16:41 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=462786C9.7090405@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Trond.Myklebust@netapp.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=bunk@stusta.de \
--cc=florin@iucha.net \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox