Re: Large file copy to NFS mounted directory causes delay in other application packets

Netdev List
 help / color / mirror / Atom feed

* Re: Large file copy to NFS mounted directory causes delay in other application packets
       [not found] <1320816503.73813.YahooMailNeo@web160718.mail.bf1.yahoo.com>
@ 2011-11-09  6:24 ` Eric Dumazet
  2011-11-09  8:13   ` Manavalan Krishnan
       [not found]   ` <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com>
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2011-11-09  6:24 UTC (permalink / raw)
  To: Manavalan Krishnan; +Cc: linux-kernel@vger.kernel.org, netdev

Le mardi 08 novembre 2011 à 21:28 -0800, Manavalan Krishnan a écrit :
> Hi All
> 
> I have two systems with two network interfaces each(eth0 and eth1). I
> am running linux-HA (heartbeat deamon) on both the systems and they
> use eth0 for exchanging heartbeats. I have NFS mounted directory in
> one system and the NFS client uses the interface eth1. 
> 
> I try to copy a large file to NFS mounted directory. But the heartbeat
> daemons misses the heartbeat packets from peers while copy is under
> progress. I did tcpdump and found that the heartbeat packets are
> delayed for few seconds before sent out on eth0. When I stop the file
> copy, the heartbeats are delivered properly. It seems linux kernel
> somehow giving priority for NFS packets(generated from the file copy)
> over other application packets.
> 
> Any thoughts on this behavior? Is there any way we can avoid this so
> that application packets get equal chance while large file copy to NFS
> mounted directory under progress?
> 
CC netdev

1) Is your NFS using UDP or TCP ?
2) Is your eth0 dedicated to heartbeats and eth1 to NFS traffic ?
3) How do you know heartbeats are delayed ?
4) Is your server CPU bounded ?

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Large file copy to NFS mounted directory causes delay in other application packets
  2011-11-09  6:24 ` Large file copy to NFS mounted directory causes delay in other application packets Eric Dumazet
@ 2011-11-09  8:13   ` Manavalan Krishnan
  2011-11-09  8:59     ` Eric Dumazet
       [not found]   ` <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com>
  1 sibling, 1 reply; 4+ messages in thread
From: Manavalan Krishnan @ 2011-11-09  8:13 UTC (permalink / raw)
  To: Eric Dumazet, linux-kernel@vger.kernel.org; +Cc: linux-kernel, netdev

(1) NFS is using TCP
(2) yes eth0 is dedicated to heartbeat and eth1 is dedicated to NFS
(3) I notice the following at the system where file copy is occuring

The kernel Recv-Q of the heartbeat application socket grows but not delivered to the socket recv call. 
Here is the netstat output.

Proto  Recv-Q  Send-Q   Local Address         Foreign Address

udp    11522                0  *:23435                     *:*

As soon as I stop the file transfer, the socket recv call receives the packets and Recv-Q goes 0.
(4) The server has 4 cpu cores and 25G RAM

________________________________
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Manavalan Krishnan <manavalan_k@yahoo.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org>
Sent: Tuesday, November 8, 2011 10:24 PM
Subject: Re: Large file copy to NFS mounted directory causes delay in other application packets

Le mardi 08 novembre 2011 à 21:28 -0800, Manavalan Krishnan a écrit :
> Hi All
> 
> I have two systems with two network interfaces each(eth0 and eth1). I
> am running linux-HA (heartbeat deamon) on both the systems and they
> use eth0 for exchanging heartbeats. I have NFS mounted directory in
> one system and the NFS client uses the interface eth1. 
> 
> I try to copy a large file to NFS mounted directory. But the heartbeat
> daemons misses the heartbeat packets from peers while copy is under
> progress. I did tcpdump and found that the heartbeat packets are
> delayed for few seconds before sent out on eth0. When I stop the file
> copy, the heartbeats are delivered properly. It seems linux kernel
> somehow giving priority for NFS packets(generated from the file copy)
> over other application packets.
> 
> Any thoughts on this behavior? Is there any way we can avoid this so
> that application packets get equal chance while large file copy to NFS
> mounted directory under progress?
> 
CC netdev

1) Is your NFS using UDP or TCP ?
2) Is your eth0 dedicated to heartbeats and eth1 to NFS traffic ?
3) How do you know heartbeats are delayed ?
4) Is your server CPU bounded ?

Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Large file copy to NFS mounted directory causes delay in other application packets
  2011-11-09  8:13   ` Manavalan Krishnan
@ 2011-11-09  8:59     ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2011-11-09  8:59 UTC (permalink / raw)
  To: Manavalan Krishnan; +Cc: linux-kernel@vger.kernel.org, netdev

Please dont top post on these lists, thanks.

Le mercredi 09 novembre 2011 à 00:13 -0800, Manavalan Krishnan a écrit :
> (1) NFS is using TCP
> (2) yes eth0 is dedicated to heartbeat and eth1 is dedicated to NFS
> (3) I notice the following at the system where file copy is occuring
> 
> The kernel Recv-Q of the heartbeat application socket grows but not delivered to the socket recv call. 
> Here is the netstat output.
> 
> Proto  Recv-Q  Send-Q   Local Address         Foreign Address
> 
> udp    11522                0  *:23435                     *:*
> 

OK so the sending side is OK : The delay is at receiver side.

Note that since netstat shows receive queue has some skbs, it should be
available to heartbeat daemon immediately.

> As soon as I stop the file transfer, the socket recv call receives the packets and Recv-Q goes 0.
> (4) The server has 4 cpu cores and 25G RAM
> 

1) How many nfsd threads are running ?
   grep th /proc/net/rpc/nfsd

2) WHat kind of NIC do you use ?
   lsmod , lspci

3) Hmm, are IRQ to eth0/eth1 handled by same cpu ?
  grep eth /proc/interrupts

4) You could try to cpu affine all nfsd to cpu0,cpu1,cpu2  and heartbeat
daemon to cpu3.
   man taskset

5) You could 'strace -ttt' heartbeat daemon to check if it is not
blocked on some local disk access (it competes with all nfsd threads)

^ permalink raw reply	[flat|nested] 4+ messages in thread

[parent not found: <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com>]

* Re: Large file copy to NFS mounted directory causes delay in other application packets
       [not found]   ` <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com>
@ 2011-11-09  8:42     ` Manavalan Krishnan
  0 siblings, 0 replies; 4+ messages in thread
From: Manavalan Krishnan @ 2011-11-09  8:42 UTC (permalink / raw)
  To: Dave Taht, Eric Dumazet, linux-kernel@vger.kernel.org; +Cc: netdev

>When I see behavior like this I keep thinking interactions between overlarge large txqueuelens, somewhat busted TCP offloads on NICs, and that pfifo_fast must die in favor of fair queuing and/or diffserv classification. But seeing >it on two different nics implies that your switch (which I assume is shared) is possibly to blame...
>(I see bufferbloat everywhere, but mostly because it's what I work on)

>Is this NFS over TCP? Does the HA daemons prioritize packets at all? Does your switch? Does your qdiscs? How deep are your buffers on the network cards and txqueuelens and switch? 
>(eric's other questions below are probably more valid)


HA   daemons does not prioritize. could you please provide info on how to prioritize application packets?
We tried with different switches, the same problem occurs. so switch may not be the issue here.
switch and qdisks does not prirotize the packets.
Here are network buffers used in the servers

txqueuelen is 1000 

net.core.netdev_max_backlog = 1000
net.core.rmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_default = 129024
net.core.wmem_max = 131071
net.ipv4.tcp_rmem = 4096    87380    4194304
net.ipv4.tcp_wmem = 4096    16384    4194304
net.ipv4.tcp_mem = 196608    262144    393216

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-11-09  8:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1320816503.73813.YahooMailNeo@web160718.mail.bf1.yahoo.com>
2011-11-09  6:24 ` Large file copy to NFS mounted directory causes delay in other application packets Eric Dumazet
2011-11-09  8:13   ` Manavalan Krishnan
2011-11-09  8:59     ` Eric Dumazet
     [not found]   ` <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com>
2011-11-09  8:42     ` Manavalan Krishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox