* Re: Large file copy to NFS mounted directory causes delay in other application packets [not found] <1320816503.73813.YahooMailNeo@web160718.mail.bf1.yahoo.com> @ 2011-11-09 6:24 ` Eric Dumazet 2011-11-09 8:13 ` Manavalan Krishnan [not found] ` <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com> 0 siblings, 2 replies; 4+ messages in thread From: Eric Dumazet @ 2011-11-09 6:24 UTC (permalink / raw) To: Manavalan Krishnan; +Cc: linux-kernel@vger.kernel.org, netdev Le mardi 08 novembre 2011 à 21:28 -0800, Manavalan Krishnan a écrit : > Hi All > > I have two systems with two network interfaces each(eth0 and eth1). I > am running linux-HA (heartbeat deamon) on both the systems and they > use eth0 for exchanging heartbeats. I have NFS mounted directory in > one system and the NFS client uses the interface eth1. > > I try to copy a large file to NFS mounted directory. But the heartbeat > daemons misses the heartbeat packets from peers while copy is under > progress. I did tcpdump and found that the heartbeat packets are > delayed for few seconds before sent out on eth0. When I stop the file > copy, the heartbeats are delivered properly. It seems linux kernel > somehow giving priority for NFS packets(generated from the file copy) > over other application packets. > > Any thoughts on this behavior? Is there any way we can avoid this so > that application packets get equal chance while large file copy to NFS > mounted directory under progress? > CC netdev 1) Is your NFS using UDP or TCP ? 2) Is your eth0 dedicated to heartbeats and eth1 to NFS traffic ? 3) How do you know heartbeats are delayed ? 4) Is your server CPU bounded ? Thanks ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Large file copy to NFS mounted directory causes delay in other application packets 2011-11-09 6:24 ` Large file copy to NFS mounted directory causes delay in other application packets Eric Dumazet @ 2011-11-09 8:13 ` Manavalan Krishnan 2011-11-09 8:59 ` Eric Dumazet [not found] ` <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com> 1 sibling, 1 reply; 4+ messages in thread From: Manavalan Krishnan @ 2011-11-09 8:13 UTC (permalink / raw) To: Eric Dumazet, linux-kernel@vger.kernel.org; +Cc: linux-kernel, netdev (1) NFS is using TCP (2) yes eth0 is dedicated to heartbeat and eth1 is dedicated to NFS (3) I notice the following at the system where file copy is occuring The kernel Recv-Q of the heartbeat application socket grows but not delivered to the socket recv call. Here is the netstat output. Proto Recv-Q Send-Q Local Address Foreign Address udp 11522 0 *:23435 *:* As soon as I stop the file transfer, the socket recv call receives the packets and Recv-Q goes 0. (4) The server has 4 cpu cores and 25G RAM ________________________________ From: Eric Dumazet <eric.dumazet@gmail.com> To: Manavalan Krishnan <manavalan_k@yahoo.com> Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>; netdev <netdev@vger.kernel.org> Sent: Tuesday, November 8, 2011 10:24 PM Subject: Re: Large file copy to NFS mounted directory causes delay in other application packets Le mardi 08 novembre 2011 à 21:28 -0800, Manavalan Krishnan a écrit : > Hi All > > I have two systems with two network interfaces each(eth0 and eth1). I > am running linux-HA (heartbeat deamon) on both the systems and they > use eth0 for exchanging heartbeats. I have NFS mounted directory in > one system and the NFS client uses the interface eth1. > > I try to copy a large file to NFS mounted directory. But the heartbeat > daemons misses the heartbeat packets from peers while copy is under > progress. I did tcpdump and found that the heartbeat packets are > delayed for few seconds before sent out on eth0. When I stop the file > copy, the heartbeats are delivered properly. It seems linux kernel > somehow giving priority for NFS packets(generated from the file copy) > over other application packets. > > Any thoughts on this behavior? Is there any way we can avoid this so > that application packets get equal chance while large file copy to NFS > mounted directory under progress? > CC netdev 1) Is your NFS using UDP or TCP ? 2) Is your eth0 dedicated to heartbeats and eth1 to NFS traffic ? 3) How do you know heartbeats are delayed ? 4) Is your server CPU bounded ? Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Large file copy to NFS mounted directory causes delay in other application packets 2011-11-09 8:13 ` Manavalan Krishnan @ 2011-11-09 8:59 ` Eric Dumazet 0 siblings, 0 replies; 4+ messages in thread From: Eric Dumazet @ 2011-11-09 8:59 UTC (permalink / raw) To: Manavalan Krishnan; +Cc: linux-kernel@vger.kernel.org, netdev Please dont top post on these lists, thanks. Le mercredi 09 novembre 2011 à 00:13 -0800, Manavalan Krishnan a écrit : > (1) NFS is using TCP > (2) yes eth0 is dedicated to heartbeat and eth1 is dedicated to NFS > (3) I notice the following at the system where file copy is occuring > > The kernel Recv-Q of the heartbeat application socket grows but not delivered to the socket recv call. > Here is the netstat output. > > Proto Recv-Q Send-Q Local Address Foreign Address > > udp 11522 0 *:23435 *:* > OK so the sending side is OK : The delay is at receiver side. Note that since netstat shows receive queue has some skbs, it should be available to heartbeat daemon immediately. > As soon as I stop the file transfer, the socket recv call receives the packets and Recv-Q goes 0. > (4) The server has 4 cpu cores and 25G RAM > 1) How many nfsd threads are running ? grep th /proc/net/rpc/nfsd 2) WHat kind of NIC do you use ? lsmod , lspci 3) Hmm, are IRQ to eth0/eth1 handled by same cpu ? grep eth /proc/interrupts 4) You could try to cpu affine all nfsd to cpu0,cpu1,cpu2 and heartbeat daemon to cpu3. man taskset 5) You could 'strace -ttt' heartbeat daemon to check if it is not blocked on some local disk access (it competes with all nfsd threads) ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com>]
* Re: Large file copy to NFS mounted directory causes delay in other application packets [not found] ` <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com> @ 2011-11-09 8:42 ` Manavalan Krishnan 0 siblings, 0 replies; 4+ messages in thread From: Manavalan Krishnan @ 2011-11-09 8:42 UTC (permalink / raw) To: Dave Taht, Eric Dumazet, linux-kernel@vger.kernel.org; +Cc: netdev >When I see behavior like this I keep thinking interactions between overlarge large txqueuelens, somewhat busted TCP offloads on NICs, and that pfifo_fast must die in favor of fair queuing and/or diffserv classification. But seeing >it on two different nics implies that your switch (which I assume is shared) is possibly to blame... >(I see bufferbloat everywhere, but mostly because it's what I work on) >Is this NFS over TCP? Does the HA daemons prioritize packets at all? Does your switch? Does your qdiscs? How deep are your buffers on the network cards and txqueuelens and switch? >(eric's other questions below are probably more valid) HA daemons does not prioritize. could you please provide info on how to prioritize application packets? We tried with different switches, the same problem occurs. so switch may not be the issue here. switch and qdisks does not prirotize the packets. Here are network buffers used in the servers txqueuelen is 1000 net.core.netdev_max_backlog = 1000 net.core.rmem_default = 262144 net.core.rmem_max = 262144 net.core.wmem_default = 129024 net.core.wmem_max = 131071 net.ipv4.tcp_rmem = 4096 87380 4194304 net.ipv4.tcp_wmem = 4096 16384 4194304 net.ipv4.tcp_mem = 196608 262144 393216 Thanks ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-11-09 8:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1320816503.73813.YahooMailNeo@web160718.mail.bf1.yahoo.com>
2011-11-09 6:24 ` Large file copy to NFS mounted directory causes delay in other application packets Eric Dumazet
2011-11-09 8:13 ` Manavalan Krishnan
2011-11-09 8:59 ` Eric Dumazet
[not found] ` <CAA93jw4gDuHyG508zxRzyn+MJ4gL5m958OaCWJxoDsaLgjdnqg@mail.gmail.com>
2011-11-09 8:42 ` Manavalan Krishnan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox