netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* UDP sockets oddities
@ 2017-08-23 20:02 Florian Fainelli
  2017-08-23 22:26 ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Fainelli @ 2017-08-23 20:02 UTC (permalink / raw)
  To: netdev, edumazet, pabeni, willemb; +Cc: davem

Hi,

On Broadcom STB chips using bcmsysport.c and bcm_sf2.c we have an out of
band HW mechanism (not using per-flow pause frames) where we can have
the integrated network switch backpressure the CPU Ethernet controller
which translates in completing TX packets interrupts at the appropriate
pace and therefore get flow control applied end-to-end from the host CPU
port towards any downstream port. At least that is the premise and this
works reasonably well.

This has a few drawbacks in that each of the bcmsysport TX queues need
to semi-statically map to their switch port output queues such that the
switch can calculate buffer occupancy and report congestion status,
which prompted this email [1] but this is tangential and is a policy not
a mechanism issue.

[1]: https://www.spinics.net/lists/netdev/msg448153.html

This is useful when your CPU / integrated switch links up at 1Gbits/sec
internally, and tries to push 1Gbits/sec worth of UDP traffic to e.g: a
downstream port linking at 100Mbits/sec, which could happen depending on
what you have connected to this device.

Now the problem that I am facing, is the following:

- net.core.wmem_default = 160KB (default value)
- using iperf -b 800M -u towards an iperf UDP server with the physical
link to that server established at 100Mbits/sec
- iperf does synchronous write(2) AFAICT so this gives it flow control
- using the default duration of 10s, you can barely see any packet loss
from one run to another
- the longer the run, the higher you are going to see some packet loss,
usually in the range of ~0.15% top

The transmit flow looks like this:

gphy (net/dsa/slave.c::dsa_slave_xmit, IFF_NO_QUEUE device)
 -> eth0 (drivers/net/ethernet/broadcom/bcmsysport.c, "regular" network
device)

I can clearly see that the network stack pushed N UDP packets (Udp and
Ip counters in /proc/net/snmp concur) however what the driver
transmitted and what the switch transmistted is N - M, and matches the
packet loss reported by the UDP server. I don't measure any SndbufErrors
which is not making sense yet.

If I reduce the default socket size to say, 10x less than 160KB, 16KB,
then I either don't see any packet loss at 100Mbits/sec for 5 minutes or
more, or just very very little, down to 0.001%. Now if I repeat the
experiment with the physical link at 10Mbits/sec, same thing, the 16KB
wmem_default setting is no longer working and we need to lower the
socket write buffer size again.

So what I am wondering is:

- do I have an obvious flow control problem in my network driver that
usually does not lead to packet loss, but may sometime happen?

- why would lowering the socket write size appear to masquerade or solve
this problem?

I can consistently reproduce this across several kernel versions, 4.1,
4.9 and latest net-next and therefore can also test patches.

Thanks for reading thus far!
-- 
Florian

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-08-29 23:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-23 20:02 UDP sockets oddities Florian Fainelli
2017-08-23 22:26 ` Eric Dumazet
2017-08-23 22:49   ` Florian Fainelli
2017-08-23 23:04     ` Eric Dumazet
2017-08-24  0:03       ` Florian Fainelli
2017-08-24  0:43         ` Eric Dumazet
2017-08-24  2:23           ` Florian Fainelli
2017-08-25 23:18             ` Florian Fainelli
2017-08-25 23:57               ` Eric Dumazet
2017-08-26  1:17                 ` Florian Fainelli
2017-08-26  1:52                   ` Eric Dumazet
2017-08-26  3:25                     ` Florian Fainelli
2017-08-26  3:40                       ` Eric Dumazet
2017-08-26  4:19                         ` David Miller
2017-08-26 12:47                           ` Eric Dumazet
2017-08-26 18:56                             ` Florian Fainelli
2017-08-29 17:53                               ` Florian Fainelli
2017-08-29 18:01                                 ` Eric Dumazet
2017-08-29 22:16                                   ` [PATCH net-next] neigh: increase queue_len_bytes to match wmem_default Eric Dumazet
2017-08-29 23:11                                     ` David Miller
2017-08-29 23:15                                     ` Eric Dumazet
2017-08-29 23:17                                       ` David Miller
2017-08-26  4:17                       ` UDP sockets oddities David Miller
2017-08-26  5:20                     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).