stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Maggie Mae Roxas <maggie.mae.roxas@gmail.com>,
	Willy Tarreau <w@1wt.eu>, "David S. Miller" <davem@davemloft.net>
Subject: [PATCH 3.17 38/47] net: mvneta: fix Tx interrupt delay
Date: Sun, 14 Dec 2014 12:21:13 -0800	[thread overview]
Message-ID: <20141214201820.014808157@linuxfoundation.org> (raw)
In-Reply-To: <20141214201818.552715149@linuxfoundation.org>

3.17-stable review patch.  If anyone has any objections, please let me know.

------------------

From: willy tarreau <w@1wt.eu>

[ Upstream commit aebea2ba0f7495e1a1c9ea5e753d146cb2f6b845 ]

The mvneta driver sets the amount of Tx coalesce packets to 16 by
default. Normally that does not cause any trouble since the driver
uses a much larger Tx ring size (532 packets). But some sockets
might run with very small buffers, much smaller than the equivalent
of 16 packets. This is what ping is doing for example, by setting
SNDBUF to 324 bytes rounded up to 2kB by the kernel.

The problem is that there is no documented method to force a specific
packet to emit an interrupt (eg: the last of the ring) nor is it
possible to make the NIC emit an interrupt after a given delay.

In this case, it causes trouble, because when ping sends packets over
its raw socket, the few first packets leave the system, and the first
15 packets will be emitted without an IRQ being generated, so without
the skbs being freed. And since the socket's buffer is small, there's
no way to reach that amount of packets, and the ping ends up with
"send: no buffer available" after sending 6 packets. Running with 3
instances of ping in parallel is enough to hide the problem, because
with 6 packets per instance, that's 18 packets total, which is enough
to grant a Tx interrupt before all are sent.

The original driver in the LSP kernel worked around this design flaw
by using a software timer to clean up the Tx descriptors. This timer
was slow and caused terrible network performance on some Tx-bound
workloads (such as routing) but was enough to make tools like ping
work correctly.

Instead here, we simply set the packet counts before interrupt to 1.
This ensures that each packet sent will produce an interrupt. NAPI
takes care of coalescing interrupts since the interrupt is disabled
once generated.

No measurable performance impact nor CPU usage were observed on small
nor large packets, including when saturating the link on Tx, and this
fixes tools like ping which rely on too small a send buffer. If one
wants to increase this value for certain workloads where it is safe
to do so, "ethtool -C $dev tx-frames" will override this default
setting.

This fix needs to be applied to stable kernels starting with 3.10.

Tested-By: Maggie Mae Roxas <maggie.mae.roxas@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/marvell/mvneta.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -216,7 +216,7 @@
 /* Various constants */
 
 /* Coalescing */
-#define MVNETA_TXDONE_COAL_PKTS		16
+#define MVNETA_TXDONE_COAL_PKTS		1
 #define MVNETA_RX_COAL_PKTS		32
 #define MVNETA_RX_COAL_USEC		100
 



  parent reply	other threads:[~2014-12-14 20:21 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-14 20:20 [PATCH 3.17 00/47] 3.17.7-stable review Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 01/47] mm: frontswap: invalidate expired data on a dup-store failure Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 02/47] mm/vmpressure.c: fix race in vmpressure_work_fn() Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 03/47] drivers/input/evdev.c: dont kfree() a vmalloc address Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 04/47] fat: fix oops on corrupted vfat fs Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 05/47] mm: fix swapoff hang after page migration and fork Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 06/47] mm: fix anon_vma_clone() error treatment Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 07/47] slab: fix nodeid bounds check for non-contiguous node IDs Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 08/47] xen-netfront: Remove BUGs on paged skb data which crosses a page boundary Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 09/47] drm/nouveau/gf116: remove copy1 engine Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 10/47] nouveau: move the hotplug ignore to correct place Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 11/47] i2c: omap: fix NACK and Arbitration Lost irq handling Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 12/47] i2c: omap: fix i207 errata handling Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 14/47] i2c: cadence: Set the hardware time-out register to maximum value Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 16/47] drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6 Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 17/47] of/fdt: memblock_reserve /memreserve/ regions in the case of partial overlap Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 18/47] drm/i915: More cautious with pch fifo underruns Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 19/47] drm/i915: Unlock panel even when LVDS is disabled Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 20/47] x86: Use $(OBJDUMP) instead of plain objdump Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 22/47] media: s2255drv: fix payload size for JPG, MJPEG Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 23/47] media: smiapp: Only some selection targets are settable Greg Kroah-Hartman
2014-12-14 20:20 ` [PATCH 3.17 24/47] AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 25/47] ahci: disable MSI on SAMSUNG 0xa800 SSD Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 26/47] sata_fsl: fix error handling of irq_of_parse_and_map Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 27/47] ip_tunnel: the lack of vti_link_ops dellink() cause kernel panic Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 28/47] ipv6: gre: fix wrong skb->protocol in WCCP Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 29/47] vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX] Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 30/47] Fix race condition between vxlan_sock_add and vxlan_sock_release Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 31/47] tg3: fix ring init when there are more TX than RX channels Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 32/47] net/mlx4_core: Limit count field to 24 bits in qp_alloc_res Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 33/47] net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 34/47] bond: Check length of IFLA_BOND_ARP_IP_TARGET attributes Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 35/47] rtnetlink: release net refcnt on error in do_setlink() Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 36/47] gre: Set inner mac header in gro complete Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 37/47] mips: bpf: Fix broken BPF_MOD Greg Kroah-Hartman
2014-12-14 20:21 ` Greg Kroah-Hartman [this message]
2014-12-14 20:21 ` [PATCH 3.17 39/47] net: mvneta: fix race condition in mvneta_tx() Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 41/47] xen-netfront: use correct linear area after linearizing an skb Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 42/47] netlink: use jhash as hashfn for rhashtable Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 43/47] Revert: ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 45/47] ALSA: hda - Add EAPD fixup for ASUS Z99He laptop Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 46/47] ALSA: hda - Fix built-in mic at resume on Lenovo Ideapad S210 Greg Kroah-Hartman
2014-12-14 20:21 ` [PATCH 3.17 47/47] ALSA: usb-audio: Dont resubmit pending URBs at MIDI error recovery Greg Kroah-Hartman
2014-12-15  3:31 ` [PATCH 3.17 00/47] 3.17.7-stable review Guenter Roeck
2014-12-16  3:07 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141214201820.014808157@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maggie.mae.roxas@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).