linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Thomas Petazzoni <thomas.petazzoni@free-electrons.com>,
	Gregory CLEMENT <gregory.clement@free-electrons.com>,
	Eric Dumazet <edumazet@google.com>,
	Arnaud Ebalard <arno@natisbad.org>, Willy Tarreau <w@1wt.eu>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 3.10 21/27] net: mvneta: use per_cpu stats to fix an SMP lock up
Date: Tue,  5 Aug 2014 11:14:13 -0700	[thread overview]
Message-ID: <20140805181344.893666223@linuxfoundation.org> (raw)
In-Reply-To: <20140805181344.268039690@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: willy tarreau <w@1wt.eu>

commit 74c41b048db1073a04827d7f39e95ac1935524cc upstream.

Stats writers are mvneta_rx() and mvneta_tx(). They don't lock anything
when they update the stats, and as a result, it randomly happens that
the stats freeze on SMP if two updates happen during stats retrieval.
This is very easily reproducible by starting two HTTP servers and binding
each of them to a different CPU, then consulting /proc/net/dev in loops
during transfers, the interface should immediately lock up. This issue
also randomly happens upon link state changes during transfers, because
the stats are collected in this situation, but it takes more attempts to
reproduce it.

The comments in netdevice.h suggest using per_cpu stats instead to get
rid of this issue.

This patch implements this. It merges both rx_stats and tx_stats into
a single "stats" member with a single syncp. Both mvneta_rx() and
mvneta_rx() now only update the a single CPU's counters.

In turn, mvneta_get_stats64() does the summing by iterating over all CPUs
to get their respective stats.

With this change, stats are still correct and no more lockup is encountered.

Note that this bug was present since the first import of the mvneta
driver.  It might make sense to backport it to some stable trees. If
so, it depends on "d33dc73 net: mvneta: increase the 64-bit rx/tx stats
out of the hot path".

Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
[wt: port to 3.10 : u64_stats_init() does not exist in 3.10 and is not needed]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/marvell/mvneta.c |   78 +++++++++++++++++++++-------------
 1 file changed, 50 insertions(+), 28 deletions(-)

--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -219,10 +219,12 @@
 
 #define MVNETA_RX_BUF_SIZE(pkt_size)   ((pkt_size) + NET_SKB_PAD)
 
-struct mvneta_stats {
+struct mvneta_pcpu_stats {
 	struct	u64_stats_sync syncp;
-	u64	packets;
-	u64	bytes;
+	u64	rx_packets;
+	u64	rx_bytes;
+	u64	tx_packets;
+	u64	tx_bytes;
 };
 
 struct mvneta_port {
@@ -248,8 +250,7 @@ struct mvneta_port {
 	u8 mcast_count[256];
 	u16 tx_ring_size;
 	u16 rx_ring_size;
-	struct mvneta_stats tx_stats;
-	struct mvneta_stats rx_stats;
+	struct mvneta_pcpu_stats *stats;
 
 	struct mii_bus *mii_bus;
 	struct phy_device *phy_dev;
@@ -428,21 +429,29 @@ struct rtnl_link_stats64 *mvneta_get_sta
 {
 	struct mvneta_port *pp = netdev_priv(dev);
 	unsigned int start;
+	int cpu;
 
-	memset(stats, 0, sizeof(struct rtnl_link_stats64));
-
-	do {
-		start = u64_stats_fetch_begin_bh(&pp->rx_stats.syncp);
-		stats->rx_packets = pp->rx_stats.packets;
-		stats->rx_bytes	= pp->rx_stats.bytes;
-	} while (u64_stats_fetch_retry_bh(&pp->rx_stats.syncp, start));
-
-
-	do {
-		start = u64_stats_fetch_begin_bh(&pp->tx_stats.syncp);
-		stats->tx_packets = pp->tx_stats.packets;
-		stats->tx_bytes	= pp->tx_stats.bytes;
-	} while (u64_stats_fetch_retry_bh(&pp->tx_stats.syncp, start));
+	for_each_possible_cpu(cpu) {
+		struct mvneta_pcpu_stats *cpu_stats;
+		u64 rx_packets;
+		u64 rx_bytes;
+		u64 tx_packets;
+		u64 tx_bytes;
+
+		cpu_stats = per_cpu_ptr(pp->stats, cpu);
+		do {
+			start = u64_stats_fetch_begin_bh(&cpu_stats->syncp);
+			rx_packets = cpu_stats->rx_packets;
+			rx_bytes   = cpu_stats->rx_bytes;
+			tx_packets = cpu_stats->tx_packets;
+			tx_bytes   = cpu_stats->tx_bytes;
+		} while (u64_stats_fetch_retry_bh(&cpu_stats->syncp, start));
+
+		stats->rx_packets += rx_packets;
+		stats->rx_bytes   += rx_bytes;
+		stats->tx_packets += tx_packets;
+		stats->tx_bytes   += tx_bytes;
+	}
 
 	stats->rx_errors	= dev->stats.rx_errors;
 	stats->rx_dropped	= dev->stats.rx_dropped;
@@ -1416,10 +1425,12 @@ static int mvneta_rx(struct mvneta_port
 	}
 
 	if (rcvd_pkts) {
-		u64_stats_update_begin(&pp->rx_stats.syncp);
-		pp->rx_stats.packets += rcvd_pkts;
-		pp->rx_stats.bytes   += rcvd_bytes;
-		u64_stats_update_end(&pp->rx_stats.syncp);
+		struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
+
+		u64_stats_update_begin(&stats->syncp);
+		stats->rx_packets += rcvd_pkts;
+		stats->rx_bytes   += rcvd_bytes;
+		u64_stats_update_end(&stats->syncp);
 	}
 
 	/* Update rxq management counters */
@@ -1552,11 +1563,12 @@ static int mvneta_tx(struct sk_buff *skb
 
 out:
 	if (frags > 0) {
-		u64_stats_update_begin(&pp->tx_stats.syncp);
-		pp->tx_stats.packets++;
-		pp->tx_stats.bytes += skb->len;
-		u64_stats_update_end(&pp->tx_stats.syncp);
+		struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
 
+		u64_stats_update_begin(&stats->syncp);
+		stats->tx_packets++;
+		stats->tx_bytes  += skb->len;
+		u64_stats_update_end(&stats->syncp);
 	} else {
 		dev->stats.tx_dropped++;
 		dev_kfree_skb_any(skb);
@@ -2758,6 +2770,13 @@ static int mvneta_probe(struct platform_
 
 	clk_prepare_enable(pp->clk);
 
+	/* Alloc per-cpu stats */
+	pp->stats = alloc_percpu(struct mvneta_pcpu_stats);
+	if (!pp->stats) {
+		err = -ENOMEM;
+		goto err_clk;
+	}
+
 	pp->tx_done_timer.data = (unsigned long)dev;
 
 	pp->tx_ring_size = MVNETA_MAX_TXD;
@@ -2769,7 +2788,7 @@ static int mvneta_probe(struct platform_
 	err = mvneta_init(pp, phy_addr);
 	if (err < 0) {
 		dev_err(&pdev->dev, "can't init eth hal\n");
-		goto err_clk;
+		goto err_free_stats;
 	}
 	mvneta_port_power_up(pp, phy_mode);
 
@@ -2798,6 +2817,8 @@ static int mvneta_probe(struct platform_
 
 err_deinit:
 	mvneta_deinit(pp);
+err_free_stats:
+	free_percpu(pp->stats);
 err_clk:
 	clk_disable_unprepare(pp->clk);
 err_unmap:
@@ -2818,6 +2839,7 @@ static int mvneta_remove(struct platform
 	unregister_netdev(dev);
 	mvneta_deinit(pp);
 	clk_disable_unprepare(pp->clk);
+	free_percpu(pp->stats);
 	iounmap(pp->base);
 	irq_dispose_mapping(dev->irq);
 	free_netdev(dev);



  parent reply	other threads:[~2014-08-05 18:16 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05 18:13 [PATCH 3.10 00/27] 3.10.52-stable review Greg Kroah-Hartman
2014-08-05 18:13 ` [PATCH 3.10 01/27] crypto: af_alg - properly label AF_ALG socket Greg Kroah-Hartman
2014-08-05 18:13 ` [PATCH 3.10 02/27] ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout Greg Kroah-Hartman
2014-08-05 18:13 ` [PATCH 3.10 03/27] cfg80211: fix mic_failure tracing Greg Kroah-Hartman
2014-08-05 18:13 ` [PATCH 3.10 04/27] rapidio/tsi721_dma: fix failure to obtain transaction descriptor Greg Kroah-Hartman
2014-08-05 18:13 ` [PATCH 3.10 05/27] scsi: handle flush errors properly Greg Kroah-Hartman
2014-08-05 18:13 ` [PATCH 3.10 06/27] mm, thp: do not allow thp faults to avoid cpuset restrictions Greg Kroah-Hartman
2014-08-05 18:13 ` [PATCH 3.10 07/27] staging: vt6655: Fix disassociated messages every 10 seconds Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 08/27] iio: buffer: Fix demux table creation Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 09/27] printk: rename printk_sched to printk_deferred Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 10/27] timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 11/27] Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 12/27] x86-64, espfix: Dont leak bits 31:16 of %esp returning to 16-bit stack Greg Kroah-Hartman
2014-08-06 15:16   ` Luis Henriques
2014-08-06 15:24     ` Greg Kroah-Hartman
2014-08-06 15:55       ` Luis Henriques
2014-08-07 17:13       ` Greg Kroah-Hartman
2014-08-07 17:29         ` Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 13/27] x86, espfix: Move espfix definitions into a separate header file Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 14/27] x86, espfix: Fix broken header guard Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 15/27] x86, espfix: Make espfix64 a Kconfig option, fix UML Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 16/27] x86, espfix: Make it possible to disable 16-bit support Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 17/27] x86_64/entry/xen: Do not invoke espfix64 on Xen Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 18/27] staging: vt6655: Fix Warning on boot handle_irq_event_percpu Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 19/27] Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan" Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 20/27] net: mvneta: increase the 64-bit rx/tx stats out of the hot path Greg Kroah-Hartman
2014-08-05 18:14 ` Greg Kroah-Hartman [this message]
2014-08-05 18:14 ` [PATCH 3.10 22/27] net: mvneta: do not schedule in mvneta_tx_timeout Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 23/27] net: mvneta: add missing bit descriptions for interrupt masks and causes Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 24/27] net: mvneta: replace Tx timer with a real interrupt Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 25/27] net/l2tp: dont fall back on UDP [get|set]sockopt Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 26/27] lib/btree.c: fix leak of whole btree nodes Greg Kroah-Hartman
2014-08-05 18:14 ` [PATCH 3.10 27/27] x86/espfix/xen: Fix allocation of pages for paravirt page tables Greg Kroah-Hartman
2014-08-05 23:08 ` [PATCH 3.10 00/27] 3.10.52-stable review Shuah Khan
2014-08-06  2:34 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140805181344.893666223@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=arno@natisbad.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gregory.clement@free-electrons.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=thomas.petazzoni@free-electrons.com \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).