Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/5] net: mvneta: increase the 64-bit rx/tx stats out of the hot path
From: Willy Tarreau @ 2014-01-13  3:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <1389574192.31367.194.camel@edumazet-glaptop2.roam.corp.google.com>

On Sun, Jan 12, 2014 at 04:49:52PM -0800, Eric Dumazet wrote:
> On Sun, 2014-01-12 at 10:31 +0100, Willy Tarreau wrote:
> > Better count packets and bytes in the stack and on 32 bit then
> > accumulate them at the end for once. This saves two memory writes
> > and two memory barriers per packet. The incoming packet rate was
> > increased by 4.7% on the Openblocks AX3 thanks to this.
> > 
> > Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> > Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
> > Signed-off-by: Willy Tarreau <w@1wt.eu>
> > ---
> >  drivers/net/ethernet/marvell/mvneta.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> 
> Note that with such a cost, one has to wonder why we keep 64bit stats
> for this NIC on 32bit hosts...

At least this avoids wrapping if stats are not retrieved often enough.
As someone who had to support 32-bit stats in production on a firewall
running on kernel 2.4, I can say it really becomes a problem to graph
activity if stats are not collected as often as every 30 seconds, which
is short in certain environments.

Thanks,
Willy

^ permalink raw reply

* [PATCH net-next] bgmac: propagate error codes in bgmac_probe()
From: Florian Fainelli @ 2014-01-13  3:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, zajec5, hauke, nlhintz, sd, bhutchings, Florian Fainelli

bgmac_mii_register() and register_netdev() both return appropriate error
codes for the failures they would encounter, propagate this error code
instead of overriding the value with -ENOTSUPP which is not the correct
error code to return.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Sabrina, Ben, this relates to the thread on -ENOTSUPP, other places in
the driver need fixing too, but this has to be done as a separate patch
anyway.

 drivers/net/ethernet/broadcom/bgmac.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index 7f968a9..0297a79 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1518,14 +1518,12 @@ static int bgmac_probe(struct bcma_device *core)
 	err = bgmac_mii_register(bgmac);
 	if (err) {
 		bgmac_err(bgmac, "Cannot register MDIO\n");
-		err = -ENOTSUPP;
 		goto err_dma_free;
 	}
 
 	err = register_netdev(bgmac->net_dev);
 	if (err) {
 		bgmac_err(bgmac, "Cannot register net device\n");
-		err = -ENOTSUPP;
 		goto err_mii_unregister;
 	}
 
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH 2/5] net: mvneta: use per_cpu stats to fix an SMP lock up
From: Willy Tarreau @ 2014-01-13  3:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <1389573903.31367.190.camel@edumazet-glaptop2.roam.corp.google.com>

On Sun, Jan 12, 2014 at 04:45:03PM -0800, Eric Dumazet wrote:
> On Sun, 2014-01-12 at 23:09 +0100, Willy Tarreau wrote:
> 
> > But we can have multiple tx in parallel, one per queue. And it's only
> > when I explicitly bind two servers to two distinct CPU cores that I
> > can trigger the issue, which seems to confirm that this is the cause
> > of the issue.
> 
> So this driver has multiqueue ?

Yes, it defaults to 8 queues in each direction.

> Definitely it should have one syncp per queue.
> 
> Or per cpu stats, as your patch did.

OK thank you for your review and explanation then, I'm reassured :-)

Thanks,
Willy

^ permalink raw reply

* [PATCH net] net: avoid reference counter overflows on fib_rules in multicast forwarding
From: Hannes Frederic Sowa @ 2014-01-13  1:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Bob Falken, Julian Anastasov, netdev, kaber, tgraf
In-Reply-To: <1389574560.31367.197.camel@edumazet-glaptop2.roam.corp.google.com>

Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.

This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.

Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.

Fixes: f0ad0860d01e47 ("ipv4: ipmr: support multiple tables")
Fixes: d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables")
Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
Bob Falken already tested this patch, as it is similar to my first
attempt but the additional and similar fix for ipv6.

We need an additional fix for kernels without FIB_LOOKUP_NOREF, but I'll
move that to tomorrow, as it is already late here.

 net/ipv4/ipmr.c  | 7 +++++--
 net/ipv6/ip6mr.c | 7 +++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 421a249..b9b3472 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -157,9 +157,12 @@ static struct mr_table *ipmr_get_table(struct net *net, u32 id)
 static int ipmr_fib_lookup(struct net *net, struct flowi4 *flp4,
 			   struct mr_table **mrt)
 {
-	struct ipmr_result res;
-	struct fib_lookup_arg arg = { .result = &res, };
 	int err;
+	struct ipmr_result res;
+	struct fib_lookup_arg arg = {
+		.result = &res,
+		.flags = FIB_LOOKUP_NOREF,
+	};
 
 	err = fib_rules_lookup(net->ipv4.mr_rules_ops,
 			       flowi4_to_flowi(flp4), 0, &arg);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index f365310..0eb4038 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -141,9 +141,12 @@ static struct mr6_table *ip6mr_get_table(struct net *net, u32 id)
 static int ip6mr_fib_lookup(struct net *net, struct flowi6 *flp6,
 			    struct mr6_table **mrt)
 {
-	struct ip6mr_result res;
-	struct fib_lookup_arg arg = { .result = &res, };
 	int err;
+	struct ip6mr_result res;
+	struct fib_lookup_arg arg = {
+		.result = &res,
+		.flags = FIB_LOOKUP_NOREF,
+	};
 
 	err = fib_rules_lookup(net->ipv6.mr6_rules_ops,
 			       flowi6_to_flowi(flp6), 0, &arg);
-- 
1.8.4.2

^ permalink raw reply related

* Re: [PATCH] sh_eth: fix garbled TX error message
From: Joe Perches @ 2014-01-13  1:24 UTC (permalink / raw)
  To: Simon Horman; +Cc: Sergei Shtylyov, netdev, linux-sh
In-Reply-To: <20140113004532.GJ15296@verge.net.au>

On Mon, 2014-01-13 at 09:45 +0900, Simon Horman wrote:
> On Sat, Jan 11, 2014 at 02:41:49AM +0300, Sergei Shtylyov wrote:
> > sh_eth_error() in case of a TX error tries to print a message using 2 dev_err()
> > calls with the first string not finished by '\n', so that the resulting message
> > would inevitably come out garbled, with something like "3net eth0: " inserted
> > in the middle.  Avoid that by merging 2 calls into one.

I believe this interleaving should not happen since
commit e28d713704117bca0820c732210df6075b09f13b
(2.6.31 days)

Perhaps it'd be better to use netdev_<level> and
netif_<level> instead of dev_<level> and pr_<level>.

uncompiled/untested...
---

 drivers/net/ethernet/renesas/sh_eth.c | 63 ++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 8884107..6baad48 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -334,7 +334,7 @@ static void sh_eth_select_mii(struct net_device *ndev)
 		value = 0x0;
 		break;
 	default:
-		pr_warn("PHY interface mode was not setup. Set to MII.\n");
+		netdev_warn(ndev, "PHY interface mode was not setup. Set to MII.\n");
 		value = 0x1;
 		break;
 	}
@@ -756,7 +756,7 @@ static int sh_eth_check_reset(struct net_device *ndev)
 		cnt--;
 	}
 	if (cnt <= 0) {
-		pr_err("Device reset failed\n");
+		netdev_err(ndev, "Device reset failed\n");
 		ret = -ETIMEDOUT;
 	}
 	return ret;
@@ -1456,8 +1456,7 @@ ignore_link:
 		/* Unused write back interrupt */
 		if (intr_status & EESR_TABT) {	/* Transmit Abort int */
 			ndev->stats.tx_aborted_errors++;
-			if (netif_msg_tx_err(mdp))
-				dev_err(&ndev->dev, "Transmit Abort\n");
+			netif_err(mdp, tx_err, ndev, "Transmit Abort\n");
 		}
 	}
 
@@ -1466,45 +1465,38 @@ ignore_link:
 		if (intr_status & EESR_RFRMER) {
 			/* Receive Frame Overflow int */
 			ndev->stats.rx_frame_errors++;
-			if (netif_msg_rx_err(mdp))
-				dev_err(&ndev->dev, "Receive Abort\n");
+			netif_err(mdp, rx_err, ndev, "Receive Abort\n");
 		}
 	}
 
 	if (intr_status & EESR_TDE) {
 		/* Transmit Descriptor Empty int */
 		ndev->stats.tx_fifo_errors++;
-		if (netif_msg_tx_err(mdp))
-			dev_err(&ndev->dev, "Transmit Descriptor Empty\n");
+		netif_err(mdp, tx_err, ndev, "Transmit Descriptor Empty\n");
 	}
 
 	if (intr_status & EESR_TFE) {
 		/* FIFO under flow */
 		ndev->stats.tx_fifo_errors++;
-		if (netif_msg_tx_err(mdp))
-			dev_err(&ndev->dev, "Transmit FIFO Under flow\n");
+		netif_err(mdp, tx_err, ndev, "Transmit FIFO Under flow\n");
 	}
 
 	if (intr_status & EESR_RDE) {
 		/* Receive Descriptor Empty int */
 		ndev->stats.rx_over_errors++;
-
-		if (netif_msg_rx_err(mdp))
-			dev_err(&ndev->dev, "Receive Descriptor Empty\n");
+		netif_err(mdp, rx_err, ndev, "Receive Descriptor Empty\n");
 	}
 
 	if (intr_status & EESR_RFE) {
 		/* Receive FIFO Overflow int */
 		ndev->stats.rx_fifo_errors++;
-		if (netif_msg_rx_err(mdp))
-			dev_err(&ndev->dev, "Receive FIFO Overflow\n");
+		netif_err(mdp, rx_err, ndev, "Receive FIFO Overflow\n");
 	}
 
 	if (!mdp->cd->no_ade && (intr_status & EESR_ADE)) {
 		/* Address Error */
 		ndev->stats.tx_fifo_errors++;
-		if (netif_msg_tx_err(mdp))
-			dev_err(&ndev->dev, "Address Error\n");
+		netif_err(mdp, tx_err, ndev, "Address Error\n");
 	}
 
 	mask = EESR_TWB | EESR_TABT | EESR_ADE | EESR_TDE | EESR_TFE;
@@ -1514,10 +1506,9 @@ ignore_link:
 		/* Tx error */
 		u32 edtrr = sh_eth_read(ndev, EDTRR);
 		/* dmesg */
-		dev_err(&ndev->dev, "TX error. status=%8.8x cur_tx=%8.8x ",
-			intr_status, mdp->cur_tx);
-		dev_err(&ndev->dev, "dirty_tx=%8.8x state=%8.8x EDTRR=%8.8x.\n",
-			mdp->dirty_tx, (u32) ndev->state, edtrr);
+		netdev_err(ndev, "TX error. status=%08x cur_tx=%08x dirty_tx=%08x state=%08x EDTRR=%08x\n",
+			   intr_status, mdp->cur_tx, mdp->dirty_tx,
+			   (u32)ndev->state, edtrr);
 		/* dirty buffer free */
 		sh_eth_txfree(ndev);
 
@@ -1678,7 +1669,7 @@ static int sh_eth_phy_init(struct net_device *ndev)
 	phydev = phy_connect(ndev, phy_id, sh_eth_adjust_link,
 			     mdp->phy_interface);
 	if (IS_ERR(phydev)) {
-		dev_err(&ndev->dev, "phy_connect failed\n");
+		netdev_err(ndev, "phy_connect failed\n");
 		return PTR_ERR(phydev);
 	}
 
@@ -1864,12 +1855,12 @@ static int sh_eth_set_ringparam(struct net_device *ndev,
 
 	ret = sh_eth_ring_init(ndev);
 	if (ret < 0) {
-		dev_err(&ndev->dev, "%s: sh_eth_ring_init failed.\n", __func__);
+		netdev_err(ndev, "%s: sh_eth_ring_init failed\n", __func__);
 		return ret;
 	}
 	ret = sh_eth_dev_init(ndev, false);
 	if (ret < 0) {
-		dev_err(&ndev->dev, "%s: sh_eth_dev_init failed.\n", __func__);
+		netdev_err(ndev, "%s: sh_eth_dev_init failed\n", __func__);
 		return ret;
 	}
 
@@ -1910,7 +1901,7 @@ static int sh_eth_open(struct net_device *ndev)
 	ret = request_irq(ndev->irq, sh_eth_interrupt,
 			  mdp->cd->irq_flags, ndev->name, ndev);
 	if (ret) {
-		dev_err(&ndev->dev, "Can not assign IRQ number\n");
+		netdev_err(ndev, "Can not assign IRQ number\n");
 		goto out_napi_off;
 	}
 
@@ -1948,10 +1939,8 @@ static void sh_eth_tx_timeout(struct net_device *ndev)
 
 	netif_stop_queue(ndev);
 
-	if (netif_msg_timer(mdp)) {
-		dev_err(&ndev->dev, "%s: transmit timed out, status %8.8x, resetting...\n",
-			ndev->name, (int)sh_eth_read(ndev, EESR));
-	}
+	netif_err(mdp, timer, ndev, "transmit timed out, status %08x, resetting...\n",
+		  (int)sh_eth_read(ndev, EESR));
 
 	/* tx_errors count up */
 	ndev->stats.tx_errors++;
@@ -2154,7 +2143,7 @@ static int sh_eth_tsu_busy(struct net_device *ndev)
 		udelay(10);
 		timeout--;
 		if (timeout <= 0) {
-			dev_err(&ndev->dev, "%s: timeout\n", __func__);
+			netdev_err(ndev, "%s: timeout\n", __func__);
 			return -ETIMEDOUT;
 		}
 	}
@@ -2571,7 +2560,6 @@ static const u16 *sh_eth_get_register_offset(int register_type)
 		reg_offset = sh_eth_offset_fast_sh3_sh2;
 		break;
 	default:
-		pr_err("Unknown register type (%d)\n", register_type);
 		break;
 	}
 
@@ -2675,6 +2663,10 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
 	/* set cpu data */
 	mdp->cd = (struct sh_eth_cpu_data *)id->driver_data;
 	mdp->reg_offset = sh_eth_get_register_offset(mdp->cd->register_type);
+	if (!mdp->reg_offset)
+		dev_err(&pdev->dev, "Unknown register type (%d)\n",
+			mdp->cd->register_type);
+
 	sh_eth_set_default_cpu_data(mdp->cd);
 
 	/* set function */
@@ -2691,9 +2683,10 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
 	/* read and set MAC address */
 	read_mac_address(ndev, pd->mac_addr);
 	if (!is_valid_ether_addr(ndev->dev_addr)) {
-		dev_warn(&pdev->dev,
-			 "no valid MAC address supplied, using a random one.\n");
 		eth_hw_addr_random(ndev);
+		dev_warn(&pdev->dev,
+			 "no valid MAC address supplied, using random %pM\n",
+			 ndev->dev_addr);
 	}
 
 	/* ioremap the TSU registers */
@@ -2733,8 +2726,8 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
 		goto out_unregister;
 
 	/* print device information */
-	pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
-		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
+	netdev_info(ndev, "Base address at 0x%x, %pM, IRQ %d\n",
+		    (u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
 
 	platform_set_drvdata(pdev, ndev);
 

^ permalink raw reply related

* Re: [PATCH net-next] IPv6: enable TCP to use an anycast address
From: Hannes Frederic Sowa @ 2014-01-13  1:11 UTC (permalink / raw)
  To: François-Xavier Le Bail
  Cc: Alexey Kuznetsov, netdev, David S. Miller, James Morris,
	Hideaki Yoshifuji, Patrick McHardy
In-Reply-To: <1389538427.32032.YahooMailBasic@web125505.mail.ne1.yahoo.com>

Hi!

On Sun, Jan 12, 2014 at 06:53:47AM -0800, François-Xavier Le Bail wrote:
> On Sat, 1/11/14, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> 
> > On Sat, Jan 11, 2014 at 05:38:27PM +0400, Alexey Kuznetsov wrote:
> > > On Sat, Jan 11, 2014 at 5:06 PM, François-Xavier Le Bail
> > > <fx.lebail@yahoo.com> wrote:
> > > > Many DNS root-servers use TCP with anycast (IPv4 and IPV6).
> > > 
> > > Actually, I was alerted by reset processing in your patch, it cannot be right.
> > > 
> > > Do not you think this must not be enabled for common use? At least
> > > some separate sysctl disabled by default.
> 
> > The idea I had, was, that if a socket does knowingly bind to an anycast
> > address, it is allowed to do so and process queries on it with both TCP and
> > UDP. I don't think we need a sysctl for that? Anycast addresses are either
> > pre-defined (e.g. the subnet router anycast address) or specified by a flag
> > when the administrator adds one. Currently one can only add anycast addresses
> > either by forwarding and gets the per-subnet anycast address or with a
> > setsockopt IPV6_JOIN_ANYCAST.
> 
> > So the problem is what should be allowed when the socket listens on an any
> > address? Maybe this should be protected by a sysctl?
> 
> TCP case:
> With my two patches (the one for bind and this one for tcp), when a
> SOCK_STREAM socket listen to in6addr_any, the server is able to 
> send TCP reply with unicast or anycast source address, according
> to the destination address used by the client.
> 
> dest request unicast => src reply unicast (current behavior)
> dest resquet anycast => src reply anycast (new)
> 
> So, I don't think there is a need for a sysctl.

I am still thinking about the RST-case and am a bit unsure here. But I
currently don't see any problems.

> UDP case:
> By default (no socket option), the server program don't know the
> destination address of the request. The ipv6_dev_get_saddr() is 
> used for choosing the unicast source address of the reply.
> 
> I am not sure a change is needed here.

If the socket did bind to anycast address, we may well send packets from
anycast source without switching a sysctl IMHO.

If not bound we need to keep current behaviour, yes. As DNS-Servers
normally do accept a list of addresses where to bind to, I find this
acceptable. RFC6724 additionally now allowes the selection of anycast
as source address, so we may think about this and maybe integrate this
with ip addrlabel and such).

We also need to check the implications to icmp here (error reporting if
an UDP packets is sent to an anycast address). Also we have to check for
icmps generated because of forwarding errors and their source selection.

> When using IPV6_RECVPKTINFO, a server is able to know
> the destination address of the request and can use it as source
> address for the reply.
> To enable anycast for this (don't get EINVAL), there is need for
> a patch like the one I posted ("IPv6: add option to use anycast
> addresses as source addresses for datagrams").
> I am working on a v2.

Ack.

> With the appropriate change:
> dest request unicast => src reply unicast (current behavior)
> dest resquet anycast => src reply anycast (new)
> 
> I don't think, there either, there is a need for a sysctl.

I agree here.

Greetings,

  Hannes

^ permalink raw reply

* Re: Multicast routing stops functioning after 4G multicast packets recived.
From: Eric Dumazet @ 2014-01-13  0:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Bob Falken, Julian Anastasov, netdev, kaber, tgraf
In-Reply-To: <20140112074245.GF6586@order.stressinduktion.org>

On Sun, 2014-01-12 at 08:42 +0100, Hannes Frederic Sowa wrote:

> Hm, Eric. If we do that we can just specifiy FIB_LOOKUP_NOREF
> unconditionally. FIB_LOOKUP_NOREF has no other side effects on a ipmr
> lookup as taking the reference on the rule, which we would drop after
> that.
> 
> So we would actually be going back to the first patch in this thread. I
> guess it is just a matter of style?

Hi Hannes, please submit a formal patch, so that we can have a proper
ground for discussion (I guess I'll only add my Acked-by)

Thanks !

^ permalink raw reply

* Re: [PATCH 1/5] net: mvneta: increase the 64-bit rx/tx stats out of the hot path
From: Eric Dumazet @ 2014-01-13  0:49 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <1389519069-1619-2-git-send-email-w@1wt.eu>

On Sun, 2014-01-12 at 10:31 +0100, Willy Tarreau wrote:
> Better count packets and bytes in the stack and on 32 bit then
> accumulate them at the end for once. This saves two memory writes
> and two memory barriers per packet. The incoming packet rate was
> increased by 4.7% on the Openblocks AX3 thanks to this.
> 
> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)


Reviewed-by: Eric Dumazet <edumazet@google.com>

Note that with such a cost, one has to wonder why we keep 64bit stats
for this NIC on 32bit hosts...

^ permalink raw reply

* Re: [PATCH 2/5] net: mvneta: use per_cpu stats to fix an SMP lock up
From: Eric Dumazet @ 2014-01-13  0:48 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <1389519069-1619-3-git-send-email-w@1wt.eu>

On Sun, 2014-01-12 at 10:31 +0100, Willy Tarreau wrote:

> This patch implements this. It merges both rx_stats and tx_stats into
> a single "stats" member with a single syncp. Both mvneta_rx() and
> mvneta_rx() now only update the a single CPU's counters.


Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH] sh_eth: fix garbled TX error message
From: Simon Horman @ 2014-01-13  0:45 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, linux-sh
In-Reply-To: <201401110241.49471.sergei.shtylyov@cogentembedded.com>

On Sat, Jan 11, 2014 at 02:41:49AM +0300, Sergei Shtylyov wrote:
> sh_eth_error() in case of a TX error tries to print a message using 2 dev_err()
> calls with the first string not finished by '\n', so that the resulting message
> would inevitably come out garbled, with something like "3net eth0: " inserted
> in the middle.  Avoid that by merging 2 calls into one.
> 
> While at it, insert an empty line after the nearby declaration.
> 
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Reviewed-by: Simon Horman <horms+renesas@verge.net.au>

> 
> ---
> Although being a fix, this patch is against the 'net-next.git' repo as it's does
> not seem important enough at this time in the release cycle. Please consider it
> for the stable kernel though...
> 
>  drivers/net/ethernet/renesas/sh_eth.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
> ===================================================================
> --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
> +++ net-next/drivers/net/ethernet/renesas/sh_eth.c
> @@ -1513,11 +1513,11 @@ ignore_link:
>  	if (intr_status & mask) {
>  		/* Tx error */
>  		u32 edtrr = sh_eth_read(ndev, EDTRR);
> +
>  		/* dmesg */
> -		dev_err(&ndev->dev, "TX error. status=%8.8x cur_tx=%8.8x ",
> -			intr_status, mdp->cur_tx);
> -		dev_err(&ndev->dev, "dirty_tx=%8.8x state=%8.8x EDTRR=%8.8x.\n",
> -			mdp->dirty_tx, (u32) ndev->state, edtrr);
> +		dev_err(&ndev->dev, "TX error. status=%8.8x cur_tx=%8.8x dirty_tx=%8.8x state=%8.8x EDTRR=%8.8x.\n",
> +			intr_status, mdp->cur_tx, mdp->dirty_tx,
> +			(u32)ndev->state, edtrr);
>  		/* dirty buffer free */
>  		sh_eth_txfree(ndev);
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sh" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH 2/5] net: mvneta: use per_cpu stats to fix an SMP lock up
From: Eric Dumazet @ 2014-01-13  0:45 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <20140112220921.GE16576@1wt.eu>

On Sun, 2014-01-12 at 23:09 +0100, Willy Tarreau wrote:

> But we can have multiple tx in parallel, one per queue. And it's only
> when I explicitly bind two servers to two distinct CPU cores that I
> can trigger the issue, which seems to confirm that this is the cause
> of the issue.

So this driver has multiqueue ?

Definitely it should have one syncp per queue.

Or per cpu stats, as your patch did.

Thanks !

^ permalink raw reply

* Re: [PATCH net-next v3 8/9] xen-netback: Timeout packets in RX path
From: Zoltan Kiss @ 2014-01-13  0:20 UTC (permalink / raw)
  To: Paul Durrant, Ian Campbell, Wei Liu,
	xen-devel@lists.xenproject.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jonathan Davies
In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD0200C2A@AMSPEX01CL01.citrite.net>

On 09/01/14 09:20, Paul Durrant wrote:
>> We are adding the skb to vif->rx_queue even when
>> xenvif_rx_ring_slots_available(vif, min_slots_needed) said there is no
>> space for that. Or am I missing something? Paul?
>>
> That's correct. Part of the flow control improvement was to get rid of needless packet drops. For your purposes, you basically need to avoid using the queuing discipline and take packets into netback's vif->rx_queue regardless of the state of the shared ring so that you can drop them if they get beyond a certain age. So, perhaps you should never stop the netif queue, place an upper limit on vif->rx_queue (either packet or byte count) and drop when that is exceeded (i.e. mimicking pfifo or bfifo internally).
>
How about this:
- when the timer fires first we wake up the thread an tell it to drop 
all the packets in rx_queue
- start_xmit then can drain the qdisc queue into the device queue
- additionally, the RX thread should stop that timer when it was able to 
do some work

Regards,

Zoli

^ permalink raw reply

* Re: [PATCH net-next v2 4/4] virtio-net: initial debugfs support, export mergeable rx buffer size
From: Michael Dalton @ 2014-01-12 23:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, lf-virt, Eric Dumazet, David S. Miller
In-Reply-To: <20140112170939.GA17202@redhat.com>

Hi Michael,

On Sun, Jan 12, 2014 at 9:09 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> Can't we add struct attribute * to netdevice, and pass that in when
> creating the kobj?

I like that idea, I think that will work and should be better than
the alternatives. The actual kobjs for RX queues (struct netdev_rx_queue)
are allocated and deallocated by calls to net_rx_queue_update_kobjects,
which resizes RX queue kobjects when the netdev RX queues are resized.

Is this what you had in mind:
(1) Add a pointer to an attribute group to struct net_device, used for
    per-netdev rx queue attributes and initialized before the call to
    register_netdevice().
(2) Declare an attribute group containing the mergeable_rx_buffer_size
    attribute in virtio-net, and initialize the per-netdevice group pointer
    to the address of this group in virtnet_probe before register_netdevice
(3) In net-sysfs, modify net_rx_queue_update_kobjects
    (or rx_queue_add_kobject) to call sysfs_create_group on the
    per-netdev attribute group (if non-NULL), adding the attributes in
    the group to the RX queue kobject.

That should allow us to have per-RX queue attributes that are
device-specific. I'm not a sysfs expert, but it seems that rx_queue_ktype
and rx_queue_sysfs_ops presume that all rx queue sysfs operations are
performed on attributes of type rx_queue_attribute. That type will need
to be moved from net-sysfs.c to a header file like netdevice.h so that
the type can be used in virtio-net when we declare the
mergeable_rx_buffer_size attribute.

The last issue is how the rx_queue_attribute 'show' function
implementation for mergeable_rx_buffer_size will access the appropriate
per-receive queue EWMA data. The arguments to the show function will be
the netdev_rx_queue and the attribute itself. We can get to the
struct net_device from the netdev_rx_queue.  If we extended
netdev_rx_queue to indicate the queue_index or to store a void *priv_data
pointer, that would be sufficient to allow us to resolve this issue.

Please let me know if the above sounds good or if you see a better way
to accomplish this goal. Thanks!

Best,

Mike

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next v3 2/9] xen-netback: Change TX path from grant copy to mapping
From: Zoltan Kiss @ 2014-01-12 23:19 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, xen-devel, jonathan.davies, ian.campbell, linux-kernel,
	netdev
In-Reply-To: <52D0198C.6000600@citrix.com>

On 10/01/14 16:02, David Vrabel wrote:
> On 10/01/14 15:24, Zoltan Kiss wrote:
> If the grant table code doesn't provide the API calls you need you can
> either:
>
> a) add the new API as a prerequisite patch.
> b) use the existing API calls and live with the performance problem,
> until you can refactor the API later on.
>
> Adding a netback-specific hack isn't a valid option.
>
> David

Ok, I've sent in the patch which does a)

Zoli

^ permalink raw reply

* Re: [PATCH net-next 0/3] bonding: cleanup bond_3ad.c
From: David Miller @ 2014-01-12 22:44 UTC (permalink / raw)
  To: vfalico; +Cc: netdev, fubar, andy
In-Reply-To: <20140112.143621.1551326776001236706.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Sun, 12 Jan 2014 14:36:21 -0800 (PST)

> I'll apply them again, thanks for noticing.

They should really be there now, let me know if there are any
problems.

^ permalink raw reply

* Re: [PATCH net-next 0/3] bonding: cleanup bond_3ad.c
From: David Miller @ 2014-01-12 22:36 UTC (permalink / raw)
  To: vfalico; +Cc: netdev, fubar, andy
In-Reply-To: <20140112132719.GA26242@redhat.com>

From: Veaceslav Falico <vfalico@redhat.com>
Date: Sun, 12 Jan 2014 14:27:19 +0100

> On Fri, Jan 10, 2014 at 06:06:14PM -0500, David Miller wrote:
>>From: Veaceslav Falico <vfalico@redhat.com>
>>Date: Wed,  8 Jan 2014 16:46:45 +0100
>>
>>> It's a huge mess there currently - and, thus, really hard to read and
>>> debug.
>>>
>>> This is the first series, and doesn't change the logic at all, only
>>> makes
>>> it a bit more readable.
>>>
>>> CC: Jay Vosburgh <fubar@us.ibm.com>
>>> CC: Andy Gospodarek <andy@greyhouse.net>
>>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>>
>>Series applied, thanks.
> 
> Hrm, still can't see them. Did I miss something?..

Ho hum, if they aren't in the tree (I can't find them either) then they
are sitting locally on a machine on the opposite coast from where I am
right now and to which I do not have access for the next month :-)

I'll apply them again, thanks for noticing.

^ permalink raw reply

* Re: [PATCH 0/5] Assorted mvneta fixes
From: Willy Tarreau @ 2014-01-12 22:22 UTC (permalink / raw)
  To: Arnaud Ebalard
  Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT, Eric Dumazet
In-Reply-To: <87ob3hovkf.fsf@natisbad.org>

Hi Arnaud,

On Sun, Jan 12, 2014 at 08:21:20PM +0100, Arnaud Ebalard wrote:
> Hi,
> 
> Willy Tarreau <w@1wt.eu> writes:
> 
> > this series provides some fixes for a number of issues met with the
> > mvneta driver :
> >
> >   - driver lockup when reading stats while sending traffic from multiple
> >     CPUs : this obviously only happens on SMP and is the result of missing
> >     locking on the driver. The problem was present since the introduction
> >     of the driver in 3.8. The first patch performs some changes that are
> >     needed for the second one which actually fixes the issue by using
> >     per-cpu counters. It could make sense to backport this to the relevant
> >     stable versions.
> >
> >   - mvneta_tx_timeout calls various functions to reset the NIC, and these
> >     functions sleep, which is not allowed here, resulting in a panic.
> >     Better completely disable this Tx timeout handler for now since it is
> >     never called. The problem was encountered while developing some new
> >     features, it's uncertain whether it's possible to reproduce it with
> >     regular usage, so maybe a backport to stable is not needed.
> >
> >   - replace the Tx timer with a real Tx IRQ. As first reported by Arnaud
> >     Ebalard and explained by Eric Dumazet, there is no way this driver
> >     can work correctly if it uses a driver to recycle the Tx descriptors.
> >     If too many packets are sent at once, the driver quickly ends up with
> >     no descriptors (which happens twice as easily in GSO) and has to wait
> >     10ms for recycling its descriptors and being able to send again. Eric
> >     has worked around this in the core GSO code. But still when routing
> >     traffic or sending UDP packets, the limitation is very visible. Using
> >     Tx IRQs allows Tx descriptors to be recycled when sent. The coalesce
> >     value is still configurable using ethtool. This fix turns the UDP
> >     send bitrate from 134 Mbps to 987 Mbps (ie: line rate). It's made of
> >     two patches, one to add the relevant bits from the original Marvell's
> >     driver, and another one to implement the change. I don't know if it
> >     should be backported to stable, as the bug only causes poor performance.
> 
> First, thanks a lot for that work!
> 
> Funny enough, I spent some time this week-end trying to find the root
> cause of some kernel freezes and panics appearing randomly after some GB
> read on a ReadyNAS 102 configured as a NFS server. 
> 
> I tested your fixes and performance series together on top of current
> 3.13.0-rc7 and I am now unable to reproduce the freeze and panics after
> having read more than the 300GB of traffic from the NAS: following
> bandwith with a bwm-ng shows the rate is also far more stable than w/
> previous driver logic (55MB/sec). So, FWIW:
> 
> Tested-by: Arnaud Ebalard <arno@natisbad.org>

Thanks for this.

BTW, the "performance" series is not supposed to fix anything, and still
it seems difficult to me to find what patch might have fixed your problem.
Maybe the timer used in place of an IRQ has an even worse effect than what
we could imagine ?

> Willy, I can extend the test to RN2120 if you think it is useful to also
> do additional tests on a dual-core armada XP.

It's up to you. These patches have run extensively on my Mirabox (Armada370),
OpenBlocks AX3 (ArmadaXP dual core) and the XP-GP board (ArmadaXP quad core),
and fixed the stability issues and performance issues I was facing there. But
you may be interested in testing them with your workloads (none of my boxes
is used as an NFS server, NAS or whatever, they mainly see HTTP and very small
packets used in stress tests).

> Now, just in case someone on netdev can find something useful in the
> panics I gathered before your set, I have added those below. The fact
> that the bugs have disappeared with your set would tend to confirm that
> it was in the driver but at some point during the tests, I suspected the
> TCP stack when the device is stressed (NFS seems to do that very well on
> a 1.2GHz/512MB device). I did the following tests:
> 
> - on a 3.12.5, 3.13.0-rc4 and rc7 on a ReadyNAS 102
> - 3.13.0-rc7 on a RN2120 (dual-core Armada XP w/ mvneta and 2GB of RAM):
>   no issue seen after 200GB of traffic transfered
> - 3.13.0-rc7 on a Duo v2 (kirkwood 88F6282 @ 1.6GHz w/ mv643xx_eth): no
>   issue 

To be completely transparent, I've already faced some panics on the mirabox
during high speed testing (when trying to send 1.488 Mpps on the two gig
ports in parallel). But I've always suspected a power supply issue and never
dug deeper. I've also read some instability reports on some mirabox, so it's
pretty possible that some design rules for the armada370 are not perfectly
respected or too hard to apply and that we seldom meet hardware issues.

Cheers,
Willy

^ permalink raw reply

* [PATCH] dm9601: add USB IDs for new dm96xx variants
From: Peter Korsgaard @ 2014-01-12 22:15 UTC (permalink / raw)
  To: netdev, davem; +Cc: joseph_chang, Peter Korsgaard

A number of new dm96xx variants now exist.

Reported-by: Joseph Chang <joseph_chang@davicom.com.tw>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
---
 drivers/net/usb/dm9601.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c
index 14aa48f..e802198 100644
--- a/drivers/net/usb/dm9601.c
+++ b/drivers/net/usb/dm9601.c
@@ -614,6 +614,18 @@ static const struct usb_device_id products[] = {
 	 USB_DEVICE(0x0a46, 0x9621),	/* DM9621A USB to Fast Ethernet Adapter */
 	 .driver_info = (unsigned long)&dm9601_info,
 	},
+	{
+	 USB_DEVICE(0x0a46, 0x9622),	/* DM9622 USB to Fast Ethernet Adapter */
+	 .driver_info = (unsigned long)&dm9601_info,
+	},
+	{
+	 USB_DEVICE(0x0a46, 0x0269),	/* DM962OA USB to Fast Ethernet Adapter */
+	 .driver_info = (unsigned long)&dm9601_info,
+	},
+	{
+	 USB_DEVICE(0x0a46, 0x1269),	/* DM9621A USB to Fast Ethernet Adapter */
+	 .driver_info = (unsigned long)&dm9601_info,
+	},
 	{},			// END
 };
 
-- 
1.8.5.2

^ permalink raw reply related

* Re: [PATCH 3/5] net: mvneta: do not schedule in mvneta_tx_timeout
From: Willy Tarreau @ 2014-01-12 22:14 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <1389548333.3720.73.camel@deadeye.wl.decadent.org.uk>

Hi Ben,

On Sun, Jan 12, 2014 at 05:38:53PM +0000, Ben Hutchings wrote:
> [Putting another hat on]
> 
> On Sun, 2014-01-12 at 17:55 +0100, Willy Tarreau wrote:
> > Hi Ben,
> > 
> > On Sun, Jan 12, 2014 at 04:49:51PM +0000, Ben Hutchings wrote:
> > (...)
> > > > So for now, let's simply ignore these timeouts generally caused by bugs
> > > > only.
> > > 
> > > No, don't ignore them.  Schedule a work item to reset the device.  (And
> > > remember to cancel it when stopping the device.)
> > 
> > OK I can try to do that. Could you recommend me one driver which does this
> > successfully so that I can see exactly what needs to be taken care of ?
> 
> sfc does it, though the reset logic there is more complicated than you
> would need.

OK.

> I think this will DTRT, but it's compile-tested only.

OK, I'll test it ASAP. I think I can force the tx timeout by disabling
the link state detection and unplugging the cable during a transfer.

> I have been given an OpenBlocks AX3 but haven't set it up yet.

Ah you're another lucky owner of this really great device :-)

I've sent Eric Leblond a complete howto in french, so it won't be of
a big use to you but if I can find some time and you don't find other
info, I can try to redo it in english. However you may be interested
in this article I put online with a few patches to make your life
easier :

   http://1wt.eu/articles/openblocks-http-server/

It's a line-rate (1.488 Mpps) HTTP server I've done on it with a few
patches that may be useful for other network tests.

Cheers,
Willy

^ permalink raw reply

* Re: [PATCH 2/5] net: mvneta: use per_cpu stats to fix an SMP lock up
From: Willy Tarreau @ 2014-01-12 22:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <1389550056.31367.186.camel@edumazet-glaptop2.roam.corp.google.com>

Hi Eric!

On Sun, Jan 12, 2014 at 10:07:36AM -0800, Eric Dumazet wrote:
> On Sun, 2014-01-12 at 10:31 +0100, Willy Tarreau wrote:
> > Stats writers are mvneta_rx() and mvneta_tx(). They don't lock anything
> > when they update the stats, and as a result, it randomly happens that
> > the stats freeze on SMP if two updates happen during stats retrieval.
> 
> Your patch is OK, but I dont understand how this freeze can happen.
> 
> TX and RX uses a separate syncp, and TX is protected by a lock, RX
> is protected by NAPI bit.

But we can have multiple tx in parallel, one per queue. And it's only
when I explicitly bind two servers to two distinct CPU cores that I
can trigger the issue, which seems to confirm that this is the cause
of the issue.

> Stats retrieval uses the appropriate BH disable before the fetches...

>From the numerous printks I have added inside the syncp blocks, it
appears that the stats themselves are not responsible for the issue,
but the concurrent Tx are. I ended up several times stuck if I had
two Tx on different CPUs right before a stats retrieval. From the
info I found on the syncp docs, the caller is responsible for locking
and I don't see where there's any lock here since the syncp are global
and not even per tx queue.

But this stuff is very new to me, I can have missed something. That
said, I'm quite certain that the lock happened within the syncp blocks
and only in this case! At least my reading of the relevant includes
seemed to confirm to me that this hypothesis was valid :-/

Thanks,
Willy

^ permalink raw reply

* Re: Use of ENOTSUPP in drivers?
From: Ben Hutchings @ 2014-01-12 21:39 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Sabrina Dubroca, netdev
In-Reply-To: <1408352.Qio0cxKTUV@lenovo>

On Sun, 2014-01-12 at 13:19 -0800, Florian Fainelli wrote:
> Le dimanche 12 janvier 2014, 20:47:02 Ben Hutchings a écrit :
> > On Sun, 2014-01-12 at 19:57 +0100, Sabrina Dubroca wrote:
> > > Thu, 2 Jan 2014 12:01:31 +0000, Ben Hutchings wrote:
> > > > Never return error code ENOTSUPP; it's *not* the same thing as ENOTSUP
> > > > in userland and is not part of the userland ABI.  I would use EINVAL
> > > > here.
> > > 
> > > I've found a few ethernet drivers that return -ENOTSUPP in various
> > > functions. In particular, some ethtool functions or ioctl's.
> > > Ben's message makes me think that the ethtool functions and ioctl's
> > > should be modified.
> > > 
> > > There are other occurences, mostly in functions related to device
> > > initialization. I didn't manage to track down exactly from where some
> > > of them are called, and I don't know if ENOTSUPP is okay in these.
> > > 
> > > I've included the complete list of occurences (based on net-next) from
> > > drivers/net/ethernet in patch form at the end, if that's more
> > > convenient than the file/function list. This is not meant to be
> > > applied.
> > > 
> > > 
> > > Do these (or part of them) need to be patched? Or is there something
> > > I'm missing?
> > 
> > [...]
> > 
> > I believe they should all be patched.  According to
> > include/linux/errno.h, ENOTSUPP is meant for use in the NFSv3 code only.
> > (But it's apparently erroneously used *all over* the tree, not just in
> > net drivers!)
> 
> Most other drivers use -EOPNOTSUPP, which is arguably as bad as -ENOTSUPP, 
> since the comment about it says:
> 
> * Operation not supported on transport endpoint *
> 
> But at least changing -ENOTSUPP to -EOPNOTSUPP until something better which is 
> not protocol/endpoint specific is agreed on might be better for consistency?

It is not as bad, because EOPNOTSUPP is actually named in userland (in
fact it has two names, the other being ENOTSUP spelt with *one* P).

glibc's strerror() returns these strings for ENOTSUPP and EOPNOTSUPP
respectively:

"Unknown error 524"
"Operation not supported"

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Use of ENOTSUPP in drivers?
From: Florian Fainelli @ 2014-01-12 21:19 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Sabrina Dubroca, netdev
In-Reply-To: <1389559622.3720.115.camel@deadeye.wl.decadent.org.uk>

Le dimanche 12 janvier 2014, 20:47:02 Ben Hutchings a écrit :
> On Sun, 2014-01-12 at 19:57 +0100, Sabrina Dubroca wrote:
> > Thu, 2 Jan 2014 12:01:31 +0000, Ben Hutchings wrote:
> > > Never return error code ENOTSUPP; it's *not* the same thing as ENOTSUP
> > > in userland and is not part of the userland ABI.  I would use EINVAL
> > > here.
> > 
> > I've found a few ethernet drivers that return -ENOTSUPP in various
> > functions. In particular, some ethtool functions or ioctl's.
> > Ben's message makes me think that the ethtool functions and ioctl's
> > should be modified.
> > 
> > There are other occurences, mostly in functions related to device
> > initialization. I didn't manage to track down exactly from where some
> > of them are called, and I don't know if ENOTSUPP is okay in these.
> > 
> > I've included the complete list of occurences (based on net-next) from
> > drivers/net/ethernet in patch form at the end, if that's more
> > convenient than the file/function list. This is not meant to be
> > applied.
> > 
> > 
> > Do these (or part of them) need to be patched? Or is there something
> > I'm missing?
> 
> [...]
> 
> I believe they should all be patched.  According to
> include/linux/errno.h, ENOTSUPP is meant for use in the NFSv3 code only.
> (But it's apparently erroneously used *all over* the tree, not just in
> net drivers!)

Most other drivers use -EOPNOTSUPP, which is arguably as bad as -ENOTSUPP, 
since the comment about it says:

* Operation not supported on transport endpoint *

But at least changing -ENOTSUPP to -EOPNOTSUPP until something better which is 
not protocol/endpoint specific is agreed on might be better for consistency?
-- 
Florian

^ permalink raw reply

* Re: [PATCH V2 0/4] misc: xgene: Add support for APM X-Gene SoC Queue Manager/Traffic Manager
From: Arnd Bergmann @ 2014-01-12 21:19 UTC (permalink / raw)
  To: Ravi Patel
  Cc: devicetree@vger.kernel.org, Jon Masters, Greg KH, patches@apm.com,
	linux-kernel, Loc Ho, netdev, Keyur Chudgar, davem,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <CAN1v_PuySRzi6r9aT_SJrQRMBh9gj-5yXOOY1QpyZ6V4wXyZCA@mail.gmail.com>

On Friday 10 January 2014, Ravi Patel wrote:

> Do you want any further clarification or document related to QMTM.
> We want to make sure everyone is on same page, understand and
> conclude upon that QMTM is a device and and not a bus or a dma
> engine.

I have a much better understanding now, but there are still a few open
questions from my side. Let me try to explain in my own words what I
think is the relevant information (part of this is still guessing).
It took me a while to figure out what it does from your description,
and then some more time to see what it's actually good for (as
opposed to adding complexity).

Please confirm or correct the individual statements in this
description:

The QMTM serves as a relay for short (a few bytes) messages between
the OS software and various slave hardware blocks on the SoC.
The messages are typically but not always DMA descriptors used by the
slave device used for starting bus master transactions by the slave,
or for notifying sofware about the competion of a DMA transaction.

The message format is specific to the slave device and the QMTM
only understands the common header of the message.

OS software sees the messages in cache-coherent memory and does
not require any cache flushes or MMIO access for inbound messages
and only a single posted MMIO write for outbound messages.

The queues are likely designed to be per-thread and don't require
software-side locking.

For outbound messages, the QMTM is the bus master of a device-to-device
DMA transaction that gets started once a message is queued and the
device has signaled that it is ready for receiving it. The QMTM needs
to know the bus address of the device as well as a slave ID for
the signal pin.
For inbound messages, the QMTM slave initiates a busmaster transaction
and needs to know the bus address of its QMTM port, while the QMTM
needs to know only the slave ID that is associated with the queue.

In addition to those hardware properties, the QMTM driver needs to
set up a memory buffer for the message queue as seen by the CPU,
and needs tell the QMTM the location as well as some other
properties such as the message length.

For inbound messages, the QMTM serves a similar purpose as an MSI
controller, ensuring that inbound DMA data has arrived in RAM
before an interrupt is delivered to the CPU and thereby avoiding
the need for an expensive MMIO read to serialize the DMA.

The resources managed by the QMTM are both SoC-global (e.g. bus
bandwidth) and slave specific (e.g. ethernet bandwith or buffer space).
Global resource management is performed to prevent one slave
device from monopolizing the system or preventing other slaves
from making forward progress.
Examples for local resource management (I had to think about this 
a long time, but probably some of these are wrong) would be
* balancing between multiple non-busmaster devices connected to
  a dma-engine
* distributing incoming ethernet data to the available CPUs based on
  a flow classifier in the MAC, e.g. by IOV MAC address, VLAN tag
  or even individual TCP connection depending on the NIC's capabilities.
* 802.1p flow control for incoming ethernet data based on the amount
  of data queued up between the MAC and the driver
* interrupt mitigation for both inbound data and outbound completion,
  by delaying the IRQ to the OS until multiple messages have arrived
  or a queue specific amount of time has passed.
* controlling the amount of outbound buffer space per flow to minimize
  buffer-bloat between an ethernet driver and the NIC hardware.
* reordering data from outbound flows based on priority.

This is basically my current interpretation, I hope I got at least
some of it right this time ;-)

	Arnd

^ permalink raw reply

* [PATCH 3/3 V2] b43legacy: Fix unload oops if firmware is not available
From: Larry Finger @ 2014-01-12 21:11 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev, Stable
In-Reply-To: <1389561099-17181-1-git-send-email-Larry.Finger@lwfinger.net>

The asyncronous firmware load uses a completion struct to hold firmware
processing until the user-space routines are up and running. There is.
however, a problem in that the waiter is nevered canceled during teardown.
As a result, unloading the driver when firmware is not available causes an oops.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org>
---
V2 - Addresses the comments of Ben Hutchings and removes the race condition.

 drivers/net/wireless/b43legacy/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/b43legacy/main.c b/drivers/net/wireless/b43legacy/main.c
index 5726688..349c776 100644
--- a/drivers/net/wireless/b43legacy/main.c
+++ b/drivers/net/wireless/b43legacy/main.c
@@ -3919,6 +3919,7 @@ static void b43legacy_remove(struct ssb_device *dev)
 	 * as the ieee80211 unreg will destroy the workqueue. */
 	cancel_work_sync(&wldev->restart_work);
 	cancel_work_sync(&wl->firmware_load);
+	complete(&wldev->fw_load_complete);
 
 	B43legacy_WARN_ON(!wl);
 	if (!wldev->fw.ucode)
-- 
1.8.4

^ permalink raw reply related

* [PATCH 2/3 V2] b43: Fix unload oops if firmware is not available
From: Larry Finger @ 2014-01-12 21:11 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev, Stable
In-Reply-To: <1389561099-17181-1-git-send-email-Larry.Finger@lwfinger.net>

The asyncronous firmware load uses a completion struct to hold firmware
processing until the user-space routines are up and running. There is.
however, a problem in that the waiter is nevered canceled during teardown.
As a result, unloading the driver when firmware is not available causes an oops.

To be able to access the completion structure at teardown, it had to be moved
into the b43_wldev structure.

This patch also fixes a typo in a comment.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org>
---

V2 - Addresses the comments of Ben Hutchings and removes the race condition.

 drivers/net/wireless/b43/b43.h  |  4 ++--
 drivers/net/wireless/b43/main.c | 10 +++++-----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/b43/b43.h b/drivers/net/wireless/b43/b43.h
index 7f3d461..54376fd 100644
--- a/drivers/net/wireless/b43/b43.h
+++ b/drivers/net/wireless/b43/b43.h
@@ -731,8 +731,6 @@ enum b43_firmware_file_type {
 struct b43_request_fw_context {
 	/* The device we are requesting the fw for. */
 	struct b43_wldev *dev;
-	/* a completion event structure needed if this call is asynchronous */
-	struct completion fw_load_complete;
 	/* a pointer to the firmware object */
 	const struct firmware *blob;
 	/* The type of firmware to request. */
@@ -809,6 +807,8 @@ enum {
 struct b43_wldev {
 	struct b43_bus_dev *dev;
 	struct b43_wl *wl;
+	/* a completion event structure needed if this call is asynchronous */
+	struct completion fw_load_complete;
 
 	/* The device initialization status.
 	 * Use b43_status() to query. */
diff --git a/drivers/net/wireless/b43/main.c b/drivers/net/wireless/b43/main.c
index 86b2030..c75237e 100644
--- a/drivers/net/wireless/b43/main.c
+++ b/drivers/net/wireless/b43/main.c
@@ -2070,6 +2070,7 @@ void b43_do_release_fw(struct b43_firmware_file *fw)
 
 static void b43_release_firmware(struct b43_wldev *dev)
 {
+	complete(&dev->fw_load_complete);
 	b43_do_release_fw(&dev->fw.ucode);
 	b43_do_release_fw(&dev->fw.pcm);
 	b43_do_release_fw(&dev->fw.initvals);
@@ -2095,7 +2096,7 @@ static void b43_fw_cb(const struct firmware *firmware, void *context)
 	struct b43_request_fw_context *ctx = context;
 
 	ctx->blob = firmware;
-	complete(&ctx->fw_load_complete);
+	complete(&ctx->dev->fw_load_complete);
 }
 
 int b43_do_request_fw(struct b43_request_fw_context *ctx,
@@ -2142,7 +2143,7 @@ int b43_do_request_fw(struct b43_request_fw_context *ctx,
 	}
 	if (async) {
 		/* do this part asynchronously */
-		init_completion(&ctx->fw_load_complete);
+		init_completion(&ctx->dev->fw_load_complete);
 		err = request_firmware_nowait(THIS_MODULE, 1, ctx->fwname,
 					      ctx->dev->dev->dev, GFP_KERNEL,
 					      ctx, b43_fw_cb);
@@ -2150,12 +2151,11 @@ int b43_do_request_fw(struct b43_request_fw_context *ctx,
 			pr_err("Unable to load firmware\n");
 			return err;
 		}
-		/* stall here until fw ready */
-		wait_for_completion(&ctx->fw_load_complete);
+		wait_for_completion(&ctx->dev->fw_load_complete);
 		if (ctx->blob)
 			goto fw_ready;
 	/* On some ARM systems, the async request will fail, but the next sync
-	 * request works. For this reason, we dall through here
+	 * request works. For this reason, we fall through here
 	 */
 	}
 	err = request_firmware(&ctx->blob, ctx->fwname,
-- 
1.8.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox