Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v2 3/6] ethernet: apm: xgene: remove unnecessary check
From: Varka Bhadram @ 2014-10-23  1:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, Varka Bhadram
In-Reply-To: <1414029531-5067-1-git-send-email-varkab@cdac.in>

devm_ioremap_resource checks platform_get_resource() return value.
We can remove the duplicate check here.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c |   12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 3c208cc..f226594 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -761,10 +761,6 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	ndev = pdata->ndev;
 
 	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "enet_csr");
-	if (!res) {
-		dev_err(dev, "Resource enet_csr not defined\n");
-		return -ENODEV;
-	}
 	pdata->base_addr = devm_ioremap_resource(dev, res);
 	if (IS_ERR(pdata->base_addr)) {
 		dev_err(dev, "Unable to retrieve ENET Port CSR region\n");
@@ -772,10 +768,6 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	}
 
 	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ring_csr");
-	if (!res) {
-		dev_err(dev, "Resource ring_csr not defined\n");
-		return -ENODEV;
-	}
 	pdata->ring_csr_addr = devm_ioremap_resource(dev, res);
 	if (IS_ERR(pdata->ring_csr_addr)) {
 		dev_err(dev, "Unable to retrieve ENET Ring CSR region\n");
@@ -783,10 +775,6 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	}
 
 	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ring_cmd");
-	if (!res) {
-		dev_err(dev, "Resource ring_cmd not defined\n");
-		return -ENODEV;
-	}
 	pdata->ring_cmd_addr = devm_ioremap_resource(dev, res);
 	if (IS_ERR(pdata->ring_cmd_addr)) {
 		dev_err(dev, "Unable to retrieve ENET Ring command region\n");
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next v2 2/6] ethernet: wiznet: remove unnecessary check
From: Varka Bhadram @ 2014-10-23  1:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, Varka Bhadram
In-Reply-To: <1414029531-5067-1-git-send-email-varkab@cdac.in>

devm_ioremap_resource checks platform_get_resource() return value.
We can remove the duplicate check here.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
---
 drivers/net/ethernet/wiznet/w5300.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/wiznet/w5300.c b/drivers/net/ethernet/wiznet/w5300.c
index f961f14..7974b7d 100644
--- a/drivers/net/ethernet/wiznet/w5300.c
+++ b/drivers/net/ethernet/wiznet/w5300.c
@@ -558,14 +558,12 @@ static int w5300_hw_probe(struct platform_device *pdev)
 	}
 
 	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	if (!mem)
-		return -ENXIO;
-	mem_size = resource_size(mem);
-
 	priv->base = devm_ioremap_resource(&pdev->dev, mem);
 	if (IS_ERR(priv->base))
 		return PTR_ERR(priv->base);
 
+	mem_size = resource_size(mem);
+
 	spin_lock_init(&priv->reg_lock);
 	priv->indirect = mem_size < W5300_BUS_DIRECT_SIZE;
 	if (priv->indirect) {
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next v2 1/6] ethernet: wiznet: remove unnecessary check
From: Varka Bhadram @ 2014-10-23  1:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, Varka Bhadram
In-Reply-To: <1414029531-5067-1-git-send-email-varkab@cdac.in>

devm_ioremap_resource checks platform_get_resource() return value.
We can remove the duplicate check here.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
---
 drivers/net/ethernet/wiznet/w5100.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/wiznet/w5100.c b/drivers/net/ethernet/wiznet/w5100.c
index 0f56b1c..70a930a 100644
--- a/drivers/net/ethernet/wiznet/w5100.c
+++ b/drivers/net/ethernet/wiznet/w5100.c
@@ -638,14 +638,12 @@ static int w5100_hw_probe(struct platform_device *pdev)
 	}
 
 	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	if (!mem)
-		return -ENXIO;
-	mem_size = resource_size(mem);
-
 	priv->base = devm_ioremap_resource(&pdev->dev, mem);
 	if (IS_ERR(priv->base))
 		return PTR_ERR(priv->base);
 
+	mem_size = resource_size(mem);
+
 	spin_lock_init(&priv->reg_lock);
 	priv->indirect = mem_size < W5100_BUS_DIRECT_SIZE;
 	if (priv->indirect) {
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next v2 0/6] cleanup on resource check
From: Varka Bhadram @ 2014-10-23  1:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, Varka Bhadram

This series removes the duplication of sanity check for
platform_get_resource() return resource. It will be checked 
with devm_ioremap_resource()

changes since v1:
	- remove NULL dereference on resource_size()

Varka Bhadram (6):
  ethernet: wiznet: remove unnecessary check
  ethernet: wiznet: remove unnecessary check
  ethernet: apm: xgene: remove unnecessary check
  ethernet: marvell: remove unnecessary check
  ethernet: renesas: remove unnecessary check
  ethernet: samsung: sxgbe: remove unnecessary check

 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   |   12 ------------
 drivers/net/ethernet/marvell/pxa168_eth.c          |    6 ++----
 drivers/net/ethernet/renesas/sh_eth.c              |    9 +++------
 .../net/ethernet/samsung/sxgbe/sxgbe_platform.c    |    3 ---
 drivers/net/ethernet/wiznet/w5100.c                |    6 ++----
 drivers/net/ethernet/wiznet/w5300.c                |    6 ++----
 6 files changed, 9 insertions(+), 33 deletions(-)

-- 
1.7.9.5

^ permalink raw reply

* Re: irq disable in __netdev_alloc_frag() ?
From: Eric Dumazet @ 2014-10-23  1:52 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Eric Dumazet, Network Development
In-Reply-To: <CAMEtUuwsUqd-U8ZSEXCB+a7cvLpbuRjPU2m0Ux84q6cxoWSx+g@mail.gmail.com>

On Wed, 2014-10-22 at 17:15 -0700, Alexei Starovoitov wrote:
> Hi Eric,
> 
> in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()")
> you mentioned that the reason to disable interrupts
> in __netdev_alloc_frag() is:
> "- Must be IRQ safe (non NAPI drivers can use it)"
> 
> Is there a way to do this conditionally?
> 
> Without it I see 10% performance gain for my RX tests
> (from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag()
> itself goes from 6.6% to 2.1%
> (popf seems to be quite costly)

Well, your driver is probably a NAPI one, so you need to
mask irqs, or to remove all non NAPI drivers from linux.

__netdev_alloc_frag() (__netdev_alloc_skb()) is used by all.

Problem is __netdev_alloc_frag() is generally deep inside caller
chain, so using a private pool might have quite an overhead.

Same could be said for skb_queue_head() /skb_queue_tail() /
sock_queue_rcv_skb() :
Many callers don't need to block irq.

^ permalink raw reply

* Re: [RFC] tcp md5 use of alloc_percpu
From: Eric Dumazet @ 2014-10-23  1:47 UTC (permalink / raw)
  To: Crestez Dan Leonard; +Cc: Jonathan Toppins, netdev
In-Reply-To: <54485337.5040108@gmail.com>

On Thu, 2014-10-23 at 04:00 +0300, Crestez Dan Leonard wrote:
> On 10/23/2014 02:38 AM, Jonathan Toppins wrote:
> > On 10/22/14, 2:55 PM, Crestez Dan Leonard wrote:
> >> sg_init_one does virt_addr on the pointer which assumes it is directly accessible. But the tcp_md5sig_pool pointer comes from alloc_percpu which can return memory from the vmalloc area after the pcpu_first_chunk is exhausted. This looks wrong to me. I'm am getting crashes on mips and I believe this to be the cause.
> >
> > Thinking about this more if the issue really is sg_init_one assumes a
> > directly accessible memory region, can we just modify the zone
> > allocation to GFP_DMA using alloc_percpu_gfp()? Does this satisfy the
> > assumptions made by sg_init_one?
> I don't think that alloc_percpu_gfp can be used that way. Looking at the 
> code it only checks for GFP_KERNEL and behaves "atomically" if it is not 
> present. This means that it fails rather than vmalloc a new percpu_chunk.
> 
> The problem is not that the memory is not allocated with GFP_DMA but 
> rather that the memory is allocated with vmalloc.

Could you try the following patch ?

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1bec4e76d88c..d253ad8ced64 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2868,30 +2868,29 @@ EXPORT_SYMBOL(compat_tcp_getsockopt);
 #endif
 
 #ifdef CONFIG_TCP_MD5SIG
-static struct tcp_md5sig_pool __percpu *tcp_md5sig_pool __read_mostly;
+static DEFINE_PER_CPU(struct tcp_md5sig_pool, tcp_md5sig_pool);
 static DEFINE_MUTEX(tcp_md5sig_mutex);
+static bool tcp_md5sig_pool_populated = false;
 
-static void __tcp_free_md5sig_pool(struct tcp_md5sig_pool __percpu *pool)
+static void tcp_free_md5sig_pool(void)
 {
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		struct tcp_md5sig_pool *p = per_cpu_ptr(pool, cpu);
+		struct crypto_hash *hash;
+
+		hash = per_cpu(tcp_md5sig_pool, cpu).md5_desc.tfm;
 
-		if (p->md5_desc.tfm)
-			crypto_free_hash(p->md5_desc.tfm);
+		if (hash) {
+			crypto_free_hash(hash);
+			per_cpu(tcp_md5sig_pool, cpu).md5_desc.tfm = NULL;
+		}
 	}
-	free_percpu(pool);
 }
 
 static void __tcp_alloc_md5sig_pool(void)
 {
 	int cpu;
-	struct tcp_md5sig_pool __percpu *pool;
-
-	pool = alloc_percpu(struct tcp_md5sig_pool);
-	if (!pool)
-		return;
 
 	for_each_possible_cpu(cpu) {
 		struct crypto_hash *hash;
@@ -2900,29 +2899,29 @@ static void __tcp_alloc_md5sig_pool(void)
 		if (IS_ERR_OR_NULL(hash))
 			goto out_free;
 
-		per_cpu_ptr(pool, cpu)->md5_desc.tfm = hash;
+		per_cpu(tcp_md5sig_pool, cpu).md5_desc.tfm = hash;
 	}
-	/* before setting tcp_md5sig_pool, we must commit all writes
-	 * to memory. See ACCESS_ONCE() in tcp_get_md5sig_pool()
+	/* before setting tcp_md5sig_pool_populated, we must commit all writes
+	 * to memory. See smp_rmb() in tcp_get_md5sig_pool()
 	 */
 	smp_wmb();
-	tcp_md5sig_pool = pool;
+	tcp_md5sig_pool_populated = true;
 	return;
 out_free:
-	__tcp_free_md5sig_pool(pool);
+	tcp_free_md5sig_pool();
 }
 
 bool tcp_alloc_md5sig_pool(void)
 {
-	if (unlikely(!tcp_md5sig_pool)) {
+	if (unlikely(!tcp_md5sig_pool_populated)) {
 		mutex_lock(&tcp_md5sig_mutex);
 
-		if (!tcp_md5sig_pool)
+		if (!tcp_md5sig_pool_populated)
 			__tcp_alloc_md5sig_pool();
 
 		mutex_unlock(&tcp_md5sig_mutex);
 	}
-	return tcp_md5sig_pool != NULL;
+	return tcp_md5sig_pool_populated;
 }
 EXPORT_SYMBOL(tcp_alloc_md5sig_pool);
 
@@ -2936,13 +2935,13 @@ EXPORT_SYMBOL(tcp_alloc_md5sig_pool);
  */
 struct tcp_md5sig_pool *tcp_get_md5sig_pool(void)
 {
-	struct tcp_md5sig_pool __percpu *p;
-
 	local_bh_disable();
-	p = ACCESS_ONCE(tcp_md5sig_pool);
-	if (p)
-		return raw_cpu_ptr(p);
 
+	if (tcp_md5sig_pool_populated) {
+		/* coupled with smp_wmb() in __tcp_alloc_md5sig_pool */
+		smp_rmb();
+		return this_cpu_ptr(&tcp_md5sig_pool);
+	}
 	local_bh_enable();
 	return NULL;
 }

^ permalink raw reply related

* Re: [RFC] tcp md5 use of alloc_percpu
From: Crestez Dan Leonard @ 2014-10-23  1:00 UTC (permalink / raw)
  To: Jonathan Toppins; +Cc: netdev
In-Reply-To: <54483FF7.4090208@cumulusnetworks.com>

On 10/23/2014 02:38 AM, Jonathan Toppins wrote:
> On 10/22/14, 2:55 PM, Crestez Dan Leonard wrote:
>> sg_init_one does virt_addr on the pointer which assumes it is directly accessible. But the tcp_md5sig_pool pointer comes from alloc_percpu which can return memory from the vmalloc area after the pcpu_first_chunk is exhausted. This looks wrong to me. I'm am getting crashes on mips and I believe this to be the cause.
>
> Thinking about this more if the issue really is sg_init_one assumes a
> directly accessible memory region, can we just modify the zone
> allocation to GFP_DMA using alloc_percpu_gfp()? Does this satisfy the
> assumptions made by sg_init_one?
I don't think that alloc_percpu_gfp can be used that way. Looking at the 
code it only checks for GFP_KERNEL and behaves "atomically" if it is not 
present. This means that it fails rather than vmalloc a new percpu_chunk.

The problem is not that the memory is not allocated with GFP_DMA but 
rather that the memory is allocated with vmalloc.

Regards,
Leonard

^ permalink raw reply

* irq disable in __netdev_alloc_frag() ?
From: Alexei Starovoitov @ 2014-10-23  0:15 UTC (permalink / raw)
  To: Eric Dumazet, Network Development

Hi Eric,

in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()")
you mentioned that the reason to disable interrupts
in __netdev_alloc_frag() is:
"- Must be IRQ safe (non NAPI drivers can use it)"

Is there a way to do this conditionally?

Without it I see 10% performance gain for my RX tests
(from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag()
itself goes from 6.6% to 2.1%
(popf seems to be quite costly)

Thanks
Alexei

^ permalink raw reply

* Re: [PATCH 1/2] net: dsa: Error out on tagging protocol mismatches
From: Florian Fainelli @ 2014-10-22 23:46 UTC (permalink / raw)
  To: Andrew Lunn, davem; +Cc: netdev, alexander.h.duyck
In-Reply-To: <1414020918-20903-2-git-send-email-andrew@lunn.ch>

On 10/22/2014 04:35 PM, Andrew Lunn wrote:
> If there is a mismatch between enabled tagging protocols and the
> protocol the switch supports, error out, rather than continue with a
> situation which is unlikely to work.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> cc: alexander.h.duyck@intel.com
> ---
>  net/dsa/dsa.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 22f34cf4cb27..8a31bd81a315 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -175,7 +175,8 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
>  			break;
>  #endif
>  		default:
> -			break;
> +			ret = -ENOPROTOOPT;
> +			goto out;
>  		}

This prevents using a switch driver without tagging, which is something
that you might want to do (link setup, ethtool stats, EEE etc...).
--
Florian

^ permalink raw reply

* Re: [RFC] tcp md5 use of alloc_percpu
From: Jonathan Toppins @ 2014-10-22 23:38 UTC (permalink / raw)
  To: Crestez Dan Leonard, netdev
In-Reply-To: <5447FDB2.2010906@gmail.com>

On 10/22/14, 2:55 PM, Crestez Dan Leonard wrote:
> sg_init_one does virt_addr on the pointer which assumes it is directly accessible. But the tcp_md5sig_pool pointer comes from alloc_percpu which can return memory from the vmalloc area after the pcpu_first_chunk is exhausted. This looks wrong to me. I'm am getting crashes on mips and I believe this to be the cause.

Thinking about this more if the issue really is sg_init_one assumes a
directly accessible memory region, can we just modify the zone
allocation to GFP_DMA using alloc_percpu_gfp()? Does this satisfy the
assumptions made by sg_init_one?

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1bec4e7..6924320 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2889,7 +2889,7 @@ static void __tcp_alloc_md5sig_pool(void)
        int cpu;
        struct tcp_md5sig_pool __percpu *pool;

-       pool = alloc_percpu(struct tcp_md5sig_pool);
+       pool = alloc_percpu_gfp(struct tcp_md5sig_pool, GFP_DMA);
        if (!pool)
                return;

^ permalink raw reply related

* Careers via Adecco UK
From: Adecco UK @ 2014-10-22 23:37 UTC (permalink / raw)


Dear Expat,

Adecco is a recruitment provider that creates the opportunity for you
to live in the UK and work with some of the most exciting companies
that can take your career to the next level.

Our recruitment is vast, so regardless of your level of education and
industry, we will have your Resume forwarded to the appropriate
companies and considered for various openings.

Please send us your Resume in reply to this notice and let us help you
find a better job.

Regards,

Arlo Colston
Snr. Recruitment Specialist
Adecco UK

^ permalink raw reply

* [PATCH 1/2] net: dsa: Error out on tagging protocol mismatches
From: Andrew Lunn @ 2014-10-22 23:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andrew Lunn, alexander.h.duyck
In-Reply-To: <1414020918-20903-1-git-send-email-andrew@lunn.ch>

If there is a mismatch between enabled tagging protocols and the
protocol the switch supports, error out, rather than continue with a
situation which is unlikely to work.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
cc: alexander.h.duyck@intel.com
---
 net/dsa/dsa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 22f34cf4cb27..8a31bd81a315 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -175,7 +175,8 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
 			break;
 #endif
 		default:
-			break;
+			ret = -ENOPROTOOPT;
+			goto out;
 		}
 
 		dst->tag_protocol = drv->tag_protocol;
-- 
2.1.1

^ permalink raw reply related

* [PATCH 2/2] dsa: mv88e6171: Fix tagging protocol/Kconfig
From: Andrew Lunn @ 2014-10-22 23:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andrew Lunn
In-Reply-To: <1414020918-20903-1-git-send-email-andrew@lunn.ch>

The mv88e6171 can support two different tagging protocols, DSA and
EDSA. The switch driver structure only allows one protocol to be
enumerated, and DSA was chosen. However the Kconfig entry ensures the
EDSA tagging code is built. With a minimal configuration, we then end
up with a mismatch. The probe is successful, EDSA tagging is used, but
the switch is configured for DSA, resulting in mangled packets.

Change the switch driver structure to enumerate EDSA, fixing the
mismatch.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 42f272539487 ("net: DSA: Marvell mv88e6171 switch driver")
---
 drivers/net/dsa/mv88e6171.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index 1020a7af67cf..78d8e876f3aa 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -395,7 +395,7 @@ static int mv88e6171_get_sset_count(struct dsa_switch *ds)
 }

 struct dsa_switch_driver mv88e6171_switch_driver = {
-	.tag_protocol		= DSA_TAG_PROTO_DSA,
+	.tag_protocol		= DSA_TAG_PROTO_EDSA,
 	.priv_size		= sizeof(struct mv88e6xxx_priv_state),
 	.probe			= mv88e6171_probe,
 	.setup			= mv88e6171_setup,
-- 
2.1.1

^ permalink raw reply related

* [PATCH 0/2] DSA tagging mismatches
From: Andrew Lunn @ 2014-10-22 23:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andrew Lunn

The second patch is a fix, which should be applied to -rc. It is
possible to get a DSA configuration which does not work. The patch
stops this happening.

The first patch detects this situation, and errors out the probe of
DSA, making it more obvious something is wrong. It is not required to
apply it -rc.

Andrew Lunn (2):
  net: dsa: Error out on tagging protocol mismatches
  dsa: mv88e6171: Fix tagging protocol/Kconfig

 drivers/net/dsa/mv88e6171.c | 2 +-
 net/dsa/dsa.c               | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
2.1.1

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Paul E. McKenney @ 2014-10-22 23:24 UTC (permalink / raw)
  To: Yanko Kaneti
  Cc: Josh Boyer, Eric W. Biederman, Cong Wang, Kevin Fenzi, netdev,
	Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <20141022224032.GA1240@declera.com>

On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti wrote:
> On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:

[ . . . ]

> > > Don't get me wrong -- the fact that this kthread appears to have
> > > blocked within rcu_barrier() for 120 seconds means that something is
> > > most definitely wrong here.  I am surprised that there are no RCU CPU
> > > stall warnings, but perhaps the blockage is in the callback execution
> > > rather than grace-period completion.  Or something is preventing this
> > > kthread from starting up after the wake-up callback executes.  Or...
> > >
> > > Is this thing reproducible?
> > 
> > I've added Yanko on CC, who reported the backtrace above and can
> > recreate it reliably.  Apparently reverting the RCU merge commit
> > (d6dd50e) and rebuilding the latest after that does not show the
> > issue.  I'll let Yanko explain more and answer any questions you have.
> 
> - It is reproducible
> - I've done another build here to double check and its definitely the rcu merge
>   that's causing it. 
> 
> Don't think I'll be able to dig deeper, but I can do testing if needed.

Please!  Does the following patch help?

							Thanx, Paul

------------------------------------------------------------------------

rcu: More on deadlock between CPU hotplug and expedited grace periods

Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and
expedited grace periods) was incomplete.  Although it did eliminate
deadlocks involving synchronize_sched_expedited()'s acquisition of
cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar
deadlock involving acquisition of this same lock via put_online_cpus().
This deadlock became apparent with testing involving hibernation.

This commit therefore changes put_online_cpus() acquisition of this lock
to be conditional, and increments a new cpu_hotplug.puts_pending field
in case of acquisition failure.  Then cpu_hotplug_begin() checks for this
new field being non-zero, and applies any changes to cpu_hotplug.refcount.

Reported-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 356450f09c1f..90a3d017b90c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -64,6 +64,8 @@ static struct {
 	 * an ongoing cpu hotplug operation.
 	 */
 	int refcount;
+	/* And allows lockless put_online_cpus(). */
+	atomic_t puts_pending;
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map dep_map;
@@ -113,7 +115,11 @@ void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
+	if (!mutex_trylock(&cpu_hotplug.lock)) {
+		atomic_inc(&cpu_hotplug.puts_pending);
+		cpuhp_lock_release();
+		return;
+	}
 
 	if (WARN_ON(!cpu_hotplug.refcount))
 		cpu_hotplug.refcount++; /* try to fix things up */
@@ -155,6 +161,12 @@ void cpu_hotplug_begin(void)
 	cpuhp_lock_acquire();
 	for (;;) {
 		mutex_lock(&cpu_hotplug.lock);
+		if (atomic_read(&cpu_hotplug.puts_pending)) {
+			int delta;
+
+			delta = atomic_xchg(&cpu_hotplug.puts_pending, 0);
+			cpu_hotplug.refcount -= delta;
+		}
 		if (likely(!cpu_hotplug.refcount))
 			break;
 		__set_current_state(TASK_UNINTERRUPTIBLE);

^ permalink raw reply related

* Re: [RFC] tcp md5 use of alloc_percpu
From: Crestez Dan Leonard @ 2014-10-22 23:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-crypto
In-Reply-To: <1414005158.9031.22.camel@edumazet-glaptop2.roam.corp.google.com>

On 10/22/2014 10:12 PM, Eric Dumazet wrote:
> On Wed, 2014-10-22 at 21:55 +0300, Crestez Dan Leonard wrote:
>> Hello,
>>
>> It seems that the TCP MD5 feature allocates a percpu struct
>> tcp_md5sig_pool and uses part of that memory for a scratch buffer to
>> do crypto on. Here is the relevant code:
>>
>> static int tcp_v4_md5_hash_pseudoheader(struct tcp_md5sig_pool *hp,
>>                                          __be32 daddr, __be32 saddr,
>> int nbytes)
>> {
>>          struct tcp4_pseudohdr *bp;
>>          struct scatterlist sg;
>>
>>          bp = &hp->md5_blk.ip4;
>>
>>          /*
>>           * 1. the TCP pseudo-header (in the order: source IP address,
>>           * destination IP address, zero-padded protocol number, and
>>           * segment length)
>>           */
>>          bp->saddr = saddr;
>>          bp->daddr = daddr;
>>          bp->pad = 0;
>>          bp->protocol = IPPROTO_TCP;
>>          bp->len = cpu_to_be16(nbytes);
>>
>>          sg_init_one(&sg, bp, sizeof(*bp));
>>          return crypto_hash_update(&hp->md5_desc, &sg, sizeof(*bp));
>> }
>>
>> sg_init_one does virt_addr on the pointer which assumes it is directly
>> accessible. But the tcp_md5sig_pool pointer comes from alloc_percpu
>> which can return memory from the vmalloc area after the
>> pcpu_first_chunk is exhausted. This looks wrong to me. I'm am getting
>> crashes on mips and I believe this to be the cause.
>>
>> Allocating a scratch buffer this way is very peculiar. The
>> tcp4_pseudohdr struct is only 12 bytes in length. Similar code in
>> tcp_v6_md5_hash_pseudoheader uses a 40 byte tcp6_pseudohdr. I think it
>> is perfectly reasonable to allocate this kind of stuff on the stack,
>> right? These pseudohdr structs are not used at all outside these two
>> static functions and it would simplify the code.
>>
> Yep, but the sg stuff does not allow for stack variables. Because of
> possible offloading and DMA, I dont know...
A stack buffer is used in tcp_md5_hash_header to add a tcphdr to the 
hash. A quick grep for sg_init_one find a couple of additional instances 
of what looks like doing crypto on small stack buffers:

net/bluetooth/smp.e:110
net/sunrpc/auth_gss/gss_krb5_crypto.c:194
net/rxrpc/rxkad.c:multiple

But those might also be bugs.

If the buffers passed to the crypto api need to be DMA-ble then wouldn't 
this also exclude DEFINE_PERCPU? The DMA-API-HOWTO mentions that items 
in data/text/bss might not be DMA-able, presumably depending on the 
architecture.

>> I'm not familiar with the linux crypto API. Isn't there an easier way
>> to get a temporary md5 hasher?
>
> You should CC crypto guys maybe ...
Added linux-crypto in CC. To summarize the question: What kind of memory 
can be passed to crypto api functions like crypto_hash_update?

 >> The whole notion of struct tcp_md5sig_pool seems dubious. This is a
 >> very tiny struct already and after removing the pseudohdr it shrinks
 >> to a percpu hash_desc for md5 (8 or 16 bytes). Wouldn't DEFINE_PERCPU
 >> be more appropriate?
 >
 > Sure. this would be the more appropriate fix IMO.
I'll post this as a patch if somebody can confirm that it is correct and 
portable.

Doing a temp kmalloc/kfree would also work, but it would hurt 
performance. It would be nice to have a generic way to ask for a small 
temporary DMA-ble buffer.

If DEFINE_PERCPU is not suitable then the tcp_md5sig_pool structs should 
be allocated via individual kmallocs for each cpu.

Regards,
Leonard

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Yanko Kaneti @ 2014-10-22 22:40 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Paul McKenney, Eric W. Biederman, Cong Wang, Kevin Fenzi, netdev,
	Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <CA+5PVA56ajrBQ-C9orSb9-_qhMKe994QL2x0FcKbe6BYmaWFBw@mail.gmail.com>

On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Wed, Oct 22, 2014 at 01:25:37PM -0500, Eric W. Biederman wrote:
> >> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> >>
> >> > On Wed, Oct 22, 2014 at 12:53:24PM -0500, Eric W. Biederman wrote:
> >> >> Cong Wang <cwang@twopensource.com> writes:
> >> >>
> >> >> > (Adding Paul and Eric in Cc)
> >> >> >
> >> >> >
> >> >> > On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> >> >> >>
> >> >> >> Someone else is seeing this when they try and modprobe ppp_generic:
> >> >> >>
> >> >> >> [  240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
> >> >> >> [  240.599338]       Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> >> >> [  240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> >> >> disables this message.
> >> >> >> [  240.599583] kworker/u16:5   D ffff8802202db480 12400   100      2 0x00000000
> >> >> >> [  240.599744] Workqueue: netns cleanup_net
> >> >> >> [  240.599823]  ffff8802202eb9e8 0000000000000096 ffff8802202db480
> >> >> >> 00000000001d5f00
> >> >> >> [  240.600066]  ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
> >> >> >> ffff8802202db480
> >> >> >> [  240.600228]  ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
> >> >> >> ffffffff81ee2690
> >> >> >> [  240.600386] Call Trace:
> >> >> >> [  240.600445]  [<ffffffff8185e239>] schedule+0x29/0x70
> >> >> >> [  240.600541]  [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
> >> >> >> [  240.600651]  [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
> >> >> >> [  240.600765]  [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
> >> >> >> [  240.600879]  [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
> >> >> >> [  240.601025]  [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
> >> >> >> [  240.601133]  [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
> >> >> >> [  240.601237]  [<ffffffff8112a845>] rcu_barrier+0x15/0x20
> >> >> >> [  240.601335]  [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
> >> >> >> [  240.601442]  [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
> >> >> >> [  240.601564]  [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
> >> >> >> [  240.601660]  [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
> >> >> >> [  240.601781]  [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
> >> >> >> [  240.601895]  [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
> >> >> >> [  240.602028]  [<ffffffff81708540>] cleanup_net+0x100/0x1f0
> >> >> >> [  240.602131]  [<ffffffff810ccfa8>] process_one_work+0x218/0x850
> >> >> >> [  240.602241]  [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
> >> >> >> [  240.602350]  [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
> >> >> >> [  240.602454]  [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
> >> >> >> [  240.602555]  [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
> >> >> >> [  240.602665]  [<ffffffff810d399b>] kthread+0x10b/0x130
> >> >> >> [  240.602762]  [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
> >> >> >> [  240.602862]  [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> >> >> >> [  240.603004]  [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
> >> >> >> [  240.603106]  [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> >> >> >> [  240.603224] 4 locks held by kworker/u16:5/100:
> >> >> >> [  240.603304]  #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
> >> >> >> process_one_work+0x17f/0x850
> >> >> >> [  240.603495]  #1:  (net_cleanup_work){+.+.+.}, at:
> >> >> >> [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
> >> >> >> [  240.603691]  #2:  (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
> >> >> >> cleanup_net+0x8c/0x1f0
> >> >> >> [  240.603869]  #3:  (rcu_sched_state.barrier_mutex){+.+...}, at:
> >> >> >> [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
> >> >> >> [  240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
> >> >> >> [  240.604329]       Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> >> >> [  240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> >> >> disables this message.
> >> >> >> [  240.604570] modprobe        D ffff8800cb4f1a40 13112  1387   1386 0x00000080
> >> >> >> [  240.604719]  ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
> >> >> >> 00000000001d5f00
> >> >> >> [  240.604878]  ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
> >> >> >> ffff8800cb4f1a40
> >> >> >> [  240.605068]  ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
> >> >> >> ffff8800cb4f1a40
> >> >> >> [  240.605228] Call Trace:
> >> >> >> [  240.605283]  [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> >> >> >> [  240.605400]  [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> >> >> >> [  240.605510]  [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> >> >> >> [  240.605626]  [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> >> >> >> [  240.605757]  [<ffffffffa0701000>] ? 0xffffffffa0701000
> >> >> >> [  240.605854]  [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
> >> >> >> [  240.606005]  [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
> >> >> >> [  240.606112]  [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >> >> >> [  240.606224]  [<ffffffff81153c02>] load_module+0x20c2/0x2870
> >> >> >> [  240.606327]  [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> >> >> >> [  240.606433]  [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> >> >> >> [  240.606557]  [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> >> >> >> [  240.606664]  [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> >> >> >> [  240.606773] 1 lock held by modprobe/1387:
> >> >> >> [  240.606845]  #0:  (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
> >> >> >> register_pernet_subsys+0x1f/0x50
> >> >> >> [  240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
> >> >> >> [  240.607231]       Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> >> >> [  240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> >> >> disables this message.
> >> >> >> [  240.607473] modprobe        D ffff88020fbab480 13096  1466   1399 0x00000084
> >> >> >> [  240.607622]  ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
> >> >> >> 00000000001d5f00
> >> >> >> [  240.607791]  ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
> >> >> >> ffff88020fbab480
> >> >> >> [  240.607949]  ffff88020fbab480 ffffffff81f8fb48 0000000000000246
> >> >> >> ffff88020fbab480
> >> >> >> [  240.608138] Call Trace:
> >> >> >> [  240.608193]  [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> >> >> >> [  240.608316]  [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> >> >> >> [  240.608425]  [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> >> >> >> [  240.608542]  [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> >> >> >> [  240.608662]  [<ffffffffa071d000>] ? 0xffffffffa071d000
> >> >> >> [  240.608759]  [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
> >> >> >> [  240.608881]  [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
> >> >> >> [  240.609021]  [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >> >> >> [  240.609131]  [<ffffffff81153c02>] load_module+0x20c2/0x2870
> >> >> >> [  240.609235]  [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> >> >> >> [  240.609339]  [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> >> >> >> [  240.609462]  [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> >> >> >> [  240.609568]  [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> >> >> >> [  240.609677] 1 lock held by modprobe/1466:
> >> >> >> [  240.609749]  #0:  (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
> >> >> >> register_pernet_device+0x1d/0x70
> >> >> >>
> >> >> >> Looks like contention on net_mutex or something, but I honestly have
> >> >> >> no idea yet.  I can't recreate it myself at the moment or I would
> >> >> >> bisect.
> >> >> >>
> >> >> >> Has nobody else run into this with the pre-3.18 kernels?  Fedora isn't
> >> >> >> carrying any patches in this area.
> >> >>
> >> >> > I am not aware of any change in net/core/dev.c related here,
> >> >> > so I guess it's a bug in rcu_barrier().
> >> >>
> >> >> >From the limited trace data I see in this email I have to agree.
> >> >>
> >> >> It looks like for some reason rcu_barrier is taking forever
> >> >> while the rtnl_lock is held in cleanup_net.  Because the
> >> >> rtnl_lock is held modprobe of the ppp driver is getting stuck.
> >> >>
> >> >> Is it possible we have an AB BA deadlock between the rtnl_lock
> >> >> and rcu.  With something the module loading code assumes?
> >> >
> >> > I am not aware of RCU ever acquiring rtnl_lock, not directly, anyway.
> >>
> >> Does the module loading code do something strange with rcu?  Perhaps
> >> blocking an rcu grace period until the module loading completes?
> >>
> >> If the module loading somehow blocks an rcu grace period that would
> >> create an AB deadlock because loading the ppp module grabs the
> >> rtnl_lock.  And elsewhere we have the rtnl_lock waiting for an rcu grace
> >> period.
> >>
> >> I would think trying and failing to get the rtnl_lock would sleep and
> >> thus let any rcu grace period happen but shrug.
> >>
> >> It looks like something is holding up the rcu grace period, and causing
> >> this.  Although it is possible that something is causing cleanup_net
> >> to run slowly and we are just seeing that slowness show up in
> >> rcu_barrier as that is one of the slower bits.  With a single trace I
> >> can't definitely same that the rcu barrier is getting stuck but it
> >> certainly looks that way.
> >
> > Don't get me wrong -- the fact that this kthread appears to have
> > blocked within rcu_barrier() for 120 seconds means that something is
> > most definitely wrong here.  I am surprised that there are no RCU CPU
> > stall warnings, but perhaps the blockage is in the callback execution
> > rather than grace-period completion.  Or something is preventing this
> > kthread from starting up after the wake-up callback executes.  Or...
> >
> > Is this thing reproducible?
> 
> I've added Yanko on CC, who reported the backtrace above and can
> recreate it reliably.  Apparently reverting the RCU merge commit
> (d6dd50e) and rebuilding the latest after that does not show the
> issue.  I'll let Yanko explain more and answer any questions you have.

- It is reproducible
- I've done another build here to double check and its definitely the rcu merge
  that's causing it. 

Don't think I'll be able to dig deeper, but I can do testing if needed.

--Yanko

^ permalink raw reply

* [PATCHv5 net-next 4/4] sunvnet: Remove irqsave/irqrestore on vio.lock
From: Sowmini Varadhan @ 2014-10-22 22:12 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev


After the  NAPIfication of sunvnet, we no longer need to
synchronize by doing irqsave/restore on vio.lock in the
I/O fastpath.

NAPI ->poll() is non-reentrant, so all RX processing occurs
strictly in a serialized environment. TX reclaim is done in NAPI
context, so the netif_tx_lock can be used to serialize
critical sections between Tx and Rx paths.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 drivers/net/ethernet/sun/sunvnet.c | 30 +++++-------------------------
 1 file changed, 5 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index 097ea5f..ac5cfc0 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -844,18 +844,6 @@ struct vnet_port *__tx_port_find(struct vnet *vp, struct sk_buff *skb)
 	return NULL;
 }
 
-struct vnet_port *tx_port_find(struct vnet *vp, struct sk_buff *skb)
-{
-	struct vnet_port *ret;
-	unsigned long flags;
-
-	spin_lock_irqsave(&vp->lock, flags);
-	ret = __tx_port_find(vp, skb);
-	spin_unlock_irqrestore(&vp->lock, flags);
-
-	return ret;
-}
-
 static struct sk_buff *vnet_clean_tx_ring(struct vnet_port *port,
 					  unsigned *pending)
 {
@@ -916,11 +904,10 @@ static void vnet_clean_timer_expire(unsigned long port0)
 	struct vnet_port *port = (struct vnet_port *)port0;
 	struct sk_buff *freeskbs;
 	unsigned pending;
-	unsigned long flags;
 
-	spin_lock_irqsave(&port->vio.lock, flags);
+	netif_tx_lock(port->vp->dev);
 	freeskbs = vnet_clean_tx_ring(port, &pending);
-	spin_unlock_irqrestore(&port->vio.lock, flags);
+	netif_tx_unlock(port->vp->dev);
 
 	vnet_free_skbs(freeskbs);
 
@@ -973,7 +960,6 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct vnet_port *port = NULL;
 	struct vio_dring_state *dr;
 	struct vio_net_desc *d;
-	unsigned long flags;
 	unsigned int len;
 	struct sk_buff *freeskbs = NULL;
 	int i, err, txi;
@@ -986,7 +972,7 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto out_dropped;
 
 	rcu_read_lock();
-	port = tx_port_find(vp, skb);
+	port = __tx_port_find(vp, skb);
 	if (unlikely(!port))
 		goto out_dropped;
 
@@ -1022,8 +1008,6 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto out_dropped;
 	}
 
-	spin_lock_irqsave(&port->vio.lock, flags);
-
 	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
 	if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
 		if (!netif_queue_stopped(dev)) {
@@ -1057,7 +1041,7 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			     (LDC_MAP_SHADOW | LDC_MAP_DIRECT | LDC_MAP_RW));
 	if (err < 0) {
 		netdev_info(dev, "tx buffer map error %d\n", err);
-		goto out_dropped_unlock;
+		goto out_dropped;
 	}
 	port->tx_bufs[txi].ncookies = err;
 
@@ -1110,7 +1094,7 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		netdev_info(dev, "TX trigger error %d\n", err);
 		d->hdr.state = VIO_DESC_FREE;
 		dev->stats.tx_carrier_errors++;
-		goto out_dropped_unlock;
+		goto out_dropped;
 	}
 
 ldc_start_done:
@@ -1126,7 +1110,6 @@ ldc_start_done:
 			netif_wake_queue(dev);
 	}
 
-	spin_unlock_irqrestore(&port->vio.lock, flags);
 	(void)mod_timer(&port->clean_timer, jiffies + VNET_CLEAN_TIMEOUT);
 	rcu_read_unlock();
 
@@ -1134,9 +1117,6 @@ ldc_start_done:
 
 	return NETDEV_TX_OK;
 
-out_dropped_unlock:
-	spin_unlock_irqrestore(&port->vio.lock, flags);
-
 out_dropped:
 	if (pending)
 		(void)mod_timer(&port->clean_timer,
-- 
1.8.4.2

^ permalink raw reply related

* [PATCHv5 3/4] sparc64: Avoid irqsave/restore on vio.lock if in_softirq()
From: Sowmini Varadhan @ 2014-10-22 22:12 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev, sparclinux


For NAPIfied drivers , there is no need to
synchronize by doing irqsave/restore on vio.lock in the I/O
path.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 arch/sparc/kernel/viohs.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/kernel/viohs.c b/arch/sparc/kernel/viohs.c
index 526fcb5..92e2a32 100644
--- a/arch/sparc/kernel/viohs.c
+++ b/arch/sparc/kernel/viohs.c
@@ -747,10 +747,11 @@ EXPORT_SYMBOL(vio_ldc_free);
 
 void vio_port_up(struct vio_driver_state *vio)
 {
-	unsigned long flags;
+	unsigned long flags = 0;
 	int err, state;
 
-	spin_lock_irqsave(&vio->lock, flags);
+	if (!in_softirq())
+		spin_lock_irqsave(&vio->lock, flags);
 
 	state = ldc_state(vio->lp);
 
@@ -777,7 +778,8 @@ void vio_port_up(struct vio_driver_state *vio)
 		mod_timer(&vio->timer, expires);
 	}
 
-	spin_unlock_irqrestore(&vio->lock, flags);
+	if (!in_softirq())
+		spin_unlock_irqrestore(&vio->lock, flags);
 }
 EXPORT_SYMBOL(vio_port_up);
 
-- 
1.8.4.2


^ permalink raw reply related

* [PATCHv5 net-next 2/4] sunvnet: Use RCU to synchronize port usage with vnet_port_remove()
From: Sowmini Varadhan @ 2014-10-22 22:12 UTC (permalink / raw)
  To: davem, bob.picco, sowmini.varadhan, dwight.engen, david.stevens; +Cc: netdev


A vnet_port_remove could be triggered as a result of an ldm-unbind
operation by the peer, module unload, or other changes to the
inter-vnet-link configuration.  When this is concurrent with
vnet_start_xmit(), there are several race sequences possible,
such as

thread 1                                    thread 2
vnet_start_xmit
-> tx_port_find
   spin_lock_irqsave(&vp->lock..)
   ret = __tx_port_find(..)
   spin_lock_irqrestore(&vp->lock..)
                                           vio_remove -> ..
                                               ->vnet_port_remove
                                           spin_lock_irqsave(&vp->lock..)
                                           cleanup
                                           spin_lock_irqrestore(&vp->lock..)
                                           kfree(port)
/* attempt to use ret will bomb */

This patch adds RCU locking for port access so that vnet_port_remove
will correctly clean up port-related state.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
---
changes since v2: use RCU.
changes since v3: incorporate David Stevens feedback

 drivers/net/ethernet/sun/sunvnet.c | 62 ++++++++++++++++++++------------------
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index 8827fef..097ea5f 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -619,7 +619,8 @@ static void maybe_tx_wakeup(struct vnet *vp)
 		struct vnet_port *port;
 		int wake = 1;
 
-		list_for_each_entry(port, &vp->port_list, list) {
+		rcu_read_lock();
+		list_for_each_entry_rcu(port, &vp->port_list, list) {
 			struct vio_dring_state *dr;
 
 			dr = &port->vio.drings[VIO_DRIVER_TX_RING];
@@ -629,6 +630,7 @@ static void maybe_tx_wakeup(struct vnet *vp)
 				break;
 			}
 		}
+		rcu_read_unlock();
 		if (wake)
 			netif_wake_queue(dev);
 	}
@@ -826,13 +828,13 @@ struct vnet_port *__tx_port_find(struct vnet *vp, struct sk_buff *skb)
 	struct hlist_head *hp = &vp->port_hash[hash];
 	struct vnet_port *port;
 
-	hlist_for_each_entry(port, hp, hash) {
+	hlist_for_each_entry_rcu(port, hp, hash) {
 		if (!port_is_up(port))
 			continue;
 		if (ether_addr_equal(port->raddr, skb->data))
 			return port;
 	}
-	list_for_each_entry(port, &vp->port_list, list) {
+	list_for_each_entry_rcu(port, &vp->port_list, list) {
 		if (!port->switch_port)
 			continue;
 		if (!port_is_up(port))
@@ -968,7 +970,7 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, void **pstart,
 static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct vnet *vp = netdev_priv(dev);
-	struct vnet_port *port = tx_port_find(vp, skb);
+	struct vnet_port *port = NULL;
 	struct vio_dring_state *dr;
 	struct vio_net_desc *d;
 	unsigned long flags;
@@ -979,14 +981,15 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	int nlen = 0;
 	unsigned pending = 0;
 
-	if (unlikely(!port))
-		goto out_dropped;
-
 	skb = vnet_skb_shape(skb, &start, &nlen);
-
 	if (unlikely(!skb))
 		goto out_dropped;
 
+	rcu_read_lock();
+	port = tx_port_find(vp, skb);
+	if (unlikely(!port))
+		goto out_dropped;
+
 	if (skb->len > port->rmtu) {
 		unsigned long localmtu = port->rmtu - ETH_HLEN;
 
@@ -1004,6 +1007,7 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			fl4.saddr = ip_hdr(skb)->saddr;
 
 			rt = ip_route_output_key(dev_net(dev), &fl4);
+			rcu_read_unlock();
 			if (!IS_ERR(rt)) {
 				skb_dst_set(skb, &rt->dst);
 				icmp_send(skb, ICMP_DEST_UNREACH,
@@ -1029,7 +1033,7 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			netdev_err(dev, "BUG! Tx Ring full when queue awake!\n");
 			dev->stats.tx_errors++;
 		}
-		spin_unlock_irqrestore(&port->vio.lock, flags);
+		rcu_read_unlock();
 		return NETDEV_TX_BUSY;
 	}
 
@@ -1123,25 +1127,27 @@ ldc_start_done:
 	}
 
 	spin_unlock_irqrestore(&port->vio.lock, flags);
+	(void)mod_timer(&port->clean_timer, jiffies + VNET_CLEAN_TIMEOUT);
+	rcu_read_unlock();
 
 	vnet_free_skbs(freeskbs);
 
-	(void)mod_timer(&port->clean_timer, jiffies + VNET_CLEAN_TIMEOUT);
-
 	return NETDEV_TX_OK;
 
 out_dropped_unlock:
 	spin_unlock_irqrestore(&port->vio.lock, flags);
 
 out_dropped:
-	if (skb)
-		dev_kfree_skb(skb);
-	vnet_free_skbs(freeskbs);
 	if (pending)
 		(void)mod_timer(&port->clean_timer,
 				jiffies + VNET_CLEAN_TIMEOUT);
 	else if (port)
 		del_timer(&port->clean_timer);
+	if (port)
+		rcu_read_unlock();
+	if (skb)
+		dev_kfree_skb(skb);
+	vnet_free_skbs(freeskbs);
 	dev->stats.tx_dropped++;
 	return NETDEV_TX_OK;
 }
@@ -1271,18 +1277,17 @@ static void vnet_set_rx_mode(struct net_device *dev)
 {
 	struct vnet *vp = netdev_priv(dev);
 	struct vnet_port *port;
-	unsigned long flags;
 
-	spin_lock_irqsave(&vp->lock, flags);
-	if (!list_empty(&vp->port_list)) {
-		port = list_entry(vp->port_list.next, struct vnet_port, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(port, &vp->port_list, list) {
 
 		if (port->switch_port) {
 			__update_mc_list(vp, dev);
 			__send_mc_list(vp, port);
+			break;
 		}
 	}
-	spin_unlock_irqrestore(&vp->lock, flags);
+	rcu_read_unlock();
 }
 
 static int vnet_change_mtu(struct net_device *dev, int new_mtu)
@@ -1635,10 +1640,11 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 
 	spin_lock_irqsave(&vp->lock, flags);
 	if (switch_port)
-		list_add(&port->list, &vp->port_list);
+		list_add_rcu(&port->list, &vp->port_list);
 	else
-		list_add_tail(&port->list, &vp->port_list);
-	hlist_add_head(&port->hash, &vp->port_hash[vnet_hashfn(port->raddr)]);
+		list_add_tail_rcu(&port->list, &vp->port_list);
+	hlist_add_head_rcu(&port->hash,
+			   &vp->port_hash[vnet_hashfn(port->raddr)]);
 	spin_unlock_irqrestore(&vp->lock, flags);
 
 	dev_set_drvdata(&vdev->dev, port);
@@ -1673,18 +1679,16 @@ static int vnet_port_remove(struct vio_dev *vdev)
 	struct vnet_port *port = dev_get_drvdata(&vdev->dev);
 
 	if (port) {
-		struct vnet *vp = port->vp;
-		unsigned long flags;
 
 		del_timer_sync(&port->vio.timer);
-		del_timer_sync(&port->clean_timer);
 
 		napi_disable(&port->napi);
-		spin_lock_irqsave(&vp->lock, flags);
-		list_del(&port->list);
-		hlist_del(&port->hash);
-		spin_unlock_irqrestore(&vp->lock, flags);
 
+		list_del_rcu(&port->list);
+		hlist_del_rcu(&port->hash);
+
+		synchronize_rcu();
+		del_timer_sync(&port->clean_timer);
 		netif_napi_del(&port->napi);
 		vnet_port_free_tx_bufs(port);
 		vio_ldc_free(&port->vio);
-- 
1.8.4.2

^ permalink raw reply related

* [PATCHv5 net-next 1/4] sunvnet: NAPIfy sunvnet
From: Sowmini Varadhan @ 2014-10-22 22:12 UTC (permalink / raw)
  To: davem, bob.picco, sowmini.varadhan, dwight.engen,
	raghuram.kothakota, david.stevens
  Cc: netdev

Move Rx packet procssing to the NAPI poll callback.
Disable VIO interrupt and unconditioanlly go into NAPI
context from vnet_event.

Note that we want to minimize the number of LDC
STOP/START messages sent. Specifically, do not send a STOP
message if vnet_walk_rx does not read all the available descriptors
because of the NAPI budget limitation. Instead, note the end index
as part of port state, and resume from this index when the
next poll callback is triggered.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Raghuram Kothakota <raghuram.kothakota@oracle.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
---
changes since v2: use NAPI.
changes since v3: David Stevens comments.
Changes since v4: vnet_event() must accumulate LDC_EVENT_* bits into rx_event
                  and all these bits should be processed in vnet_event_api()
		  in the same order as send_events()

 drivers/net/ethernet/sun/sunvnet.c | 175 ++++++++++++++++++++++++++++---------
 drivers/net/ethernet/sun/sunvnet.h |   6 +-
 2 files changed, 137 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index 3652afd..8827fef 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -33,6 +33,8 @@
 #define DRV_MODULE_VERSION	"1.0"
 #define DRV_MODULE_RELDATE	"June 25, 2007"
 
+#define	NAPI_POLL_WEIGHT	64
+
 static char version[] =
 	DRV_MODULE_NAME ".c:v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
 MODULE_AUTHOR("David S. Miller (davem@davemloft.net)");
@@ -311,9 +313,7 @@ static int vnet_rx_one(struct vnet_port *port, unsigned int len,
 
 	dev->stats.rx_packets++;
 	dev->stats.rx_bytes += len;
-
-	netif_rx(skb);
-
+	napi_gro_receive(&port->napi, skb);
 	return 0;
 
 out_free_skb:
@@ -430,6 +430,7 @@ static int vnet_walk_rx_one(struct vnet_port *port,
 	struct vio_driver_state *vio = &port->vio;
 	int err;
 
+	BUG_ON(desc == NULL);
 	if (IS_ERR(desc))
 		return PTR_ERR(desc);
 
@@ -456,10 +457,11 @@ static int vnet_walk_rx_one(struct vnet_port *port,
 }
 
 static int vnet_walk_rx(struct vnet_port *port, struct vio_dring_state *dr,
-			u32 start, u32 end)
+			u32 start, u32 end, int *npkts, int budget)
 {
 	struct vio_driver_state *vio = &port->vio;
 	int ack_start = -1, ack_end = -1;
+	bool send_ack = true;
 
 	end = (end == (u32) -1) ? prev_idx(start, dr) : next_idx(end, dr);
 
@@ -471,6 +473,7 @@ static int vnet_walk_rx(struct vnet_port *port, struct vio_dring_state *dr,
 			return err;
 		if (err != 0)
 			break;
+		(*npkts)++;
 		if (ack_start == -1)
 			ack_start = start;
 		ack_end = start;
@@ -482,13 +485,26 @@ static int vnet_walk_rx(struct vnet_port *port, struct vio_dring_state *dr,
 				return err;
 			ack_start = -1;
 		}
+		if ((*npkts) >= budget) {
+			send_ack = false;
+			break;
+		}
 	}
 	if (unlikely(ack_start == -1))
 		ack_start = ack_end = prev_idx(start, dr);
-	return vnet_send_ack(port, dr, ack_start, ack_end, VIO_DRING_STOPPED);
+	if (send_ack) {
+		port->napi_resume = false;
+		return vnet_send_ack(port, dr, ack_start, ack_end,
+				     VIO_DRING_STOPPED);
+	} else  {
+		port->napi_resume = true;
+		port->napi_stop_idx = ack_end;
+		return 1;
+	}
 }
 
-static int vnet_rx(struct vnet_port *port, void *msgbuf)
+static int vnet_rx(struct vnet_port *port, void *msgbuf, int *npkts,
+		   int budget)
 {
 	struct vio_dring_data *pkt = msgbuf;
 	struct vio_dring_state *dr = &port->vio.drings[VIO_DRIVER_RX_RING];
@@ -505,11 +521,13 @@ static int vnet_rx(struct vnet_port *port, void *msgbuf)
 		return 0;
 	}
 
-	dr->rcv_nxt++;
+	if (!port->napi_resume)
+		dr->rcv_nxt++;
 
 	/* XXX Validate pkt->start_idx and pkt->end_idx XXX */
 
-	return vnet_walk_rx(port, dr, pkt->start_idx, pkt->end_idx);
+	return vnet_walk_rx(port, dr, pkt->start_idx, pkt->end_idx,
+			    npkts, budget);
 }
 
 static int idx_is_pending(struct vio_dring_state *dr, u32 end)
@@ -542,9 +560,12 @@ static int vnet_ack(struct vnet_port *port, void *msgbuf)
 	if (unlikely(!idx_is_pending(dr, end)))
 		return 0;
 
+	vp = port->vp;
+	dev = vp->dev;
 	/* sync for race conditions with vnet_start_xmit() and tell xmit it
 	 * is time to send a trigger.
 	 */
+	netif_tx_lock(dev);
 	dr->cons = next_idx(end, dr);
 	desc = vio_dring_entry(dr, dr->cons);
 	if (desc->hdr.state == VIO_DESC_READY && port->start_cons) {
@@ -559,10 +580,8 @@ static int vnet_ack(struct vnet_port *port, void *msgbuf)
 	} else {
 		port->start_cons = true;
 	}
+	netif_tx_unlock(dev);
 
-
-	vp = port->vp;
-	dev = vp->dev;
 	if (unlikely(netif_queue_stopped(dev) &&
 		     vnet_tx_dring_avail(dr) >= VNET_TX_WAKEUP_THRESH(dr)))
 		return 1;
@@ -591,9 +610,8 @@ static int handle_mcast(struct vnet_port *port, void *msgbuf)
 	return 0;
 }
 
-static void maybe_tx_wakeup(unsigned long param)
+static void maybe_tx_wakeup(struct vnet *vp)
 {
-	struct vnet *vp = (struct vnet *)param;
 	struct net_device *dev = vp->dev;
 
 	netif_tx_lock(dev);
@@ -617,32 +635,43 @@ static void maybe_tx_wakeup(unsigned long param)
 	netif_tx_unlock(dev);
 }
 
-static void vnet_event(void *arg, int event)
+static inline bool port_is_up(struct vnet_port *vnet)
+{
+	struct vio_driver_state *vio = &vnet->vio;
+
+	return !!(vio->hs_state & VIO_HS_COMPLETE);
+}
+
+static int vnet_event_napi(struct vnet_port *port, int budget)
 {
-	struct vnet_port *port = arg;
 	struct vio_driver_state *vio = &port->vio;
-	unsigned long flags;
 	int tx_wakeup, err;
+	int npkts = 0;
+	int event = (port->rx_event & LDC_EVENT_RESET);
 
-	spin_lock_irqsave(&vio->lock, flags);
-
+ldc_ctrl:
 	if (unlikely(event == LDC_EVENT_RESET ||
 		     event == LDC_EVENT_UP)) {
 		vio_link_state_change(vio, event);
-		spin_unlock_irqrestore(&vio->lock, flags);
 
 		if (event == LDC_EVENT_RESET) {
 			port->rmtu = 0;
 			vio_port_up(vio);
 		}
-		return;
+		port->rx_event = 0;
+		return 0;
 	}
+	/* We may have multiple LDC events in rx_event. Unroll send_events() */
+	event = (port->rx_event & LDC_EVENT_UP);
+	port->rx_event &= ~(LDC_EVENT_RESET|LDC_EVENT_UP);
+	if (event == LDC_EVENT_UP)
+		goto ldc_ctrl;
+	event = port->rx_event;
+	if (!(event & LDC_EVENT_DATA_READY))
+		return 0;
 
-	if (unlikely(event != LDC_EVENT_DATA_READY)) {
-		pr_warn("Unexpected LDC event %d\n", event);
-		spin_unlock_irqrestore(&vio->lock, flags);
-		return;
-	}
+	/* we dont expect any other bits than RESET, UP, DATA_READY */
+	BUG_ON(event != LDC_EVENT_DATA_READY);
 
 	tx_wakeup = err = 0;
 	while (1) {
@@ -651,6 +680,21 @@ static void vnet_event(void *arg, int event)
 			u64 raw[8];
 		} msgbuf;
 
+		if (port->napi_resume) {
+			struct vio_dring_data *pkt =
+				(struct vio_dring_data *)&msgbuf;
+			struct vio_dring_state *dr =
+				&port->vio.drings[VIO_DRIVER_RX_RING];
+
+			pkt->tag.type = VIO_TYPE_DATA;
+			pkt->tag.stype = VIO_SUBTYPE_INFO;
+			pkt->tag.stype_env = VIO_DRING_DATA;
+			pkt->seq = dr->rcv_nxt;
+			pkt->start_idx = next_idx(port->napi_stop_idx, dr);
+			pkt->end_idx = -1;
+			goto napi_resume;
+		}
+ldc_read:
 		err = ldc_read(vio->lp, &msgbuf, sizeof(msgbuf));
 		if (unlikely(err < 0)) {
 			if (err == -ECONNRESET)
@@ -667,10 +711,22 @@ static void vnet_event(void *arg, int event)
 		err = vio_validate_sid(vio, &msgbuf.tag);
 		if (err < 0)
 			break;
-
+napi_resume:
 		if (likely(msgbuf.tag.type == VIO_TYPE_DATA)) {
 			if (msgbuf.tag.stype == VIO_SUBTYPE_INFO) {
-				err = vnet_rx(port, &msgbuf);
+				if (!port_is_up(port)) {
+					/* failures like handshake_failure()
+					 * may have cleaned up dring, but
+					 * NAPI polling may bring us here.
+					 */
+					err = -ECONNRESET;
+					break;
+				}
+				err = vnet_rx(port, &msgbuf, &npkts, budget);
+				if (npkts >= budget)
+					break;
+				if (npkts == 0 && err != -ECONNRESET)
+					goto ldc_read;
 			} else if (msgbuf.tag.stype == VIO_SUBTYPE_ACK) {
 				err = vnet_ack(port, &msgbuf);
 				if (err > 0)
@@ -691,15 +747,33 @@ static void vnet_event(void *arg, int event)
 		if (err == -ECONNRESET)
 			break;
 	}
-	spin_unlock(&vio->lock);
-	/* Kick off a tasklet to wake the queue.  We cannot call
-	 * maybe_tx_wakeup directly here because we could deadlock on
-	 * netif_tx_lock() with dev_watchdog()
-	 */
 	if (unlikely(tx_wakeup && err != -ECONNRESET))
-		tasklet_schedule(&port->vp->vnet_tx_wakeup);
+		maybe_tx_wakeup(port->vp);
+	return npkts;
+}
+
+static int vnet_poll(struct napi_struct *napi, int budget)
+{
+	struct vnet_port *port = container_of(napi, struct vnet_port, napi);
+	struct vio_driver_state *vio = &port->vio;
+	int processed = vnet_event_napi(port, budget);
+
+	if (processed < budget) {
+		napi_complete(napi);
+		vio_set_intr(vio->vdev->rx_ino, HV_INTR_ENABLED);
+	}
+	return processed;
+}
+
+static void vnet_event(void *arg, int event)
+{
+	struct vnet_port *port = arg;
+	struct vio_driver_state *vio = &port->vio;
+
+	port->rx_event |= event;
+	vio_set_intr(vio->vdev->rx_ino, HV_INTR_DISABLED);
+	napi_schedule(&port->napi);
 
-	local_irq_restore(flags);
 }
 
 static int __vnet_tx_trigger(struct vnet_port *port, u32 start)
@@ -746,13 +820,6 @@ static int __vnet_tx_trigger(struct vnet_port *port, u32 start)
 	return err;
 }
 
-static inline bool port_is_up(struct vnet_port *vnet)
-{
-	struct vio_driver_state *vio = &vnet->vio;
-
-	return !!(vio->hs_state & VIO_HS_COMPLETE);
-}
-
 struct vnet_port *__tx_port_find(struct vnet *vp, struct sk_buff *skb)
 {
 	unsigned int hash = vnet_hashfn(skb->data);
@@ -1342,6 +1409,21 @@ err_out:
 	return err;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static void vnet_poll_controller(struct net_device *dev)
+{
+	struct vnet *vp = netdev_priv(dev);
+	struct vnet_port *port;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vp->lock, flags);
+	if (!list_empty(&vp->port_list)) {
+		port = list_entry(vp->port_list.next, struct vnet_port, list);
+		napi_schedule(&port->napi);
+	}
+	spin_unlock_irqrestore(&vp->lock, flags);
+}
+#endif
 static LIST_HEAD(vnet_list);
 static DEFINE_MUTEX(vnet_list_mutex);
 
@@ -1354,6 +1436,9 @@ static const struct net_device_ops vnet_ops = {
 	.ndo_tx_timeout		= vnet_tx_timeout,
 	.ndo_change_mtu		= vnet_change_mtu,
 	.ndo_start_xmit		= vnet_start_xmit,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_poll_controller	= vnet_poll_controller,
+#endif
 };
 
 static struct vnet *vnet_new(const u64 *local_mac)
@@ -1374,7 +1459,6 @@ static struct vnet *vnet_new(const u64 *local_mac)
 	vp = netdev_priv(dev);
 
 	spin_lock_init(&vp->lock);
-	tasklet_init(&vp->vnet_tx_wakeup, maybe_tx_wakeup, (unsigned long)vp);
 	vp->dev = dev;
 
 	INIT_LIST_HEAD(&vp->port_list);
@@ -1434,7 +1518,6 @@ static void vnet_cleanup(void)
 		vp = list_first_entry(&vnet_list, struct vnet, list);
 		list_del(&vp->list);
 		dev = vp->dev;
-		tasklet_kill(&vp->vnet_tx_wakeup);
 		/* vio_unregister_driver() should have cleaned up port_list */
 		BUG_ON(!list_empty(&vp->port_list));
 		unregister_netdev(dev);
@@ -1536,6 +1619,8 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	if (err)
 		goto err_out_free_port;
 
+	netif_napi_add(port->vp->dev, &port->napi, vnet_poll, NAPI_POLL_WEIGHT);
+
 	err = vnet_port_alloc_tx_bufs(port);
 	if (err)
 		goto err_out_free_ldc;
@@ -1564,6 +1649,7 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	setup_timer(&port->clean_timer, vnet_clean_timer_expire,
 		    (unsigned long)port);
 
+	napi_enable(&port->napi);
 	vio_port_up(&port->vio);
 
 	mdesc_release(hp);
@@ -1571,6 +1657,7 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	return 0;
 
 err_out_free_ldc:
+	netif_napi_del(&port->napi);
 	vio_ldc_free(&port->vio);
 
 err_out_free_port:
@@ -1592,11 +1679,13 @@ static int vnet_port_remove(struct vio_dev *vdev)
 		del_timer_sync(&port->vio.timer);
 		del_timer_sync(&port->clean_timer);
 
+		napi_disable(&port->napi);
 		spin_lock_irqsave(&vp->lock, flags);
 		list_del(&port->list);
 		hlist_del(&port->hash);
 		spin_unlock_irqrestore(&vp->lock, flags);
 
+		netif_napi_del(&port->napi);
 		vnet_port_free_tx_bufs(port);
 		vio_ldc_free(&port->vio);
 
diff --git a/drivers/net/ethernet/sun/sunvnet.h b/drivers/net/ethernet/sun/sunvnet.h
index c911045..c8a862e 100644
--- a/drivers/net/ethernet/sun/sunvnet.h
+++ b/drivers/net/ethernet/sun/sunvnet.h
@@ -56,6 +56,11 @@ struct vnet_port {
 	struct timer_list	clean_timer;
 
 	u64			rmtu;
+
+	struct napi_struct	napi;
+	u32			napi_stop_idx;
+	bool			napi_resume;
+	int			rx_event;
 };
 
 static inline struct vnet_port *to_vnet_port(struct vio_driver_state *vio)
@@ -97,7 +102,6 @@ struct vnet {
 	struct list_head	list;
 	u64			local_mac;
 
-	struct tasklet_struct	vnet_tx_wakeup;
 };
 
 #endif /* _SUNVNET_H */
-- 
1.8.4.2

^ permalink raw reply related

* [PATCHv5 net-next 0/4] sunvnet: NAPI enhancements.
From: Sowmini Varadhan @ 2014-10-22 22:12 UTC (permalink / raw)
  To: davem, bob.picco, sowmini.varadhan, dwight.engen,
	raghuram.kothakota, david.stevens
  Cc: netdev, sparclinux

This patchset converts the sunvnet driver to use the NAPI framework.
Changes since v4 to (Patch 1): 
  vnet_event accumulates LDC_EVENT_* bits into rx_event. 
  vnet_event_napi() unrolls send_events() logic to process all rx_event bits.
(Mail Subject of cover-letter changed, to keep it distinct from 
subject of patch 1)

Patch 1 in the series addresses the packet-receive path- all
the vnet_event() processing is moved into NAPI context.
This patch is dependant on the sparc-next commit:
  "sparc64: Add vio_set_intr() to enable/disable Rx interrupts"
  (sparc commit id ca605b7dd740c8909408d67911d8ddd272c2b320)

Patch 2 uses RCU to fix race conditions between vnet_port_remove and
paths that access/modify port-related state, such as vnet_start_xmit.

Patch 3 and Patch 4 in the series leverage from the NAPIfied Rx path, 
dropping superfluous usage of the irqsave/irqrestores on the vio.lock
where possible.

Note: Patch 3 contains changes that target sparc-next, Patch 4 targets
net-next.

Sowmini Varadhan (4):
  NAPIfy sunvnet
  Use RCU to synchronize port usage with vnet_port_remove()
  Avoid irqsave/restore on vio.lock if in_softirq()
  Remove irqsave/irqrestore on vio.lock

 arch/sparc/kernel/viohs.c          |   8 +-
 drivers/net/ethernet/sun/sunvnet.c | 265 +++++++++++++++++++++++--------------
 drivers/net/ethernet/sun/sunvnet.h |   6 +-
 3 files changed, 179 insertions(+), 100 deletions(-)

-- 
1.8.4.2

^ permalink raw reply

* Re: [PATCH net] hyperv: Fix the total_data_buflen in send path
From: David Miller @ 2014-10-22 21:59 UTC (permalink / raw)
  To: haiyangz; +Cc: olaf, netdev, jasowang, driverdev-devel, linux-kernel
In-Reply-To: <1414010838-20656-1-git-send-email-haiyangz@microsoft.com>

From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Wed, 22 Oct 2014 13:47:18 -0700

> total_data_buflen is used by netvsc_send() to decide if a packet can be put
> into send buffer. It should also include the size of RNDIS message before the
> Ethernet frame. Otherwise, a messge with total size bigger than send_section_size
> may be copied into the send buffer, and cause data corruption.
> 
> [Request to include this patch to the Stable branches]
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [RFC] tcp md5 use of alloc_percpu
From: David Miller @ 2014-10-22 21:53 UTC (permalink / raw)
  To: cdleonard; +Cc: netdev
In-Reply-To: <5447FDB2.2010906@gmail.com>

From: Crestez Dan Leonard <cdleonard@gmail.com>
Date: Wed, 22 Oct 2014 21:55:46 +0300

>  static int tcp_v4_md5_hash_pseudoheader(struct tcp_md5sig_pool *hp,
>  					__be32 daddr, __be32 saddr, int nbytes)
>  {
> -	struct tcp4_pseudohdr *bp;
> +	struct tcp4_pseudohdr bp;
...
> +	sg_init_one(&sg, &bp, sizeof(bp));
> +	return crypto_hash_update(&hp->md5_desc, &sg, sizeof(bp));

As others have mentioned, you cannot do this.

On some architectures the kernel stack comes from vmalloc()
memory too.

^ permalink raw reply

* Re: [PATCH net v1 0/2] amd-xgbe: AMD XGBE driver fixes 2014-10-22
From: David Miller @ 2014-10-22 21:51 UTC (permalink / raw)
  To: thomas.lendacky; +Cc: netdev
In-Reply-To: <20141022162605.31495.98889.stgit@tlendack-t1.amdoffice.net>

From: Tom Lendacky <thomas.lendacky@amd.com>
Date: Wed, 22 Oct 2014 11:26:05 -0500

> The following series of patches includes fixes to the driver.
> 
> - Properly handle feature changes via ethtool by using correctly sized
>   variables
> - Perform proper napi packet counting and budget checking
> 
> This patch series is based on net.

Series applied, thanks Tom.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox