Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/3] virtio-net: inline header support
From: Michael S. Tsirkin @ 2012-10-08 21:31 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kvm, netdev, linux-kernel, Sasha Levin, virtualization, avi,
	Anthony Liguori, Thomas Lendacky
In-Reply-To: <87391u3o67.fsf@rustcorp.com.au>

On Thu, Oct 04, 2012 at 01:04:56PM +0930, Rusty Russell wrote:
> Anthony Liguori <anthony@codemonkey.ws> writes:
> > Rusty Russell <rusty@rustcorp.com.au> writes:
> >
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >>
> >>> Thinking about Sasha's patches, we can reduce ring usage
> >>> for virtio net small packets dramatically if we put
> >>> virtio net header inline with the data.
> >>> This can be done for free in case guest net stack allocated
> >>> extra head room for the packet, and I don't see
> >>> why would this have any downsides.
> >>
> >> I've been wanting to do this for the longest time... but...
> >>
> >>> Even though with my recent patches qemu
> >>> no longer requires header to be the first s/g element,
> >>> we need a new feature bit to detect this.
> >>> A trivial qemu patch will be sent separately.
> >>
> >> There's a reason I haven't done this.  I really, really dislike "my
> >> implemention isn't broken" feature bits.  We could have an infinite
> >> number of them, for each bug in each device.
> >
> > This is a bug in the specification.
> >
> > The QEMU implementation pre-dates the specification.  All of the actual
> > implementations of virtio relied on the semantics of s/g elements and
> > still do.
> 
> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
> ever going to be merged, so I'm not sure what its status is?  But I'm
> determined to fix qemu, and hence my torture patch to make sure this
> doesn't creep in again.

If you look at my patch you'll notice there's also a
comment in virtio_net.h that seems to be broken in this respect:

/* This is the first element of the scatter-gather list.  If you don't
 * specify GSO or CSUM features, you can simply ignore the header. */

There is a similar comment in virtio-blk.

^ permalink raw reply

* [PATCH resend] net, bluetooth: don't attempt to free a channel that wasn't created
From: Sasha Levin @ 2012-10-08 20:48 UTC (permalink / raw)
  To: marcel, gustavo, johan.hedberg, davem
  Cc: linux-bluetooth, davej, netdev, linux-kernel, levinsasha928,
	Sasha Levin

We may currently attempt to free a channel which wasn't created due to
an error in the initialization path, this would cause a NULL ptr deref.

This would cause the following oops:

[   12.919073] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[   12.919131] IP: [<ffffffff836645c4>] l2cap_chan_put+0x34/0x50
[   12.919135] PGD 0
[   12.919138] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   12.919193] Dumping ftrace buffer:
[   12.919242]    (ftrace buffer empty)
[   12.919314] Modules linked in:
[   12.919318] CPU 1
[   12.919319] Pid: 6210, comm: krfcommd Tainted: G        W    3.6.0-next-20121004-sasha-00005-gb010653-dirty #30
[   12.919374] RIP: 0010:[<ffffffff836645c4>]  [<ffffffff836645c4>] l2cap_chan_put+0x34/0x50
[   12.919377] RSP: 0000:ffff880066933c38  EFLAGS: 00010246
[   12.919378] RAX: ffffffff8366c780 RBX: 0000000000000000 RCX: 6666666666666667
[   12.919379] RDX: 0000000000000fa0 RSI: ffffffff84d3f79e RDI: 0000000000000010
[   12.919381] RBP: ffff880066933c48 R08: ffffffff859989f8 R09: 0000000000000001
[   12.919382] R10: 0000000000000000 R11: 7fffffffffffffff R12: 0000000000000000
[   12.919383] R13: ffff88009b00a200 R14: ffff88009b00a200 R15: 0000000000000001
[   12.919385] FS:  0000000000000000(0000) GS:ffff880033600000(0000) knlGS:0000000000000000
[   12.919437] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.919440] CR2: 0000000000000010 CR3: 0000000005026000 CR4: 00000000000406e0
[   12.919446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   12.919451] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   12.919504] Process krfcommd (pid: 6210, threadinfo ffff880066932000, task ffff880065c4b000)
[   12.919506] Stack:
[   12.919510]  ffff88009b00a200 ffff880032084000 ffff880066933c68 ffffffff8366c7bc
[   12.919513]  7fffffffffffffff ffff880032084000 ffff880066933c98 ffffffff833ae0ae
[   12.919516]  ffff880066933ca8 0000000000000000 0000000000000000 ffff88009b00a200
[   12.919517] Call Trace:
[   12.919522]  [<ffffffff8366c7bc>] l2cap_sock_destruct+0x3c/0x80
[   12.919527]  [<ffffffff833ae0ae>] __sk_free+0x1e/0x1f0
[   12.919530]  [<ffffffff833ae2f7>] sk_free+0x17/0x20
[   12.919585]  [<ffffffff8366ca4e>] l2cap_sock_alloc.constprop.5+0x9e/0xd0
[   12.919591]  [<ffffffff8366cb9e>] l2cap_sock_create+0x7e/0x100
[   12.919652]  [<ffffffff83a4f32a>] ? _raw_read_lock+0x6a/0x80
[   12.919658]  [<ffffffff836402c4>] ? bt_sock_create+0x74/0x110
[   12.919660]  [<ffffffff83640308>] bt_sock_create+0xb8/0x110
[   12.919664]  [<ffffffff833aa232>] __sock_create+0x282/0x3b0
[   12.919720]  [<ffffffff833aa0b0>] ? __sock_create+0x100/0x3b0
[   12.919725]  [<ffffffff836785b0>] ? rfcomm_process_sessions+0x17e0/0x17e0
[   12.919779]  [<ffffffff833aa37f>] sock_create_kern+0x1f/0x30
[   12.919784]  [<ffffffff83675714>] rfcomm_l2sock_create+0x44/0x70
[   12.919787]  [<ffffffff836785b0>] ? rfcomm_process_sessions+0x17e0/0x17e0
[   12.919790]  [<ffffffff836785fe>] rfcomm_run+0x4e/0x1f0
[   12.919846]  [<ffffffff836785b0>] ? rfcomm_process_sessions+0x17e0/0x17e0
[   12.919852]  [<ffffffff81138ee3>] kthread+0xe3/0xf0
[   12.919908]  [<ffffffff8117b12e>] ? put_lock_stats.isra.14+0xe/0x40
[   12.919914]  [<ffffffff81138e00>] ? flush_kthread_work+0x1f0/0x1f0
[   12.919968]  [<ffffffff83a5077c>] ret_from_fork+0x7c/0x90
[   12.919973]  [<ffffffff81138e00>] ? flush_kthread_work+0x1f0/0x1f0
[   12.920161] Code: 83 ec 08 f6 05 ff 58 44 02 04 74 1b 8b 4f 10 48 89 fa 48 c7 c6 d9 d7 d4 84 48 c7 c7 80 9e aa 85 31 c0 e8 80
ac 3a fe 48 8d 7b 10 <f0> 83 6b 10 01 0f 94 c0 84 c0 74 05 e8 8b e0 ff ff 48 83 c4 08
[   12.920165] RIP  [<ffffffff836645c4>] l2cap_chan_put+0x34/0x50
[   12.920166]  RSP <ffff880066933c38>
[   12.920167] CR2: 0000000000000010
[   12.920417] ---[ end trace 5a9114e8a158ab84 ]---

Introduced in commit 61d6ef3e ("Bluetooth: Make better use of l2cap_chan
reference counting").

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/bluetooth/l2cap_sock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 083f2bf..66c295a 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1083,7 +1083,8 @@ static void l2cap_sock_destruct(struct sock *sk)
 {
 	BT_DBG("sk %p", sk);
 
-	l2cap_chan_put(l2cap_pi(sk)->chan);
+	if (l2cap_pi(sk)->chan)
+		l2cap_chan_put(l2cap_pi(sk)->chan);
 	if (l2cap_pi(sk)->rx_busy_skb) {
 		kfree_skb(l2cap_pi(sk)->rx_busy_skb);
 		l2cap_pi(sk)->rx_busy_skb = NULL;
-- 
1.7.12

^ permalink raw reply related

* Re: [PATCH net 1/6] ipv4: fix sending of redirects
From: David Miller @ 2012-10-08 20:41 UTC (permalink / raw)
  To: ja; +Cc: netdev
In-Reply-To: <alpine.LFD.2.00.1210082259180.12172@ja.ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Mon, 8 Oct 2012 23:43:45 +0300 (EEST)

> On Mon, 8 Oct 2012, David Miller wrote:
> 
>> From: Julian Anastasov <ja@ssi.bg>
>> Date: Sun,  7 Oct 2012 14:26:03 +0300
>> 
>> > @@ -322,7 +322,8 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>> >  {
>> >  	int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
>> >  
>> > -	if (!r && !fib_num_tclassid_users(dev_net(dev))) {
>> > +	if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
>> > +	    dev->ifindex != oif) {
>> >  		*itag = 0;
>> >  		return 0;
>> >  	}
>> 
>> Hmmm, won't this cause the slow path to be taken for locally
>> destined traffic?
> 
> 	In this case idev=eth0 and oif=lo. The only case
> where we can see same input and output device is for
> forwarding and loopback (but only output routes where
> there is no such validation).

Ok, now I understand.  This added condition is fine.

Thanks.

^ permalink raw reply

* Re: [PATCH 0/3] virtio-net: inline header support
From: Michael S. Tsirkin @ 2012-10-08 20:41 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kvm, netdev, linux-kernel, Sasha Levin, virtualization, avi,
	Thomas Lendacky
In-Reply-To: <87vces2gxq.fsf@rustcorp.com.au>

On Wed, Oct 03, 2012 at 04:14:17PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > Thinking about Sasha's patches, we can reduce ring usage
> > for virtio net small packets dramatically if we put
> > virtio net header inline with the data.
> > This can be done for free in case guest net stack allocated
> > extra head room for the packet, and I don't see
> > why would this have any downsides.
> 
> I've been wanting to do this for the longest time... but...
> 
> > Even though with my recent patches qemu
> > no longer requires header to be the first s/g element,
> > we need a new feature bit to detect this.
> > A trivial qemu patch will be sent separately.
> 
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.
> 
> So my plan was to tie this assumption to the new PCI layout.

I don't object but old qemu has this limitation for s390 as well,
and that's not using PCI, right? So how do we detect
new hypervisor there?

-- 
MST

^ permalink raw reply

* Re: [PATCH net 1/6] ipv4: fix sending of redirects
From: Julian Anastasov @ 2012-10-08 20:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20121008.151633.753185005887436197.davem@davemloft.net>


	Hello,

On Mon, 8 Oct 2012, David Miller wrote:

> From: Julian Anastasov <ja@ssi.bg>
> Date: Sun,  7 Oct 2012 14:26:03 +0300
> 
> > @@ -322,7 +322,8 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
> >  {
> >  	int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
> >  
> > -	if (!r && !fib_num_tclassid_users(dev_net(dev))) {
> > +	if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
> > +	    dev->ifindex != oif) {
> >  		*itag = 0;
> >  		return 0;
> >  	}
> 
> Hmmm, won't this cause the slow path to be taken for locally
> destined traffic?

	In this case idev=eth0 and oif=lo. The only case
where we can see same input and output device is for
forwarding and loopback (but only output routes where
there is no such validation).

	Of course, it slows down on traffic that ignores
our redirects. We can save some cycles if IN_DEV_TX_REDIRECTS
is checked before setting RTCF_DOREDIRECT:

	(dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev)) {

	RTCF_DOREDIRECT is used just to call ip_rt_send_redirect
where all depends on IN_DEV_TX_REDIRECTS.

	The only difference can be that user space will
not see RTCF_DOREDIRECT flag if IN_DEV_TX_REDIRECTS is false.
But may be it is better to show "redirect" in ip route
only when IN_DEV_TX_REDIRECTS are enabled. The old way
is preferred only when there is routing cache and changes
in IN_DEV_TX_REDIRECTS do not flush cache in devinet_conf_proc().

> > +			  rt->rt_gateway ? : ip_hdr(skb)->daddr);
> 
> Please use the rt_nexthop() helper.
> 
> > +		__be32 gw = rt->rt_gateway ? : ip_hdr(skb)->daddr;

	Good idea.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [RFC] napi: limit GRO latency
From: Eric Dumazet @ 2012-10-08 19:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, rick.jones2, herbert, netdev, jesse
In-Reply-To: <20121008124026.4057aba8@nehalam.linuxnetplumber.net>

On Mon, 2012-10-08 at 12:40 -0700, Stephen Hemminger wrote:
> On Mon, 08 Oct 2012 21:30:12 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > System is under stress, you dont want to increase load at this point,
> > but increase bandwidth.
> > 
> > Really try this on a 40Gbe link and see how bad it is.
> 
> Not everybody has 40Gbe hardware just lying around available for testing :-)

;)

In fact, the current strategy to flush napi_gro when napi completes
might be not good for moderate traffic (thats 99% of the linux machines
I believe). We could have a deferred flush instead...

I dont know what strategy is used by hardware. But surely hardware uses
timers.

^ permalink raw reply

* Re: [PATCH] ipv6: gro: fix PV6_GRO_CB(skb)->proto problem
From: David Miller @ 2012-10-08 19:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, herbert
In-Reply-To: <1349725130.21172.3663.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 08 Oct 2012 21:38:50 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive()
> if a new skb is allocated (to serve as an anchor for frag_list)
> 
> We copy NAPI_GRO_CB() only (not the IPV6 specific part) in :
> 
> *NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);
> 
> So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead
> of IPPROTO_TCP (6)
> 
> ipv6_gro_complete() isnt able to call ops->gro_complete()
> [ tcp6_gro_complete() ]
> 
> Fix this by moving proto in NAPI_GRO_CB() and getting rid of
> IPV6_GRO_CB
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Re: [RFC] napi: limit GRO latency
From: Stephen Hemminger @ 2012-10-08 19:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, rick.jones2, herbert, netdev, jesse
In-Reply-To: <1349724612.21172.3643.camel@edumazet-glaptop>

On Mon, 08 Oct 2012 21:30:12 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> System is under stress, you dont want to increase load at this point,
> but increase bandwidth.
> 
> Really try this on a 40Gbe link and see how bad it is.

Not everybody has 40Gbe hardware just lying around available for testing :-)

^ permalink raw reply

* Re: [PATCH] net/ethernet/jme: disable ASPM
From: David Miller @ 2012-10-08 19:39 UTC (permalink / raw)
  To: mjg59; +Cc: kevin.baradon, cooldavid, netdev
In-Reply-To: <20121008193316.GA13684@srcf.ucam.org>

From: Matthew Garrett <mjg59@srcf.ucam.org>
Date: Mon, 8 Oct 2012 20:33:16 +0100

> On Mon, Oct 08, 2012 at 03:24:26PM -0400, David Miller wrote:
> 
>> This should be a PCI quirk shouldn't it?
> 
> No, it's a driver-level policy decision.

Then at a bare minimum this change should be using one of the ASPM
helpers provided by drivers/pci/pcie/aspm.c such as
pci_disable_link_state().

^ permalink raw reply

* [PATCH] ipv6: gro: fix PV6_GRO_CB(skb)->proto problem
From: Eric Dumazet @ 2012-10-08 19:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Herbert Xu

From: Eric Dumazet <edumazet@google.com>

It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive()
if a new skb is allocated (to serve as an anchor for frag_list)

We copy NAPI_GRO_CB() only (not the IPV6 specific part) in :

*NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);

So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead
of IPPROTO_TCP (6)

ipv6_gro_complete() isnt able to call ops->gro_complete()
[ tcp6_gro_complete() ]

Fix this by moving proto in NAPI_GRO_CB() and getting rid of
IPV6_GRO_CB

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
 include/linux/netdevice.h |    3 +++
 net/ipv6/af_inet6.c       |   11 ++---------
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0a36fff..4b9035c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1513,6 +1513,9 @@ struct napi_gro_cb {
 
 	/* jiffies when first packet was created/queued */
 	unsigned long age;
+
+	/* Used in ipv6_gro_receive() */
+	int	proto;
 };
 
 #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb)
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index f757e3b..a974247 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -822,13 +822,6 @@ out:
 	return segs;
 }
 
-struct ipv6_gro_cb {
-	struct napi_gro_cb napi;
-	int proto;
-};
-
-#define IPV6_GRO_CB(skb) ((struct ipv6_gro_cb *)(skb)->cb)
-
 static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
 					 struct sk_buff *skb)
 {
@@ -874,7 +867,7 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
 		iph = ipv6_hdr(skb);
 	}
 
-	IPV6_GRO_CB(skb)->proto = proto;
+	NAPI_GRO_CB(skb)->proto = proto;
 
 	flush--;
 	nlen = skb_network_header_len(skb);
@@ -930,7 +923,7 @@ static int ipv6_gro_complete(struct sk_buff *skb)
 				 sizeof(*iph));
 
 	rcu_read_lock();
-	ops = rcu_dereference(inet6_protos[IPV6_GRO_CB(skb)->proto]);
+	ops = rcu_dereference(inet6_protos[NAPI_GRO_CB(skb)->proto]);
 	if (WARN_ON(!ops || !ops->gro_complete))
 		goto out_unlock;
 

^ permalink raw reply related

* Re: [PATCH] net/ethernet/jme: disable ASPM
From: Matthew Garrett @ 2012-10-08 19:33 UTC (permalink / raw)
  To: David Miller; +Cc: kevin.baradon, cooldavid, netdev
In-Reply-To: <20121008.152426.1524908667081347879.davem@davemloft.net>

On Mon, Oct 08, 2012 at 03:24:26PM -0400, David Miller wrote:

> This should be a PCI quirk shouldn't it?

No, it's a driver-level policy decision.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply

* Re: [RFC] napi: limit GRO latency
From: Eric Dumazet @ 2012-10-08 19:30 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, rick.jones2, herbert, netdev, jesse
In-Reply-To: <20121008121248.783e6dc4@nehalam.linuxnetplumber.net>

On Mon, 2012-10-08 at 12:12 -0700, Stephen Hemminger wrote:
> On Mon, 08 Oct 2012 15:10:35 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Stephen Hemminger <shemminger@vyatta.com>
> > Date: Mon, 8 Oct 2012 11:58:35 -0700
> > 
> > > Limit the latency of pending GRO in NAPI processing to 2*HZ.
> > > When the system is under heavy network load, NAPI will go into
> > > poll mode via soft irq, and only stay in the loop for
> > > two jiffies. If this occurs, process the GRO pending list
> > > to make sure and not delay outstanding TCP frames for too long.
> > > 
> > > Rearrange the exit path to get rid of unnecessary goto logic.
> > > 
> > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > Did you see Eric's patch I just applied which limits it to 1ms?
> 
> Think this is a different problem.
> After leaving softirq, it may be a long time until
> ksoftirqd is run which is a different problem.

System is under stress, you dont want to increase load at this point,
but increase bandwidth.

Really try this on a 40Gbe link and see how bad it is.

On a server, we probably dedicate one or several cpus only to run NAPI,
so yes we hit net_rx_action() break, but we'll reenter it a few us
later.

^ permalink raw reply

* Re: [PATCH] net/ethernet/jme: disable ASPM
From: David Miller @ 2012-10-08 19:24 UTC (permalink / raw)
  To: kevin.baradon; +Cc: cooldavid, mjg, netdev
In-Reply-To: <1349722956-14159-1-git-send-email-kevin.baradon@gmail.com>

From: Kevin Baradon <kevin.baradon@gmail.com>
Date: Mon,  8 Oct 2012 21:02:36 +0200

> Based on patch from Matthew Garrett <mjg@redhat.com> (https://lkml.org/lkml/2011/11/11/168).
> 
> http://driveragent.com/archive/30421/7-0-14 indicates that ASPM is
> disabled on the 250 and 260. Duplicate for sanity.
> 
> Fixes random RX engine hangs I experienced with JMC250 on Clevo W270HU.
> 
> Signed-off-by: Kevin Baradon <kevin.baradon@gmail.com>

This should be a PCI quirk shouldn't it?

Also:

> +	pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
> +		PCIE_LINK_STATE_CLKPM);

This is terrible formatting, you must line up the "PCIE_LINK_STATE_CLKPM" on
the second line with the very first column after the openning parenthesis
on the first line.

^ permalink raw reply

* Re: [PATCH v2] vlan: don't deliver frames for unknown vlans to protocols
From: David Miller @ 2012-10-08 19:22 UTC (permalink / raw)
  To: florz; +Cc: kaber, eric.dumazet, netdev, linux-kernel, jpirko
In-Reply-To: <20121008191944.GD25288@florz.florz.dyndns.org>

From: Florian Zumbiehl <florz@florz.de>
Date: Mon, 8 Oct 2012 21:19:44 +0200

> The only way to reach the new check without another_round and with a
> non-zero tag is the first return false, which happens if there is no device
> for the tag, in which case setting PACKET_OTHERHOST should be the right
> thing to do (in particular, a non-existent vlan device won't have the
> frame's MAC address). I am assuming that rx_handlers don't modify the
> frame unless they return RX_HANDLER_ANOTHER.

Indeed, you're right.

Patch applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [RFC] napi: limit GRO latency
From: Eric Dumazet @ 2012-10-08 19:21 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Rick Jones, Herbert Xu, netdev, Jesse Gross
In-Reply-To: <20121008115835.3a3bfed6@nehalam.linuxnetplumber.net>

On Mon, 2012-10-08 at 11:58 -0700, Stephen Hemminger wrote:
> Limit the latency of pending GRO in NAPI processing to 2*HZ.
> When the system is under heavy network load, NAPI will go into
> poll mode via soft irq, and only stay in the loop for
> two jiffies. If this occurs, process the GRO pending list
> to make sure and not delay outstanding TCP frames for too long.
> 
> Rearrange the exit path to get rid of unnecessary goto logic.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
> --- a/net/core/dev.c	2012-10-08 09:21:27.466049785 -0700
> +++ b/net/core/dev.c	2012-10-08 11:56:41.714618745 -0700
> @@ -3937,8 +3937,16 @@ static void net_rx_action(struct softirq
>  		 * Allow this to run for 2 jiffies since which will allow
>  		 * an average latency of 1.5/HZ.
>  		 */
> -		if (unlikely(budget <= 0 || time_after(jiffies, time_limit)))
> -			goto softnet_break;
> +		if (unlikely(budget <= 0 || time_after(jiffies, time_limit))) {
> +			/* Cleanup all pending GRO */
> +
> +			list_for_each_entry(n, &sd->poll_list, poll_list)
> +				napi_gro_flush(n);
> +
> +			sd->time_squeeze++;
> +			__raise_softirq_irqoff(NET_RX_SOFTIRQ);
> +			break;
> +		}
>  

Hmm, I really dont think its a good idea.

I am working in making GRO storing more packets, so any kind of flushes
like that will completely defeat GRO intent.

^ permalink raw reply

* [PATCH] net/ethernet/jme: disable ASPM
From: Kevin Baradon @ 2012-10-08 19:02 UTC (permalink / raw)
  Cc: Kevin Baradon, Guo-Fu Tseng, Matthew Garrett, netdev

Based on patch from Matthew Garrett <mjg@redhat.com> (https://lkml.org/lkml/2011/11/11/168).

http://driveragent.com/archive/30421/7-0-14 indicates that ASPM is
disabled on the 250 and 260. Duplicate for sanity.

Fixes random RX engine hangs I experienced with JMC250 on Clevo W270HU.

Signed-off-by: Kevin Baradon <kevin.baradon@gmail.com>
Cc: Guo-Fu Tseng <cooldavid@cooldavid.org>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/jme.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c
index c911d88..373df5a 100644
--- a/drivers/net/ethernet/jme.c
+++ b/drivers/net/ethernet/jme.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/pci.h>
+#include <linux/pci-aspm.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
 #include <linux/ethtool.h>
@@ -2973,6 +2974,9 @@ jme_init_one(struct pci_dev *pdev,
 	/*
 	 * set up PCI device basics
 	 */
+	pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
+		PCIE_LINK_STATE_CLKPM);
+
 	rc = pci_enable_device(pdev);
 	if (rc) {
 		pr_err("Cannot enable PCI device\n");
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH v2] vlan: don't deliver frames for unknown vlans to protocols
From: Florian Zumbiehl @ 2012-10-08 19:19 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, eric.dumazet, netdev, linux-kernel, jpirko
In-Reply-To: <20121008.144230.1404596032615788891.davem@davemloft.net>

Hi,

> But I wonder if it breaks things, since you do the assignment so late
> we no longer handle the case where the VLAN device's MAC address
> matches the packet MAC address and the top-level device's does not.
> 
> That's handled by logic in vlan_do_receive() which checks for
> PACKET_OTHERHOST.
> 
> But you're going to unconditionally set PACKET_OTHERHOST, overriding
> any decision that code makes.

I don't think that that's actually the case. If vlan_do_receive() reaches
the MAC address check (that is, there is a vlan device for the tag), it
will either clear skb->vlan_tci and return true (which also causes goto
another_round), or return false with a NULL skb, which causes goto out.

The only way to reach the new check without another_round and with a
non-zero tag is the first return false, which happens if there is no device
for the tag, in which case setting PACKET_OTHERHOST should be the right
thing to do (in particular, a non-existent vlan device won't have the
frame's MAC address). I am assuming that rx_handlers don't modify the
frame unless they return RX_HANDLER_ANOTHER.

> This turns out to be a really non-trivial area and it's going to take
> some time to get this right and audit the change appropriately.

I wouldn't want to disagree with that ;-)

Florian

^ permalink raw reply

* Re: [PATCH net 3/6] ipv4: add check if nh_pcpu_rth_output is allocated
From: David Miller @ 2012-10-08 19:17 UTC (permalink / raw)
  To: ja; +Cc: netdev
In-Reply-To: <1349609168-9848-4-git-send-email-ja@ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Sun,  7 Oct 2012 14:26:05 +0300

> 	Avoid NULL ptr dereference and caching if
> nh_pcpu_rth_output is not allocated.
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>

Ugh.

Checking for alloc_percpu() failures in fib_create_info() is 1,000
times better than doing stuff like this in the fast path.

^ permalink raw reply

* Re: [PATCH net 1/6] ipv4: fix sending of redirects
From: David Miller @ 2012-10-08 19:16 UTC (permalink / raw)
  To: ja; +Cc: netdev
In-Reply-To: <1349609168-9848-2-git-send-email-ja@ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Sun,  7 Oct 2012 14:26:03 +0300

> @@ -322,7 +322,8 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  {
>  	int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
>  
> -	if (!r && !fib_num_tclassid_users(dev_net(dev))) {
> +	if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
> +	    dev->ifindex != oif) {
>  		*itag = 0;
>  		return 0;
>  	}

Hmmm, won't this cause the slow path to be taken for locally
destined traffic?

> +			  rt->rt_gateway ? : ip_hdr(skb)->daddr);

Please use the rt_nexthop() helper.

> +		__be32 gw = rt->rt_gateway ? : ip_hdr(skb)->daddr;

Likewise.

^ permalink raw reply

* Re: [RFC] napi: limit GRO latency
From: Stephen Hemminger @ 2012-10-08 19:12 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, rick.jones2, herbert, netdev, jesse
In-Reply-To: <20121008.151035.1665672334154033755.davem@davemloft.net>

On Mon, 08 Oct 2012 15:10:35 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Mon, 8 Oct 2012 11:58:35 -0700
> 
> > Limit the latency of pending GRO in NAPI processing to 2*HZ.
> > When the system is under heavy network load, NAPI will go into
> > poll mode via soft irq, and only stay in the loop for
> > two jiffies. If this occurs, process the GRO pending list
> > to make sure and not delay outstanding TCP frames for too long.
> > 
> > Rearrange the exit path to get rid of unnecessary goto logic.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> Did you see Eric's patch I just applied which limits it to 1ms?

Think this is a different problem.
After leaving softirq, it may be a long time until
ksoftirqd is run which is a different problem.

^ permalink raw reply

* Re: [RFC] napi: limit GRO latency
From: David Miller @ 2012-10-08 19:10 UTC (permalink / raw)
  To: shemminger; +Cc: eric.dumazet, rick.jones2, herbert, netdev, jesse
In-Reply-To: <20121008115835.3a3bfed6@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 8 Oct 2012 11:58:35 -0700

> Limit the latency of pending GRO in NAPI processing to 2*HZ.
> When the system is under heavy network load, NAPI will go into
> poll mode via soft irq, and only stay in the loop for
> two jiffies. If this occurs, process the GRO pending list
> to make sure and not delay outstanding TCP frames for too long.
> 
> Rearrange the exit path to get rid of unnecessary goto logic.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Did you see Eric's patch I just applied which limits it to 1ms?

^ permalink raw reply

* [RFC] napi: limit GRO latency
From: Stephen Hemminger @ 2012-10-08 18:58 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: Rick Jones, Herbert Xu, netdev, Jesse Gross
In-Reply-To: <1349719006.21172.3537.camel@edumazet-glaptop>

Limit the latency of pending GRO in NAPI processing to 2*HZ.
When the system is under heavy network load, NAPI will go into
poll mode via soft irq, and only stay in the loop for
two jiffies. If this occurs, process the GRO pending list
to make sure and not delay outstanding TCP frames for too long.

Rearrange the exit path to get rid of unnecessary goto logic.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/net/core/dev.c	2012-10-08 09:21:27.466049785 -0700
+++ b/net/core/dev.c	2012-10-08 11:56:41.714618745 -0700
@@ -3937,8 +3937,16 @@ static void net_rx_action(struct softirq
 		 * Allow this to run for 2 jiffies since which will allow
 		 * an average latency of 1.5/HZ.
 		 */
-		if (unlikely(budget <= 0 || time_after(jiffies, time_limit)))
-			goto softnet_break;
+		if (unlikely(budget <= 0 || time_after(jiffies, time_limit))) {
+			/* Cleanup all pending GRO */
+
+			list_for_each_entry(n, &sd->poll_list, poll_list)
+				napi_gro_flush(n);
+
+			sd->time_squeeze++;
+			__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+			break;
+		}
 
 		local_irq_enable();
 
@@ -3987,7 +3995,6 @@ static void net_rx_action(struct softirq
 
 		netpoll_poll_unlock(have);
 	}
-out:
 	net_rps_action_and_irq_enable(sd);
 
 #ifdef CONFIG_NET_DMA
@@ -3997,13 +4004,6 @@ out:
 	 */
 	dma_issue_pending_all();
 #endif
-
-	return;
-
-softnet_break:
-	sd->time_squeeze++;
-	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
-	goto out;
 }
 
 static gifconf_func_t *gifconf_list[NPROTO];

^ permalink raw reply

* Re: [PATCH] net: gro: selective flush of packets
From: David Miller @ 2012-10-08 18:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: herbert, netdev, jesse, therbert, ycheng
In-Reply-To: <1349546929.21172.1598.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 06 Oct 2012 20:08:49 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> Current GRO can hold packets in gro_list for almost unlimited
> time, in case napi->poll() handler consumes its budget over and over.
> 
> In this case, napi_complete()/napi_gro_flush() are not called.
> 
> Another problem is that gro_list is flushed in non friendly way :
> We scan the list and complete packets in the reverse order.
> (youngest packets first, oldest packets last)
> This defeats priorities that sender could have cooked.
> 
> Since GRO currently only store TCP packets, we dont really notice the
> bug because of retransmits, but this behavior can add unexpected
> latencies, particularly on mice flows clamped by elephant flows.
> 
> This patch makes sure no packet can stay more than 1 ms in queue, and
> only in stress situations.
> 
> It also complete packets in the right order to minimize latencies.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH v2] skge: Add DMA mask quirk for Marvell 88E8001 on ASUS P5NSLI motherboard
From: David Miller @ 2012-10-08 18:49 UTC (permalink / raw)
  To: jan.ceuleers; +Cc: netdev, graham.gower, shemminger
In-Reply-To: <1349721290-16374-1-git-send-email-jan.ceuleers@computer.org>

From: Jan Ceuleers <jan.ceuleers@computer.org>
Date: Mon,  8 Oct 2012 20:34:50 +0200

> From: Graham Gower <graham.gower@gmail.com>
> 
> Marvell 88E8001 on an ASUS P5NSLI motherboard is unable to send/receive
> packets on a system with >4gb ram unless a 32bit DMA mask is used.
> 
> This issue has been around for years and a fix was sent 3.5 years ago, but
> there was some debate as to whether it should instead be fixed as a PCI quirk.
> http://www.spinics.net/lists/netdev/msg88670.html
> 
> However, 18 months later a similar workaround was introduced for another
> chipset exhibiting the same problem.
> http://www.spinics.net/lists/netdev/msg142287.html
> 
> Signed-off-by: Graham Gower <graham.gower@gmail.com>
> Signed-off-by: Jan Ceuleers <jan.ceuleers@computer.org>
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH] cxgb4: Address various sparse warnings
From: David Miller @ 2012-10-08 18:48 UTC (permalink / raw)
  To: vipul; +Cc: netdev, divy, dm, leedom, felix, jay
In-Reply-To: <1349701183-25093-1-git-send-email-vipul@chelsio.com>

From: Vipul Pandya <vipul@chelsio.com>
Date: Mon,  8 Oct 2012 18:29:43 +0530

> This patch fixes type assignment issues, function definition and symbol
> shadowing which triggered sparse warnings.
> 
> Signed-off-by: Jay Hernandez <jay@chelsio.com>
> Signed-off-by: Vipul Pandya <vipul@chelsio.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox