Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: David Miller @ 2010-04-22  7:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1271920233.7895.4723.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Apr 2010 09:10:33 +0200

> Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit :
> 
>> Right, I've applied this, thanks.
>> 
>> What we should probably do instead is call and NULL out the
>> DEV_GSO_CB() destructor.  Right?
> 
> Yes, probably, I'll take a look at this if you want.

It might look something like this:

diff --git a/net/core/dev.c b/net/core/dev.c
index 9bf1ccc..13241da 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1892,6 +1892,20 @@ static inline void skb_orphan_try(struct sk_buff *skb)
 		skb_orphan(skb);
 }
 
+/*
+ * GSO packets need to be handled specially because such packets
+ * hold the normal SKB destructor in a backup pointer.
+ */
+static inline void skb_orphan_try_gso(struct sk_buff *skb)
+{
+	if (!skb_tx(skb)->flags) {
+		if (DEV_GSO_CB(skb)->destructor)
+			DEV_GSO_CB(skb)->destructor(skb);
+		DEV_GSO_CB(skb)->destructor = NULL;
+		skb->sk = NULL;
+	}
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1937,6 +1951,7 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
+		skb_orphan_try_gso(skb);
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)

^ permalink raw reply related

* Re: [PATCH v3] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-04-22  7:13 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev, Tom Herbert, jamal
In-Reply-To: <j2s412e6f7f1004212333se60a9083s59185ee3466313f@mail.gmail.com>

Le jeudi 22 avril 2010 à 14:33 +0800, Changli Gao a écrit :
> On Thu, Apr 22, 2010 at 7:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mercredi 14 avril 2010 à 17:52 +0800, Changli Gao a écrit :
> >> batch skb dequeueing from softnet input_pkt_queue
> >>
> >> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
> >> contention and irq disabling/enabling.
> >>
> >> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> >> ----
> >
> > lock contention _is_ a problem, Jamal tests can show it.
> >
> > irq disabling/enabling is not, and force to use stop_machine() killer.
> >
> 
> Although irq_disabling/enabling is not, we should do our best to make
> fast path as quickly as possible, and because stop_machine() is used
> in slow patch, I think we can afford its weight.
> 
> 

No thanks, this is out of the question.

Talk to ixiacom guys, some people settle/dismantle dozens of network
device per second, on production machines.





^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: Eric Dumazet @ 2010-04-22  7:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100421.225625.177238009.davem@davemloft.net>

Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit :

> Right, I've applied this, thanks.
> 
> What we should probably do instead is call and NULL out the
> DEV_GSO_CB() destructor.  Right?

Yes, probably, I'll take a look at this if you want.

Thanks



^ permalink raw reply

* Re: [net-next,1/2] add iovnl netlink support
From: David Miller @ 2010-04-22  7:09 UTC (permalink / raw)
  To: arnd; +Cc: scofeldm, chrisw, netdev
In-Reply-To: <201004212313.05060.arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 21 Apr 2010 23:13:04 +0200

> My preference would probably be make these a subcategory of the
> if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.

I was going to suggest this as well.

^ permalink raw reply

* Re: [net-next PATCH 1/2] add iovnl netlink support
From: David Miller @ 2010-04-22  6:52 UTC (permalink / raw)
  To: scofeldm; +Cc: netdev, chrisw
In-Reply-To: <20100419191807.10423.84600.stgit@savbu-pc100.cisco.com>

From: Scott Feldman <scofeldm@cisco.com>
Date: Mon, 19 Apr 2010 12:18:07 -0700

> +	if (tb[IOV_ATTR_VF_IFNAME])
> +		vf_dev = dev_get_by_name(&init_net,
> +			nla_data(tb[IOV_ATTR_VF_IFNAME]));

It's probably best to check this for NULL and notify
the user with an error in that case (don't forget to
put 'dev' in that error path :-)

As things stand it looks like if we can't find vf_dev, we'll just send
NULL down to the vf_dev arg of the various operations and possibly
silently succeed.

That's not desirable, semantically.

^ permalink raw reply

* Re: [net-next,1/2] add iovnl netlink support
From: Arnd Bergmann @ 2010-04-22  6:51 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev
In-Reply-To: <20100421224802.GF28829@x200.localdomain>

On Thursday 22 April 2010, Chris Wright wrote:
> > 
> > ip link add link eth0 type macvlan    # for a container
> > ip link add link eth0 type macvtap    # for qemu/vhost
> > ip link add link eth0 type vf         # for device assignment
> 
> BTW, what do you mean by device assignment?

I mean giving an SR-IOV VF to the guest as a native PCI device
rather than having qemu or vhost present a virtio-net to the
guest.

> > There are obviously significant differences between these three, but
> > they also share enough of their properties to let us treat them
> > in similar ways.
> > 
> > If we integrate the iovnl client into iproute2, the sequence for setting
> > up an enic VF and associating it to the port profile could be
> > 
> > # create vf0, pass mac and vlan id to HW, no association yet
> > ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> 
> Just to clarify...right now, the normal SR-IOV VF is already there.
> And, or course, can have its mac addr/vlan set already.

I don't have an SR-IOV card available for testing yet. How is this
configured now?

> > # associate vf with port profile, mac address must match the one assigned
> > #  to the interface before.
> > ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> >        mac fe:dc:ba:12:34:56
> 
> At that point you could just do s/mac fe:.*/link vf0/

My point was that this information should be irrelevant to the code doing the
association with the switch. It sort of makes sense when the receiver is enic,
but when we send the same data to lldpad, it doesn't care about the slave device
name but only about the mac address. Especially since the slave device might not
be in the root name space any more, meaning we have no way to find it.

	Arnd

^ permalink raw reply

* Re: [net-next PATCH 1/2] add iovnl netlink support
From: David Miller @ 2010-04-22  6:48 UTC (permalink / raw)
  To: scofeldm; +Cc: netdev, chrisw
In-Reply-To: <20100419191807.10423.84600.stgit@savbu-pc100.cisco.com>

From: Scott Feldman <scofeldm@cisco.com>
Date: Mon, 19 Apr 2010 12:18:07 -0700

> +#define IOVNL_PROTO_VERSION 1
> +

Please delete this in the final version, the macro isn't even used by
the code.

We don't do protocol versioning in netlink.  Instead we get the base
stuff solid from the beginning, and then if something needs fixing up
we handle this using new attributes in a way which is both backward
and forward compatible.

Thanks.

^ permalink raw reply

* Re: [PATCH] Socket filter access to hatype
From: David Miller @ 2010-04-22  6:42 UTC (permalink / raw)
  To: leonerd; +Cc: netdev
In-Reply-To: <20100421172546.GO19334@cel.leo>

From: Paul LeoNerd Evans <leonerd@leonerd.org.uk>
Date: Wed, 21 Apr 2010 18:25:46 +0100

> When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all
> interfaces, there doesn't appear to be a way for the filter program to
> actually find out the underlying hardware type the packet was captured
> on, such as is reported by the sll_hatype field of the struct sockaddr_ll
> when the packet is sent up to userland.
> 
> Unless I've managed to miss a trick somewhere, this would seem to put a
> fairly fundamental blocker on actually being able to filter in such
> packets. Granted there's the SKF_OFF_NET area to inspect at the e.g. IPv4
> level, but this makes it impossible to do anything on e.g. the Ethernet
> level.
> 
> See below for a patch to add an SKF_AD_HATYPE field, up among the other
> special access fields around SKF_AD_OFF.

This looks fine but you need to submit your patch properly,
including proper "Signed-off-by: " tags etc.  see
Documentation/SubmittingPatches for details.

Please make a complete fresh new submission, and don't try to shortcut
this by just replying and adding the Signed-off-by: or anything like
that.

Thanks.

^ permalink raw reply

* Re: [PATCH v3] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22  6:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev, Tom Herbert, jamal
In-Reply-To: <1271891149.7895.3751.camel@edumazet-laptop>

On Thu, Apr 22, 2010 at 7:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 14 avril 2010 à 17:52 +0800, Changli Gao a écrit :
>> batch skb dequeueing from softnet input_pkt_queue
>>
>> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
>> contention and irq disabling/enabling.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>> ----
>
> lock contention _is_ a problem, Jamal tests can show it.
>
> irq disabling/enabling is not, and force to use stop_machine() killer.
>

Although irq_disabling/enabling is not, we should do our best to make
fast path as quickly as possible, and because stop_machine() is used
in slow patch, I think we can afford its weight.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH v4] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22  6:30 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, jamal, Tom Herbert, Eric Dumazet, netdev
In-Reply-To: <20100421231843.4c284991@nehalam>

On Thu, Apr 22, 2010 at 2:18 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Thu, 22 Apr 2010 13:51:53 +0800
> Changli Gao <xiaosuo@gmail.com> wrote:
>
>> +     struct sk_buff          *input_pkt_queue_head;
>> +     struct sk_buff          **input_pkt_queue_tailp;
>> +     unsigned int            input_pkt_queue_len;
>> +     unsigned int            process_queue_len;
>
> Why is opencoding a skb queue a step forward?
> Just keep using sk_queue routines, just not the locked variants.
>

I want to keep the critical section of rps_lock() as small as possible
to reduce the potential lock contention, when RPS is used.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH v4] net: batch skb dequeueing from softnet input_pkt_queue
From: Stephen Hemminger @ 2010-04-22  6:18 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, jamal, Tom Herbert, Eric Dumazet, netdev
In-Reply-To: <1271915513-2966-1-git-send-email-xiaosuo@gmail.com>

On Thu, 22 Apr 2010 13:51:53 +0800
Changli Gao <xiaosuo@gmail.com> wrote:

> +	struct sk_buff		*input_pkt_queue_head;
> +	struct sk_buff		**input_pkt_queue_tailp;
> +	unsigned int		input_pkt_queue_len;
> +	unsigned int		process_queue_len;

Why is opencoding a skb queue a step forward?
Just keep using sk_queue routines, just not the locked variants.

-- 

^ permalink raw reply

* Re: [PATCH v2] rps: optimize rps_get_cpu()
From: Changli Gao @ 2010-04-22  6:10 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <20100421.224014.77255834.davem@davemloft.net>

On Thu, Apr 22, 2010 at 1:40 PM, David Miller <davem@davemloft.net> wrote:
>
> I'll buy you a cookie if you can find a multiply generated by the
> compiler for "x * 4".  It's going to use shifts and those are
> basically free.

On amd64:

                if (pskb_may_pull(skb, (ihl * 4) + 4)) {
    2794:       8d 34 9d 04 00 00 00    lea    0x4(,%rbx,4),%esi
    279b:       4c 89 ef                mov    %r13,%rdi
    279e:       e8 a5 fd ff ff          callq  2548 <pskb_may_pull>
    27a3:       85 c0                   test   %eax,%eax
    27a5:       74 28                   je     27cf <get_rps_cpu+0x169>
                        __be16 *hports = (__be16 *) (skb->data + (ihl * 4));
    27a7:       8d 04 9d 00 00 00 00    lea    0x0(,%rbx,4),%eax

the compiler uses lea instead of multiply, and it should be more
efficient, but i'm not sure. Is there a equivalent of lea on the other
architectures?

>
> Please just change one thing at a time.  It would have helped you
> here.  I was willing to apply the port dereference part of your
> change, but not necessarily the 'ihl' changes.  But because you've
> combined them, I have no choice but to reject everything.
>

Ok. I think the first version is ready to apply, it only has the port
dereference part.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* setsockopt with cmsghdr needs COMPAT support?
From: Andrew May @ 2010-04-22  4:45 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

[-- Attachment #1: Type: text/plain, Size: 895 bytes --]

I have a userspace app that is doing an IPv6 IPV6_2292PKTOPTIONS
setsockopt to add an Extension header in a mixed 64 bit/32 bit setup.

It is failing with an EINVAL because it seems the cmsghdr doesn't get
the proper fixup.
This isn't stuff I really look at much but I came up with this hack to
at least get past the error. All my userspace is 32 bits so I just
put in the "#if 1" rather than attempting a runtime check on the socket.

I am not sure if the userspace app is doing something wrong, but it
seems like this is a real problem. The "on the stack" assumption by
the fixup helper seems like it really should be reworked, but I have no
idea how it should be done. And I didn't bother to handle the the
getsockopt function.
Doing a grep I didn't find any other offenders, but I can't say for
sure.

Does anyone have any ideas on the "right" way to fix this, or point out
my flaw?

Thanks.

[-- Attachment #2: compat.patch --]
[-- Type: text/x-patch, Size: 1821 bytes --]

diff --git a/net/compat.c b/net/compat.c
index ec24d9e..1a6fcb1 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -213,6 +213,7 @@ Efault:
 		sock_kfree_s(sk, kcmsg_base, kcmlen);
 	return err;
 }
+EXPORT_SYMBOL(cmsghdr_from_user_compat_to_kern);
 
 int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *data)
 {
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 33f60fc..3907ce4 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -422,6 +422,11 @@ sticky_done:
 		struct msghdr msg;
 		struct flowi fl;
 		int junk;
+#if 1
+		int compat_alloc = optlen;
+#else
+		int compat_alloc = 0;
+#endif
 
 		fl.fl6_flowlabel = 0;
 		fl.oif = sk->sk_bound_dev_if;
@@ -436,13 +441,29 @@ sticky_done:
 		retv = -EINVAL;
 		if (optlen > 64*1024)
 			break;
-
-		opt = sock_kmalloc(sk, sizeof(*opt) + optlen, GFP_KERNEL);
+		opt = sock_kmalloc(sk, sizeof(*opt) + optlen + compat_alloc,
+				   GFP_KERNEL);
 		retv = -ENOBUFS;
 		if (opt == NULL)
 			break;
 
 		memset(opt, 0, sizeof(*opt));
+#if 1
+		msg.msg_controllen = optlen;
+		msg.msg_control = optval;
+		retv = cmsghdr_from_user_compat_to_kern(&msg, sk, (void*)(opt+1),
+							optlen + compat_alloc );
+
+		printk( KERN_ERR "Cmsghdr conver ret %d len from %d to %d\n",
+			retv, (int)optlen, (int)msg.msg_controllen );
+		if (retv)
+			goto done;
+		if ( msg.msg_control != (opt+1) ){
+			printk( KERN_ERR "cmsg realloc issue???" );
+			/*Screwed*/
+		}
+		opt->tot_len = sizeof(*opt) + msg.msg_controllen;
+#else
 		opt->tot_len = sizeof(*opt) + optlen;
 		retv = -EFAULT;
 		if (copy_from_user(opt+1, optval, optlen))
@@ -450,6 +471,7 @@ sticky_done:
 
 		msg.msg_controllen = optlen;
 		msg.msg_control = (void*)(opt+1);
+#endif
 
 		retv = datagram_send_ctl(net, &msg, &fl, opt, &junk, &junk);
 		if (retv)

^ permalink raw reply related

* Re: cxgb4: Make unnecessarily global functions static
From: David Miller @ 2010-04-22  6:01 UTC (permalink / raw)
  To: rdreier; +Cc: dm, netdev
In-Reply-To: <adapr1spr7e.fsf@roland-alpha.cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Wed, 21 Apr 2010 11:59:17 -0700

> Also put t4_write_indirect() inside "#if 0" to avoid a "defined but not
> used" compile warning.
> 
> Signed-off-by: Roland Dreier <rolandd@cisco.com>

Also applied to net-next-2.6, thanks Roland.

^ permalink raw reply

* Re: cxgb4: Use ntohs() on __be16 value instead of htons()
From: David Miller @ 2010-04-22  6:00 UTC (permalink / raw)
  To: dm; +Cc: rdreier, netdev
In-Reply-To: <4BCF4F57.4050802@chelsio.com>

From: Dimitris Michailidis <dm@chelsio.com>
Date: Wed, 21 Apr 2010 12:17:43 -0700

> On 04/21/2010 11:09 AM, Roland Dreier wrote:
>> Use the correct direction of byte-swapping function to fix a mistake
>> shown by sparse endianness checking -- c.fl0id is __be16.
>>
>> Signed-off-by: Roland Dreier<rolandd@cisco.com>
> 
> Yes, thanks.
> 
> Acked-by: Dimitris Michailidis <dm@chelsio.com>

Applied to net-next-2.6

^ permalink raw reply

* Re: [PATCH] net: ipv6 bind to device issue
From: David Miller @ 2010-04-22  5:58 UTC (permalink / raw)
  To: brian.haley
  Cc: jolsa, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet,
	netdev
In-Reply-To: <20100421.225015.137831360.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 21 Apr 2010 22:50:15 -0700 (PDT)

> Jiri please respin your patch with the argument order
> reversed so that we can make the inexpensive check before
> the expensive one.

Nevermind, I see you posted an updated version already,
which I've applied, thanks!

^ permalink raw reply

* Re: [PATCH] ethernet: print protocol in host byte order
From: David Miller @ 2010-04-22  5:57 UTC (permalink / raw)
  To: johannes; +Cc: netdev, eric.dumazet
In-Reply-To: <1271833567.3627.12.camel@jlt3.sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Wed, 21 Apr 2010 09:06:07 +0200

> Eric's recent patch added __force, but this
> place would seem to require actually doing
> a byte order conversion so the printk is
> consistent across architectures.
> 
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>

Applied to net-next-2.6, thanks a lot Johannes.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: David Miller @ 2010-04-22  5:56 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1271830116.7895.1316.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Apr 2010 08:08:36 +0200

> Le dimanche 18 avril 2010 à 02:46 -0700, David Miller a écrit :
> 
>> Looks good, applied, thanks Eric.
> 
> Hmm, looking at the GSO stuff, I believe we should not call
> skb_orphan_try() on gso skbs ?

Right, I've applied this, thanks.

What we should probably do instead is call and NULL out the
DEV_GSO_CB() destructor.  Right?

^ permalink raw reply

* [PATCH v4] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22  5:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: jamal, Tom Herbert, Eric Dumazet, netdev, Changli Gao

batch skb dequeueing from softnet input_pkt_queue

batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
contention when RPS is enabled. input_pkt_queue is reimplemented as a single
linked list(FIFO), and input_pkt_queue_lock is moved into RPS section, so
softnet becomes smaller when RPS is disabled than before.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |   14 +++++--
 net/core/dev.c            |   92 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 82 insertions(+), 24 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3c5ed5f..5ccb92b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1387,6 +1387,7 @@ struct softnet_data {
 	struct Qdisc		*output_queue;
 	struct list_head	poll_list;
 	struct sk_buff		*completion_queue;
+	struct sk_buff		*process_queue;
 
 #ifdef CONFIG_RPS
 	struct softnet_data	*rps_ipi_list;
@@ -1396,15 +1397,22 @@ struct softnet_data {
 	struct softnet_data	*rps_ipi_next;
 	unsigned int		cpu;
 	unsigned int		input_queue_head;
+	spinlock_t		input_pkt_queue_lock;
+	/* 4 bytes hole on 64bits machine */
 #endif
-	struct sk_buff_head	input_pkt_queue;
+	struct sk_buff		*input_pkt_queue_head;
+	struct sk_buff		**input_pkt_queue_tailp;
+	unsigned int		input_pkt_queue_len;
+	unsigned int		process_queue_len;
+
 	struct napi_struct	backlog;
 };
 
-static inline void input_queue_head_incr(struct softnet_data *sd)
+static inline void input_queue_head_add(struct softnet_data *sd,
+					unsigned int len)
 {
 #ifdef CONFIG_RPS
-	sd->input_queue_head++;
+	sd->input_queue_head += len;
 #endif
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index e904c47..81c7877 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -211,14 +211,14 @@ static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex)
 static inline void rps_lock(struct softnet_data *sd)
 {
 #ifdef CONFIG_RPS
-	spin_lock(&sd->input_pkt_queue.lock);
+	spin_lock(&sd->input_pkt_queue_lock);
 #endif
 }
 
 static inline void rps_unlock(struct softnet_data *sd)
 {
 #ifdef CONFIG_RPS
-	spin_unlock(&sd->input_pkt_queue.lock);
+	spin_unlock(&sd->input_pkt_queue_lock);
 #endif
 }
 
@@ -2402,6 +2402,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 {
 	struct softnet_data *sd;
 	unsigned long flags;
+	unsigned int qlen;
 
 	sd = &per_cpu(softnet_data, cpu);
 
@@ -2409,12 +2410,16 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	__get_cpu_var(netdev_rx_stat).total++;
 
 	rps_lock(sd);
-	if (sd->input_pkt_queue.qlen <= netdev_max_backlog) {
-		if (sd->input_pkt_queue.qlen) {
+	qlen = sd->input_pkt_queue_len + sd->process_queue_len;
+	if (qlen <= netdev_max_backlog) {
+		if (qlen) {
 enqueue:
-			__skb_queue_tail(&sd->input_pkt_queue, skb);
+			skb->next = NULL;
+			*sd->input_pkt_queue_tailp = skb;
+			sd->input_pkt_queue_tailp = &skb->next;
+			sd->input_pkt_queue_len++;
 #ifdef CONFIG_RPS
-			*qtail = sd->input_queue_head + sd->input_pkt_queue.qlen;
+			*qtail = sd->input_queue_head + sd->input_pkt_queue_len;
 #endif
 			rps_unlock(sd);
 			local_irq_restore(flags);
@@ -2931,16 +2936,33 @@ static void flush_backlog(void *arg)
 {
 	struct net_device *dev = arg;
 	struct softnet_data *sd = &__get_cpu_var(softnet_data);
-	struct sk_buff *skb, *tmp;
+	struct sk_buff **pskb, *skb;
 
 	rps_lock(sd);
-	skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp)
+	for (pskb = &sd->input_pkt_queue_head; *pskb; ) {
+		skb = *pskb;
 		if (skb->dev == dev) {
-			__skb_unlink(skb, &sd->input_pkt_queue);
+			*pskb = skb->next;
 			kfree_skb(skb);
-			input_queue_head_incr(sd);
+			input_queue_head_add(sd, 1);
+			sd->input_pkt_queue_len--;
+		} else {
+			pskb = &skb->next;
 		}
+	}
+	sd->input_pkt_queue_tailp = pskb;
 	rps_unlock(sd);
+
+	for (pskb = &sd->process_queue; *pskb; ) {
+		skb = *pskb;
+		if (skb->dev == dev) {
+			*pskb = skb->next;
+			kfree_skb(skb);
+			sd->process_queue_len--;
+		} else {
+			pskb = &skb->next;
+		}
+	}
 }
 
 static int napi_gro_complete(struct sk_buff *skb)
@@ -3249,25 +3271,39 @@ static int process_backlog(struct napi_struct *napi, int quota)
 	struct softnet_data *sd = &__get_cpu_var(softnet_data);
 
 	napi->weight = weight_p;
+	local_irq_disable();
 	do {
 		struct sk_buff *skb;
 
-		local_irq_disable();
+		while (sd->process_queue) {
+			skb = sd->process_queue;
+			sd->process_queue = skb->next;
+			sd->process_queue_len--;
+			local_irq_enable();
+			__netif_receive_skb(skb);
+			if (++work >= quota)
+				goto out;
+			local_irq_disable();
+		}
+
 		rps_lock(sd);
-		skb = __skb_dequeue(&sd->input_pkt_queue);
-		if (!skb) {
+		if (sd->input_pkt_queue_head == NULL) {
 			__napi_complete(napi);
 			rps_unlock(sd);
 			local_irq_enable();
 			break;
 		}
-		input_queue_head_incr(sd);
-		rps_unlock(sd);
-		local_irq_enable();
 
-		__netif_receive_skb(skb);
-	} while (++work < quota);
+		sd->process_queue = sd->input_pkt_queue_head;
+		sd->process_queue_len = sd->input_pkt_queue_len;
+		input_queue_head_add(sd, sd->input_pkt_queue_len);
+		sd->input_pkt_queue_head = NULL;
+		sd->input_pkt_queue_tailp = &sd->input_pkt_queue_head;
+		sd->input_pkt_queue_len = 0;
+		rps_unlock(sd);
+	} while (1);
 
+out:
 	return work;
 }
 
@@ -5621,10 +5657,19 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 	local_irq_enable();
 
 	/* Process offline CPU's input_pkt_queue */
-	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
+	while ((skb = oldsd->input_pkt_queue_head)) {
+		oldsd->input_pkt_queue_head = skb->next;
+		netif_rx(skb);
+	}
+	oldsd->input_pkt_queue_tailp = &oldsd->input_pkt_queue_head;
+	input_queue_head_add(oldsd, oldsd->input_pkt_queue_len);
+	oldsd->input_pkt_queue_len = 0;
+
+	while ((skb = oldsd->process_queue)) {
+		oldsd->process_queue = skb->next;
 		netif_rx(skb);
-		input_queue_head_incr(oldsd);
 	}
+	oldsd->process_queue_len = 0;
 
 	return NOTIFY_OK;
 }
@@ -5842,11 +5887,16 @@ static int __init net_dev_init(void)
 	for_each_possible_cpu(i) {
 		struct softnet_data *sd = &per_cpu(softnet_data, i);
 
-		skb_queue_head_init(&sd->input_pkt_queue);
+		sd->input_pkt_queue_head = NULL;
+		sd->input_pkt_queue_tailp = &sd->input_pkt_queue_head;
+		sd->input_pkt_queue_len = 0;
+		sd->process_queue = NULL;
+		sd->process_queue_len = 0;
 		sd->completion_queue = NULL;
 		INIT_LIST_HEAD(&sd->poll_list);
 
 #ifdef CONFIG_RPS
+		spin_lock_init(&sd->input_pkt_queue_lock);
 		sd->csd.func = rps_trigger_softirq;
 		sd->csd.info = sd;
 		sd->csd.flags = 0;

^ permalink raw reply related

* Re: [PATCH] net: ipv6 bind to device issue
From: David Miller @ 2010-04-22  5:50 UTC (permalink / raw)
  To: brian.haley
  Cc: jolsa, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet,
	netdev
In-Reply-To: <4BCDEED3.7040901@hp.com>

From: Brian Haley <brian.haley@hp.com>
Date: Tue, 20 Apr 2010 14:13:39 -0400

> Actually, looking at this again, we might want to swap the order
> here since fl->oif should be filled-in for most link-local and
> multicast requests calling this:
> 
> 	if (fl->oif || rt6_need_strict(&fl->fl6_dst))
> 
> Just a thought, but it potentially saves a call to determine
> the scope of the address.

Yes I think we should make this change.

Jiri please respin your patch with the argument order
reversed so that we can make the inexpensive check before
the expensive one.

Thanks.

^ permalink raw reply

* Re: [PATCH BUG-FIX] ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU field less than IPV6_MIN_MTU
From: David Miller @ 2010-04-22  5:48 UTC (permalink / raw)
  To: herbert
  Cc: shanwei, yoshfuji, yjwei, vladislav.yasevich, kuznet, pekkas,
	jmorris, kaber, eric.dumazet, sri, netdev, linux-sctp
In-Reply-To: <20100419035535.GA7011@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 19 Apr 2010 11:55:35 +0800

> On Mon, Apr 19, 2010 at 10:58:22AM +0800, Shan Wei wrote:
>> 
>> According to RFC2460, PMTU is set to the IPv6 Minimum Link
>> MTU (1280) and a fragment header should always be included
>> after a node receiving Too Big message reporting PMTU is
>> less than the IPv6 Minimum Link MTU.
>> 
>> After receiving a ICMPv6 Too Big message reporting PMTU is
>> less than the IPv6 Minimum Link MTU, sctp *can't* send any
>> data/control chunk that total length including IPv6 head 
>> and IPv6 extend head is less than IPV6_MIN_MTU(1280 bytes).
>> 
>> The failure occured in p6_fragment(), about reason 
>> see following(take SHUTDOWN chunk for example):
>> sctp_packet_transmit (SHUTDOWN chunk, len=16 byte)
>> |------sctp_v6_xmit (local_df=0)
>>    |------ip6_xmit
>>        |------ip6_output (dst_allfrag is ture)
>>            |------ip6_fragment
>> 
>> In ip6_fragment(), for local_df=0, drops the the packet
>> and returns EMSGSIZE.
>> 
>> The patch fixes it with adding check length of skb->len.
>> In this case, Ipv6 not to fragment upper protocol data,
>> just only add a fragment header before it. 
>> 
>> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
> 
> The patch looks good to me.
> 
> If we wanted to optimise the allfrags case it may be better
> to reserve the space beforehand and generate the fragment header
> at the same time as we're doing the IPv6 header.
> 
> But it can't be all that important as it's been broken for so
> many years.

Right, I've applied Shan's patch, thanks.

^ permalink raw reply

* Re: [PATCH] drivers/net/usb: Add new driver ipheth
From: David Miller @ 2010-04-22  5:44 UTC (permalink / raw)
  To: diego-KR0zwsgql1HQT0dZR+AlfA
  Cc: agimenez-lqZFv/KUvpAxAGwisGp4zA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dborca-/E1597aS9LQAvxtiuMwx3w,
	James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk,
	ralf-6z/3iImG2C8G8FEW9MqTrA, gregkh-l3A5Bk7waGM,
	jonas.sjoquist-IzeFyvvaP7pWk0Htik3J/w,
	torgny.johansson-Re5JQEeQqe8AvxtiuMwx3w,
	steve.glendinning-sdUf+H5yV5I,
	dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f,
	omar.oberthur-Re5JQEeQqe8AvxtiuMwx3w,
	remi.denis-courmont-xNZwKgViW5gAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <x2g1b0798831004210715h37253bc5y74baf86556aea7c5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

From: Diego Giagio <diego-KR0zwsgql1HQT0dZR+AlfA@public.gmane.org>
Date: Wed, 21 Apr 2010 11:15:15 -0300

> On Sun, Apr 18, 2010 at 3:35 PM, L. Alberto Giménez
> <agimenez-lqZFv/KUvpAxAGwisGp4zA@public.gmane.org> wrote:
>> From: Diego Giagio <diego-KR0zwsgql1HQT0dZR+AlfA@public.gmane.org>
>>
>> Add new driver to use tethering with an iPhone device. After initial submission,
>> apply fixes to fit the new driver into the kernel standards.
>>
>> There are still a couple of minor (almost cosmetic-level) issues, but the driver
>> is fully functional right now.
>>
> 
> Signed-off-by: Diego Giagio <diego-KR0zwsgql1HQT0dZR+AlfA@public.gmane.org>
> Cc: Daniel Borca <dborca-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] TCP: avoid to send keepalive probes if it is receiving data
From: David Miller @ 2010-04-22  5:42 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: eric.dumazet, fleitner, netdev
In-Reply-To: <alpine.DEB.2.00.1004182328370.19304@melkinpaasi.cs.helsinki.fi>

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sun, 18 Apr 2010 23:34:15 +0300 (EEST)

> I fail to see why the addition of this new variable is necessary at all, 
> could either of you enlight me why exactly it's necessary and rcv_tstamp 
> will not suffice?

I agree, the existing rcv_tstamp should serve this purpose just
fine.

^ permalink raw reply

* Re: [PATCH v2] rps: optimize rps_get_cpu()
From: David Miller @ 2010-04-22  5:40 UTC (permalink / raw)
  To: xiaosuo; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <1271772160-28177-1-git-send-email-xiaosuo@gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Tue, 20 Apr 2010 22:02:40 +0800

> use ihl in bytes to eliminate later multiplyings.

I'll buy you a cookie if you can find a multiply generated by the
compiler for "x * 4".  It's going to use shifts and those are
basically free.

Please just change one thing at a time.  It would have helped you
here.  I was willing to apply the port dereference part of your
change, but not necessarily the 'ihl' changes.  But because you've
combined them, I have no choice but to reject everything.

^ permalink raw reply

* Re: [PATCH 1/2][RESEND] ehea: error handling improvement
From: David Miller @ 2010-04-22  5:36 UTC (permalink / raw)
  To: tklein; +Cc: netdev, linuxppc-dev, linux-kernel, themann
In-Reply-To: <201004211110.55986.tklein@de.ibm.com>

From: Thomas Klein <tklein@de.ibm.com>
Date: Wed, 21 Apr 2010 11:10:55 +0200

> Reset a port's resources only if they're actually in an error state
> 
> Signed-off-by: Thomas Klein <tklein@de.ibm.com>
> ---
> 
> Patch created against net-2.6

I thought you were sorry for wasting my time and that you were going
to follow the directions I gave you last time, and I quote:

--------------------
3) These are not appropriate for net-2.6 as we are deep in
   the -rcX series at this point and only the most diabolical
   bug fixes are appropriate.  Therefore, please generate these
   against net-next-2.6, thanks.
--------------------

And here you are generating your patches against net-2.6.  Heck, you
even feel it's worth mentioning explicitly.

Lucky for you the patches happen to apply cleanly to net-next-2.6 so
I've put them there.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox