Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: cxgb4: Make unnecessarily global functions static
From: David Miller @ 2010-04-22  6:01 UTC (permalink / raw)
  To: rdreier; +Cc: dm, netdev
In-Reply-To: <adapr1spr7e.fsf@roland-alpha.cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Wed, 21 Apr 2010 11:59:17 -0700

> Also put t4_write_indirect() inside "#if 0" to avoid a "defined but not
> used" compile warning.
> 
> Signed-off-by: Roland Dreier <rolandd@cisco.com>

Also applied to net-next-2.6, thanks Roland.

^ permalink raw reply

* setsockopt with cmsghdr needs COMPAT support?
From: Andrew May @ 2010-04-22  4:45 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

[-- Attachment #1: Type: text/plain, Size: 895 bytes --]

I have a userspace app that is doing an IPv6 IPV6_2292PKTOPTIONS
setsockopt to add an Extension header in a mixed 64 bit/32 bit setup.

It is failing with an EINVAL because it seems the cmsghdr doesn't get
the proper fixup.
This isn't stuff I really look at much but I came up with this hack to
at least get past the error. All my userspace is 32 bits so I just
put in the "#if 1" rather than attempting a runtime check on the socket.

I am not sure if the userspace app is doing something wrong, but it
seems like this is a real problem. The "on the stack" assumption by
the fixup helper seems like it really should be reworked, but I have no
idea how it should be done. And I didn't bother to handle the the
getsockopt function.
Doing a grep I didn't find any other offenders, but I can't say for
sure.

Does anyone have any ideas on the "right" way to fix this, or point out
my flaw?

Thanks.

[-- Attachment #2: compat.patch --]
[-- Type: text/x-patch, Size: 1821 bytes --]

diff --git a/net/compat.c b/net/compat.c
index ec24d9e..1a6fcb1 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -213,6 +213,7 @@ Efault:
 		sock_kfree_s(sk, kcmsg_base, kcmlen);
 	return err;
 }
+EXPORT_SYMBOL(cmsghdr_from_user_compat_to_kern);

 int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *data)
 {
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 33f60fc..3907ce4 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -422,6 +422,11 @@ sticky_done:
 		struct msghdr msg;
 		struct flowi fl;
 		int junk;
+#if 1
+		int compat_alloc = optlen;
+#else
+		int compat_alloc = 0;
+#endif

 		fl.fl6_flowlabel = 0;
 		fl.oif = sk->sk_bound_dev_if;
@@ -436,13 +441,29 @@ sticky_done:
 		retv = -EINVAL;
 		if (optlen > 64*1024)
 			break;
-
-		opt = sock_kmalloc(sk, sizeof(*opt) + optlen, GFP_KERNEL);
+		opt = sock_kmalloc(sk, sizeof(*opt) + optlen + compat_alloc,
+				   GFP_KERNEL);
 		retv = -ENOBUFS;
 		if (opt == NULL)
 			break;

 		memset(opt, 0, sizeof(*opt));
+#if 1
+		msg.msg_controllen = optlen;
+		msg.msg_control = optval;
+		retv = cmsghdr_from_user_compat_to_kern(&msg, sk, (void*)(opt+1),
+							optlen + compat_alloc );
+
+		printk( KERN_ERR "Cmsghdr conver ret %d len from %d to %d\n",
+			retv, (int)optlen, (int)msg.msg_controllen );
+		if (retv)
+			goto done;
+		if ( msg.msg_control != (opt+1) ){
+			printk( KERN_ERR "cmsg realloc issue???" );
+			/*Screwed*/
+		}
+		opt->tot_len = sizeof(*opt) + msg.msg_controllen;
+#else
 		opt->tot_len = sizeof(*opt) + optlen;
 		retv = -EFAULT;
 		if (copy_from_user(opt+1, optval, optlen))
@@ -450,6 +471,7 @@ sticky_done:

 		msg.msg_controllen = optlen;
 		msg.msg_control = (void*)(opt+1);
+#endif

 		retv = datagram_send_ctl(net, &msg, &fl, opt, &junk, &junk);
 		if (retv)

^ permalink raw reply related

* Re: [PATCH v2] rps: optimize rps_get_cpu()
From: Changli Gao @ 2010-04-22  6:10 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <20100421.224014.77255834.davem@davemloft.net>

On Thu, Apr 22, 2010 at 1:40 PM, David Miller <davem@davemloft.net> wrote:
>
> I'll buy you a cookie if you can find a multiply generated by the
> compiler for "x * 4".  It's going to use shifts and those are
> basically free.

On amd64:

                if (pskb_may_pull(skb, (ihl * 4) + 4)) {
    2794:       8d 34 9d 04 00 00 00    lea    0x4(,%rbx,4),%esi
    279b:       4c 89 ef                mov    %r13,%rdi
    279e:       e8 a5 fd ff ff          callq  2548 <pskb_may_pull>
    27a3:       85 c0                   test   %eax,%eax
    27a5:       74 28                   je     27cf <get_rps_cpu+0x169>
                        __be16 *hports = (__be16 *) (skb->data + (ihl * 4));
    27a7:       8d 04 9d 00 00 00 00    lea    0x0(,%rbx,4),%eax

the compiler uses lea instead of multiply, and it should be more
efficient, but i'm not sure. Is there a equivalent of lea on the other
architectures?

>
> Please just change one thing at a time.  It would have helped you
> here.  I was willing to apply the port dereference part of your
> change, but not necessarily the 'ihl' changes.  But because you've
> combined them, I have no choice but to reject everything.
>

Ok. I think the first version is ready to apply, it only has the port
dereference part.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH v4] net: batch skb dequeueing from softnet input_pkt_queue
From: Stephen Hemminger @ 2010-04-22  6:18 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, jamal, Tom Herbert, Eric Dumazet, netdev
In-Reply-To: <1271915513-2966-1-git-send-email-xiaosuo@gmail.com>

On Thu, 22 Apr 2010 13:51:53 +0800
Changli Gao <xiaosuo@gmail.com> wrote:

> +	struct sk_buff		*input_pkt_queue_head;
> +	struct sk_buff		**input_pkt_queue_tailp;
> +	unsigned int		input_pkt_queue_len;
> +	unsigned int		process_queue_len;

Why is opencoding a skb queue a step forward?
Just keep using sk_queue routines, just not the locked variants.

-- 

^ permalink raw reply

* Re: [PATCH v4] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22  6:30 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, jamal, Tom Herbert, Eric Dumazet, netdev
In-Reply-To: <20100421231843.4c284991@nehalam>

On Thu, Apr 22, 2010 at 2:18 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Thu, 22 Apr 2010 13:51:53 +0800
> Changli Gao <xiaosuo@gmail.com> wrote:
>
>> +     struct sk_buff          *input_pkt_queue_head;
>> +     struct sk_buff          **input_pkt_queue_tailp;
>> +     unsigned int            input_pkt_queue_len;
>> +     unsigned int            process_queue_len;
>
> Why is opencoding a skb queue a step forward?
> Just keep using sk_queue routines, just not the locked variants.
>

I want to keep the critical section of rps_lock() as small as possible
to reduce the potential lock contention, when RPS is used.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH v3] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22  6:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev, Tom Herbert, jamal
In-Reply-To: <1271891149.7895.3751.camel@edumazet-laptop>

On Thu, Apr 22, 2010 at 7:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 14 avril 2010 à 17:52 +0800, Changli Gao a écrit :
>> batch skb dequeueing from softnet input_pkt_queue
>>
>> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
>> contention and irq disabling/enabling.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>> ----
>
> lock contention _is_ a problem, Jamal tests can show it.
>
> irq disabling/enabling is not, and force to use stop_machine() killer.
>

Although irq_disabling/enabling is not, we should do our best to make
fast path as quickly as possible, and because stop_machine() is used
in slow patch, I think we can afford its weight.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] Socket filter access to hatype
From: David Miller @ 2010-04-22  6:42 UTC (permalink / raw)
  To: leonerd; +Cc: netdev
In-Reply-To: <20100421172546.GO19334@cel.leo>

From: Paul LeoNerd Evans <leonerd@leonerd.org.uk>
Date: Wed, 21 Apr 2010 18:25:46 +0100

> When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all
> interfaces, there doesn't appear to be a way for the filter program to
> actually find out the underlying hardware type the packet was captured
> on, such as is reported by the sll_hatype field of the struct sockaddr_ll
> when the packet is sent up to userland.
> 
> Unless I've managed to miss a trick somewhere, this would seem to put a
> fairly fundamental blocker on actually being able to filter in such
> packets. Granted there's the SKF_OFF_NET area to inspect at the e.g. IPv4
> level, but this makes it impossible to do anything on e.g. the Ethernet
> level.
> 
> See below for a patch to add an SKF_AD_HATYPE field, up among the other
> special access fields around SKF_AD_OFF.

This looks fine but you need to submit your patch properly,
including proper "Signed-off-by: " tags etc.  see
Documentation/SubmittingPatches for details.

Please make a complete fresh new submission, and don't try to shortcut
this by just replying and adding the Signed-off-by: or anything like
that.

Thanks.

^ permalink raw reply

* Re: [net-next PATCH 1/2] add iovnl netlink support
From: David Miller @ 2010-04-22  6:48 UTC (permalink / raw)
  To: scofeldm; +Cc: netdev, chrisw
In-Reply-To: <20100419191807.10423.84600.stgit@savbu-pc100.cisco.com>

From: Scott Feldman <scofeldm@cisco.com>
Date: Mon, 19 Apr 2010 12:18:07 -0700

> +#define IOVNL_PROTO_VERSION 1
> +

Please delete this in the final version, the macro isn't even used by
the code.

We don't do protocol versioning in netlink.  Instead we get the base
stuff solid from the beginning, and then if something needs fixing up
we handle this using new attributes in a way which is both backward
and forward compatible.

Thanks.

^ permalink raw reply

* Re: [net-next,1/2] add iovnl netlink support
From: Arnd Bergmann @ 2010-04-22  6:51 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev
In-Reply-To: <20100421224802.GF28829@x200.localdomain>

On Thursday 22 April 2010, Chris Wright wrote:
> > 
> > ip link add link eth0 type macvlan    # for a container
> > ip link add link eth0 type macvtap    # for qemu/vhost
> > ip link add link eth0 type vf         # for device assignment
> 
> BTW, what do you mean by device assignment?

I mean giving an SR-IOV VF to the guest as a native PCI device
rather than having qemu or vhost present a virtio-net to the
guest.

> > There are obviously significant differences between these three, but
> > they also share enough of their properties to let us treat them
> > in similar ways.
> > 
> > If we integrate the iovnl client into iproute2, the sequence for setting
> > up an enic VF and associating it to the port profile could be
> > 
> > # create vf0, pass mac and vlan id to HW, no association yet
> > ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> 
> Just to clarify...right now, the normal SR-IOV VF is already there.
> And, or course, can have its mac addr/vlan set already.

I don't have an SR-IOV card available for testing yet. How is this
configured now?

> > # associate vf with port profile, mac address must match the one assigned
> > #  to the interface before.
> > ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> >        mac fe:dc:ba:12:34:56
> 
> At that point you could just do s/mac fe:.*/link vf0/

My point was that this information should be irrelevant to the code doing the
association with the switch. It sort of makes sense when the receiver is enic,
but when we send the same data to lldpad, it doesn't care about the slave device
name but only about the mac address. Especially since the slave device might not
be in the root name space any more, meaning we have no way to find it.

	Arnd

^ permalink raw reply

* Re: [net-next PATCH 1/2] add iovnl netlink support
From: David Miller @ 2010-04-22  6:52 UTC (permalink / raw)
  To: scofeldm; +Cc: netdev, chrisw
In-Reply-To: <20100419191807.10423.84600.stgit@savbu-pc100.cisco.com>

From: Scott Feldman <scofeldm@cisco.com>
Date: Mon, 19 Apr 2010 12:18:07 -0700

> +	if (tb[IOV_ATTR_VF_IFNAME])
> +		vf_dev = dev_get_by_name(&init_net,
> +			nla_data(tb[IOV_ATTR_VF_IFNAME]));

It's probably best to check this for NULL and notify
the user with an error in that case (don't forget to
put 'dev' in that error path :-)

As things stand it looks like if we can't find vf_dev, we'll just send
NULL down to the vf_dev arg of the various operations and possibly
silently succeed.

That's not desirable, semantically.

^ permalink raw reply

* Re: [net-next,1/2] add iovnl netlink support
From: David Miller @ 2010-04-22  7:09 UTC (permalink / raw)
  To: arnd; +Cc: scofeldm, chrisw, netdev
In-Reply-To: <201004212313.05060.arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 21 Apr 2010 23:13:04 +0200

> My preference would probably be make these a subcategory of the
> if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.

I was going to suggest this as well.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: Eric Dumazet @ 2010-04-22  7:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100421.225625.177238009.davem@davemloft.net>

Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit :

> Right, I've applied this, thanks.
> 
> What we should probably do instead is call and NULL out the
> DEV_GSO_CB() destructor.  Right?

Yes, probably, I'll take a look at this if you want.

Thanks



^ permalink raw reply

* Re: [PATCH v3] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-04-22  7:13 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev, Tom Herbert, jamal
In-Reply-To: <j2s412e6f7f1004212333se60a9083s59185ee3466313f@mail.gmail.com>

Le jeudi 22 avril 2010 à 14:33 +0800, Changli Gao a écrit :
> On Thu, Apr 22, 2010 at 7:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mercredi 14 avril 2010 à 17:52 +0800, Changli Gao a écrit :
> >> batch skb dequeueing from softnet input_pkt_queue
> >>
> >> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
> >> contention and irq disabling/enabling.
> >>
> >> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> >> ----
> >
> > lock contention _is_ a problem, Jamal tests can show it.
> >
> > irq disabling/enabling is not, and force to use stop_machine() killer.
> >
> 
> Although irq_disabling/enabling is not, we should do our best to make
> fast path as quickly as possible, and because stop_machine() is used
> in slow patch, I think we can afford its weight.
> 
> 

No thanks, this is out of the question.

Talk to ixiacom guys, some people settle/dismantle dozens of network
device per second, on production machines.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: David Miller @ 2010-04-22  7:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1271920233.7895.4723.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Apr 2010 09:10:33 +0200

> Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit :
> 
>> Right, I've applied this, thanks.
>> 
>> What we should probably do instead is call and NULL out the
>> DEV_GSO_CB() destructor.  Right?
> 
> Yes, probably, I'll take a look at this if you want.

It might look something like this:

diff --git a/net/core/dev.c b/net/core/dev.c
index 9bf1ccc..13241da 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1892,6 +1892,20 @@ static inline void skb_orphan_try(struct sk_buff *skb)
 		skb_orphan(skb);
 }
 
+/*
+ * GSO packets need to be handled specially because such packets
+ * hold the normal SKB destructor in a backup pointer.
+ */
+static inline void skb_orphan_try_gso(struct sk_buff *skb)
+{
+	if (!skb_tx(skb)->flags) {
+		if (DEV_GSO_CB(skb)->destructor)
+			DEV_GSO_CB(skb)->destructor(skb);
+		DEV_GSO_CB(skb)->destructor = NULL;
+		skb->sk = NULL;
+	}
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1937,6 +1951,7 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
+		skb_orphan_try_gso(skb);
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)

^ permalink raw reply related

* Re: [PATCH v3] net: batch skb dequeueing from softnet input_pkt_queue
From: David Miller @ 2010-04-22  7:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, netdev, therbert, hadi
In-Reply-To: <1271920402.7895.4732.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Apr 2010 09:13:22 +0200

> No thanks, this is out of the question.
> 
> Talk to ixiacom guys, some people settle/dismantle dozens of network
> device per second, on production machines.

Yes, ifup/ifdown performance is very important these days.

Recently we've had to do a lot of work to increase scalability and
latency in this area, let's not undo that.

^ permalink raw reply

* Re: [PATCH net-next-2.6] rps: immediate send IPI in process_backlog()
From: David Miller @ 2010-04-22  7:21 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, xiaosuo, netdev
In-Reply-To: <1271883898.7895.3379.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Apr 2010 23:04:58 +0200

> If some skb are queued to our backlog, we are delaying IPI sending at
> the end of net_rx_action(), increasing latencies. This defeats the
> queueing, since we want to quickly dispatch packets to the pool of
> worker cpus, then eventually deeply process our packets.
> 
> It's better to send IPI before processing our packets in upper layers,
> from process_backlog().
> 
> Change the _and_disable_irq suffix to _and_enable_irq(), since we enable
> local irq in net_rps_action(), sorry for the confusion.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Eric, irqs are enabled in process_backlog(), so I don't know how legal
it is to invoke net_rps_action_and_irq_enable() from there.

At least, if you are depending upon a later action to pick up the
pieces if the rps_ipi_list test races, you need to update the comment
above net_rps_action_and_irq_enable() since it states that it is
always invoked with IRQs disabled :-)

^ permalink raw reply

* Re: [PATCH v4] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-04-22  7:21 UTC (permalink / raw)
  To: Changli Gao
  Cc: Stephen Hemminger, David S. Miller, jamal, Tom Herbert, netdev
In-Reply-To: <p2i412e6f7f1004212330y6f3e2a92oa295e528afde3cd4@mail.gmail.com>

Le jeudi 22 avril 2010 à 14:30 +0800, Changli Gao a écrit :
> On Thu, Apr 22, 2010 at 2:18 PM, Stephen Hemminger
> <shemminger@vyatta.com> wrote:
> > On Thu, 22 Apr 2010 13:51:53 +0800
> > Changli Gao <xiaosuo@gmail.com> wrote:
> >
> >> +     struct sk_buff          *input_pkt_queue_head;
> >> +     struct sk_buff          **input_pkt_queue_tailp;
> >> +     unsigned int            input_pkt_queue_len;
> >> +     unsigned int            process_queue_len;
> >
> > Why is opencoding a skb queue a step forward?
> > Just keep using sk_queue routines, just not the locked variants.
> >
> 
> I want to keep the critical section of rps_lock() as small as possible
> to reduce the potential lock contention, when RPS is used.
> 

Jamal perf reports show lock contention but also cache line ping pongs.

Yet, you keep a process_queue_len shared by producers and consumer.

Producers want to read it, while consumer decrement it (dirtying its
cache line) every packet, slowing down the things.

The idea of batching is to let the consumer process its local queue with
no impact to producers.

Please remove it completely, or make the consumer zero it only at the
end of batch processing.

A cache line miss cost is about 120 cycles. Multiply it by 1 million
packet per second...

^ permalink raw reply

* Re: [PATCH net-next-2.6] rps: immediate send IPI in process_backlog()
From: David Miller @ 2010-04-22  7:22 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, xiaosuo, netdev
In-Reply-To: <20100422.002118.107274505.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 22 Apr 2010 00:21:18 -0700 (PDT)

> Eric, irqs are enabled in process_backlog(), so I don't know how legal
> it is to invoke net_rps_action_and_irq_enable() from there.

Nevermind I mis-read your patch.

Ignore me, I'll apply this.

Thanks!

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: Eric Dumazet @ 2010-04-22  7:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100422.001625.200862474.davem@davemloft.net>

Le jeudi 22 avril 2010 à 00:16 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 22 Apr 2010 09:10:33 +0200
> 
> > Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit :
> > 
> >> Right, I've applied this, thanks.
> >> 
> >> What we should probably do instead is call and NULL out the
> >> DEV_GSO_CB() destructor.  Right?
> > 
> > Yes, probably, I'll take a look at this if you want.
> 
> It might look something like this:
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 9bf1ccc..13241da 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1892,6 +1892,20 @@ static inline void skb_orphan_try(struct sk_buff *skb)
>  		skb_orphan(skb);
>  }
>  
> +/*
> + * GSO packets need to be handled specially because such packets
> + * hold the normal SKB destructor in a backup pointer.
> + */
> +static inline void skb_orphan_try_gso(struct sk_buff *skb)
> +{
> +	if (!skb_tx(skb)->flags) {
> +		if (DEV_GSO_CB(skb)->destructor)
> +			DEV_GSO_CB(skb)->destructor(skb);
> +		DEV_GSO_CB(skb)->destructor = NULL;
> +		skb->sk = NULL;
> +	}
> +}
> +
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  			struct netdev_queue *txq)
>  {
> @@ -1937,6 +1951,7 @@ gso:
>  		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>  			skb_dst_drop(nskb);
>  
> +		skb_orphan_try_gso(skb);
>  		rc = ops->ndo_start_xmit(nskb, dev);
>  		if (unlikely(rc != NETDEV_TX_OK)) {
>  			if (rc & ~NETDEV_TX_MASK)

Hmm... are you sure we want to call destructor for each skb ?

Should'nt we do it before initial skb is split ?




^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: David Miller @ 2010-04-22  7:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1271921045.7895.4763.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Apr 2010 09:24:05 +0200

> Hmm... are you sure we want to call destructor for each skb ?
> 
> Should'nt we do it before initial skb is split ?

Good idea, therefore you mean something like this?

diff --git a/net/core/dev.c b/net/core/dev.c
index 3ba774b..f3c3885 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb)
 	int features = dev->features & ~(illegal_highdma(dev, skb) ?
 					 NETIF_F_SG : 0);
 
+	skb_orphan_try(skb);
 	segs = skb_gso_segment(skb, features);
 
 	/* Verifying header integrity only. */

^ permalink raw reply related

* Re: [PATCH net-next-2.6] rps: immediate send IPI in process_backlog()
From: Eric Dumazet @ 2010-04-22  7:28 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, xiaosuo, netdev
In-Reply-To: <20100422.002118.107274505.davem@davemloft.net>

Le jeudi 22 avril 2010 à 00:21 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 21 Apr 2010 23:04:58 +0200
> 
> > If some skb are queued to our backlog, we are delaying IPI sending at
> > the end of net_rx_action(), increasing latencies. This defeats the
> > queueing, since we want to quickly dispatch packets to the pool of
> > worker cpus, then eventually deeply process our packets.
> > 
> > It's better to send IPI before processing our packets in upper layers,
> > from process_backlog().
> > 
> > Change the _and_disable_irq suffix to _and_enable_irq(), since we enable
> > local irq in net_rps_action(), sorry for the confusion.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Eric, irqs are enabled in process_backlog(), so I don't know how legal
> it is to invoke net_rps_action_and_irq_enable() from there.
> 
> At least, if you are depending upon a later action to pick up the
> pieces if the rps_ipi_list test races, you need to update the comment
> above net_rps_action_and_irq_enable() since it states that it is
> always invoked with IRQs disabled :-)
> --

But I do disable irqs berfore calling this function from
process_backlog, only if current pointer is non null.

Pointer is then re-fetched inside net_rps_action_and_irq_enable()

I thought using xchg(), but this adds an atomic op, so I think its
better to use local_irq_disable()/enable() pairs.


About the comment, it says :

/*
 * net_rps_action sends any pending IPI's for rps.
 * Note: called with local irq disabled, but exits with local irq
enabled.
 */


So it documents this function is called with irq disabled, and re-enable
them before return ?



^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: Eric Dumazet @ 2010-04-22  7:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100422.002623.00784210.davem@davemloft.net>

Le jeudi 22 avril 2010 à 00:26 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 22 Apr 2010 09:24:05 +0200
> 
> > Hmm... are you sure we want to call destructor for each skb ?
> > 
> > Should'nt we do it before initial skb is split ?
> 
> Good idea, therefore you mean something like this?
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3ba774b..f3c3885 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb)
>  	int features = dev->features & ~(illegal_highdma(dev, skb) ?
>  					 NETIF_F_SG : 0);
>  
> +	skb_orphan_try(skb);
>  	segs = skb_gso_segment(skb, features);
>  
>  	/* Verifying header integrity only. */

Yes, it seems better.

What about the 

if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
	skb_dst_drop(skb);

This thing might also be moved before the split, since split probably
clone all dst ?



^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: David Miller @ 2010-04-22  7:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1271921637.7895.4791.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Apr 2010 09:33:57 +0200

> Le jeudi 22 avril 2010 à 00:26 -0700, David Miller a écrit :
>> @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb)
>>  	int features = dev->features & ~(illegal_highdma(dev, skb) ?
>>  					 NETIF_F_SG : 0);
>>  
>> +	skb_orphan_try(skb);
>>  	segs = skb_gso_segment(skb, features);
>>  
>>  	/* Verifying header integrity only. */
> 
> Yes, it seems better.
> 
> What about the 
> 
> if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
> 	skb_dst_drop(skb);
> 
> This thing might also be moved before the split, since split probably
> clone all dst ?

Good catch, agreed.

diff --git a/net/core/dev.c b/net/core/dev.c
index 3ba774b..4f897e2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1851,6 +1851,17 @@ static void dev_gso_skb_destructor(struct sk_buff *skb)
 		cb->destructor(skb);
 }
 
+/*
+ * Try to orphan skb early, right before transmission by the device.
+ * We cannot orphan skb if tx timestamp is requested, since
+ * drivers need to call skb_tstamp_tx() to send the timestamp.
+ */
+static inline void skb_orphan_try(struct sk_buff *skb)
+{
+	if (!skb_tx(skb)->flags)
+		skb_orphan(skb);
+}
+
 /**
  *	dev_gso_segment - Perform emulated hardware segmentation on skb.
  *	@skb: buffer to segment
@@ -1865,6 +1876,7 @@ static int dev_gso_segment(struct sk_buff *skb)
 	int features = dev->features & ~(illegal_highdma(dev, skb) ?
 					 NETIF_F_SG : 0);
 
+	skb_orphan_try(skb);
 	segs = skb_gso_segment(skb, features);
 
 	/* Verifying header integrity only. */
@@ -1881,17 +1893,6 @@ static int dev_gso_segment(struct sk_buff *skb)
 	return 0;
 }
 
-/*
- * Try to orphan skb early, right before transmission by the device.
- * We cannot orphan skb if tx timestamp is requested, since
- * drivers need to call skb_tstamp_tx() to send the timestamp.
- */
-static inline void skb_orphan_try(struct sk_buff *skb)
-{
-	if (!skb_tx(skb)->flags)
-		skb_orphan(skb);
-}
-
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1902,13 +1903,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (!list_empty(&ptype_all))
 			dev_queue_xmit_nit(skb, dev);
 
-		if (netif_needs_gso(dev, skb)) {
-			if (unlikely(dev_gso_segment(skb)))
-				goto out_kfree_skb;
-			if (skb->next)
-				goto gso;
-		}
-
 		/*
 		 * If device doesnt need skb->dst, release it right now while
 		 * its hot in this cpu cache
@@ -1916,6 +1910,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(skb);
 
+		if (netif_needs_gso(dev, skb)) {
+			if (unlikely(dev_gso_segment(skb)))
+				goto out_kfree_skb;
+			if (skb->next)
+				goto gso;
+		}
+
 		skb_orphan_try(skb);
 		rc = ops->ndo_start_xmit(skb, dev);
 		if (rc == NETDEV_TX_OK)

^ permalink raw reply related

* Re: IPv6: race condition in __ipv6_ifa_notify() and dst_free() ?
From: David Miller @ 2010-04-22  7:43 UTC (permalink / raw)
  To: herbert; +Cc: jbohac, yoshfuji, netdev, shemminger
In-Reply-To: <20100422023211.GA7109@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 22 Apr 2010 10:32:11 +0800

> Anyway, I think the root of the issue is the fact that NDISC is
> calling addrconf_dad_failure with no locking whatsoever.  The
> latter is not idempotent so some form of locking is needed.
> 
> This bug appears to have been around since the very start.
> 
> I'll dig deeper to see where we might be able to add some locks.

Thanks Herbert.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: Eric Dumazet @ 2010-04-22  7:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100422.004136.151480121.davem@davemloft.net>

Le jeudi 22 avril 2010 à 00:41 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 22 Apr 2010 09:33:57 +0200
> 
> > Le jeudi 22 avril 2010 à 00:26 -0700, David Miller a écrit :
> >> @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb)
> >>  	int features = dev->features & ~(illegal_highdma(dev, skb) ?
> >>  					 NETIF_F_SG : 0);
> >>  
> >> +	skb_orphan_try(skb);
> >>  	segs = skb_gso_segment(skb, features);
> >>  
> >>  	/* Verifying header integrity only. */
> > 
> > Yes, it seems better.
> > 
> > What about the 
> > 
> > if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
> > 	skb_dst_drop(skb);
> > 
> > This thing might also be moved before the split, since split probably
> > clone all dst ?
> 
> Good catch, agreed.
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3ba774b..4f897e2 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1851,6 +1851,17 @@ static void dev_gso_skb_destructor(struct sk_buff *skb)
>  		cb->destructor(skb);
>  }
>  
> +/*
> + * Try to orphan skb early, right before transmission by the device.
> + * We cannot orphan skb if tx timestamp is requested, since
> + * drivers need to call skb_tstamp_tx() to send the timestamp.
> + */
> +static inline void skb_orphan_try(struct sk_buff *skb)
> +{
> +	if (!skb_tx(skb)->flags)
> +		skb_orphan(skb);
> +}
> +
>  /**
>   *	dev_gso_segment - Perform emulated hardware segmentation on skb.
>   *	@skb: buffer to segment
> @@ -1865,6 +1876,7 @@ static int dev_gso_segment(struct sk_buff *skb)
>  	int features = dev->features & ~(illegal_highdma(dev, skb) ?
>  					 NETIF_F_SG : 0);
>  
> +	skb_orphan_try(skb);
>  	segs = skb_gso_segment(skb, features);
>  
>  	/* Verifying header integrity only. */
> @@ -1881,17 +1893,6 @@ static int dev_gso_segment(struct sk_buff *skb)
>  	return 0;
>  }
>  
> -/*
> - * Try to orphan skb early, right before transmission by the device.
> - * We cannot orphan skb if tx timestamp is requested, since
> - * drivers need to call skb_tstamp_tx() to send the timestamp.
> - */
> -static inline void skb_orphan_try(struct sk_buff *skb)
> -{
> -	if (!skb_tx(skb)->flags)
> -		skb_orphan(skb);
> -}
> -
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  			struct netdev_queue *txq)
>  {
> @@ -1902,13 +1903,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  		if (!list_empty(&ptype_all))
>  			dev_queue_xmit_nit(skb, dev);
>  
> -		if (netif_needs_gso(dev, skb)) {
> -			if (unlikely(dev_gso_segment(skb)))
> -				goto out_kfree_skb;
> -			if (skb->next)
> -				goto gso;
> -		}
> -
>  		/*
>  		 * If device doesnt need skb->dst, release it right now while
>  		 * its hot in this cpu cache
> @@ -1916,6 +1910,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>  			skb_dst_drop(skb);
>  
> +		if (netif_needs_gso(dev, skb)) {
> +			if (unlikely(dev_gso_segment(skb)))
> +				goto out_kfree_skb;
> +			if (skb->next)
> +				goto gso;
> +		}
> +
>  		skb_orphan_try(skb);
>  		rc = ops->ndo_start_xmit(skb, dev);
>  		if (rc == NETDEV_TX_OK)

You could have one skb_orphan_try() call before the

if (netif_needs_gso(dev, skb)) {

and remove it from dev_gso_segment() ?



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox