Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] tun: orphan an skb on tx
From: Michael S. Tsirkin @ 2010-04-13 14:59 UTC (permalink / raw)
  Cc: David S. Miller, Herbert Xu, Michael S. Tsirkin, Paul Moore,
	David Woodhouse, netdev, linux-kernel, Jan Kiszka, qemu-devel

The following situation was observed in the field:
tap1 sends packets, tap2 does not consume them, as a result
tap1 can not be closed. This happens because
tun/tap devices can hang on to skbs undefinitely.

As noted by Herbert, possible solutions include a timeout followed by a
copy/change of ownership of the skb, or always copying/changing
ownership if we're going into a hostile device.

This patch implements the second approach.

Note: one issue still remaining is that since skbs
keep reference to tun socket and tun socket has a
reference to tun device, we won't flush backlog,
instead simply waiting for all skbs to get transmitted.
At least this is not user-triggerable, and
this was not reported in practice, my assumption is
other devices besides tap complete an skb
within finite time after it has been queued.

A possible solution for the second issue
would not to have socket reference the device,
instead, implement dev->destructor for tun, and
wait for all skbs to complete there, but this
needs some thought, probably too risky for 2.6.34.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Yan Vugenfirer <yvugenfi@redhat.com>

---

Please review the below, and consider for 2.6.34,
and stable trees.

 drivers/net/tun.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 96c39bd..4326520 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -387,6 +387,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}

+	/* Orphan the skb - required as we might hang on to it
+	 * for indefinite time. */
+	skb_orphan(skb);
+
 	/* Enqueue packet */
 	skb_queue_tail(&tun->socket.sk->sk_receive_queue, skb);
 	dev->trans_start = jiffies;
-- 
1.7.0.2.280.gc6f05

^ permalink raw reply related

* Re: forcedeth driver hangs under heavy load
From: stephen mulcahy @ 2010-04-13 14:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201
In-Reply-To: <1271169741.16881.437.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit :
>> Ok, I've tried both of the following with my reproducer
>>
>> 1. ethtool -K eth0 tso off
>>
>> RESULT: reproducer causes multiple hosts to be come unresponsive on 
>> first run.
>>
>> 2. ethtool -K eth0 tx off
>>
>> RESULT: reproducer runs three times without any hosts becoming unresponsive.
>>
>> -stephen
> 
> Thanks Stephen !
> 
> Now some brave fouls to check the 6410 lines of this driver ? ;)
> 
> Question of the day : Why TSO is broken in forcedeth ?
> Is it generically broken or is it broken for specific NICS ?
> 

Actually, it is only when tx-checksumming is turned off that the problem 
  doesn't occur (so I'm not sure TSO is the problem).

Additionally, a google also turns up this existing Debian bug 
http://bugs.debian.org/506419 which seems to be related.

-stephen


^ permalink raw reply

* RE: [PATCH 2/2] [V5] Add non-Virtex5 support for LL TEMAC driver
From: John Linn @ 2010-04-13 14:43 UTC (permalink / raw)
  To: David Miller, grant.likely
  Cc: netdev, linuxppc-dev, jwboyer, eric.dumazet, john.williams,
	michal.simek, jtyner
In-Reply-To: <20100413.013403.184052247.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, April 13, 2010 2:34 AM
> To: grant.likely@secretlab.ca
> Cc: John Linn; netdev@vger.kernel.org; linuxppc-dev@ozlabs.org;
jwboyer@linux.vnet.ibm.com;
> eric.dumazet@gmail.com; john.williams@petalogix.com;
michal.simek@petalogix.com; jtyner@cs.ucr.edu
> Subject: Re: [PATCH 2/2] [V5] Add non-Virtex5 support for LL TEMAC
driver
> 
> From: Grant Likely <grant.likely@secretlab.ca>
> Date: Fri, 9 Apr 2010 12:10:21 -0
> 
> > On Thu, Apr 8, 2010 at 11:08 AM, John Linn <john.linn@xilinx.com>
wrote:
> >> This patch adds support for using the LL TEMAC Ethernet driver on
> >> non-Virtex 5 platforms by adding support for accessing the Soft DMA
> >> registers as if they were memory mapped instead of solely through
the
> >> DCR's (available on the Virtex 5).
> >>
> >> The patch also updates the driver so that it runs on the
MicroBlaze.
> >> The changes were tested on the PowerPC 440, PowerPC 405, and the
> >> MicroBlaze platforms.
> >>
> >> Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
> >> Signed-off-by: John Linn <john.linn@xilinx.com>
> >
> > Picked up and build tested both patches on 405, 440, 60x and ppc64.
> > No build problems found either built-in or as a module.
> >
> > for both:
> > Acked-by: Grant Likely <grant.likely@secretlab.ca>
> 
> Ok, both applied to net-next-2.6, thanks everyone for sorting this
> out.

Great! Thanks David, appreciate the help.

This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.



^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: Eric Dumazet @ 2010-04-13 14:42 UTC (permalink / raw)
  To: stephen mulcahy
  Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201
In-Reply-To: <4BC47F38.5040509@gmail.com>

Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit :
> Ok, I've tried both of the following with my reproducer
> 
> 1. ethtool -K eth0 tso off
> 
> RESULT: reproducer causes multiple hosts to be come unresponsive on 
> first run.
> 
> 2. ethtool -K eth0 tx off
> 
> RESULT: reproducer runs three times without any hosts becoming unresponsive.
> 
> -stephen

Thanks Stephen !

Now some brave fouls to check the 6410 lines of this driver ? ;)

Question of the day : Why TSO is broken in forcedeth ?
Is it generically broken or is it broken for specific NICS ?



^ permalink raw reply

* Re: [PATCH v2] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-04-13 14:37 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <x2z412e6f7f1004130638hb72b121cne442d722427de5a4@mail.gmail.com>

Le mardi 13 avril 2010 à 21:38 +0800, Changli Gao a écrit :
> On Tue, Apr 13, 2010 at 9:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > This is a problem of cooperation between flush_backlog() and
> > process_backlog(). Dont allow flush_backlog() to return if
> > process_backlog() is running. Exactly as before, but lock acquisition
> > done in flush_backlog() should be a bit smarter.
> >
> 
> flush_backlog() is called in IRQ context. Unless you disable irq in
> process_backlog(), you can't block flush_backlog().
> 

There is nothing preventing flush_backlog() to be done differently you
know. It was done like that because it was the most simple thing to do
given the (basic) constraints. Now if the constraints change,
implementation might change too. It is slow path (in most setups) and
some extra work to keep fast path really fast is ok.

netdevice are dismantled and we respect an RCU grace period before
freeing. process_backlog() is done inside a rcu lock, so everything is
possible.

^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: stephen mulcahy @ 2010-04-13 14:27 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Eric Dumazet, netdev, Ben Hutchings, Ayaz Abdulla, 572201
In-Reply-To: <1271160298.2098.0.camel@achroite.uk.solarflarecom.com>

Ok, I've tried both of the following with my reproducer

1. ethtool -K eth0 tso off

RESULT: reproducer causes multiple hosts to be come unresponsive on 
first run.

2. ethtool -K eth0 tx off

RESULT: reproducer runs three times without any hosts becoming unresponsive.

-stephen

^ permalink raw reply

* Re: Strange packet drops with heavy firewalling
From: Paweł Staszewski @ 2010-04-13 13:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, Benny Amorsen, zhigang gong, netdev
In-Reply-To: <1271163184.16881.307.camel@edumazet-laptop>

W dniu 2010-04-13 14:53, Eric Dumazet pisze:
> Le mardi 13 avril 2010 à 14:33 +0200, Paweł Staszewski a écrit :
>    
>> W dniu 2010-04-13 01:18, Changli Gao pisze:
>>      
>>> On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+usenet@amorsen.dk>   wrote:
>>>
>>>        
>>>>    99:         24    1306226          3          2   PCI-MSI-edge      eth1-tx-0
>>>>    100:      15735    1648774          3          7   PCI-MSI-edge      eth1-tx-1
>>>>    101:          8         11          9    1083022   PCI-MSI-edge      eth1-tx-2
>>>>    102:          0          0          0          0   PCI-MSI-edge      eth1-tx-3
>>>>    103:         18         15       6131    1095383   PCI-MSI-edge      eth1-rx-0
>>>>    104:        217         32      46544    1335325   PCI-MSI-edge      eth1-rx-1
>>>>    105:        154    1305595        218         16   PCI-MSI-edge      eth1-rx-2
>>>>    106:         17         16       8229    1467509   PCI-MSI-edge      eth1-rx-3
>>>>    107:          0          0          1          0   PCI-MSI-edge      eth1
>>>>    108:          2         14         15    1003053   PCI-MSI-edge      eth0-tx-0
>>>>    109:       8226    1668924        478        487   PCI-MSI-edge      eth0-tx-1
>>>>    110:          3    1188874         17         12   PCI-MSI-edge      eth0-tx-2
>>>>    111:          0          0          0          0   PCI-MSI-edge      eth0-tx-3
>>>>    112:        203        185       5324    1015263   PCI-MSI-edge      eth0-rx-0
>>>>    113:       4141    1600793        153        159   PCI-MSI-edge      eth0-rx-1
>>>>    114:      16242    1210108        436       3124   PCI-MSI-edge      eth0-rx-2
>>>>    115:        267       4173      19471    1321252   PCI-MSI-edge      eth0-rx-3
>>>>    116:          0          1          0          0   PCI-MSI-edge      eth0
>>>>
>>>>
>>>> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
>>>> which to my mind should cause the same problem as before (where CPU1 and
>>>> CPU3 was handling all packets). Yet the box clearly works much better
>>>> than before.
>>>>
>>>>          
>>> irqbalanced? I don't think it can work properly. Try RPS in netdev and
>>> linux-next tree, and if cpu load isn't even, try this patch:
>>> http://patchwork.ozlabs.org/patch/49915/ .
>>>
>>>
>>>
>>>        
>> Yes without irqbalance - and with irq affinity set by hand router will
>> work much better.
>>
>> But I don't think that RPS will help him - I make some tests with RPS
>> and AFFINITY - results in attached file.
>> Test router make traffic management (hfsc) for almost 9k users
>>      
> Thanks for sharing Pawel.
>
> But obviously you are mixing apples and oranges.
>
>   Are you aware that HFSC and other trafic shapers do serialize access to
> data structures ? If many cpus try to access these structures in //, you
> have a lot of cache line misses. HFSC is a real memory hog :(
>
>    
Thanks Eric for explanation why RPS is useless for traffic management 
routers.

> Benny do have firewalling (highly parallelized these days, iptables was
> well improved in this area), but no traffic control.
>
>    
Hmm so maybe better choice for traffic management is use iptables for 
"filter classification" instead of "u32 filters"- something like 
iptables CLASSIFY target

> Anyway, Benny has now multiqueue devices, and therefore RPS will not
> help him. I suggested RPS before his move to multiqueue, and multiqueue
> is the most sensible way to improve things, when no central lock is
> used. Every cpu can really work in //.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>    


^ permalink raw reply

* Re: [PATCH v2] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-13 13:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev
In-Reply-To: <1271164894.16881.342.camel@edumazet-laptop>

On Tue, Apr 13, 2010 at 9:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> This is a problem of cooperation between flush_backlog() and
> process_backlog(). Dont allow flush_backlog() to return if
> process_backlog() is running. Exactly as before, but lock acquisition
> done in flush_backlog() should be a bit smarter.
>

flush_backlog() is called in IRQ context. Unless you disable irq in
process_backlog(), you can't block flush_backlog().

>
>> Oh, my GOD. When RPS is enabled, if flush_backlog(eth0) is called on
>> CPU1 when a skb0(eth0) is dequeued from CPU0's softnet and isn't
>> queued to CPU1's softnet, what will happen?
>>
>
> I am a bit lost here. flush_backlog() drops skbs, not requeue them.
>

I mean flush_backlog() don't drop all the packets, whose dev point to
a special net_device and can't be processed before the net_device
disappers.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH v2] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-04-13 13:21 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <n2t412e6f7f1004130553x85452fc0u22e512cad412abd3@mail.gmail.com>

Le mardi 13 avril 2010 à 20:53 +0800, Changli Gao a écrit :
> OK. If we make processing_queue is a stack variable. When quota or
> jiffies limit is reached, we have to splice processing_queue back to
> input_pkt_queue. If flush_backlog() is called before the
> processing_queue is spliced, there will still packets which refer to
> the NIC going. Then these packets are queued to input_pkt_queue. When
> process_backlog() is called again, the dev field of these skbs are
> wild...
> 

This is a problem of cooperation between flush_backlog() and
process_backlog(). Dont allow flush_backlog() to return if
process_backlog() is running. Exactly as before, but lock acquisition
done in flush_backlog() should be a bit smarter.

> Oh, my GOD. When RPS is enabled, if flush_backlog(eth0) is called on
> CPU1 when a skb0(eth0) is dequeued from CPU0's softnet and isn't
> queued to CPU1's softnet, what will happen?
> 

I am a bit lost here. flush_backlog() drops skbs, not requeue them.

> >
> > Absolutely not. You missed something apparently.
> >
> > You pay the price at each packet enqueue, because you have to compute
> > the sum of two lengthes, and guess what, if you do this you have a cache
> > line miss in one of the operand. Your patch as is is suboptimal.
> >
> > Remember : this batch mode should not change packet queueing at all,
> > only speed it because of less cache line misses.
> >
> 
> WoW, is it really so expensive?
> 

Yes. Whole point of your idea is to remove cache line misses.

They cost much more than a spinlock/unlock pair

^ permalink raw reply

* Re: [Patch 3/3] net: reserve ports for applications using fixed port numbers
From: Tetsuo Handa @ 2010-04-13 13:07 UTC (permalink / raw)
  To: amwang, sean.hefty, rolandd
  Cc: opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel
In-Reply-To: <4BC42FE0.4040601@redhat.com>

Hello.

Adding Sean Hefty and Roland Dreier as drivers/infiniband/core/cma.c maintainer.

Cong Wang wrote:
> Cong Wang wrote:
> > Tetsuo Handa wrote:
> >> Hello.
> >>
> >>> --- linux-2.6.orig/drivers/infiniband/core/cma.c
> >>> +++ linux-2.6/drivers/infiniband/core/cma.c
> >>> @@ -1980,6 +1980,8 @@ retry:
> >>>  	/* FIXME: add proper port randomization per like inet_csk_get_port */
> >>>  	do {
> >>>  		ret = idr_get_new_above(ps, bind_list, next_port, &port);
> >>> +		if (!ret && inet_is_reserved_local_port(port))
> >>> +			ret = -EAGAIN;
> >>>  	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
> >>>  
> >>>  	if (ret)
> >>>
> >> I think above part is wrong. Below program
> > ...
> >> This result suggests that above loop will continue until idr_pre_get() fails
> >> due to out of memory if all ports were reserved.
> >>
> >> Also, if idr_get_new_above() returned 0, bind_list (which is a kmalloc()ed
> >> pointer) is already installed into a free slot (see comment on
> >> idr_get_new_above_int()). Thus, simply calling idr_get_new_above() again will
> >> install the same pointer into multiple slots. I guess it will malfunction later.
> > 
> > Thanks for testing!
> > 
> > How about:
> > 
> > +		if (!ret && inet_is_reserved_local_port(port))
> > +			ret = -EBUSY;
> > 
> > ? So that it will break the loop and return error.
> > 
> 
> Or use the similar trick:
> 
>  int tries = 10;
> ...
> 
>  if(!ret && inet_is_reserved_local_port(port)) {
>    if (tries--)
>      ret = -EAGAIN;
>    else
>      ret = -EBUSY;
>  }
> 
> Any comments?
> 
I don't like above change. Above change makes local port assignment from
"likely-succeed" (succeeds if one port is available from thousands of ports) to
"unlikely-succeed" (fail if randomly chosen port is already in use).
We should repeat for all ranges specified in /proc/sys/net/ipv4/ip_local_port_range .

cma_alloc_any_port() and cma_alloc_port() are almost identical.
Thus, I think we can call cma_alloc_port() from cma_alloc_any_port().

Sean and Roland, is below patch correct?
inet_is_reserved_local_port() is the new function proposed in this patchset.

---
 drivers/infiniband/core/cma.c |   68 ++++++++++++++----------------------------
 1 file changed, 23 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,31 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+	do {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+		if (last_used_port != rover &&
+		    !inet_is_reserved_local_port(rover) &&
+		    !idr_find(ps, (unsigned short) rover) &&
+		    !cma_alloc_port(ps, id_priv, rover)) {
+			/*
+			 * Remember previously used port number in order to
+			 * avoid re-using same port immediately after it is
+			 * closed.
+			 */
+			last_used_port = rover;
+			return 0;
 		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
-	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	} while (--remaining > 0);
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2978,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply

* Re: [PATCH v2] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-13 12:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev
In-Reply-To: <1271153942.16881.233.camel@edumazet-laptop>

On Tue, Apr 13, 2010 at 6:19 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 13 avril 2010 à 17:50 +0800, Changli Gao a écrit :
>> On Tue, Apr 13, 2010 at 4:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >
>> >        Probably not necessary.
>> >
>> >> +     volatile bool           flush_processing_queue;
>> >
>> > Use of 'volatile' is strongly discouraged, I would say, forbidden.
>> >
>>
>> volatile is used to avoid compiler optimization.
>
> volatile might be used on special macros only, not to guard a variable.
> volatile was pre SMP days. We need something better defined these days.
>

flush_processing_queue is only accessed on the same CPU, so no
volatile is needed. I'll remove it in the next version.

>> >> @@ -2803,6 +2808,7 @@ static void flush_backlog(void *arg)
>> >>                       __skb_unlink(skb, &queue->input_pkt_queue);
>> >>                       kfree_skb(skb);
>> >>               }
>> >> +     queue->flush_processing_queue = true;
>> >
>> >        Probably not necessary
>> >
>>
>> If flush_backlog() is called when there are still packets in
>> processing_queue, there maybe some packets refer to the netdev gone,
>> if we remove this line.
>
> We dont need this "processing_queue". Once you remove it, there is no
> extra work to perform.

OK. If we make processing_queue is a stack variable. When quota or
jiffies limit is reached, we have to splice processing_queue back to
input_pkt_queue. If flush_backlog() is called before the
processing_queue is spliced, there will still packets which refer to
the NIC going. Then these packets are queued to input_pkt_queue. When
process_backlog() is called again, the dev field of these skbs are
wild...

Oh, my GOD. When RPS is enabled, if flush_backlog(eth0) is called on
CPU1 when a skb0(eth0) is dequeued from CPU0's softnet and isn't
queued to CPU1's softnet, what will happen?

>
>> >
>> >>
>> >
>> > I advise to keep it simple.
>> >
>> > My suggestion would be to limit this patch only to process_backlog().
>> >
>> > Really if you touch other areas, there is too much risk.
>> >
>> > Perform sort of skb_queue_splice_tail_init() into a local (stack) queue,
>> > but the trick is to not touch input_pkt_queue.qlen, so that we dont slow
>> > down enqueue_to_backlog().
>> >
>> > Process at most 'quota' skbs (or jiffies limit).
>> >
>> > relock queue.
>> > input_pkt_queue.qlen -= number_of_handled_skbs;
>> >
>>
>> Oh no, in order to let latter packets in as soon as possible, we have
>> to update qlen immediately.
>>
>
> Absolutely not. You missed something apparently.
>
> You pay the price at each packet enqueue, because you have to compute
> the sum of two lengthes, and guess what, if you do this you have a cache
> line miss in one of the operand. Your patch as is is suboptimal.
>
> Remember : this batch mode should not change packet queueing at all,
> only speed it because of less cache line misses.
>

WoW, is it really so expensive?

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: Strange packet drops with heavy firewalling
From: Eric Dumazet @ 2010-04-13 12:53 UTC (permalink / raw)
  To: Paweł Staszewski; +Cc: Changli Gao, Benny Amorsen, zhigang gong, netdev
In-Reply-To: <4BC464A6.9000307@itcare.pl>

Le mardi 13 avril 2010 à 14:33 +0200, Paweł Staszewski a écrit :
> W dniu 2010-04-13 01:18, Changli Gao pisze:
> > On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+usenet@amorsen.dk>  wrote:
> >    
> >>   99:         24    1306226          3          2   PCI-MSI-edge      eth1-tx-0
> >>   100:      15735    1648774          3          7   PCI-MSI-edge      eth1-tx-1
> >>   101:          8         11          9    1083022   PCI-MSI-edge      eth1-tx-2
> >>   102:          0          0          0          0   PCI-MSI-edge      eth1-tx-3
> >>   103:         18         15       6131    1095383   PCI-MSI-edge      eth1-rx-0
> >>   104:        217         32      46544    1335325   PCI-MSI-edge      eth1-rx-1
> >>   105:        154    1305595        218         16   PCI-MSI-edge      eth1-rx-2
> >>   106:         17         16       8229    1467509   PCI-MSI-edge      eth1-rx-3
> >>   107:          0          0          1          0   PCI-MSI-edge      eth1
> >>   108:          2         14         15    1003053   PCI-MSI-edge      eth0-tx-0
> >>   109:       8226    1668924        478        487   PCI-MSI-edge      eth0-tx-1
> >>   110:          3    1188874         17         12   PCI-MSI-edge      eth0-tx-2
> >>   111:          0          0          0          0   PCI-MSI-edge      eth0-tx-3
> >>   112:        203        185       5324    1015263   PCI-MSI-edge      eth0-rx-0
> >>   113:       4141    1600793        153        159   PCI-MSI-edge      eth0-rx-1
> >>   114:      16242    1210108        436       3124   PCI-MSI-edge      eth0-rx-2
> >>   115:        267       4173      19471    1321252   PCI-MSI-edge      eth0-rx-3
> >>   116:          0          1          0          0   PCI-MSI-edge      eth0
> >>
> >>
> >> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
> >> which to my mind should cause the same problem as before (where CPU1 and
> >> CPU3 was handling all packets). Yet the box clearly works much better
> >> than before.
> >>      
> > irqbalanced? I don't think it can work properly. Try RPS in netdev and
> > linux-next tree, and if cpu load isn't even, try this patch:
> > http://patchwork.ozlabs.org/patch/49915/ .
> >
> >
> >    
> Yes without irqbalance - and with irq affinity set by hand router will 
> work much better.
> 
> But I don't think that RPS will help him - I make some tests with RPS 
> and AFFINITY - results in attached file.
> Test router make traffic management (hfsc) for almost 9k users

Thanks for sharing Pawel.

But obviously you are mixing apples and oranges.

 Are you aware that HFSC and other trafic shapers do serialize access to
data structures ? If many cpus try to access these structures in //, you
have a lot of cache line misses. HFSC is a real memory hog :(

Benny do have firewalling (highly parallelized these days, iptables was
well improved in this area), but no traffic control.

Anyway, Benny has now multiqueue devices, and therefore RPS will not
help him. I suggested RPS before his move to multiqueue, and multiqueue
is the most sensible way to improve things, when no central lock is
used. Every cpu can really work in //.




^ permalink raw reply

* Re: Strange packet drops with heavy firewalling
From: Paweł Staszewski @ 2010-04-13 12:33 UTC (permalink / raw)
  To: Changli Gao; +Cc: Benny Amorsen, zhigang gong, netdev
In-Reply-To: <u2y412e6f7f1004121618p6d6eff30q8a45a03faa59a912@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2348 bytes --]

W dniu 2010-04-13 01:18, Changli Gao pisze:
> On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+usenet@amorsen.dk>  wrote:
>    
>>   99:         24    1306226          3          2   PCI-MSI-edge      eth1-tx-0
>>   100:      15735    1648774          3          7   PCI-MSI-edge      eth1-tx-1
>>   101:          8         11          9    1083022   PCI-MSI-edge      eth1-tx-2
>>   102:          0          0          0          0   PCI-MSI-edge      eth1-tx-3
>>   103:         18         15       6131    1095383   PCI-MSI-edge      eth1-rx-0
>>   104:        217         32      46544    1335325   PCI-MSI-edge      eth1-rx-1
>>   105:        154    1305595        218         16   PCI-MSI-edge      eth1-rx-2
>>   106:         17         16       8229    1467509   PCI-MSI-edge      eth1-rx-3
>>   107:          0          0          1          0   PCI-MSI-edge      eth1
>>   108:          2         14         15    1003053   PCI-MSI-edge      eth0-tx-0
>>   109:       8226    1668924        478        487   PCI-MSI-edge      eth0-tx-1
>>   110:          3    1188874         17         12   PCI-MSI-edge      eth0-tx-2
>>   111:          0          0          0          0   PCI-MSI-edge      eth0-tx-3
>>   112:        203        185       5324    1015263   PCI-MSI-edge      eth0-rx-0
>>   113:       4141    1600793        153        159   PCI-MSI-edge      eth0-rx-1
>>   114:      16242    1210108        436       3124   PCI-MSI-edge      eth0-rx-2
>>   115:        267       4173      19471    1321252   PCI-MSI-edge      eth0-rx-3
>>   116:          0          1          0          0   PCI-MSI-edge      eth0
>>
>>
>> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
>> which to my mind should cause the same problem as before (where CPU1 and
>> CPU3 was handling all packets). Yet the box clearly works much better
>> than before.
>>      
> irqbalanced? I don't think it can work properly. Try RPS in netdev and
> linux-next tree, and if cpu load isn't even, try this patch:
> http://patchwork.ozlabs.org/patch/49915/ .
>
>
>    
Yes without irqbalance - and with irq affinity set by hand router will 
work much better.

But I don't think that RPS will help him - I make some tests with RPS 
and AFFINITY - results in attached file.
Test router make traffic management (hfsc) for almost 9k users





[-- Attachment #2: RPS_AFFINITY_TEST.txt --]
[-- Type: text/plain, Size: 5028 bytes --]

##############################################################################
eth0 -> CPU0
eth1 -> CPU5
RPS:
echo 00e0 > /sys/class/net/eth1/queues/rx-0/rps_cpus
echo 000e > /sys/class/net/eth0/queues/rx-0/rps_cpus

------------------------------------------------------------------------------
   PerfTop:   85205 irqs/sec  kernel:97.1% [100000 cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

           214930.00 - 24.5% : _raw_spin_lock
            63844.00 -  7.3% : u32_classify
            48381.00 -  5.5% : e1000_clean
            47754.00 -  5.5% : rb_next
            37222.00 -  4.2% : e1000_intr_msi
            26295.00 -  3.0% : hfsc_enqueue
            17371.00 -  2.0% : rb_erase
            15290.00 -  1.7% : _raw_spin_lock_irqsave
            14958.00 -  1.7% : rb_insert_color
            14439.00 -  1.6% : update_vf
            14384.00 -  1.6% : e1000_xmit_frame
            14356.00 -  1.6% : hfsc_dequeue
            13804.00 -  1.6% : e1000_clean_tx_irq
            13413.00 -  1.5% : ipt_do_table
             9654.00 -  1.1% : ip_route_input

##############################################################################
eth0 -> CPU0
eth1 -> CPU5
NO RPS

------------------------------------------------------------------------------
   PerfTop:   33800 irqs/sec  kernel:96.9% [100000 cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            19361.00 - 11.2% : e1000_clean
            16424.00 -  9.5% : rb_next
            13060.00 -  7.5% : e1000_intr_msi
             7293.00 -  4.2% : u32_classify
             6875.00 -  4.0% : ipt_do_table
             5811.00 -  3.4% : _raw_spin_lock
             5754.00 -  3.3% : e1000_xmit_frame
             5671.00 -  3.3% : hfsc_dequeue
             4503.00 -  2.6% : __alloc_skb
             4156.00 -  2.4% : hfsc_enqueue
             4090.00 -  2.4% : e1000_clean_tx_irq
             3809.00 -  2.2% : e1000_clean_rx_irq
             3424.00 -  2.0% : update_vf
             3028.00 -  1.7% : rb_erase
             2714.00 -  1.6% : ip_route_input

##############################################################################
eth0 -> CPU0,CPU1,CPU2,CPU4 -> affinity echo 0f > /proc/irq/30/smp_affinity
eth1 -> CPU5,CPU6,CPU7,CPU8 -> affinity echo f0 > /proc/irq/31/smp_affinity
NO RPS
------------------------------------------------------------------------------
   PerfTop:   42362 irqs/sec  kernel:96.0% [100000 cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            33815.00 - 10.6% : rb_next
            21357.00 -  6.7% : u32_classify
            14525.00 -  4.6% : _raw_spin_lock
            14346.00 -  4.5% : e1000_clean
            12798.00 -  4.0% : hfsc_enqueue
            10526.00 -  3.3% : ipt_do_table
             9999.00 -  3.1% : hfsc_dequeue
             9976.00 -  3.1% : e1000_intr_msi
             9787.00 -  3.1% : rb_erase
             8259.00 -  2.6% : e1000_xmit_frame
             8015.00 -  2.5% : rb_insert_color
             7948.00 -  2.5% : update_vf
             6868.00 -  2.2% : e1000_clean_tx_irq
             6822.00 -  2.1% : e1000_clean_rx_irq
             6368.00 -  2.0% : __alloc_skb

##############################################################################
eth0 -> CPU0,CPU1,CPU2,CPU4 -> affinity echo 0f > /proc/irq/30/smp_affinity
eth1 -> CPU5,CPU6,CPU7,CPU8 -> affinity echo f0 > /proc/irq/31/smp_affinity
RPS:
echo 0f > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo f0 > /sys/class/net/eth1/queues/rx-0/rps_cpus
------------------------------------------------------------------------------
   PerfTop:   81051 irqs/sec  kernel:96.9% [100000 cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

           167110.00 - 22.3% : _raw_spin_lock
            58221.00 -  7.8% : u32_classify
            46379.00 -  6.2% : rb_next
            35189.00 -  4.7% : e1000_clean
            25614.00 -  3.4% : e1000_intr_msi
            24094.00 -  3.2% : hfsc_enqueue
            16231.00 -  2.2% : rb_erase
            14298.00 -  1.9% : rb_insert_color
            13751.00 -  1.8% : update_vf
            13712.00 -  1.8% : ipt_do_table
            13588.00 -  1.8% : hfsc_dequeue
            13335.00 -  1.8% : e1000_xmit_frame
            12449.00 -  1.7% : e1000_clean_tx_irq
            11510.00 -  1.5% : net_tx_action
            11428.00 -  1.5% : _raw_spin_lock_irqsave


^ permalink raw reply

* Re: SO_REUSEADDR with UDP (again)
From: Eric Dumazet @ 2010-04-13 12:21 UTC (permalink / raw)
  To: Michal Svoboda; +Cc: netdev
In-Reply-To: <20100413112726.GB16595@myhost.felk.cvut.cz>

Le mardi 13 avril 2010 à 13:27 +0200, Michal Svoboda a écrit :
> Eric Dumazet wrote:
> > Why do you use REUSEADDR ? This is doing what is documented.
> > 
> >        SO_REUSEADDR
> >               Indicates that the rules used in validating addresses  supplied
> >               in  a  bind(2) call should allow reuse of local addresses.  For
> >               AF_INET sockets this means that a socket may bind, except  when
> >               there is an active listening socket bound to the address.  When
> >               the listening socket is bound to  INADDR_ANY  with  a  specific
> >               port then it is not possible to bind to this port for any local
> >               address.  Argument is an integer boolean flag.
> 
> I read it 10 times but it doesn't say anything about stealing frames, or
> implementation-defined behavior in this case.

If it is not documented, it is implementation defined.

> 
> > An UDP application wanting a port for its exclusive use dont set
> > REUSEADDR, or basically allows anybody to bind an udp socket to same
> > port, and potentially steal incoming frames.
> 
> That's fair enough, I will talk to the developers of the "very buggy"
> applications that use this flag and ask them to reconsider.

;)

>  
> > REUSEADDR is usually used when an application has several sockets bound
> > to same port, but different IP addresses (or bound to different devices)
> 
> I just tried that and you can bind to different IPs without REUSEADDR.

Of course it is possible !

REUSEADDR allows following :

(Note that both sockets MUST have requested REUSEADDR=1)


#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>

main()
{
int sock1, sock2;
struct sockaddr_in addr;
int on = 1;

memset(&addr, 0, sizeof(addr));
addr.sin_port = htons(3444);
addr.sin_family = AF_INET;

sock1 = socket(AF_INET, SOCK_DGRAM, 0);
setsockopt(sock1, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
addr.sin_addr.s_addr = htonl(0x7f000001);
if (bind(sock1, (struct sockaddr *)&addr, sizeof(addr)))
	perror("bind1");

sock2 = socket(AF_INET, SOCK_DGRAM, 0);
setsockopt(sock2, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
addr.sin_addr.s_addr = INADDR_ANY; /* or htonl(0x7f000001); */
if (bind(sock2, (struct sockaddr *)&addr, sizeof(addr)))
	perror("bind2");
}



If an application didnt specified REUSEADDR=1, then its UDP port is
private, it cannot be stolen.

Therefore, applications should not use REUSEADDR on unicast UDP, unless
it is a non security issue (for example, if it is able to react to any
new IP addresses added by the administrator on the machine, and complain
loudly if another application could bind() before itself)

REUSADDR has a meaning for multicast, but for unicast... this is hardly
useful ?


About the connect() thing, its also a fact that connected sockets have a
higher priority (they'll receive incoming frames, their score his higher
than a non connected socket, if source of the packet matches the connect
destination of course). Same thing if you play with BINDTODEVICE.




^ permalink raw reply

* Re: [PATCH v2] xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()
From: Patrick McHardy @ 2010-04-13 12:10 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: linux-kernel, netdev, shemminger, bhutchings, andreas, hadi,
	hideaki
In-Reply-To: <20100414125007.GB25686@x200>

Alexey Dobriyan wrote:
> On Tue, Apr 13, 2010 at 01:08:20PM +0200, Patrick McHardy wrote:
>> Alexey Dobriyan wrote:
>>> XT_ALIGN() was rewritten through ALIGN() by commit 42107f5009da223daa800d6da6904d77297ae829
>>> "netfilter: xtables: symmetric COMPAT_XT_ALIGN definition".
>>> ALIGN() is not exported in userspace headers, which created compile problem for tc(8)
>>> and will create problem for iptables(8).
>>>
>>> We can't export generic looking name ALIGN() but we can export less generic
>>> __ALIGN_KERNEL() (suggested by Ben Hutchings).
>>> Google knows nothing about __ALIGN_KERNEL().
>>>
>>> COMPAT_XT_ALIGN() changed for symmetry.
>> I've already pushed your change out, could you send me an incremental
>> fix please?
>>
>> master.kernel.org:/pub/scm/linux/kernel/git/kaber/nf-next-2.6.git
> 
> [PATCH] Restore __ALIGN_MASK()
> 
> Fix lib/bitmap.c compile failure due to __ALIGN_KERNEL changes.

Applied.

^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: Ben Hutchings @ 2010-04-13 12:04 UTC (permalink / raw)
  To: stephen mulcahy; +Cc: Eric Dumazet, netdev, Ben Hutchings, Ayaz Abdulla, 572201
In-Reply-To: <4BC44EC8.1010104@gmail.com>

On Tue, 2010-04-13 at 12:00 +0100, stephen mulcahy wrote:
> Eric Dumazet wrote:
> > Le mardi 13 avril 2010 à 11:03 +0100, stephen mulcahy a écrit :
> >> Eric Dumazet wrote:
> >>> OK it seems forcedeth has problem with checksums ?
> >>>
> >>> Try to change "ethtool -k eth0" settings ?
> >>>
> >>> ethtool -K eth0 tso off tx off
> >> Yes, that makes an unresponsive system responsive again immediately, nice!
> >>
> >> Should the driver default to disabling this until we problem is corrected?
> >>
> >> -stephen
> > 
> > Both flags need to be disabled, or only one is OK ?
> 
> ethtool -K eth0 tx off
> 
> fixes the problem (without tso)
> 
> but running
> 
> ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: off
> scatter-gather: off
> tcp-segmentation-offload: off
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> 
> seems to indicate that tso is also disabled by this - does that sound 
> correct?

That's correct - TSO requires TX offload.  What happens if you only turn
off TSO?

Ben. (wearing another hat)

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH v2] xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()
From: Alexey Dobriyan @ 2010-04-13 11:50 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: linux-kernel, netdev, shemminger, bhutchings, andreas, hadi,
	hideaki
In-Reply-To: <4BC450A4.1010200@trash.net>

On Tue, Apr 13, 2010 at 01:08:20PM +0200, Patrick McHardy wrote:
> Alexey Dobriyan wrote:
> > XT_ALIGN() was rewritten through ALIGN() by commit 42107f5009da223daa800d6da6904d77297ae829
> > "netfilter: xtables: symmetric COMPAT_XT_ALIGN definition".
> > ALIGN() is not exported in userspace headers, which created compile problem for tc(8)
> > and will create problem for iptables(8).
> > 
> > We can't export generic looking name ALIGN() but we can export less generic
> > __ALIGN_KERNEL() (suggested by Ben Hutchings).
> > Google knows nothing about __ALIGN_KERNEL().
> > 
> > COMPAT_XT_ALIGN() changed for symmetry.
> 
> I've already pushed your change out, could you send me an incremental
> fix please?
> 
> master.kernel.org:/pub/scm/linux/kernel/git/kaber/nf-next-2.6.git

[PATCH] Restore __ALIGN_MASK()

Fix lib/bitmap.c compile failure due to __ALIGN_KERNEL changes.

---
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -40,6 +40,7 @@ extern const char linux_proc_banner[];
 #define STACK_MAGIC	0xdeadbeef
 
 #define ALIGN(x, a)		__ALIGN_KERNEL((x), (a))
+#define __ALIGN_MASK(x, mask)	__ALIGN_KERNEL_MASK((x), (mask))
 #define PTR_ALIGN(p, a)		((typeof(p))ALIGN((unsigned long)(p), (a)))
 #define IS_ALIGNED(x, a)		(((x) & ((typeof(x))(a) - 1)) == 0)
 

^ permalink raw reply

* Re: [PATCH v2] xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()
From: Alexey Dobriyan @ 2010-04-13 11:50 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: linux-kernel, netdev, shemminger, bhutchings, andreas, hadi,
	hideaki
In-Reply-To: <4BC450A4.1010200@trash.net>

On Tue, Apr 13, 2010 at 01:08:20PM +0200, Patrick McHardy wrote:
> Alexey Dobriyan wrote:
> > XT_ALIGN() was rewritten through ALIGN() by commit 42107f5009da223daa800d6da6904d77297ae829
> > "netfilter: xtables: symmetric COMPAT_XT_ALIGN definition".
> > ALIGN() is not exported in userspace headers, which created compile problem for tc(8)
> > and will create problem for iptables(8).
> > 
> > We can't export generic looking name ALIGN() but we can export less generic
> > __ALIGN_KERNEL() (suggested by Ben Hutchings).
> > Google knows nothing about __ALIGN_KERNEL().
> > 
> > COMPAT_XT_ALIGN() changed for symmetry.
> 
> I've already pushed your change out, could you send me an incremental
> fix please?
> 
> master.kernel.org:/pub/scm/linux/kernel/git/kaber/nf-next-2.6.git

[PATCH] Restore __ALIGN_MASK()

Fix lib/bitmap.c compile failure due to __ALIGN_KERNEL changes.

---
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -40,6 +40,7 @@ extern const char linux_proc_banner[];
 #define STACK_MAGIC	0xdeadbeef
 
 #define ALIGN(x, a)		__ALIGN_KERNEL((x), (a))
+#define __ALIGN_MASK(x, mask)	__ALIGN_KERNEL_MASK((x), (mask))
 #define PTR_ALIGN(p, a)		((typeof(p))ALIGN((unsigned long)(p), (a)))
 #define IS_ALIGNED(x, a)		(((x) & ((typeof(x))(a) - 1)) == 0)
 

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2010-04-13 11:43 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Of note is the fix for the wireless regression reported last week, a
fix for virtio net crashes with DEBUG_SG enabled, and some minor bug
cures in other drivers.

I tried to cure a loopback TCP checksumming anomaly with dataless
packets, but couldn't get it right the first time so reverted.

It's a harmless issue (we check checksums of ACKs over loopback) and
we've had it forever, but such things should be toyed with in our
-next trees not here...

I merged in your tree to resolve some conflicts mentioned by
Stephen Rothwell.

1) Restrict WoL in igb driver to devices which can support it.
   From Stefan Assmann.

2) Add missing sg_init_table to virtio_net, from Shirley Ma.

3) e1000e can walk past the end of TX ring links when we get
   lots of interrupts, fix from Terry Loftin.

4) iwlwifi bug fixes, including the fix for the crash reported
   last week.  From Shanyu Zhao, Wey-Yi Guy, and Zhu Yi.

5) Fix RCU checking warning in mac80211, from Johannes Berg.

6) Bridging code IGMP3 report parser accesses entry pointer
   incorrectly.  Fix from Herbert Xu.

7) stmmac miscalculates resource size by hand, use resource_size()
   and get it right, from Dan Carpenter.

8) X.25 protocol can access past end of SKB, fix from John Hughes.

Please pull, thanks a lot!

The following changes since commit 0eddb519b9127c73d53db4bf3ec1d45b13f844d1:
  Linus Torvalds (1):
        Merge branch 'for-linus' of git://git.kernel.org/.../roland/infiniband

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Amit Kumar Salecha (1):
      qlcnic: fix set mac addr

Brice Goglin (1):
      myri10ge: fix rx_pause in myri10ge_set_pauseparam

Dan Carpenter (1):
      stmmac: use resource_size()

David S. Miller (6):
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6
      Merge branch 'vhost' of git://git.kernel.org/.../mst/vhost
      tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6
      Revert "tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"
      Merge branch 'master' of /home/davem/src/GIT/linux-2.6/

Eric Dumazet (1):
      can: avoids a false warning

Felix Fietkau (1):
      ath9k: fix double calls to ath_radio_enable

Florian Fainelli (1):
      r6040: fix r6040_multicast_list

Herbert Xu (1):
      bridge: Fix IGMP3 report parsing

Javier Cardona (1):
      mac80211: Handle mesh action frames in ieee80211_rx_h_action

Jeff Dike (1):
      vhost-net: fix vq_memory_access_ok error checking

Johannes Berg (1):
      mac80211: annotate station rcu dereferences

John Hughes (2):
      x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet.
      x.25 attempts to negotiate invalid throughput

Jorge Boncompte [DTI2] (1):
      udp: fix for unicast RX path optimization

Ken Kawasaki (1):
      smc91c92_cs: define multicast_table as unsigned char

Michael Chan (1):
      cnic: Fix crash during bnx2x MTU change.

Patrick Loschmidt (1):
      net: corrected documentation for hardware time stamping

Shanyu Zhao (1):
      iwlwifi: use consistent table for tx data collect

Shirley Ma (1):
      virtio_net: missing sg_init_table

Stefan Assmann (1):
      igb: restrict WoL for 82576 ET2 Quad Port Server Adapter

Terry Loftin (1):
      e1000e: stop cleaning when we reach tx_ring->next_to_use

Wey-Yi Guy (1):
      iwlwifi: need check for valid qos packet before free

Zhu Yi (2):
      iwlwifi: fix DMA allocation warnings
      iwlwifi: avoid Tx queue memory allocation in interface down

 Documentation/networking/timestamping.txt |   76 ++++++++++++--------
 drivers/net/cnic.c                        |   10 ++--
 drivers/net/e1000e/netdev.c               |    2 +
 drivers/net/igb/igb_ethtool.c             |    1 +
 drivers/net/igb/igb_main.c                |    1 +
 drivers/net/myri10ge/myri10ge.c           |    2 +-
 drivers/net/pcmcia/smc91c92_cs.c          |   13 ++--
 drivers/net/qlcnic/qlcnic_hw.c            |    3 +
 drivers/net/r6040.c                       |   11 +--
 drivers/net/stmmac/stmmac_main.c          |   10 ++--
 drivers/net/virtio_net.c                  |    2 +
 drivers/net/wireless/ath/ath9k/main.c     |    3 +-
 drivers/net/wireless/iwlwifi/iwl-4965.c   |   13 +++-
 drivers/net/wireless/iwlwifi/iwl-agn-rs.c |   55 +++++++--------
 drivers/net/wireless/iwlwifi/iwl-core.c   |   11 ++-
 drivers/net/wireless/iwlwifi/iwl-core.h   |    5 +-
 drivers/net/wireless/iwlwifi/iwl-tx.c     |  107 +++++++++++++++++++++++++----
 drivers/vhost/vhost.c                     |    4 +
 include/net/x25.h                         |    4 +
 net/bridge/br_multicast.c                 |    2 +-
 net/can/raw.c                             |    2 +-
 net/ipv4/udp.c                            |    4 +-
 net/ipv6/udp.c                            |    4 +-
 net/mac80211/main.c                       |    4 +-
 net/mac80211/mesh.c                       |    3 -
 net/mac80211/rx.c                         |    5 ++
 net/mac80211/sta_info.c                   |   20 ++++-
 net/x25/af_x25.c                          |   67 +++++++++++++++++--
 net/x25/x25_facilities.c                  |   27 ++++++-
 net/x25/x25_in.c                          |   15 +++-
 30 files changed, 348 insertions(+), 138 deletions(-)

^ permalink raw reply

* Re: SO_REUSEADDR with UDP (again)
From: Michal Svoboda @ 2010-04-13 11:27 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1271155163.16881.244.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 1359 bytes --]

Eric Dumazet wrote:
> Why do you use REUSEADDR ? This is doing what is documented.
> 
>        SO_REUSEADDR
>               Indicates that the rules used in validating addresses  supplied
>               in  a  bind(2) call should allow reuse of local addresses.  For
>               AF_INET sockets this means that a socket may bind, except  when
>               there is an active listening socket bound to the address.  When
>               the listening socket is bound to  INADDR_ANY  with  a  specific
>               port then it is not possible to bind to this port for any local
>               address.  Argument is an integer boolean flag.

I read it 10 times but it doesn't say anything about stealing frames, or
implementation-defined behavior in this case.

> An UDP application wanting a port for its exclusive use dont set
> REUSEADDR, or basically allows anybody to bind an udp socket to same
> port, and potentially steal incoming frames.

That's fair enough, I will talk to the developers of the "very buggy"
applications that use this flag and ask them to reconsider.
 
> REUSEADDR is usually used when an application has several sockets bound
> to same port, but different IP addresses (or bound to different devices)

I just tried that and you can bind to different IPs without REUSEADDR.


Michal Svoboda


[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 9/9] net: ipmr: support multiple tables
From: Patrick McHardy @ 2010-04-13 11:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100413.041709.99820904.davem@davemloft.net>

David Miller wrote:
> From: kaber@trash.net
> Date: Sun, 11 Apr 2010 19:37:15 +0200
> 
>> This patch adds support for multiple independant multicast routing instances,
>> named "tables".
> 
> Ok, this looks fine to me too.
> 
> Let me know when you have a final version you'd like
> me to add to net-next-2.6
> 

I'll fix up the list thing and resubmit, thanks Dave.

^ permalink raw reply

* Re: [RFC PATCH 7/9] ipv4: ipmr: convert struct mfc_cache to struct list_head
From: Patrick McHardy @ 2010-04-13 11:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100413.041506.177518126.davem@davemloft.net>

David Miller wrote:
> From: kaber@trash.net
> Date: Sun, 11 Apr 2010 19:37:13 +0200
> 
>> From: Patrick McHardy <kaber@trash.net>
>>
>> Signed-off-by: Patrick McHardy <kaber@trash.net>
> 
> Great, it looks like you didn't fall into most of the
> traps that usually occur during a list_head conversion :-)
> 
> But:
> 
>> -		c->next = net->ipv4.mfc_unres_queue;
>> -		net->ipv4.mfc_unres_queue = c;
>> +		list_add_tail(&c->list, &net->ipv4.mfc_unres_queue);
>  ...
>>  	write_lock_bh(&mrt_lock);
>> -	c->next = net->ipv4.mfc_cache_array[line];
>> -	net->ipv4.mfc_cache_array[line] = c;
>> +	list_add_tail(&c->list, &net->ipv4.mfc_cache_array[line]);
>>  	write_unlock_bh(&mrt_lock);
> 
> Are you sure we mean to insert to the tail here?  It looks like a head
> insertion to me beforehand, and the fact that the previous list
> iterators start at the list head pointer seem to confirm this.

I don't think it matters since each entry only exists once and
there are no ordering requirements, but I'll change it just to
make sure.

Thanks for your review so far :)

^ permalink raw reply

* Re: [RFC PATCH 9/9] net: ipmr: support multiple tables
From: David Miller @ 2010-04-13 11:17 UTC (permalink / raw)
  To: kaber; +Cc: netdev
In-Reply-To: <1271007435-20035-10-git-send-email-kaber@trash.net>

From: kaber@trash.net
Date: Sun, 11 Apr 2010 19:37:15 +0200

> This patch adds support for multiple independant multicast routing instances,
> named "tables".

Ok, this looks fine to me too.

Let me know when you have a final version you'd like
me to add to net-next-2.6


^ permalink raw reply

* Re: [RFC PATCH 8/9] net: ipmr: move mroute data into seperate structure
From: David Miller @ 2010-04-13 11:15 UTC (permalink / raw)
  To: kaber; +Cc: netdev
In-Reply-To: <1271007435-20035-9-git-send-email-kaber@trash.net>

From: kaber@trash.net
Date: Sun, 11 Apr 2010 19:37:14 +0200

> From: Patrick McHardy <kaber@trash.net>
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

Looks fine.

^ permalink raw reply

* Re: [RFC PATCH 7/9] ipv4: ipmr: convert struct mfc_cache to struct list_head
From: David Miller @ 2010-04-13 11:15 UTC (permalink / raw)
  To: kaber; +Cc: netdev
In-Reply-To: <1271007435-20035-8-git-send-email-kaber@trash.net>

From: kaber@trash.net
Date: Sun, 11 Apr 2010 19:37:13 +0200

> From: Patrick McHardy <kaber@trash.net>
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

Great, it looks like you didn't fall into most of the
traps that usually occur during a list_head conversion :-)

But:

> -		c->next = net->ipv4.mfc_unres_queue;
> -		net->ipv4.mfc_unres_queue = c;
> +		list_add_tail(&c->list, &net->ipv4.mfc_unres_queue);
 ...
>  	write_lock_bh(&mrt_lock);
> -	c->next = net->ipv4.mfc_cache_array[line];
> -	net->ipv4.mfc_cache_array[line] = c;
> +	list_add_tail(&c->list, &net->ipv4.mfc_cache_array[line]);
>  	write_unlock_bh(&mrt_lock);

Are you sure we mean to insert to the tail here?  It looks like a head
insertion to me beforehand, and the fact that the previous list
iterators start at the list head pointer seem to confirm this.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox