Netdev List
 help / color / mirror / Atom feed
* pull request: wireless-2.6 2010-07-19
From: John W. Linville @ 2010-07-19 19:17 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev, linux-kernel

Dave,

In this round we have two more-or-less-one-liners intended for 2.6.25.

The hostap fix is the third (and hopefully final) bite at the apple
for correcting an initialization failure.  The first two attempts
created and then reinstated a regression caused by a discrepency
between the PCI and PCMCIA support within hostap.  The regression
was caused by checking the value of dev->base_addr, which the PCI
code was not setting.  Testing by the regression reporter indicates
that his device is finally working again with this fix.

The rt2x00 fix merely reorders some initialization so that unwinding
that init in an error path works as expected.

Please let me know if there are problems!

Thanks,

John

---

The following changes since commit 91a72a70594e5212c97705ca6a694bd307f7a26b:

  net/core: neighbour update Oops (2010-07-14 18:02:16 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git master

John W. Linville (1):
      hostap_pci: set dev->base_addr during probe

Stephen Boyd (1):
      rt2x00: Fix lockdep warning in rt2x00lib_probe_dev()

 drivers/net/wireless/hostap/hostap_pci.c |    1 +
 drivers/net/wireless/rt2x00/rt2x00dev.c  |   10 +++++-----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/hostap/hostap_pci.c b/drivers/net/wireless/hostap/hostap_pci.c
index d24dc7d..972a9c3 100644
--- a/drivers/net/wireless/hostap/hostap_pci.c
+++ b/drivers/net/wireless/hostap/hostap_pci.c
@@ -330,6 +330,7 @@ static int prism2_pci_probe(struct pci_dev *pdev,
 
         dev->irq = pdev->irq;
         hw_priv->mem_start = mem;
+	dev->base_addr = (unsigned long) mem;
 
 	prism2_pci_cor_sreset(local);
 
diff --git a/drivers/net/wireless/rt2x00/rt2x00dev.c b/drivers/net/wireless/rt2x00/rt2x00dev.c
index 3ae468c..f20d3ee 100644
--- a/drivers/net/wireless/rt2x00/rt2x00dev.c
+++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
@@ -854,6 +854,11 @@ int rt2x00lib_probe_dev(struct rt2x00_dev *rt2x00dev)
 		    BIT(NL80211_IFTYPE_WDS);
 
 	/*
+	 * Initialize configuration work.
+	 */
+	INIT_WORK(&rt2x00dev->intf_work, rt2x00lib_intf_scheduled);
+
+	/*
 	 * Let the driver probe the device to detect the capabilities.
 	 */
 	retval = rt2x00dev->ops->lib->probe_hw(rt2x00dev);
@@ -863,11 +868,6 @@ int rt2x00lib_probe_dev(struct rt2x00_dev *rt2x00dev)
 	}
 
 	/*
-	 * Initialize configuration work.
-	 */
-	INIT_WORK(&rt2x00dev->intf_work, rt2x00lib_intf_scheduled);
-
-	/*
 	 * Allocate queue array.
 	 */
 	retval = rt2x00queue_allocate(rt2x00dev);
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply related

* Re: [RFC 1/2] netfilter: xt_condition: export list management code
From: Luciano Coelho @ 2010-07-19 19:14 UTC (permalink / raw)
  To: ext Jan Engelhardt
  Cc: Netfilter Developer Mailing List, netdev@vger.kernel.org,
	Patrick McHardy, sameo@linux.intel.com
In-Reply-To: <alpine.LSU.2.01.1007191807020.19191@obet.zrqbmnf.qr>

On Mon, 2010-07-19 at 18:13 +0200, ext Jan Engelhardt wrote:
> On Monday 2010-07-19 16:15, Luciano Coelho wrote:
> 
> >From: Luciano Coelho <coelho@testbed>
> >
> >This patch isolates and exports the condition list management code, in
> >preparation for the CONDITION target to use it.  No functional change,
> >just reorganization of the code.
> 
> Well, I guess it would make more sense if the two extensions be in a 
> single file. That would alleviate the need for export reorganizations, 
> and also works because the module metadata overhead is large already.

Right.  I'll change the code so that the two extensions are in the same
file/module.  You're the second person to mention this already. ;)


> >@@ -3,12 +3,27 @@
> > 
> > #include <linux/types.h>
> > 
> >+#define XT_CONDITION_MAX_NAME_SIZE 30
> >+
> > struct xt_condition_mtinfo {
> >-	char name[31];
> >+	char name[XT_CONDITION_MAX_NAME_SIZE + 1];
> > 	__u8 invert;
> 
> Oh noes. Please please avoid any math operations inside []. It has 
> already driven XT_FUNCTION_MAXNAMELEN into nuts ("was it now +1 or -1, 
> or even -2 that we needed to pass for various functions?"). Just let MAX 
> be 31 and have name[MAX].

Yeah, I had already done as you suggested in my previous module
(IDLETIMER), I don't know what I had in my head today when I did it
differently.  Even the name of the macro is totally wrong (_SIZE), it
would make a tiny little bit more sense if it was _LEN.  I'll change it.


> > MODULE_ALIAS("ip6t_condition");
> > 
> >-struct condition_variable {
> >-	struct list_head list;
> >-	struct proc_dir_entry *status_proc;
> >-	unsigned int refcount;
> >-	bool enabled;
> >-};
> 
> Given your excellent usage example of a CONDITION target, I think it 
> even makes sense to enlarge the "enabled" variable to a full-fledged 
> 32-bit value that can be &, | and ^'d, similar to nfmark.

Ok, that's a good idea, I'll do that.

Thanks for your comments!


-- 
Cheers,
Luca.


^ permalink raw reply

* Re: bnx2/5709: Strange interrupts spread
From: Michael Chan @ 2010-07-19 18:47 UTC (permalink / raw)
  To: Christophe Ngo Van Duc; +Cc: netdev@vger.kernel.org
In-Reply-To: <AANLkTikfEED5Hvt80G6EnMwPBByMuw4r1Xg2ITjuZ0eV@mail.gmail.com>


On Mon, 2010-07-19 at 08:55 -0700, Christophe Ngo Van Duc wrote:
> So i've been able to do some test today:
> If I put the 2 interface in a bridge with no IP adress, the interrupts
> are on 1 CPU
> If I put the 2 interface in a bridge with IP adress, the interrupts
> are still on 1 CPU
> If I put the 2 interface outside the bridge with IP address,
> everything works fine the interrupts get spread on the CPU
> 
> So the conclusion seems to be that when the bnx2 is put into
> promiscuous mode by the bridge, the RSS hash stop to work even if
> traffic is IP in nature.

I did a quick test with bridging and saw no problem with RSS.  I did see
this though:

br0 received packet on queue 4, but number of RX queues is 1

Looks like it is a warning message from RPS.



^ permalink raw reply

* Re: Very low latency TCP for clusters
From: Tom Herbert @ 2010-07-19 18:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1279561319.2553.153.camel@edumazet-laptop>

On Mon, Jul 19, 2010 at 10:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 19 juillet 2010 à 10:05 -0700, Tom Herbert a écrit :
>> We have been looking at best case TCP latencies that might be achieved
>> within a cluster (low loss fabric).  The goal is to have latency
>> numbers roughly comparable to that which can be produced using RDMA/IB
>> in a low latency configuration  (<5 usecs round trip on netperf TCP_RR
>> test with one byte data for directly connected hosts as a starting
>> point).  This would be without changing sockets API, fabric, and
>> preferably not using TCP offload or a user space stack.
>>
>> I think there are at least two techniques that will drive down TCP
>> latency: per connection queues and polling queues.  Per connection
>> queues (supported by device) should eliminate costs of connection
>> look-up, hopefully some locking.  Polling becomes viable as core
>> counts on systems increase, and burning a few CPUs for networking
>> polling on behalf of very low-latency threads would be reasonable.
>>
>> Are there any efforts in progress to integrate per connection queues
>> in the stack or integrate polling of queues?
>
> aka "net channel" ;)
>
I don't think this is the same.  I am thinking of a device that
supports multi-queue where individual queues can be programming to
accept an exact 4-tuple, from the device's point of view I don't think
there's much beyond that and it is otherwise treated as just another
packet queue.  However, kernel may be able to use it to shortcut some
processing.  I believe such functionality is already supported in
Intel's flow director and possibly by some other vendors.

> What a nightmare...
>
I prefer to think of it as challenge, needing to resort to stateful
offload to get low latency would be the nightmare ;-)

> Anyway, 5 us roundtrip TCP_RR (including user thread work), seems a bit
> utopic right now.
>
> Even on loopback
>

I see about 7 usecs as best number on loopback, so I believe this is
in the ballpark.  As I mentioned above, this about "best case" latency
of a single thread, so we assume any amount of pinning or other
customized configuration to that purpose.

>
>
>

^ permalink raw reply

* Re: Very low latency TCP for clusters
From: Nivedita Singhvi @ 2010-07-19 18:28 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev
In-Reply-To: <AANLkTilNmNZbFWS8LF-UHU65QYIC32HZlgVZ7lXJHxPh@mail.gmail.com>

Tom Herbert wrote:
> We have been looking at best case TCP latencies that might be achieved
> within a cluster (low loss fabric).  The goal is to have latency
> numbers roughly comparable to that which can be produced using RDMA/IB
> in a low latency configuration  (<5 usecs round trip on netperf TCP_RR
> test with one byte data for directly connected hosts as a starting
> point).  This would be without changing sockets API, fabric, and
> preferably not using TCP offload or a user space stack.

Over what media are you doing this? 10gbe? gbe? Whatever numbers
I've seen for latency have been superior on IB, and I'd be very
interested in any effort to get lower latencies over other transport.

> I think there are at least two techniques that will drive down TCP
> latency: per connection queues and polling queues.  Per connection
> queues (supported by device) should eliminate costs of connection
> look-up, hopefully some locking.  Polling becomes viable as core
> counts on systems increase, and burning a few CPUs for networking
> polling on behalf of very low-latency threads would be reasonable.

Have you got any profiling data that captures where your
particular latencies are? Also, have you tried a real-time
kernel?

thanks,
Nivedita

^ permalink raw reply

* Re: [BUG net-next-2.6] vlan, bonding, bnx2 problems
From: Michael Chan @ 2010-07-19 18:14 UTC (permalink / raw)
  To: Eric Dumazet, fubar
  Cc: David Miller, pedro.netdev@dondevamos.com, netdev@vger.kernel.org,
	kaber@trash.net, bhutchings@solarflare.com
In-Reply-To: <1279545854.2553.37.camel@edumazet-laptop>

Adding Jay to CC.

On Mon, 2010-07-19 at 06:24 -0700, Eric Dumazet wrote:
> [   32.046479] BUG: scheduling while atomic: ifenslave/4586/0x00000100
> [   32.046540] Modules linked in: ipmi_si ipmi_msghandler hpilo
> bonding ipv6
> [   32.046784] Pid: 4586, comm: ifenslave Tainted: G        W
> 2.6.35-rc1-01453-g3e12451-dirty #836
> [   32.046860] Call Trace:
> [   32.046910]  [<c13421c4>] ? printk+0x18/0x1c
> [   32.046965]  [<c10315c9>] __schedule_bug+0x59/0x60
> [   32.047019]  [<c1342a2c>] schedule+0x57c/0x850
> [   32.047074]  [<c104a106>] ? lock_timer_base+0x26/0x50
> [   32.047128]  [<c1342f78>] schedule_timeout+0x118/0x250
> [   32.047183]  [<c104a2c0>] ? process_timeout+0x0/0x10
> [   32.047238]  [<c13430c5>] schedule_timeout_uninterruptible
> +0x15/0x20
> [   32.047295]  [<c104a345>] msleep+0x15/0x20
> [   32.047350]  [<c1227082>] bnx2_napi_disable+0x52/0x80
> [   32.047405]  [<c122b56f>] bnx2_netif_stop+0x3f/0xa0
> [   32.047460]  [<c122b62a>] bnx2_vlan_rx_register+0x5a/0x80
> [   32.047516]  [<f8ced776>] bond_enslave+0x526/0xa90 [bonding]
> [   32.047576]  [<f8b8f0d0>] ? fib6_clean_node+0x0/0xb0 [ipv6]
> [   32.047634]  [<f8b8dda0>] ? fib6_age+0x0/0x90 [ipv6]
> [   32.047689]  [<c129d2d3>] ? netdev_set_master+0x3/0xc0
> [   32.047746]  [<f8cee4cb>] bond_do_ioctl+0x31b/0x430 [bonding]
> [   32.047804]  [<c105b19a>] ? raw_notifier_call_chain+0x1a/0x20
> [   32.047861]  [<c12abd5d>] ? __rtnl_unlock+0xd/0x10
> [   32.047915]  [<c129f8cd>] ? __dev_get_by_name+0x7d/0xa0
> [   32.047970]  [<c12a19b0>] dev_ifsioc+0xf0/0x290
> [   32.048025]  [<f8cee1b0>] ? bond_do_ioctl+0x0/0x430 [bonding]
> [   32.048081]  [<c12a1ce1>] dev_ioctl+0x191/0x610
> [   32.048136]  [<c12eeb20>] ? udp_ioctl+0x0/0x70
> [   32.048189]  [<c128f67c>] sock_ioctl+0x6c/0x240
> [   32.048243]  [<c10d3a44>] vfs_ioctl+0x34/0xa0
> [   32.048297]  [<c10c7cab>] ? alloc_file+0x1b/0xa0
> [   32.048351]  [<c128f610>] ? sock_ioctl+0x0/0x240
> [   32.048404]  [<c10d4186>] do_vfs_ioctl+0x66/0x550
> [   32.048459]  [<c1022ca0>] ? do_page_fault+0x0/0x350
> [   32.048513]  [<c1022e41>] ? do_page_fault+0x1a1/0x350
> [   32.048568]  [<c129098c>] ? sys_socket+0x5c/0x70
> [   32.048622]  [<c1291860>] ? sys_socketcall+0x60/0x270
> [   32.048677]  [<c10d46a9>] sys_ioctl+0x39/0x60
> [   32.048730]  [<c1002bd0>] sysenter_do_call+0x12/0x26
> [   32.052025] bonding: bond0: enslaving eth1 as a backup interface
> with a down link.
> [   32.100207] tg3 0000:14:04.0: PME# enabled
> [   32.100222]  pci0000:00: wake-up capability enabled by ACPI
> [   32.224488]  pci0000:00: wake-up capability disabled by ACPI
> [   32.224492] tg3 0000:14:04.0: PME# disabled
> [   32.348516] tg3 0000:14:04.0: BAR 0: set to [mem
> 0xfdff0000-0xfdffffff 64bit] (PCI address [0xfdff0000-0xfdffffff]
> [   32.348524] tg3 0000:14:04.0: BAR 2: set to [mem
> 0xfdfe0000-0xfdfeffff 64bit] (PCI address [0xfdfe0000-0xfdfeffff]
> [   32.363711] bonding: bond0: enslaving eth2 as a backup interface
> with a down link.
> 
> 
> 
> For bnx2, it seems commit 212f9934afccf9c9739921
> was not sufficient to correct the "scheduling while atomic" bug...
> enslaving a bnx2 on a bond device with one vlan already set :
>  bond_enslave -> bnx2_vlan_rx_register -> bnx2_netif_stop ->
> bnx2_napi_disable -> msleep()
> 

There are a number of drivers that call napi_disable() during
->ndo_vlan_rx_regsiter().  bnx2 is lockless in the rx path and so we
need to disable NAPI rx processing and wait for it to be done before
modifying the vlgrp.

Jay, is there an alternative to holding the bond->lock when calling the
slave's ->ndo_vlan_rx_register()?



^ permalink raw reply

* Re: Very low latency TCP for clusters
From: Rick Jones @ 2010-07-19 18:13 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev
In-Reply-To: <AANLkTilNmNZbFWS8LF-UHU65QYIC32HZlgVZ7lXJHxPh@mail.gmail.com>

Tom Herbert wrote:
> We have been looking at best case TCP latencies that might be achieved
> within a cluster (low loss fabric).  The goal is to have latency
> numbers roughly comparable to that which can be produced using RDMA/IB
> in a low latency configuration  (<5 usecs round trip on netperf TCP_RR
> test with one byte data for directly connected hosts as a starting
> point).  This would be without changing sockets API, fabric, and
> preferably not using TCP offload or a user space stack.
> 
> I think there are at least two techniques that will drive down TCP
> latency: per connection queues and polling queues.  Per connection
> queues (supported by device) should eliminate costs of connection
> look-up, hopefully some locking.  Polling becomes viable as core
> counts on systems increase, and burning a few CPUs for networking
> polling on behalf of very low-latency threads would be reasonable.

Likely preaching to the choir - but "just so long as it doesn't give the 
system's coherence fits."  Every once and again there are things stuck into the 
idle loop of various OSes on the premis that it is only burning cycles on that 
idle core, but ends-up trashing cache lines and/or the memory subsystem and so 
drags-down other cores.

Just how close to even 5 usecs/tran is the service demand on a TCP_RR test now? 
  The best I've seen for a 10GbE NIC under SLES11 SP1 (sorry, not latest 
upstream) has been 10-12.6 usec/tran, but the range went as high as 20 or more - 
depended on where netperf/netserver were running relative to the interrupt CPU:

ftp://ftp.netperf.org/netperf/misc/dl380g6_X5560_sles11sp1_ad386a_cxgb3_1.1.3-ko_b2b_to_same_1500mtu_20100602.csv
ftp://ftp.netperf.org/netperf/misc/dl380g6_X5560_sles11sp1_nc550_be2net_2.102.147s_b2b_to_same_1500mtu_20100520.csv

Getting rid of connection lookup and some locking will no doubt be necessary, 
but I suspect there will be a lot more to it as well.  Quite a few sacred 
path-length cows may have to be slaughtered along the way to get the service 
demand << 5 microseconds to allow the < 5 usec RTT.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: Very low latency TCP for clusters
From: Eric Dumazet @ 2010-07-19 17:41 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev
In-Reply-To: <AANLkTilNmNZbFWS8LF-UHU65QYIC32HZlgVZ7lXJHxPh@mail.gmail.com>

Le lundi 19 juillet 2010 à 10:05 -0700, Tom Herbert a écrit :
> We have been looking at best case TCP latencies that might be achieved
> within a cluster (low loss fabric).  The goal is to have latency
> numbers roughly comparable to that which can be produced using RDMA/IB
> in a low latency configuration  (<5 usecs round trip on netperf TCP_RR
> test with one byte data for directly connected hosts as a starting
> point).  This would be without changing sockets API, fabric, and
> preferably not using TCP offload or a user space stack.
> 
> I think there are at least two techniques that will drive down TCP
> latency: per connection queues and polling queues.  Per connection
> queues (supported by device) should eliminate costs of connection
> look-up, hopefully some locking.  Polling becomes viable as core
> counts on systems increase, and burning a few CPUs for networking
> polling on behalf of very low-latency threads would be reasonable.
> 
> Are there any efforts in progress to integrate per connection queues
> in the stack or integrate polling of queues?

aka "net channel" ;)

What a nightmare...

Anyway, 5 us roundtrip TCP_RR (including user thread work), seems a bit
utopic right now.

Even on loopback




^ permalink raw reply

* Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue
From: Eric Dumazet @ 2010-07-19 17:39 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Lennart Schulte, David Miller, Tejun Heo, lkml,
	netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert
In-Reply-To: <alpine.DEB.2.00.1007192013010.30181@melkinpaasi.cs.helsinki.fi>

Le lundi 19 juillet 2010 à 20:25 +0300, Ilpo Järvinen a écrit :

> This difference is well thought and intentional, I didn't use different 
> one by accident. We want to make sure we won't use NULL from 
> tcp_write_queue_head() while the pre 08ebd1721ab8fd3 kernels was 
> interested mainly whether the first loop should run or not (and of course 
> ends up avoid the null deref too but it's more optimization like 
> thing in there, ie., if there's no lost packets no work to-do). The deref 
> could have been fixed by moving TCP_SKB_CB(skb)->sacked a bit later but 
> that would again make us depend on the side-effect of the send_head check 
> (in the case of packets_out being zero and wq empty) which is something I 
> don't like too much.
> 

Thanks Ilpo.

Do you know in what exact circumstance the bug triggers ?

It's hard to believe thousand of machines on the Internet never hit
it :(

Maybe another problem in congestion control ?

^ permalink raw reply

* Re: Very low latency TCP for clusters
From: David Miller @ 2010-07-19 17:35 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <AANLkTilNmNZbFWS8LF-UHU65QYIC32HZlgVZ7lXJHxPh@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Mon, 19 Jul 2010 10:05:19 -0700

> Per connection queues (supported by device) should eliminate costs
> of connection look-up, hopefully some locking.

What are these per-connection queues exactly?

Is it like GRO and just accumulates in-order packets for a flow?

Or it is something more like Jacobson's net channels?

If it's the former, we have it already.  If it's the
latter we've found it to be utterly impractical due to all
of the facilities we have which live between the device
and the socket layer (netfilter, packet scheduler, IPSEC,
etc.)

^ permalink raw reply

* Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue
From: Ilpo Järvinen @ 2010-07-19 17:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Lennart Schulte, David Miller, Tejun Heo, lkml,
	netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert
In-Reply-To: <1279548555.2553.51.camel@edumazet-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2153 bytes --]

On Mon, 19 Jul 2010, Eric Dumazet wrote:

> Le lundi 19 juillet 2010 à 14:16 +0300, Ilpo Järvinen a écrit :
> 
> > Thanks for testing.
> > 
> > DaveM, I think this oops was introduced for 2.6.28 (in 
> > 08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to 
> > stables it should go too please. I've only tweaked the message (so no need 
> > for Lennart to retest v2 :-)).
> > 
> > --
> > [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue
> > 
> > It can happen that there are no packets in queue while calling
> > tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
> > NULL and that gets deref'ed to get sacked into a local var.
> > 
> > There is no work to do if no packets are outstanding so we just
> > exit early.
> > 
> > This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
> > guard to make joining diff nicer).
> > 
> 
> But prior to commit 08ebd1721ab8fd3, we were not testing
> tp->packets_out, but tp->lost_out

That's right, but back then we were not testing it for the same purpose.
 
> if it was 0, we were not doing the tcp_for_write_queue_from() loop.

This invariant _should_ be true all the time:
 lost_out <= packets_out

...and if it's not we would get Leak printouts every now and then. Thus is 
packets_out is zero no NULL defer with the if lost_out either. The other 
loop too (in pre 08eb kernels) will work because of earlier mentioned 
send_head check side-effects.

> Not sure it makes a difference ?

This difference is well thought and intentional, I didn't use different 
one by accident. We want to make sure we won't use NULL from 
tcp_write_queue_head() while the pre 08ebd1721ab8fd3 kernels was 
interested mainly whether the first loop should run or not (and of course 
ends up avoid the null deref too but it's more optimization like 
thing in there, ie., if there's no lost packets no work to-do). The deref 
could have been fixed by moving TCP_SKB_CB(skb)->sacked a bit later but 
that would again make us depend on the side-effect of the send_head check 
(in the case of packets_out being zero and wq empty) which is something I 
don't like too much.

-- 
 i.

^ permalink raw reply

* Re: Raise initial congestion window size / speedup slow start?
From: Rick Jones @ 2010-07-19 17:08 UTC (permalink / raw)
  To: H.K. Jerry Chu
  Cc: Patrick McManus, David Miller, davidsen, lists, linux-kernel,
	netdev
In-Reply-To: <AANLkTil_c-TH6k2BDW2r5c0HYXFxiu85aMda1bT0nJt3@mail.gmail.com>

H.K. Jerry Chu wrote:
> On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <mcmanus@ducksong.com> wrote:
>>can you tell us more about the impl concerns of initcwnd stored on the
>>route?
> 
> 
> We have found two issues when altering initcwnd through the ip route cmd:
> 1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
> defaulted to a small value of 16KB). This problem has been made obscured
> by the TSO code, which fudges the flow control limit (and could be a bug by
> itself).

I'll ask my Emily Litella question of the day and inquire as to why that would 
be unique to altering initcwnd via the route?

The slightly less Emily Litella-esque question is why an appliction with a 
desire to know it could send more than 16K at one time wouldn't have either 
asked via its install docs to have the minimum tweaked (certainly if one is 
already tweaking routes...), or "gone all the way" and made an explicit 
setsockopt(SO_SNDBUF) call?  We are in a realm of applications for which there 
was a proposal to allow them to pick their own initcwnd right?  Having them pick 
an SO_SNDBUF size would seem to be no more to ask.

rick jones

sendbuf_init = max(tcp_mem,initcwnd)?

^ permalink raw reply

* Very low latency TCP for clusters
From: Tom Herbert @ 2010-07-19 17:05 UTC (permalink / raw)
  To: netdev

We have been looking at best case TCP latencies that might be achieved
within a cluster (low loss fabric).  The goal is to have latency
numbers roughly comparable to that which can be produced using RDMA/IB
in a low latency configuration  (<5 usecs round trip on netperf TCP_RR
test with one byte data for directly connected hosts as a starting
point).  This would be without changing sockets API, fabric, and
preferably not using TCP offload or a user space stack.

I think there are at least two techniques that will drive down TCP
latency: per connection queues and polling queues.  Per connection
queues (supported by device) should eliminate costs of connection
look-up, hopefully some locking.  Polling becomes viable as core
counts on systems increase, and burning a few CPUs for networking
polling on behalf of very low-latency threads would be reasonable.

Are there any efforts in progress to integrate per connection queues
in the stack or integrate polling of queues?

Thanks,
Tom

^ permalink raw reply

* Re: [PATCH 2.6.35-rc1] net-next: vmxnet3 fixes [4/5] Do not reset when the device is not opened
From: Shreyas Bhatewara @ 2010-07-19 17:02 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	pv-drivers@vmware.com, Ronghua Zhang, Matthieu Bucchianeri
In-Reply-To: <20100717.163538.137865264.davem@davemloft.net>



On Sat, 17 Jul 2010, David Miller wrote:

> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> Date: Fri, 16 Jul 2010 01:17:29 -0700 (PDT)
> 
> > 
> > 
> > On Thu, 15 Jul 2010, David Miller wrote:
> > 
> >> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> >> Date: Thu, 15 Jul 2010 18:20:52 -0700 (PDT)
> >> 
> >> > Is this what you suggest :
> >> > 
> >> > ---
> >> > 
> >> > Hold rtnl_lock to get the right link state.
> >> 
> >> It ought to work, but make sure that it is legal to take the
> >> RTNL semaphore in all contexts in which this code block
> >> might be called.
> >> 
> > 
> > This code block is called only from the workqueue handler, which runs in
> > process context, so it is legal to take rtnl semaphore.
> > Tested this code by simulating event interrupts (which schedule this 
> > code) at considerable frequency while the interface was brought up and
> > down in a loop. Similar stress testing had revealed the bug originally. 
> 
> Awesome, please submit this formally.  The copy you sent lacked a commit
> message and signoff.
> 

Reposting the patch formally.

David,
Thanks for your coperation.

->Shreyas

---
From: Shreyas Bhatewara <sbhatewara@vmware.com>

Hold rtnl_lock to get the right link state.

While asynchronously resetting the device, hold rtnl_lock to get the
right value from netif_running. If a reset is scheduled, and the device
goes thru close and open, it may happen that reset and open may run in
parallel. Holding rtnl_lock will avoid this.

Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

---

 drivers/net/vmxnet3/vmxnet3_drv.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 1b0ce8c..c4d7e42 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2420,6 +2420,7 @@ vmxnet3_reset_work(struct work_struct *data)
 		return;
 
 	/* if the device is closed, we must leave it alone */
+	rtnl_lock();
 	if (netif_running(adapter->netdev)) {
 		printk(KERN_INFO "%s: resetting\n", adapter->netdev->name);
 		vmxnet3_quiesce_dev(adapter);
@@ -2428,6 +2429,7 @@ vmxnet3_reset_work(struct work_struct *data)
 	} else {
 		printk(KERN_INFO "%s: already closed\n", adapter->netdev->name);
 	}
+	rtnl_unlock();
 
 	clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state);
 }

^ permalink raw reply related

* RE: [PATCH] Export SMBIOS provided firmware instance and label to sysfs
From: Narendra_K @ 2010-07-19 16:54 UTC (permalink / raw)
  To: Narendra_K, greg
  Cc: netdev, linux-hotplug, linux-pci, Matt_Domsch, Charles_Rose,
	Jordan_Hargrave, Vijay_Nijhawan
In-Reply-To: <20100714121345.GA20411@auslistsprd01.us.dell.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Narendra K
> Sent: Wednesday, July 14, 2010 5:44 PM
> To: greg@kroah.com
> Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org; linux-
> pci@vger.kernel.org; Domsch, Matt; Rose, Charles; Hargrave, Jordan;
> Nijhawan, Vijay
> Subject: Re: [PATCH] Export SMBIOS provided firmware instance and
label
> to sysfs
> 
> 
> V1 -> V2:
> 
> 1. The 'smbios_attr' buffer is not being used as mentioned above
> 
> 2. The function 'smbios_instance_string_exist' is split into two
> functions,
> the other being 'find_smbios_instance_string' which would print the
> result
> into the sysfs provided 'buf' of associated device. The function
> 'smbios_instance_string_exist' would let us know if the label exists
or
> not.
> 
> Please find the patch with above changes here -
> 
> From: Narendra K <narendra_k@dell.com>
> Subject: [PATCH] Export SMBIOS provided firmware instance and label to
> sysfs
> 

Greg,

Thanks for the review comments. 

This version of the patch has all the suggestions incorporated. Please
let us know if there are any concerns. If the approach is acceptable,
please consider this patch for inclusion.

With regards,
Narendra K

^ permalink raw reply

* Re: [0/8] netpoll/bridge fixes
From: Eric Dumazet @ 2010-07-19 16:52 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, mst, shemminger, frzhang, netdev, amwang, mpm
In-Reply-To: <20100719.090503.73693858.davem@davemloft.net>

Le lundi 19 juillet 2010 à 09:05 -0700, David Miller a écrit :

> I thought we did that already.... oh I see, we did it for bonding:
> 
> commit c22d7ac844f1cb9c6a5fd20f89ebadc2feef891b
> Author: Andy Gospodarek <andy@greyhouse.net>
> Date:   Fri Jun 25 09:50:44 2010 +0000
> 

BTW, this added following warning :


[PATCH] bonding: avoid a warning

drivers/net/bonding/bond_main.c:179:12: warning: ‘disable_netpoll’
defined but not used

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8228088..20f45cb 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -176,7 +176,9 @@ static int arp_ip_count;
 static int bond_mode	= BOND_MODE_ROUNDROBIN;
 static int xmit_hashtype = BOND_XMIT_POLICY_LAYER2;
 static int lacp_fast;
+#ifdef CONFIG_NET_POLL_CONTROLLER
 static int disable_netpoll = 1;
+#endif
 
 const struct bond_parm_tbl bond_lacp_tbl[] = {
 {	"slow",		AD_LACP_SLOW},



^ permalink raw reply related

* Re: Are concurrent calls to tc action ipt safe?
From: Jan Engelhardt @ 2010-07-19 16:44 UTC (permalink / raw)
  To: Gerd v. Egidy; +Cc: netfilter-devel, netdev
In-Reply-To: <201007191623.40423.lists@egidy.de>

On Monday 2010-07-19 16:23, Gerd v. Egidy wrote:
>AFAIK, current iptables has a short race condition when two rules within the 
>same table are changed at once.
>
>E.g. when two users simultaneously call something like this
>iptables -t filter -A INPUT -s 192.168.1.1 -j MARK --set-mark 1
>and
>iptables -t filter -A INPUT -s 192.168.1.2 -j MARK --set-mark 2
>one of these entries can get lost.

There are many serialization techniques possible to serialize iptables 
execution.

>tc filter add dev eth0 parent ffff: protocol ip prio 1 u32  \
>match ip src 192.168.1.1 \
>action ipt -j MARK --set-mark 1
>
>Since this call uses the xtables targets I'm currently not sure if the same 
>problem regarding concurrent changes exists or not. Can anyone tell me if 
>concurrent calls like this are safe?

This target invocation is not in any table, thus there is no race 
condition.


^ permalink raw reply

* Re: [PATCH net-next-2.6] net: 64bit stats for netdev_queue
From: David Miller @ 2010-07-19 16:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1279546422.2553.45.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 19 Jul 2010 15:33:42 +0200

> Since struct netdev_queue tx_bytes/tx_packets/tx_dropped are already
> protected by _xmit_lock, its easy to convert these fields to u64 instead
> of unsigned long.
> This completes 64bit stats for devices using them (vlan, macvlan, ...)
> 
> Strictly, we could avoid the locking in dev_txq_stats_fold() on 64bit
> arches, but its slow path and we prefer keep it simple.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks a lot Eric.

^ permalink raw reply

* Re: [BUG net-next-2.6] vlan, bonding, bnx2 problems
From: David Miller @ 2010-07-19 16:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: mchan, pedro.netdev, netdev, kaber, bhutchings
In-Reply-To: <1279545854.2553.37.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 19 Jul 2010 15:24:14 +0200

> [RFC net-next-2.6] bonding: fix bond_inet6addr_event() 
> 
> After commit ad1afb0039391 (vlan_dev: VLAN 0 should be treated
> as "no vlan tag" (802.1p packet)),
> bond_inet6addr_event() might be called with a NULL bond->vlgrp pointer, and
> a non empty bond->vlan_list. vlan_group_get_device() is dereferencing a NULL pointer.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

I'll apply this bandaid for now, but yes we need to think more
deeply about this.

^ permalink raw reply

* Re: [PATCH] s2io: Remove unnecessary memset of netdev private data
From: David Miller @ 2010-07-19 16:28 UTC (permalink / raw)
  To: tklauser; +Cc: netdev, kernel-janitors
In-Reply-To: <1279529758-7901-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon, 19 Jul 2010 10:55:58 +0200

> The memory for the private data is allocated using kzalloc in
> alloc_etherdev (or alloc_netdev_mq respectively) so there is no need to
> set it to 0 again.
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied, thanks.

^ permalink raw reply

* Re: [RFC 1/2] netfilter: xt_condition: export list management code
From: Jan Engelhardt @ 2010-07-19 16:13 UTC (permalink / raw)
  To: Luciano Coelho
  Cc: Netfilter Developer Mailing List, netdev, Patrick McHardy, sameo
In-Reply-To: <1279548947-10470-2-git-send-email-luciano.coelho@nokia.com>

On Monday 2010-07-19 16:15, Luciano Coelho wrote:

>From: Luciano Coelho <coelho@testbed>
>
>This patch isolates and exports the condition list management code, in
>preparation for the CONDITION target to use it.  No functional change,
>just reorganization of the code.

Well, I guess it would make more sense if the two extensions be in a 
single file. That would alleviate the need for export reorganizations, 
and also works because the module metadata overhead is large already.

>@@ -3,12 +3,27 @@
> 
> #include <linux/types.h>
> 
>+#define XT_CONDITION_MAX_NAME_SIZE 30
>+
> struct xt_condition_mtinfo {
>-	char name[31];
>+	char name[XT_CONDITION_MAX_NAME_SIZE + 1];
> 	__u8 invert;

Oh noes. Please please avoid any math operations inside []. It has 
already driven XT_FUNCTION_MAXNAMELEN into nuts ("was it now +1 or -1, 
or even -2 that we needed to pass for various functions?"). Just let MAX 
be 31 and have name[MAX].

> MODULE_ALIAS("ip6t_condition");
> 
>-struct condition_variable {
>-	struct list_head list;
>-	struct proc_dir_entry *status_proc;
>-	unsigned int refcount;
>-	bool enabled;
>-};

Given your excellent usage example of a CONDITION target, I think it 
even makes sense to enlarge the "enabled" variable to a full-fledged 
32-bit value that can be &, | and ^'d, similar to nfmark.

^ permalink raw reply

* Re: [0/8] netpoll/bridge fixes
From: David Miller @ 2010-07-19 16:05 UTC (permalink / raw)
  To: herbert; +Cc: mst, shemminger, frzhang, netdev, amwang, mpm
In-Reply-To: <20100719115411.GA22758@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 19 Jul 2010 19:54:11 +0800

> Still, it might be a good idea to disable bridge netpoll in
> 2.6.35.

I thought we did that already.... oh I see, we did it for bonding:

commit c22d7ac844f1cb9c6a5fd20f89ebadc2feef891b
Author: Andy Gospodarek <andy@greyhouse.net>
Date:   Fri Jun 25 09:50:44 2010 +0000

    bonding: prevent netpoll over bonded interfaces

I'm fine with disabling it for bridging too, just send me a patch
similar to the bonding one.

^ permalink raw reply

* Re: bnx2/5709: Strange interrupts spread
From: Christophe Ngo Van Duc @ 2010-07-19 15:55 UTC (permalink / raw)
  To: netdev
In-Reply-To: <AANLkTiniNNPV9ztxXHtX4np7PIZabkm0I4v5O29chf8i@mail.gmail.com>

Dear list,

So i've been able to do some test today:
If I put the 2 interface in a bridge with no IP adress, the interrupts
are on 1 CPU
If I put the 2 interface in a bridge with IP adress, the interrupts
are still on 1 CPU
If I put the 2 interface outside the bridge with IP address,
everything works fine the interrupts get spread on the CPU

So the conclusion seems to be that when the bnx2 is put into
promiscuous mode by the bridge, the RSS hash stop to work even if
traffic is IP in nature.

Best regards,
Christophe.

On Fri, Jul 2, 2010 at 5:33 PM, Christophe Ngo Van Duc
<cngovanduc@gmail.com> wrote:
> Dear list,
>
> I hope I am posting to the correct place...
>
> I am facing a strange issue on a HP DL 360.
>
> I have 2 internal ethernet cards (the one that came by default with
> the server) and 2 additional ethernet cards for a total for 4 ethernet
> cards.
>
> The 2 internal cards are running fine as of interrupts (for example eth1):
>           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
>      CPU6       CPU7
>
>  71:        604      11933         40       1537          0
> 0          0       6043   PCI-MSI-edge      eth1-0
>  72:      24805       9795       3606          0        128
> 0       3365          0   PCI-MSI-edge      eth1-1
>  73:          0        279          0        429         38
> 16540          0      30843   PCI-MSI-edge      eth1-2
>  74:          0          0      25365        267          0
> 0         89      15541   PCI-MSI-edge      eth1-3
>  75:       7244      24108          0          0      16488
> 0        240          0   PCI-MSI-edge      eth1-4
>  76:      21378       3628       7726          0         49
> 247       2871          0   PCI-MSI-edge      eth1-5
>  77:          0          0      47199        459         13
> 46      63064         18   PCI-MSI-edge      eth1-6
>  78:          0       6230         67        283        259
> 82       7846      27130   PCI-MSI-edge      eth1-7
>
> On eth2 (external card) all interrupts goes to CPU0
>           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
>      CPU6       CPU7
>  80:   46973077          0          0            0            0
>   0          0   PCI-MSI-edge      eth2-0
>  81:          0          0          0          0          0
> 0          0          0   PCI-MSI-edge      eth2-1
>  82:          0          0          0          0          0
> 0          0          0   PCI-MSI-edge      eth2-2
>  83:          0          0          0          0          0
> 0          0          0   PCI-MSI-edge      eth2-3
>  84:          0          0          0          0          0
> 0          0          0   PCI-MSI-edge      eth2-4
>  85:          0          0          0          0          0
> 0          0          0   PCI-MSI-edge      eth2-5
>  86:          0          0       2445          0         37
> 0       8463         13   PCI-MSI-edge      eth2-6
>  87:          0          0          0          0          0
> 0          0          0   PCI-MSI-edge      eth2-7
>
> If I understand correctly the RSS hash is used to dispatch the packets
> into the different queues running on the different CPU.
>
> Why then my internal cards are running fine but the additional cards
> (eth2 and eth3) are presenting this behavior where all interrupts goes
> to one CPU?
>
> Thanks for your help in understanding this. (see below for config details)
>
> Christophe.
>
> All are detected correctly at boot:
> Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.8e (April 13, 2010)
> bnx2 0000:02:00.0: PCI INT A -> GSI 31 (level, low) -> IRQ 31
> bnx2 0000:02:00.0: setting latency timer to 64
> eth0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
> at mem f4000000, IRQ 31, node addr f4:ce:46:86:a1:00
> bnx2 0000:02:00.1: PCI INT B -> GSI 39 (level, low) -> IRQ 39
> bnx2 0000:02:00.1: setting latency timer to 64
> eth1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
> at mem f2000000, IRQ 39, node addr f4:ce:46:86:a1:02
> bnx2 0000:07:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
> bnx2 0000:07:00.0: setting latency timer to 64
> eth2: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
> at mem fa000000, IRQ 24, node addr 00:26:55:87:17:98
> bnx2 0000:07:00.1: PCI INT B -> GSI 34 (level, low) -> IRQ 34
> bnx2 0000:07:00.1: setting latency timer to 64
> eth3: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found
> at mem f8000000, IRQ 34, node addr 00:26:55:87:17:9a
>
> Kernel is 2.6.31-13
> Broadcom driver bnx2 v2.0.8e
>
> eth0 is a normal interface with an Ip address
> eth1 is a normal interface with an Ip address
> eth2 belongs to a bridge interface without an ip address, running tc (htb)
> eth3 belongs to the same bridge interface without an ip address
>

^ permalink raw reply

* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15
From: Tejun Heo @ 2010-07-19 14:57 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Lennart Schulte, Eric Dumazet, David S. Miller, lkml,
	netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert
In-Reply-To: <alpine.DEB.2.00.1007161448330.13946@melkinpaasi.cs.helsinki.fi>

Hello,

On 07/16/2010 02:02 PM, Ilpo Järvinen wrote:
> Besides, Tejun has also found that it's hint->next ptr which is NULL in 
> his case so this won't solve his case anyway. Tejun, can you confirm 
> whether it was retransmit_skb_hint->next being NULL on _entry time_ to 
> tcp_xmit_retransmit_queue() or later on in the loop after the updates done 
> by the loop itself to the hint (or that your testing didn't conclude 
> either)?

Sorry about the delay.  I was traveling last week.  Unfortunately, I
don't know whether ->next was NULL on entry or not.  I hacked up the
following ugly patch for the next test run.  It should have everything
which has come up till now + list and hint sanity checking before
starting processing them.  I'm planning on deploying it w/ crashdump
enabled in several days.  If I've missed something, please let me
know.

Thanks.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b4ed957..1c8b1e0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2190,6 +2190,53 @@ static int tcp_can_forward_retransmit(struct sock *sk)
 	return 1;
 }

+static void print_queue(struct sock *sk, struct sk_buff *old, struct sk_buff *hole)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb, *prev;
+	bool do_panic = false;
+
+	skb = tcp_write_queue_head(sk);
+	prev = (struct sk_buff *)(&sk->sk_write_queue);
+
+	if (skb == NULL) {
+		printk("XXX NULL head, pkts %u\n", tp->packets_out);
+		do_panic = true;
+	}
+
+	printk("XXX head %p tail %p sendhead %p oldhint %p now %p hole %p high %u\n",
+	       tcp_write_queue_head(sk), tcp_write_queue_tail(sk),
+	       tcp_send_head(sk), old, tp->retransmit_skb_hint, hole,
+	       tp->retransmit_high);
+
+	while (skb) {
+		printk("XXX skb %p (%u-%u) next %p prev %p sacked %u\n",
+		       skb, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,
+		       skb->next, skb->prev, TCP_SKB_CB(skb)->sacked);
+		if (prev != skb->prev) {
+			printk("XXX Inconsistent prev\n");
+			do_panic = true;
+		}
+
+		if (skb == tcp_write_queue_tail(sk)) {
+			if (skb->next != (struct sk_buff *)(&sk->sk_write_queue)) {
+				printk("XXX Improper next at tail\n");
+				do_panic = true;
+			}
+			break;
+		}
+
+		prev = skb;
+		skb = skb->next;
+	}
+	if (!skb) {
+		printk("XXX Encountered unexpected NULL\n");
+		do_panic = true;
+	}
+	if (do_panic)
+		panic("XXX panicking");
+}
+
 /* This gets called after a retransmit timeout, and the initially
  * retransmitted data is acknowledged.  It tries to continue
  * resending the rest of the retransmit queue, until either
@@ -2198,19 +2245,53 @@ static int tcp_can_forward_retransmit(struct sock *sk)
  * based retransmit packet might feed us FACK information again.
  * If so, we use it to avoid unnecessarily retransmissions.
  */
+static unsigned int caught_it;
+
 void tcp_xmit_retransmit_queue(struct sock *sk)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct sk_buff *skb;
+	struct sk_buff *skb, *prev;
 	struct sk_buff *hole = NULL;
+	struct sk_buff *old = tp->retransmit_skb_hint;
 	u32 last_lost;
 	int mib_idx;
 	int fwd_rexmitting = 0;
+	bool saw_hint = false;
+
+	if (!tp->packets_out) {
+		if (net_ratelimit())
+			printk("XXX !tp->packets_out, retransmit_skb_hint=%p, write_queue_head=%p\n",
+			       tp->retransmit_skb_hint, tcp_write_queue_head(sk));
+		return;
+	}

 	if (!tp->lost_out)
 		tp->retransmit_high = tp->snd_una;

+	for (skb = tcp_write_queue_head(sk),
+	     prev = (struct sk_buff *)&sk->sk_write_queue;
+	     skb != (struct sk_buff *)&sk->sk_write_queue;
+	     prev = skb, skb = skb->next) {
+		if (prev != skb->prev) {
+			printk("XXX sanity check: prev corrupt\n");
+			print_queue(sk, old, hole);
+		}
+		if (skb == tp->retransmit_skb_hint)
+			saw_hint = true;
+		if (skb == tcp_write_queue_tail(sk) &&
+		    skb->next != (struct sk_buff *)(&sk->sk_write_queue)) {
+			printk("XXX sanity check: end corrupt\n");
+			print_queue(sk, old, hole);
+		}
+	}
+	if (tp->retransmit_skb_hint && !saw_hint) {
+		printk("XXX sanity check: retransmit_skb_hint=%p is not on list, claring hint\n",
+		       tp->retransmit_skb_hint);
+		print_queue(sk, old, hole);
+		tp->retransmit_skb_hint = NULL;
+	}
+
 	if (tp->retransmit_skb_hint) {
 		skb = tp->retransmit_skb_hint;
 		last_lost = TCP_SKB_CB(skb)->end_seq;
@@ -2218,7 +2299,17 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 			last_lost = tp->retransmit_high;
 	} else {
 		skb = tcp_write_queue_head(sk);
-		last_lost = tp->snd_una;
+		if (skb)
+			last_lost = tp->snd_una;
+	}
+
+checknull:
+	if (skb == NULL) {
+		print_queue(sk, old, hole);
+		caught_it++;
+		if (net_ratelimit())
+			printk("XXX Errors caught so far %u\n", caught_it);
+		return;
 	}

 	tcp_for_write_queue_from(skb, sk) {
@@ -2261,7 +2352,7 @@ begin_fwd:
 		} else if (!(sacked & TCPCB_LOST)) {
 			if (hole == NULL && !(sacked & (TCPCB_SACKED_RETRANS|TCPCB_SACKED_ACKED)))
 				hole = skb;
-			continue;
+			goto cont;

 		} else {
 			last_lost = TCP_SKB_CB(skb)->end_seq;
@@ -2272,7 +2363,7 @@ begin_fwd:
 		}

 		if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))
-			continue;
+			goto cont;

 		if (tcp_retransmit_skb(sk, skb))
 			return;
@@ -2282,6 +2373,9 @@ begin_fwd:
 			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
 						  inet_csk(sk)->icsk_rto,
 						  TCP_RTO_MAX);
+cont:
+		skb = skb->next;
+		goto checknull;
 	}
 }

-- 
tejun

^ permalink raw reply related

* Re: [PATCH 07/11] Removing dead ARCH_PNX010X
From: Christoph Egger @ 2010-07-19 14:37 UTC (permalink / raw)
  To: David Miller
  Cc: joe, shemminger, dongdong.deng, jkosina, netdev, linux-kernel,
	vamos-dev
In-Reply-To: <20100714.133916.71109591.davem@davemloft.net>

On Wed, Jul 14, 2010 at 01:39:16PM -0700, David Miller wrote:
> From: Christoph Egger <siccegge@cs.fau.de>
> Date: Wed, 14 Jul 2010 14:41:09 +0200
> 
> > ARCH_PNX010X doesn't exist in Kconfig, therefore removing all
> > references for it from the source code.
> > 
> > Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
> 
> If you are going to kill this off, kill the references in
> driver/net/Kconfig at the same time.
> 
> Please fix this up and resubmit your patch, thanks.

DOne, patch below

Thanks

    CHristoph

---
>From ed6ffbfd77e14f17fa7d75ddf70b0d3b0126848c Mon Sep 17 00:00:00 2001
From: Christoph Egger <siccegge@cs.fau.de>
Date: Wed, 14 Jul 2010 14:19:15 +0200
Subject: [PATCH] Removing dead ARCH_PNX010X

ARCH_PNX010X doesn't exist in Kconfig, therefore removing all
references for it from the source code/Kconfig.

Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
---
 drivers/net/Kconfig  |    4 ++--
 drivers/net/cs89x0.c |   45 ---------------------------------------------
 2 files changed, 2 insertions(+), 47 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index ce2fcdd..ba5b862 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1463,7 +1463,7 @@ config FORCEDETH
 config CS89x0
 	tristate "CS89x0 support"
 	depends on NET_ETHERNET && (ISA || EISA || MACH_IXDP2351 \
-		|| ARCH_IXDP2X01 || ARCH_PNX010X || MACH_MX31ADS)
+		|| ARCH_IXDP2X01 || MACH_MX31ADS)
 	---help---
 	  Support for CS89x0 chipset based Ethernet cards. If you have a
 	  network (Ethernet) card of this type, say Y and read the
@@ -1477,7 +1477,7 @@ config CS89x0
 config CS89x0_NONISA_IRQ
 	def_bool y
 	depends on CS89x0 != n
-	depends on MACH_IXDP2351 || ARCH_IXDP2X01 || ARCH_PNX010X || MACH_MX31ADS
+	depends on MACH_IXDP2351 || ARCH_IXDP2X01 || MACH_MX31ADS
 
 config TC35815
 	tristate "TOSHIBA TC35815 Ethernet support"
diff --git a/drivers/net/cs89x0.c b/drivers/net/cs89x0.c
index 2ccb9f1..7a5d787 100644
--- a/drivers/net/cs89x0.c
+++ b/drivers/net/cs89x0.c
@@ -180,12 +180,6 @@ static unsigned int cs8900_irq_map[] = {IRQ_IXDP2351_CS8900, 0, 0, 0};
 #elif defined(CONFIG_ARCH_IXDP2X01)
 static unsigned int netcard_portlist[] __used __initdata = {IXDP2X01_CS8900_VIRT_BASE, 0};
 static unsigned int cs8900_irq_map[] = {IRQ_IXDP2X01_CS8900, 0, 0, 0};
-#elif defined(CONFIG_ARCH_PNX010X)
-#include <mach/gpio.h>
-#define CIRRUS_DEFAULT_BASE	IO_ADDRESS(EXT_STATIC2_s0_BASE + 0x200000)	/* = Physical address 0x48200000 */
-#define CIRRUS_DEFAULT_IRQ	VH_INTC_INT_NUM_CASCADED_INTERRUPT_1 /* Event inputs bank 1 - ID 35/bit 3 */
-static unsigned int netcard_portlist[] __used __initdata = {CIRRUS_DEFAULT_BASE, 0};
-static unsigned int cs8900_irq_map[] = {CIRRUS_DEFAULT_IRQ, 0, 0, 0};
 #elif defined(CONFIG_MACH_MX31ADS)
 #include <mach/board-mx31ads.h>
 static unsigned int netcard_portlist[] __used __initdata = {
@@ -372,18 +366,6 @@ writeword(unsigned long base_addr, int portno, u16 value)
 {
 	__raw_writel(value, base_addr + (portno << 1));
 }
-#elif defined(CONFIG_ARCH_PNX010X)
-static u16
-readword(unsigned long base_addr, int portno)
-{
-	return inw(base_addr + (portno << 1));
-}
-
-static void
-writeword(unsigned long base_addr, int portno, u16 value)
-{
-	outw(value, base_addr + (portno << 1));
-}
 #else
 static u16
 readword(unsigned long base_addr, int portno)
@@ -546,30 +528,6 @@ cs89x0_probe1(struct net_device *dev, int ioaddr, int modular)
 #endif
         }
 
-#ifdef CONFIG_ARCH_PNX010X
-	initialize_ebi();
-
-	/* Map GPIO registers for the pins connected to the CS8900a. */
-	if (map_cirrus_gpio() < 0)
-		return -ENODEV;
-
-	reset_cirrus();
-
-	/* Map event-router registers. */
-	if (map_event_router() < 0)
-		return -ENODEV;
-
-	enable_cirrus_irq();
-
-	unmap_cirrus_gpio();
-	unmap_event_router();
-
-	dev->base_addr = ioaddr;
-
-	for (i = 0 ; i < 3 ; i++)
-		readreg(dev, 0);
-#endif
-
 	/* Grab the region so we can find another board if autoIRQ fails. */
 	/* WTF is going on here? */
 	if (!request_region(ioaddr & ~3, NETCARD_IO_EXTENT, DRV_NAME)) {
@@ -1391,9 +1349,6 @@ net_open(struct net_device *dev)
 	case A_CNF_MEDIA_10B_2: result = lp->adapter_cnf & A_CNF_10B_2; break;
         default: result = lp->adapter_cnf & (A_CNF_10B_T | A_CNF_AUI | A_CNF_10B_2);
         }
-#ifdef CONFIG_ARCH_PNX010X
-	result = A_CNF_10B_T;
-#endif
         if (!result) {
                 printk(KERN_ERR "%s: EEPROM is configured for unavailable media\n", dev->name);
 release_dma:
-- 
1.7.0.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox