Netdev List
 help / color / mirror / Atom feed
* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-21  1:22 UTC (permalink / raw)
  To: mpatocka; +Cc: netdev, kaber
In-Reply-To: <Pine.LNX.4.64.1004202116570.29631@hs20-bc2-1.build.redhat.com>

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 20 Apr 2010 21:20:09 -0400 (EDT)

> 
> 
> On Tue, 20 Apr 2010, David Miller wrote:
> 
>> From: Mikulas Patocka <mpatocka@redhat.com>
>> Date: Tue, 20 Apr 2010 21:10:04 -0400 (EDT)
>> 
>> > I see, but GRO is turned off on my interfaces, according to ethtool.
>> 
>> GRO is just a flag bit, so it's possible that if your kernel is too
>> old ethtool will always show that it's off.
>> 
>> If you haven't turned off GRO explicitly, then it's a good bet that
>> this is why it looks like it's off.  And GRO is on by default.
> 
> I have kernel 2.6.34-rc4, ethtool 2.6.33 and GRO is off. I haven't turned 
> it off, I left it on default.

See my follow-up, what ethtool output makes you think GRO is
off?  "large-receive-offload" is not GRO

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: Mikulas Patocka @ 2010-04-21  1:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20100420.181648.183008607.davem@davemloft.net>



On Tue, 20 Apr 2010, David Miller wrote:

> From: David Miller <davem@davemloft.net>
> Date: Tue, 20 Apr 2010 18:14:34 -0700 (PDT)
> 
> > From: Mikulas Patocka <mpatocka@redhat.com>
> > Date: Tue, 20 Apr 2010 21:10:04 -0400 (EDT)
> > 
> >> I see, but GRO is turned off on my interfaces, according to ethtool.
> > 
> > GRO is just a flag bit, so it's possible that if your kernel is too
> > old ethtool will always show that it's off.
> 
> Actually, looking back at your original report, are you confusing
> "large-receive-offload" as reported by ethtool with GRO?
> 
> They are completely seperate things.
> 
> "large-receive-offload" is LRO, whereas GRO is something done
> in software and something entirely different.

Posting once more. Both LRO and GRO are off. And it is default.

Mikulas

Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off

Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off


^ permalink raw reply

* Re: [PATCH] bridge: add a missing ntohs()
From: David Miller @ 2010-04-21  1:21 UTC (permalink / raw)
  To: herbert; +Cc: eric.dumazet, netdev
In-Reply-To: <20100420133016.GA21788@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 20 Apr 2010 21:30:16 +0800

> On Tue, Apr 20, 2010 at 03:20:05PM +0200, Eric Dumazet wrote:
>> Found by code review and sparse endianness warning, I am not sure this
>> patch is valid.
>> 
>> Thanks
>> 
>> [PATCH] bridge: add a missing ntohs()
>> 
>> grec_nsrcs is in network order, we should convert to host horder in
>> br_multicast_igmp3_report()
>> 
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Good catch Eric.

Note that this means that most igmpv3_grec entries get ignored
on little-endian because the greg_nsrcs will look "real big"
and thus the pskb_may_pull() will fail.

Anyways I'll apply this to net-2.6, thanks.

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: Mikulas Patocka @ 2010-04-21  1:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20100420.181434.35828504.davem@davemloft.net>



On Tue, 20 Apr 2010, David Miller wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 21:10:04 -0400 (EDT)
> 
> > I see, but GRO is turned off on my interfaces, according to ethtool.
> 
> GRO is just a flag bit, so it's possible that if your kernel is too
> old ethtool will always show that it's off.
> 
> If you haven't turned off GRO explicitly, then it's a good bet that
> this is why it looks like it's off.  And GRO is on by default.

I have kernel 2.6.34-rc4, ethtool 2.6.33 and GRO is off. I haven't turned 
it off, I left it on default.

> I still contend, therefore, that it's completely normal to see GSO
> packet processing in the TX code path, even with bridging, and
> therefore seeing the GSO code path get taken is not indicative of some
> bug wrt. ->ndo_start_xmit() return value flag bit handling.

Mikulas

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-21  1:16 UTC (permalink / raw)
  To: mpatocka; +Cc: netdev, kaber
In-Reply-To: <20100420.181434.35828504.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 20 Apr 2010 18:14:34 -0700 (PDT)

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 21:10:04 -0400 (EDT)
> 
>> I see, but GRO is turned off on my interfaces, according to ethtool.
> 
> GRO is just a flag bit, so it's possible that if your kernel is too
> old ethtool will always show that it's off.

Actually, looking back at your original report, are you confusing
"large-receive-offload" as reported by ethtool with GRO?

They are completely seperate things.

"large-receive-offload" is LRO, whereas GRO is something done
in software and something entirely different.

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-21  1:14 UTC (permalink / raw)
  To: mpatocka; +Cc: netdev, kaber
In-Reply-To: <Pine.LNX.4.64.1004202108240.27476@hs20-bc2-1.build.redhat.com>

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 20 Apr 2010 21:10:04 -0400 (EDT)

> I see, but GRO is turned off on my interfaces, according to ethtool.

GRO is just a flag bit, so it's possible that if your kernel is too
old ethtool will always show that it's off.

If you haven't turned off GRO explicitly, then it's a good bet that
this is why it looks like it's off.  And GRO is on by default.

I still contend, therefore, that it's completely normal to see GSO
packet processing in the TX code path, even with bridging, and
therefore seeing the GSO code path get taken is not indicative of some
bug wrt. ->ndo_start_xmit() return value flag bit handling.

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: Mikulas Patocka @ 2010-04-21  1:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20100420.180253.159346294.davem@davemloft.net>

On Tue, 20 Apr 2010, David Miller wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 20:23:57 -0400 (EDT)
> 
> > I have two NICs, each with 1500 MTU. The stack trace indicates that a 
> > packet was received by one NIC and bridged to the other. The stack trace 
> > also indicates that it went through GSO code path. The question is why? 
> > How could a forwarded packet be so large to use GSO?
> 
> GRO automatically accumulates packets together, accumulating them into
> larger super-packets.  This is done regardless of input device feeding
> it.
> 
> Can you understand this now?  In software, we accumulate all incoming
> packets for a TCP stream into larger super-packets.  Just because it's
> a bridging scenerio doesn't mean we disable GRO.
> 
> These super-packets are being bridged, then forwarded out your
> destination device and GSO has to de-segment these GRO frames.
> 
> GRO is done unconditionally, all the time, for all packets.

I see, but GRO is turned off on my interfaces, according to ethtool.

Mikulas

^ permalink raw reply

* Re: [PATCH] gianfar: Wait for both RX and TX to stop
From: David Miller @ 2010-04-21  1:06 UTC (permalink / raw)
  To: timur.tabi; +Cc: galak, afleming, netdev
In-Reply-To: <o2oed82fe3e1004200801n98683cdalb2898c879b93f8fd@mail.gmail.com>

From: Timur Tabi <timur.tabi@gmail.com>
Date: Tue, 20 Apr 2010 10:01:48 -0500

> On Mon, Apr 19, 2010 at 11:43 PM, Kumar Gala <galak@kernel.crashing.org> wrote:
> 
>> spin_event_timeout doesn't make sense for this.  The patch is fine.
> 
> Can you please elaborate on that?  I don't understand why you think
> that.  spin_event_timeout() takes an expression and a timeout, and
> loops over the expression calling cpu_relax(), just like this loop
> does.

Indeed it does, Kumar this request seems reasonable.

^ permalink raw reply

* Re: [PATCH] can: Fix possible NULL pointer dereference in ems_usb.c
From: David Miller @ 2010-04-21  1:05 UTC (permalink / raw)
  To: wg; +Cc: hjk, netdev, socketcan-core
In-Reply-To: <4BCE04D9.2030601@grandegger.com>

From: Wolfgang Grandegger <wg@grandegger.com>
Date: Tue, 20 Apr 2010 21:47:37 +0200

> Hans J. Koch wrote:
>> In ems_usb_probe(), a pointer is dereferenced after making sure it is NULL...
>> 
>> This patch replaces dev_err() with printk() to avoid this.
>> 
>> Signed-off-by: "Hans J. Koch" <hjk@linutronix.de>
>> ---
>>  drivers/net/can/usb/ems_usb.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> Index: linux-2.6.34-rc/drivers/net/can/usb/ems_usb.c
>> ===================================================================
>> --- linux-2.6.34-rc.orig/drivers/net/can/usb/ems_usb.c	2010-04-20 15:32:25.000000000 +0200
>> +++ linux-2.6.34-rc/drivers/net/can/usb/ems_usb.c	2010-04-20 15:33:20.000000000 +0200
>> @@ -1006,7 +1006,7 @@
>>  
>>  	netdev = alloc_candev(sizeof(struct ems_usb), MAX_TX_URBS);
>>  	if (!netdev) {
>> -		dev_err(netdev->dev.parent, "Couldn't alloc candev\n");
>> +		printk(KERN_ERR "ems_usb: Couldn't alloc candev\n");
>>  		return -ENOMEM;
>>  	}
> 
> I think "dev_err(&intf->dev, ...)" should be used before
> SET_NETDEV_DEV(netdev, &intf->dev) is called. I see two "dev_err()"
> calls which need to be fixed.

Agreed.

^ permalink raw reply

* Re: pull request: wireless-2.6 2010-04-20
From: David Miller @ 2010-04-21  1:04 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20100420183013.GC6472@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Tue, 20 Apr 2010 14:30:14 -0400

> Here are a few more intended for 2.6.34, mostly from the iwlwifi team.
> They fix a couple of potential crashes, an incorrect EEPROM offset
> related to regulatory information, and a connectivity failure relating
> to 802.11n and the 4965.  The "iwlwifi: fix scan races" seems a little
> long, but much of it is the fallout from renaming a goto label.

Pulled, thanks John.

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-21  1:02 UTC (permalink / raw)
  To: mpatocka; +Cc: netdev, kaber
In-Reply-To: <Pine.LNX.4.64.1004202018100.16302@hs20-bc2-1.build.redhat.com>

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 20 Apr 2010 20:23:57 -0400 (EDT)

> I have two NICs, each with 1500 MTU. The stack trace indicates that a 
> packet was received by one NIC and bridged to the other. The stack trace 
> also indicates that it went through GSO code path. The question is why? 
> How could a forwarded packet be so large to use GSO?

GRO automatically accumulates packets together, accumulating them into
larger super-packets.  This is done regardless of input device feeding
it.

Can you understand this now?  In software, we accumulate all incoming
packets for a TCP stream into larger super-packets.  Just because it's
a bridging scenerio doesn't mean we disable GRO.

These super-packets are being bridged, then forwarded out your
destination device and GSO has to de-segment these GRO frames.

GRO is done unconditionally, all the time, for all packets.

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: Mikulas Patocka @ 2010-04-21  0:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20100420.171525.200010343.davem@davemloft.net>



On Tue, 20 Apr 2010, David Miller wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 20:12:45 -0400 (EDT)
> 
> > Why is it using GSO on bridging?
> 
> Unlike LRO, GRO and GSO are completely valid in bridging and
> routing situations.
> 
> In fact, in virtualization environments it is essential for
> good performance.

I know it may be used for bridging. But it doesn't explain how it happened 
in my case.

I have two NICs, each with 1500 MTU. The stack trace indicates that a 
packet was received by one NIC and bridged to the other. The stack trace 
also indicates that it went through GSO code path. The question is why? 
How could a forwarded packet be so large to use GSO?

Mikulas

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Evgeniy Polyakov @ 2010-04-21  0:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Gaspar Chilingarov, netdev
In-Reply-To: <1271806945.7895.581.camel@edumazet-laptop>

Hi.

On Wed, Apr 21, 2010 at 01:42:25AM +0200, Eric Dumazet (eric.dumazet@gmail.com) wrote:
> If you want to check source, its in file net/ipv4/inet_connection_sock.c
> 
> function inet_csk_get_port()
> 
> you can remove the 
> 
> if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) { 
> 	spin_unlock(&head->lock); 
> 	snum = smallest_rover; 
> 	goto have_snum;
> }
> 
> It will solve your problem (but bind() will probably be slow)

Netcat uses SO_REUSEADDR, so its the code path which generates an error,
but this part actually returns a port number not en error. I suppose
what happend is 'attempts' check faired and thus system failed to find
a small enough bucket which does not conflict. To test this we can
remove 'smallest_size = tb->num_owners;' assignment, but bind() can be
damn slow in this case indeed.

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Evgeniy Polyakov @ 2010-04-21  0:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Greear, David Miller, Gaspar Chilingarov, netdev
In-Reply-To: <1271808314.7895.614.camel@edumazet-laptop>

On Wed, Apr 21, 2010 at 02:05:14AM +0200, Eric Dumazet (eric.dumazet@gmail.com) wrote:
> I believe the bsockets 'optimization' is a bug, we should remove it.
> 
> This is a stable candidate (2.6.30+)
> 
> [PATCH net-next-2.6] tcp: remove bsockets count
> 
> Counting number of bound sockets to avoid a loop is buggy, since we cant
> know how many IP addresses are in use. When threshold is reached, we try
> 5 random slots and can fail while there are plenty available ports.

To return back to exponential bind() times you need to revert the whole
original patch including magic 5 number, not only bsockets.

But actual problem is not in this digit, but in a deeper logic.
Previously we scanned the whole table, now we have 5 attempts to
find out at least one bucket (without conflict) we will insert
new socket into. Apparently for large number of addresses it is possible
that all 5 times we will randomly select those buckets which conflicts.
As dumb solution we can increase 'attempt' number to infinite one, or
fallback to whole-table-search after several random attempts, which is a
bit more clever I think.

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-21  0:15 UTC (permalink / raw)
  To: mpatocka; +Cc: netdev, kaber
In-Reply-To: <Pine.LNX.4.64.1004202003010.13651@hs20-bc2-1.build.redhat.com>

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 20 Apr 2010 20:12:45 -0400 (EDT)

> Why is it using GSO on bridging?

Unlike LRO, GRO and GSO are completely valid in bridging and
routing situations.

In fact, in virtualization environments it is essential for
good performance.

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Eric Dumazet @ 2010-04-21  0:14 UTC (permalink / raw)
  To: Gaspar Chilingarov; +Cc: Ben Greear, netdev
In-Reply-To: <n2g46c8cb3e1004201657t50599d7em5535f2f12af96b59@mail.gmail.com>

Le mercredi 21 avril 2010 à 04:57 +0500, Gaspar Chilingarov a écrit :
> Ohhh, see my bug report again, please :)
> 
> The problem arises when used port count is close to or more than the
> number of local ports available on the system (in your case 40000
> client connections is smaller than 51000 of available ports).
> 
> I think that if you choose to go with default settings and try to bind
> half of your ports to one local IP and half of your ports - to another
> one -- you will hit this bug as well.
> 
> If you explicitly bind to the local ports as well - it _may_ solve a
> problem - I will this today (it's just a workaround, but in fact not
> the solution :), but if you let kernel automatically find local ports
> - you will get the errors.

Yes, I posted a preliminar patch to solve this problem, please test
it ;)

But beware bind(port=0) time might be long. This is why high perf
programs prefer to give the exact port at bind(IP, port) time.

Algo will scan the whole range to find the port used by minimum number
of connection. Then it'll check if the selected IP was already in use.

If not -> ok, port found

If yes -> redo five time the same loop (I guess this redo should be
changed too, to use something else.




^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: David Miller @ 2010-04-21  0:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: greearb, gasparch, netdev
In-Reply-To: <1271808314.7895.614.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Apr 2010 02:05:14 +0200

> [PATCH net-next-2.6] tcp: remove bsockets count
> 
> Counting number of bound sockets to avoid a loop is buggy, since we cant
> know how many IP addresses are in use. When threshold is reached, we try
> 5 random slots and can fail while there are plenty available ports.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Hmmm, yes indeed it seems there are improper assumptions made by
this scheme.

This is a tricky area, so I'll wait for some test results from the
reporter and study the code over a few times myself to make sure
we get this right.

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: Mikulas Patocka @ 2010-04-21  0:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20100420.170038.130590455.davem@davemloft.net>



On Tue, 20 Apr 2010, David Miller wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 19:57:45 -0400 (EDT)
> 
> >> We're not reverting the commit you mention, we're going to fix
> >> whatever bug exists instead.
> > 
> > So have a fun searching for them :-/
> 
> When you report a problem and your knee-jerk reaction is "revert this
> commit", expect some push back.

I tried to analyze skb handling and couldn't check correctness of this 
commit. I really think it should be reverted and applied later, when the 
author does it correctly (for example, one bit in the return value to 
indicate consumed/returned skb, other unimportant informational bits).

> We'll instead analyze this bug and take the time to figure out the
> most appropriate course of action.

And do you have any idea what caused it? Why is it using GSO on bridging?

Mikulas

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Gaspar Chilingarov @ 2010-04-21  0:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Greear, David Miller, netdev
In-Reply-To: <1271808314.7895.614.camel@edumazet-laptop>

2010/4/21 Eric Dumazet <eric.dumazet@gmail.com>:
> Le mardi 20 avril 2010 à 16:49 -0700, Ben Greear a écrit :
>> On 04/20/2010 04:35 PM, Gaspar Chilingarov wrote:
>> > sysctl -a | grep local_port_range
>>
>> [root@ct503-10G-09 ~]# sysctl -a | grep local_port_range
>> net.ipv4.ip_local_port_range = 10000  61000
>>
>> I'm explicitly binding to local ports as well as local IPs, btw.
>>
>
> I believe the bsockets 'optimization' is a bug, we should remove it.
>
> This is a stable candidate (2.6.30+)
>
> [PATCH net-next-2.6] tcp: remove bsockets count
>
> Counting number of bound sockets to avoid a loop is buggy, since we cant
> know how many IP addresses are in use. When threshold is reached, we try
> 5 random slots and can fail while there are plenty available ports.
>

Thank you a lot for the patch - I will try it.

In FreeBSD I was able to add about 32 C classes (8192 ips) on the
single interface (never tried to do that in Linux yet :) - so you
really never know how much IP's are there available. Tens and even up
to hundred IPs on the single machine are not that usual in hosting
environment at all.

/Gaspar

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Eric Dumazet @ 2010-04-21  0:05 UTC (permalink / raw)
  To: Ben Greear, David Miller; +Cc: Gaspar Chilingarov, netdev
In-Reply-To: <4BCE3D8D.3030500@candelatech.com>

Le mardi 20 avril 2010 à 16:49 -0700, Ben Greear a écrit :
> On 04/20/2010 04:35 PM, Gaspar Chilingarov wrote:
> > sysctl -a | grep local_port_range
> 
> [root@ct503-10G-09 ~]# sysctl -a | grep local_port_range
> net.ipv4.ip_local_port_range = 10000	61000
> 
> I'm explicitly binding to local ports as well as local IPs, btw.
> 

I believe the bsockets 'optimization' is a bug, we should remove it.

This is a stable candidate (2.6.30+)

[PATCH net-next-2.6] tcp: remove bsockets count

Counting number of bound sockets to avoid a loop is buggy, since we cant
know how many IP addresses are in use. When threshold is reached, we try
5 random slots and can fail while there are plenty available ports.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/inet_hashtables.h   |    2 --
 net/ipv4/inet_connection_sock.c |    5 -----
 net/ipv4/inet_hashtables.c      |    5 -----
 3 files changed, 12 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 74358d1..e0f3a05 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -150,8 +150,6 @@ struct inet_hashinfo {
 	 */
 	struct inet_listen_hashbucket	listening_hash[INET_LHTABLE_SIZE]
 					____cacheline_aligned_in_smp;
-
-	atomic_t			bsockets;
 };
 
 static inline struct inet_ehash_bucket *inet_ehash_bucket(
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 8da6429..0bbfd00 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -119,11 +119,6 @@ again:
 					    (tb->num_owners < smallest_size || smallest_size == -1)) {
 						smallest_size = tb->num_owners;
 						smallest_rover = rover;
-						if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) {
-							spin_unlock(&head->lock);
-							snum = smallest_rover;
-							goto have_snum;
-						}
 					}
 					goto next;
 				}
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 2b79377..4bc921f 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -62,8 +62,6 @@ void inet_bind_hash(struct sock *sk, struct inet_bind_bucket *tb,
 {
 	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
 
-	atomic_inc(&hashinfo->bsockets);
-
 	inet_sk(sk)->inet_num = snum;
 	sk_add_bind_node(sk, &tb->owners);
 	tb->num_owners++;
@@ -81,8 +79,6 @@ static void __inet_put_port(struct sock *sk)
 	struct inet_bind_hashbucket *head = &hashinfo->bhash[bhash];
 	struct inet_bind_bucket *tb;
 
-	atomic_dec(&hashinfo->bsockets);
-
 	spin_lock(&head->lock);
 	tb = inet_csk(sk)->icsk_bind_hash;
 	__sk_del_bind_node(sk);
@@ -551,7 +547,6 @@ void inet_hashinfo_init(struct inet_hashinfo *h)
 {
 	int i;
 
-	atomic_set(&h->bsockets, 0);
 	for (i = 0; i < INET_LHTABLE_SIZE; i++) {
 		spin_lock_init(&h->listening_hash[i].lock);
 		INIT_HLIST_NULLS_HEAD(&h->listening_hash[i].head,



^ permalink raw reply related

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-21  0:00 UTC (permalink / raw)
  To: mpatocka; +Cc: netdev, kaber
In-Reply-To: <Pine.LNX.4.64.1004201947470.29017@hs20-bc2-1.build.redhat.com>

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 20 Apr 2010 19:57:45 -0400 (EDT)

>> We're not reverting the commit you mention, we're going to fix
>> whatever bug exists instead.
> 
> So have a fun searching for them :-/

When you report a problem and your knee-jerk reaction is "revert this
commit", expect some push back.

We'll instead analyze this bug and take the time to figure out the
most appropriate course of action.

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: Mikulas Patocka @ 2010-04-20 23:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20100420.164505.34362726.davem@davemloft.net>

On Tue, 20 Apr 2010, David Miller wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 19:40:56 -0400 (EDT)
> 
> > Reviewing the code further, I found one very weird commit
> > 572a9d7b6fc7f20f573664063324c086be310c42 committed to 2.6.33. What
> > it does, it changes the semantics of ndo_hard_start_xmit(). Prior to
> > the patch, the meaning was --- return zero (NETDEV_TX_OK) --- the
> > skb is consumed by the driver. Returns non-zero --- the skb is left
> > owned by the caller.  The patch changes it to return other flags in
> > bits 4-7 and changes the consumed/returned logic.
> > 
> > The problem is that there is still plenty of code that compares it
> > against NETDEV_TX_OK to find out if the skb was consumed.
> 
> Drivers are not supposed to return those new flag bits, the new flag
> bits as return values exist only in the packet scheduler path.

If they are not supposed to return it, why do you test it in 
dev_hard_start_xmit?

Define clearly what the return status is allowed (and maybe enforce it 
with BUG()s) and follow that definition.

Currently, there is no specification and the code is unintelligible crap: 
it first masks the value with ~0xf0 to test if the skb is consumed 
(dev_hard_start_xmit) and then compares it with 0x0f for the same test 
(dev_queue_xmit).

> We're not reverting the commit you mention, we're going to fix
> whatever bug exists instead.

So have a fun searching for them :-/

Mikulas

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Gaspar Chilingarov @ 2010-04-20 23:57 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4BCE3D8D.3030500@candelatech.com>

Ohhh, see my bug report again, please :)

The problem arises when used port count is close to or more than the
number of local ports available on the system (in your case 40000
client connections is smaller than 51000 of available ports).

I think that if you choose to go with default settings and try to bind
half of your ports to one local IP and half of your ports - to another
one -- you will hit this bug as well.

If you explicitly bind to the local ports as well - it _may_ solve a
problem - I will this today (it's just a workaround, but in fact not
the solution :), but if you let kernel automatically find local ports
- you will get the errors.

/Gaspar


2010/4/21 Ben Greear <greearb@candelatech.com>:
> On 04/20/2010 04:35 PM, Gaspar Chilingarov wrote:
>>
>> sysctl -a | grep local_port_range
>
> [root@ct503-10G-09 ~]# sysctl -a | grep local_port_range
> net.ipv4.ip_local_port_range = 10000    61000
>
> I'm explicitly binding to local ports as well as local IPs, btw.
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>



-- 
Gaspar Chilingarov

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Ben Greear @ 2010-04-20 23:49 UTC (permalink / raw)
  To: Gaspar Chilingarov; +Cc: netdev
In-Reply-To: <y2p46c8cb3e1004201635s8f84672dq90d3115dca04a1ed@mail.gmail.com>

On 04/20/2010 04:35 PM, Gaspar Chilingarov wrote:
> sysctl -a | grep local_port_range

[root@ct503-10G-09 ~]# sysctl -a | grep local_port_range
net.ipv4.ip_local_port_range = 10000	61000

I'm explicitly binding to local ports as well as local IPs, btw.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-20 23:48 UTC (permalink / raw)
  To: mpatocka; +Cc: lists.linux-foundation.org, netdev, kaber
In-Reply-To: <20100420.164505.34362726.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 20 Apr 2010 16:45:05 -0700 (PDT)

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 19:40:56 -0400 (EDT)
> 
>> Reviewing the code further, I found one very weird commit
>> 572a9d7b6fc7f20f573664063324c086be310c42 committed to 2.6.33. What
>> it does, it changes the semantics of ndo_hard_start_xmit(). Prior to
>> the patch, the meaning was --- return zero (NETDEV_TX_OK) --- the
>> skb is consumed by the driver. Returns non-zero --- the skb is left
>> owned by the caller.  The patch changes it to return other flags in
>> bits 4-7 and changes the consumed/returned logic.
>> 
>> The problem is that there is still plenty of code that compares it
>> against NETDEV_TX_OK to find out if the skb was consumed.
> 
> Drivers are not supposed to return those new flag bits, the new flag
> bits as return values exist only in the packet scheduler path.

And BTW, NETDEV_TX_OK is only ever returned by itself, the
flag bits only get set when a non-NETDEV_TX_OK value is returned.

So we really haven't changed semantics at all, NETDEV_TX_OK (which is
zero) and non-zero are the two valid return value cases.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox