Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6] bonding: move slave MTU handling from sysfs V2
From: Jay Vosburgh @ 2010-05-20 18:21 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, bonding-devel, monis
In-Reply-To: <20100520053403.GA2867@psychotron.redhat.com>

Jiri Pirko <jpirko@redhat.com> wrote:

>Thu, May 20, 2010 at 02:07:41AM CEST, fubar@us.ibm.com wrote:
[...]
>>	This chunk doesn't apply to net-next-2.6 because your context
>>doesn't match; it looks like you've removed the variable "found" in your
>>"before" source.  On closer inspection, "found" isn't actually used
>>meaningfully, so I'm guessing you removed it in a prior patch but didn't
>>submit that patch.
>>
>>	If that's the case, could you repost the whole series, with
>>sequence numbers?
>
>I don't think that's necessary for now. The patch removing found was posted as a
>first one:
>http://patchwork.ozlabs.org/patch/52795/
>
>I tried it several times. Patches are cleanly applicable in order I posted it.

	Ok, I tracked down a copy (not sure where mine went).  Sequence
numbers do help in general, though, as a set of email messages aren't
always delivered in the same order they're sent.

	In any event, the patches all look ok to me (they do apply
cleanly and compile, now that I have the complete set), but none of them
are bug fixes, and should therefore probably wait until net-next
reopens.  

	So, for whenever the tree is open:

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: bnx2x + SFP+ DA/2.6.33.3: Got bad status 0x0 when reading from SFP+ EEPROM -> SFP+ module is not initialized
From: Eilon Greenstein @ 2010-05-20 18:45 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF57ADF.1060203@ans.pl>

On Thu, 2010-05-20 at 11:09 -0700, Krzysztof Olędzki wrote:
> On 2010-05-20 19:49, Eilon Greenstein wrote:
> > On Thu, 2010-05-20 at 10:08 -0700, Krzysztof Olędzki wrote:
> >> Hello,
> >>
> >> I would like to connect a dual port SFP+ NetXtreme II BCM57711
> >> 10-Gigabit NIC to a HP J9309A ProCurve 4-Port 10GbE SFP+ zl Module
> >> using a HP ProCurve 10-GbE SFP+ 7m Direct Attach Cable (J9285B).
> >>
> >> Unfortuantely, it does not work. :( After connecting the switch and
> >> the server together and loading the bnx2x module the switch logs:
> >>
> >> I 05/20/10 18:32:23 ports: port E4 is Blocked by STP
> >> I 05/20/10 18:32:23 ports: port E4 is now on-line
> >>
> >> Here is the dmesg output from the server:
> >>
> >> Broadcom NetXtreme II 5771x 10Gigabit Ethernet Driver bnx2x 1.52.1-5 (2009/11/09)
> >> bnx2x 0000:04:00.0: PCI INT A ->  GSI 38 (level, low) ->  IRQ 38
> >> bnx2x 0000:04:00.0: setting latency timer to 64
> >> bnx2x: part number 394D4342-31373735-31314131-473331
> >> bnx2x: Loading bnx2x-e1h-5.2.7.0.fw
> >> bnx2x 0000:04:00.0: firmware: requesting bnx2x-e1h-5.2.7.0.fw
> >> eth4: Broadcom NetXtreme II BCM57711 XGb (A0) PCI-E x8 5GHz (Gen2) found at mem dc000000, IRQ 38, node addr 00:10:18:5f:e4:b4
> >> bnx2x 0000:04:00.1: PCI INT B ->  GSI 45 (level, low) ->  IRQ 45
> >> bnx2x 0000:04:00.1: setting latency timer to 64
> >> bnx2x: part number 394D4342-31373735-31314131-473331
> >> bnx2x: Loading bnx2x-e1h-5.2.7.0.fw
> >> bnx2x 0000:04:00.1: firmware: requesting bnx2x-e1h-5.2.7.0.fw
> >> eth5: Broadcom NetXtreme II BCM57711 XGb (A0) PCI-E x8 5GHz (Gen2) found at mem dd000000, IRQ 45, node addr 00:10:18:5f:e4:b6
> >> bnx2x 0000:04:00.0: irq 97 for MSI/MSI-X
> >> bnx2x 0000:04:00.0: irq 98 for MSI/MSI-X
> >> bnx2x 0000:04:00.0: irq 99 for MSI/MSI-X
> >> bnx2x 0000:04:00.0: irq 100 for MSI/MSI-X
> >> bnx2x 0000:04:00.0: irq 101 for MSI/MSI-X
> >> bnx2x 0000:04:00.0: irq 102 for MSI/MSI-X
> >> bnx2x: eth4: using MSI-X  IRQs: sp 97  fp[0] 99 ... fp[3] 102
> >> ADDRCONF(NETDEV_UP): eth4: link is not ready
> >> bnx2x 0000:04:00.1: irq 103 for MSI/MSI-X
> >> bnx2x 0000:04:00.1: irq 104 for MSI/MSI-X
> >> bnx2x 0000:04:00.1: irq 105 for MSI/MSI-X
> >> bnx2x 0000:04:00.1: irq 106 for MSI/MSI-X
> >> bnx2x 0000:04:00.1: irq 107 for MSI/MSI-X
> >> bnx2x 0000:04:00.1: irq 108 for MSI/MSI-X
> >> bnx2x: eth5: using MSI-X  IRQs: sp 103  fp[0] 105 ... fp[3] 108
> >> ADDRCONF(NETDEV_UP): eth5: link is not ready
> >> bnx2x: eth5 NIC Link is Down
> >> bnx2x: eth5 NIC Link is Down
> >>
> >> Loading the driver with debug mode enabled (modprobe bnx2x debug=0x20004) I got:
> > Thank you for this debug information! You saved one email round trip :)
> 
> Hehe, thanks.
> 
> > However, I still need some more information about the FW version and
> > nvram settings. Can you please send me the output of ethtool -i
> 
> # ethtool -i eth5
> driver: bnx2x
> version: 1.52.1-5
> firmware-version: BC:5.0.13 PHY:0aa0:0406
> bus-info: 0000:04:00.1
> 
> > and ethtool -e? Since ethtool -e is quite long, it is best to send
> > it as an attached file.
> 
> Attached.

Almost everything seems to be in order. Almost - since you don't get
link... I don't think I have tried using this kind of Direct Attach
Cable - so maybe it just needs some more time. Let's see if the
following makes any difference (other than delay the failure for another
2.7 seconds):

diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index ff70be8..bcee38c 100644
--- a/drivers/net/bnx2x_link.c
+++ b/drivers/net/bnx2x_link.c
@@ -3113,7 +3113,7 @@ static u8
bnx2x_wait_for_sfp_module_initialized(struct link_params *params)
        u16 timeout;
        /* Initialization time after hot-plug may take up to 300ms for
some
        phys type ( e.g. JDSU ) */
-       for (timeout = 0; timeout < 60; timeout++) {
+       for (timeout = 0; timeout < 600; timeout++) {
                if (bnx2x_read_sfp_module_eeprom(params, 1, 1, &val)
                    == 0) {
                        DP(NETIF_MSG_LINK, "SFP+ module initialization "

If it does help, be sure to let me know how much time it took (you
should have this debug print).

Regards,
Eilon



^ permalink raw reply related

* Re: bnx2x + SFP+ DA/2.6.33.3: Got bad status 0x0 when reading from SFP+ EEPROM -> SFP+ module is not initialized
From: Krzysztof Olędzki @ 2010-05-20 19:41 UTC (permalink / raw)
  To: eilong; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274381113.28702.6.camel@lb-tlvb-eilong.il.broadcom.com>

On 2010-05-20 20:45, Eilon Greenstein wrote:
> On Thu, 2010-05-20 at 11:09 -0700, Krzysztof Olędzki wrote:
>> On 2010-05-20 19:49, Eilon Greenstein wrote:
>>> On Thu, 2010-05-20 at 10:08 -0700, Krzysztof Olędzki wrote:
>>>> Hello,
>>>>
>>>> I would like to connect a dual port SFP+ NetXtreme II BCM57711
>>>> 10-Gigabit NIC to a HP J9309A ProCurve 4-Port 10GbE SFP+ zl Module
>>>> using a HP ProCurve 10-GbE SFP+ 7m Direct Attach Cable (J9285B).
>>>>
>>>> Unfortuantely, it does not work. :( After connecting the switch and
>>>> the server together and loading the bnx2x module the switch logs:
>>>>
>>>> I 05/20/10 18:32:23 ports: port E4 is Blocked by STP
>>>> I 05/20/10 18:32:23 ports: port E4 is now on-line
>>>>
>>>> Here is the dmesg output from the server:
>>>>
>>>> Broadcom NetXtreme II 5771x 10Gigabit Ethernet Driver bnx2x 1.52.1-5 (2009/11/09)
>>>> bnx2x 0000:04:00.0: PCI INT A ->   GSI 38 (level, low) ->   IRQ 38
>>>> bnx2x 0000:04:00.0: setting latency timer to 64
>>>> bnx2x: part number 394D4342-31373735-31314131-473331
>>>> bnx2x: Loading bnx2x-e1h-5.2.7.0.fw
>>>> bnx2x 0000:04:00.0: firmware: requesting bnx2x-e1h-5.2.7.0.fw
>>>> eth4: Broadcom NetXtreme II BCM57711 XGb (A0) PCI-E x8 5GHz (Gen2) found at mem dc000000, IRQ 38, node addr 00:10:18:5f:e4:b4
>>>> bnx2x 0000:04:00.1: PCI INT B ->   GSI 45 (level, low) ->   IRQ 45
>>>> bnx2x 0000:04:00.1: setting latency timer to 64
>>>> bnx2x: part number 394D4342-31373735-31314131-473331
>>>> bnx2x: Loading bnx2x-e1h-5.2.7.0.fw
>>>> bnx2x 0000:04:00.1: firmware: requesting bnx2x-e1h-5.2.7.0.fw
>>>> eth5: Broadcom NetXtreme II BCM57711 XGb (A0) PCI-E x8 5GHz (Gen2) found at mem dd000000, IRQ 45, node addr 00:10:18:5f:e4:b6
>>>> bnx2x 0000:04:00.0: irq 97 for MSI/MSI-X
>>>> bnx2x 0000:04:00.0: irq 98 for MSI/MSI-X
>>>> bnx2x 0000:04:00.0: irq 99 for MSI/MSI-X
>>>> bnx2x 0000:04:00.0: irq 100 for MSI/MSI-X
>>>> bnx2x 0000:04:00.0: irq 101 for MSI/MSI-X
>>>> bnx2x 0000:04:00.0: irq 102 for MSI/MSI-X
>>>> bnx2x: eth4: using MSI-X  IRQs: sp 97  fp[0] 99 ... fp[3] 102
>>>> ADDRCONF(NETDEV_UP): eth4: link is not ready
>>>> bnx2x 0000:04:00.1: irq 103 for MSI/MSI-X
>>>> bnx2x 0000:04:00.1: irq 104 for MSI/MSI-X
>>>> bnx2x 0000:04:00.1: irq 105 for MSI/MSI-X
>>>> bnx2x 0000:04:00.1: irq 106 for MSI/MSI-X
>>>> bnx2x 0000:04:00.1: irq 107 for MSI/MSI-X
>>>> bnx2x 0000:04:00.1: irq 108 for MSI/MSI-X
>>>> bnx2x: eth5: using MSI-X  IRQs: sp 103  fp[0] 105 ... fp[3] 108
>>>> ADDRCONF(NETDEV_UP): eth5: link is not ready
>>>> bnx2x: eth5 NIC Link is Down
>>>> bnx2x: eth5 NIC Link is Down
>>>>
>>>> Loading the driver with debug mode enabled (modprobe bnx2x debug=0x20004) I got:
>>> Thank you for this debug information! You saved one email round trip :)
>>
>> Hehe, thanks.
>>
>>> However, I still need some more information about the FW version and
>>> nvram settings. Can you please send me the output of ethtool -i
>>
>> # ethtool -i eth5
>> driver: bnx2x
>> version: 1.52.1-5
>> firmware-version: BC:5.0.13 PHY:0aa0:0406
>> bus-info: 0000:04:00.1
>>
>>> and ethtool -e? Since ethtool -e is quite long, it is best to send
>>> it as an attached file.
>>
>> Attached.
>
> Almost everything seems to be in order. Almost - since you don't get
> link... I don't think I have tried using this kind of Direct Attach
> Cable - so maybe it just needs some more time. Let's see if the
> following makes any difference (other than delay the failure for another
> 2.7 seconds):
>
> diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
> index ff70be8..bcee38c 100644
> --- a/drivers/net/bnx2x_link.c
> +++ b/drivers/net/bnx2x_link.c
> @@ -3113,7 +3113,7 @@ static u8
> bnx2x_wait_for_sfp_module_initialized(struct link_params *params)
>          u16 timeout;
>          /* Initialization time after hot-plug may take up to 300ms for
> some
>          phys type ( e.g. JDSU ) */
> -       for (timeout = 0; timeout<  60; timeout++) {
> +       for (timeout = 0; timeout<  600; timeout++) {
>                  if (bnx2x_read_sfp_module_eeprom(params, 1, 1,&val)
>                      == 0) {
>                          DP(NETIF_MSG_LINK, "SFP+ module initialization "
>
> If it does help, be sure to let me know how much time it took (you
> should have this debug print).

Still no luck. :( The kernel printed many more "Got bad status 0x0 when 
reading from SFP+ EEPROM" messages. Finally I got:
  "SFP+ module is not initialized".

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: bnx2x + SFP+ DA/2.6.33.3: Got bad status 0x0 when reading from SFP+ EEPROM -> SFP+ module is not initialized
From: Rick Jones @ 2010-05-20 20:25 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: eilong, Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF59058.6050205@ans.pl>

Some simple/simplistic thoughts/questions...

Has the DAC been used successfully prior to this?

Do you have another HP ProCurve 10-GbE SFP+ 7m Direct Attach Cable (J9285B) to try?

There's a transceiver and presumably an EEPROM at both ends of a DAC right?  If 
the EEPROM at one end were "bad" might the 57711 be happier with the other end 
of the DAC?  Getting some sort of error message at the switch side, which may 
(or may not) have more detailed diagnostics might help.

rick jones

^ permalink raw reply

* [RFC] tcp: delack_timer expiration changes for every frame
From: Eric Dumazet @ 2010-05-20 20:47 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Ilpo Järvinen

While oprofiling net-next-2.6 during tcp workloads I found
mod_timer(delack_timer) was used too much, even if we receive/send more
than one frame per jiffie.

Something seems wrong, since we should try to change this timer only
when jiffies changes. mod_timer() has a special optimization for this,
but something is broken in our tcp stack ?

I added some logs in mod_timer() :

HZ = 250

results for one socket shown :

[  392.116735] timer->expires=22997, expires=23024(37) diff=-27 timer=e5ecb754
[  392.120627] timer->expires=23024, expires=22998(10) diff=26 timer=e5ecb754
[  392.123245] timer->expires=22998, expires=23025(37) diff=-27 timer=e5ecb754
[  392.133688] timer->expires=23025, expires=23001(10) diff=24 timer=e5ecb754
[  392.136502] timer->expires=23001, expires=23029(37) diff=-28 timer=e5ecb754
[  392.140392] timer->expires=23029, expires=23003(10) diff=26 timer=e5ecb754
[  392.143142] timer->expires=23003, expires=23030(37) diff=-27 timer=e5ecb754
[  392.153812] timer->expires=23030, expires=23006(10) diff=24 timer=e5ecb754
[  392.156658] timer->expires=23006, expires=23034(37) diff=-28 timer=e5ecb754
[  392.160474] timer->expires=23034, expires=23008(10) diff=26 timer=e5ecb754
[  392.163317] timer->expires=23008, expires=23035(37) diff=-27 timer=e5ecb754
[  392.167176] timer->expires=23035, expires=23009(10) diff=26 timer=e5ecb754
[  392.176963] timer->expires=23009, expires=23039(37) diff=-30 timer=e5ecb754
[  392.180863] timer->expires=23039, expires=23013(10) diff=26 timer=e5ecb754
[  392.183577] timer->expires=23013, expires=23040(37) diff=-27 timer=e5ecb754
[  392.187537] timer->expires=23040, expires=23014(10) diff=26 timer=e5ecb754
[  392.197286] timer->expires=23014, expires=23044(37) diff=-30 timer=e5ecb754
[  392.201047] timer->expires=23044, expires=23018(10) diff=26 timer=e5ecb754
[  392.203761] timer->expires=23018, expires=23045(37) diff=-27 timer=e5ecb754
[  392.207721] timer->expires=23045, expires=23019(10) diff=26 timer=e5ecb754
[  392.217454] timer->expires=23019, expires=23049(37) diff=-30 timer=e5ecb754

So we change the delack_timer by a positive delta (~ HZ/10) and a
 negative delta (~HZ/10), on the typical netperf TCP_RR workload.



Here, the incoming frame is handled by netperf, doing a recvmsg().
tcp_send_delayed_ack() sets the delack_timer to jiffies + HZ/25

[  392.207721] timer->expires=23045, new expires=23019(10) diff=26 timer=e5ecb754
[  392.207785] ------------[ cut here ]------------
[  392.207846] WARNING: at kernel/timer.c:753 mod_timer+0x55/0x18e()
[  392.207908] Hardware name: ProLiant BL460c G6
[  392.207965] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler ipv6 dm_mod button battery ac ehci_hcd uhci_hcd tg3 libphy bnx2x crc32c libcrc32c mdio [last unloaded: x_tables]
[  392.208900] Pid: 5320, comm: netperf Tainted: G        W  2.6.34-06175-g801cae3-dirty #33
[  392.208979] Call Trace:
[  392.209036]  [<c102df55>] ? warn_slowpath_common+0x5d/0x70
[  392.209098]  [<c102df73>] ? warn_slowpath_null+0xb/0xd
[  392.209159]  [<c10388de>] ? mod_timer+0x55/0x18e
[  392.209221]  [<c1279ce7>] ? tcp_send_delayed_ack+0xb5/0xc1
[  392.209282]  [<c1276d26>] ? tcp_rcv_established+0x39f/0x4f7
[  392.209345]  [<c127bae5>] ? tcp_v4_do_rcv+0x22/0x161
[  392.209406]  [<c126d6b4>] ? tcp_prequeue_process+0x47/0x5b
[  392.209468]  [<c12701d2>] ? tcp_recvmsg+0x371/0x691
[  392.209529]  [<c12b8c91>] ? _raw_spin_lock_bh+0x8/0x1e
[  392.209590]  [<c1240bcd>] ? release_sock+0x10/0xc9
[  392.216514]  [<c1285bad>] ? inet_recvmsg+0x5d/0x72
[  392.216575]  [<c123e725>] ? sock_recvmsg+0xb4/0xd1
[  392.216636]  [<c1032ace>] ? irq_exit+0x39/0x5b
[  392.216696]  [<c123fab5>] ? sys_recvfrom+0xb4/0x117
[  392.216757]  [<c10483be>] ? ktime_get+0x61/0xe8
[  392.216817]  [<c1016431>] ? lapic_next_event+0x13/0x16
[  392.216878]  [<c104ba9d>] ? clockevents_program_event+0xac/0xbc
[  392.216940]  [<c104c6cc>] ? tick_dev_program_event+0x34/0x138
[  392.217002]  [<c104c7ed>] ? tick_program_event+0x1d/0x21
[  392.217064]  [<c1044da0>] ? hrtimer_interrupt+0x10b/0x1c1
[  392.217126]  [<c123fb31>] ? sys_recv+0x19/0x1d
[  392.217186]  [<c12401dc>] ? sys_socketcall+0x120/0x1c6
[  392.217303]  [<c100268c>] ? sysenter_do_call+0x12/0x22
[  392.217364] ---[ end trace e9475c06f1d49408 ]---

Here, the incoming frame is handled by the other side (netserver),
but still for the netperf socket, (softirq handling)
tcp_v4_rcv() sets the delack timer to 37 ticks, so mod_timer() optimizations is not
working at all.

[  392.217454] timer->expires=23019, new expires=23049(37) diff=-30 timer=e5ecb754
[  392.217518] ------------[ cut here ]------------
[  392.217578] WARNING: at kernel/timer.c:753 mod_timer+0x55/0x18e()
[  392.217639] Hardware name: ProLiant BL460c G6
[  392.217697] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler ipv6 dm_mod button battery ac ehci_hcd uhci_hcd tg3 libphy bnx2x crc32c libcrc32c mdio [last unloaded: x_tables]
[  392.218439] Pid: 5321, comm: netserver Tainted: G        W  2.6.34-06175-g801cae3-dirty #33
[  392.218526] Call Trace:
[  392.218582]  [<c102df55>] ? warn_slowpath_common+0x5d/0x70
[  392.218644]  [<c102df73>] ? warn_slowpath_null+0xb/0xd
[  392.218705]  [<c10388de>] ? mod_timer+0x55/0x18e
[  392.218765]  [<c127cf56>] ? tcp_v4_rcv+0x41c/0x6b7
[  392.218826]  [<c1265832>] ? ip_local_deliver_finish+0xe9/0x178
[  392.218888]  [<c126572a>] ? ip_rcv_finish+0x262/0x281
[  392.218949]  [<c1249986>] ? __netif_receive_skb+0x267/0x282
[  392.219011]  [<c1249a0d>] ? process_backlog+0x6c/0x113
[  392.219072]  [<c124a2e6>] ? net_rx_action+0x8a/0x15a
[  392.219133]  [<c106234e>] ? __rcu_process_callbacks+0xb9/0x1d1
[  392.219195]  [<c1032910>] ? __do_softirq+0x0/0x13a
[  392.219255]  [<c10329b5>] ? __do_softirq+0xa5/0x13a
[  392.219316]  [<c1032910>] ? __do_softirq+0x0/0x13a
[  392.219375]  <IRQ>  [<c1032449>] ? local_bh_enable+0x5f/0x6a
[  392.219474]  [<c124c0be>] ? dev_queue_xmit+0x34d/0x37a
[  392.219536]  [<c1268115>] ? ip_finish_output+0x1c7/0x1ff
[  392.219610]  [<c1268255>] ? ip_local_out+0x18/0x1a
[  392.219670]  [<c12684f4>] ? ip_queue_xmit+0x29d/0x2d5
[  392.219731]  [<c1293e89>] ? bictcp_acked+0x4f/0x139
[  392.219791]  [<c1275ee8>] ? tcp_ack+0x155b/0x16e9
[  392.219851]  [<c127851a>] ? tcp_transmit_skb+0x62a/0x65f
[  392.219912]  [<c1038a03>] ? mod_timer+0x17a/0x18e
[  392.219972]  [<c127900a>] ? tcp_write_xmit+0x73a/0x81c
[  392.220033]  [<c109e487>] ? __kmalloc_node+0x30/0x76
[  392.220095]  [<c1279127>] ? __tcp_push_pending_frames+0x15/0x6c
[  392.220159]  [<c126f214>] ? tcp_sendmsg+0x7ee/0x8c5
[  392.220223]  [<c123e823>] ? sock_sendmsg+0xa7/0xc1
[  392.220284]  [<c1044da0>] ? hrtimer_interrupt+0x10b/0x1c1
[  392.220346]  [<c1032ace>] ? irq_exit+0x39/0x5b
[  392.220406]  [<c1016811>] ? smp_apic_timer_interrupt+0x6b/0x75
[  392.220468]  [<c102007b>] ? pud_huge+0x1/0x9
[  392.220536]  [<c123f9b9>] ? sys_sendto+0xfc/0x127
[  392.220599]  [<c1044bd4>] ? hrtimer_start_range_ns+0xf/0x13
[  392.220661]  [<c1023848>] ? update_curr+0x60/0xdf
[  392.220722]  [<c1044871>] ? hrtimer_forward+0x10f/0x123
[  392.220784]  [<c10483be>] ? ktime_get+0x61/0xe8
[  392.220844]  [<c123f9fd>] ? sys_send+0x19/0x1d
[  392.220903]  [<c12401af>] ? sys_socketcall+0xf3/0x1c6
[  392.220964]  [<c1032ace>] ? irq_exit+0x39/0x5b
[  392.221024]  [<c1016811>] ? smp_apic_timer_interrupt+0x6b/0x75
[  392.221086]  [<c100268c>] ? sysenter_do_call+0x12/0x22
[  392.221147] ---[ end trace e9475c06f1d49409 ]---

Its a bit late here to investigate, maybe one of you guys have an idea about this...




^ permalink raw reply

* Re: bnx2x + SFP+ DA/2.6.33.3: Got bad status 0x0 when reading from SFP+ EEPROM -> SFP+ module is not initialized
From: Krzysztof Olędzki @ 2010-05-20 20:54 UTC (permalink / raw)
  To: Rick Jones; +Cc: eilong, Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF59ABB.9070600@hp.com>

On 2010-05-20 22:25, Rick Jones wrote:
> Some simple/simplistic thoughts/questions...
>
> Has the DAC been used successfully prior to this?

Yes. It was successfully used to connect two HP switches, before I 
received SFP+ SR modules, that allowed me to put the switches into 
distanced rooms.

> Do you have another HP ProCurve 10-GbE SFP+ 7m Direct Attach Cable (J9285B) to try?

Yes. The same situation.

> There's a transceiver and presumably an EEPROM at both ends of a DAC right?

Yes, I think there should be one. ;)

> If
> the EEPROM at one end were "bad" might the 57711 be happier with the other end
> of the DAC?

Tested both ends. The same situation. :|

> Getting some sort of error message at the switch side, which may
> (or may not) have more detailed diagnostics might help.

There is no error message at the switch side. The switch shows that 
everything is correct.

Best regards,

				Krzysztof Olędzki

^ permalink raw reply

* [PATCH] ipvs: Add missing locking during connection table hashing and unhashing
From: Sven Wegener @ 2010-05-20 20:55 UTC (permalink / raw)
  To: Simon Horman, Julian Anastasov, Wensong Zhang; +Cc: netdev, lvs-devel

The code that hashes and unhashes connections from the connection table
is missing locking of the connection being modified, which opens up a
race condition and results in memory corruption when this race condition
is hit.

Here is what happens in pretty verbose form:

CPU 0					CPU 1
------------				------------
An active connection is terminated and
we schedule ip_vs_conn_expire() on this
CPU to expire this connection.

					IRQ assignment is changed to this CPU,
					but the expire timer stays scheduled on
					the other CPU.

					New connection from same ip:port comes
					in right before the timer expires, we
					find the inactive connection in our
					connection table and get a reference to
					it. We proper lock the connection in
					tcp_state_transition() and read the
					connection flags in set_tcp_state().

ip_vs_conn_expire() gets called, we
unhash the connection from our
connection table and remove the hashed
flag in ip_vs_conn_unhash(), without
proper locking!

					While still holding proper locks we
					write the connection flags in
					set_tcp_state() and this sets the hashed
					flag again.

ip_vs_conn_expire() fails to expire the
connection, because the other CPU has
incremented the reference count. We try
to re-insert the connection into our
connection table, but this fails in
ip_vs_conn_hash(), because the hashed
flag has been set by the other CPU. We
re-schedule execution of
ip_vs_conn_expire(). Now this connection
has the hashed flag set, but isn't
actually hashed in our connection table
and has a dangling list_head.

					We drop the reference we held on the
					connection and schedule the expire timer
					for timeouting the connection on this
					CPU. Further packets won't be able to
					find this connection in our connection
					table.

					ip_vs_conn_expire() gets called again,
					we think it's already hashed, but the
					list_head is dangling and while removing
					the connection from our connection table
					we write to the memory location where
					this list_head points to.

The result will probably be a kernel oops at some other point in time.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: stable@kernel.org
---
 net/netfilter/ipvs/ip_vs_conn.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

This race condition is pretty subtle, but it can be triggered remotely.
It needs the IRQ assignment change or another circumstance where packets
coming from the same ip:port for the same service are being processed on
different CPUs. And it involves hitting the exact time at which
ip_vs_conn_expire() gets called. It can be avoided by making sure that
all packets from one connection are always processed on the same CPU and
can be made harder to exploit by changing the connection timeouts to
some custom values.

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index d8f7e8e..ff04e9e 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -162,6 +162,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
 	hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);

 	ct_write_lock(hash);
+	spin_lock(&cp->lock);

 	if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
 		list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
@@ -174,6 +175,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
 		ret = 0;
 	}

+	spin_unlock(&cp->lock);
 	ct_write_unlock(hash);

 	return ret;
@@ -193,6 +195,7 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
 	hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);

 	ct_write_lock(hash);
+	spin_lock(&cp->lock);

 	if (cp->flags & IP_VS_CONN_F_HASHED) {
 		list_del(&cp->c_list);
@@ -202,6 +205,7 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
 	} else
 		ret = 0;

+	spin_unlock(&cp->lock);
 	ct_write_unlock(hash);

 	return ret;

^ permalink raw reply related

* Re: [PATCH net-next-2.6 0/8] CAIF: Bugfixes and updates
From: Sjur Brændeland @ 2010-05-20 21:08 UTC (permalink / raw)
  To: David Miller
  Cc: sjur.brandeland, netdev, marcel, daniel.martensson, linus.walleji
In-Reply-To: <20100520.005658.11949785.davem@davemloft.net>

Hi Dave,

David Miller wrote:
> Send me bug fixes only.

Currently in caif_socket.c caif_seqpkt_recvmsg returns -EMSGSIZE if
skb don't fit in user buffer.
Would you consider my patch where I fix MSG_TRUNC to work properly a bugfix?

Regards Sjur

^ permalink raw reply

* Re: [PATCH] sh_eth: Fix memleak in sh_mdio_release
From: Nobuhiro Iwamatsu @ 2010-05-20 22:12 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: davem, shimoda.yoshihiro, morimoto.kuninori, netdev
In-Reply-To: <20100520140059.GA8968@hera.kernel.org>

Hi, Denis.

2010/5/20 Denis Kirjanov <dkirjanov@hera.kernel.org>:
> Allocated memory for IRQs should be freed when releasing the mii_bus
>
> Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
> ---
>
> drivers/net/sh_eth.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/sh_eth.c b/drivers/net/sh_eth.c
> index 586ed09..501a55f 100644
> --- a/drivers/net/sh_eth.c
> +++ b/drivers/net/sh_eth.c
> @@ -1294,6 +1294,9 @@ static int sh_mdio_release(struct net_device *ndev)
>        /* remove mdio bus info from net_device */
>        dev_set_drvdata(&ndev->dev, NULL);
>
> +       /* free interrupts memory */
> +       kfree(bus->irq);
> +
>        /* free bitbang info */
>        free_mdio_bitbang(bus);
>
>
Acked-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>

Thanks!

Best regards,
  Nobuhiro

^ permalink raw reply

* Re: [PATCH 1/3] cgroups: Add an API to attach a task to current task's cgroup
From: Paul Menage @ 2010-05-20 22:22 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: Michael S. Tsirkin, netdev, kvm@vger.kernel.org, lkml
In-Reply-To: <1274227488.2370.107.camel@w-sridhar.beaverton.ibm.com>

On Tue, May 18, 2010 at 5:04 PM, Sridhar Samudrala
<samudrala.sridhar@gmail.com> wrote:
> Add a new kernel API to attach a task to current task's cgroup
> in all the active hierarchies.
>
> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>

Reviewed-by: Paul Menage <menage@google.com>

It would be more efficient to just attach directly to current->cgroups
rather than potentially creating/destroying one css_set for each
hierarchy until we've completely converged on current->cgroups - but
that would require a bunch of refactoring of the guts of
cgroup_attach_task() to ensure that the right can_attach()/attach()
callbacks are made. That doesn't really seem worthwhile right now for
the initial use, that I imagine isn't going to be
performance-sensitive.

Paul

^ permalink raw reply

* Re: [PATCH 1/3] cgroups: Add an API to attach a task to current task's cgroup
From: Paul Menage @ 2010-05-20 22:26 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: Michael S. Tsirkin, netdev, kvm@vger.kernel.org, lkml
In-Reply-To: <AANLkTinsrFoLVKDFM5pcKcL_6MvAzhR6IzbNmWKh3BDh@mail.gmail.com>

On Thu, May 20, 2010 at 3:22 PM, Paul Menage <menage@google.com> wrote:
> On Tue, May 18, 2010 at 5:04 PM, Sridhar Samudrala
> <samudrala.sridhar@gmail.com> wrote:
>> Add a new kernel API to attach a task to current task's cgroup
>> in all the active hierarchies.
>>
>> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
>
> Reviewed-by: Paul Menage <menage@google.com>
>

One other thought on this - this would be the first piece of code
that's attaching a task to a cgroup without holding the cgroup
directory inode i_mutex. I believe that this is probably OK.

Paul

^ permalink raw reply

* Re: [PATCH net-next-2.6 0/8] CAIF: Bugfixes and updates
From: David Miller @ 2010-05-20 22:27 UTC (permalink / raw)
  To: sjurbren
  Cc: sjur.brandeland, netdev, marcel, daniel.martensson, linus.walleji
In-Reply-To: <AANLkTilvQ8W5X-qvt6GvEj-1ZmmfwZJz079Rdi8K6Tll@mail.gmail.com>

From: Sjur Brændeland <sjurbren@gmail.com>
Date: Thu, 20 May 2010 23:08:27 +0200

> Currently in caif_socket.c caif_seqpkt_recvmsg returns -EMSGSIZE if
> skb don't fit in user buffer.
> Would you consider my patch where I fix MSG_TRUNC to work properly a bugfix?

You're really pushing it, but fine...

This is the part I hate most about the merge window, people just
want to slip in as much as they possibly can and justify it by
any means necessary to suit their own personal needs instead of
being amicable and abiding by the merge window rules which is
for the good of everyone.

^ permalink raw reply

* Re: [patch] IPVS: one-packet scheduling
From: Simon Horman @ 2010-05-20 22:31 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: netdev, lvs-devel, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Nick Chalk
In-Reply-To: <4BF55294.9030908@trash.net>

On Thu, May 20, 2010 at 05:17:40PM +0200, Patrick McHardy wrote:
> Simon Horman wrote:
> > From: Nick Chalk <nick@loadbalancer.org>
> > 
> > IPVS: one-packet scheduling
> > 
> > Allow one-packet scheduling for UDP connections. When the fwmark-based or
> > normal virtual service is marked with '-o' or '--ops' options all
> > connections are created only to schedule one packet. Useful to schedule UDP
> > packets from same client port to different real servers. Recommended with
> > RR or WRR schedulers (the connections are not visible with ipvsadm -L).
> 
> I'm afraid its too late in this merge window for new features
> since Dave has already sent his merge request to Linus.
> 
> Please resend once the net-next (and nf-next) tree opens up.

Sure, will do.


^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-20 23:16 UTC (permalink / raw)
  To: Neil Horman; +Cc: Eric Dumazet, David Miller, bmb, tgraf, nhorman, netdev
In-Reply-To: <20100520172918.GA17613@shamino.rdu.redhat.com>

On Thu, May 20, 2010 at 01:29:18PM -0400, Neil Horman wrote:
>
> So, I'm testing this patch out now, and unfotunately it doesn't seem to be
> working.  Every frame seems to be holding a classid of 0.  Trying to figure out
> why now.

Not very surprising since tun.c doesn't go through the normal
socket interface.  I'll send a additional patch for that.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: linux-next: build failure after merge of the suspend tree
From: Stephen Rothwell @ 2010-05-21  0:29 UTC (permalink / raw)
  To: John W. Linville, David Miller, Linus
  Cc: Rafael J. Wysocki, linux-next, linux-kernel, Helmut Schaa, netdev
In-Reply-To: <201005080413.24465.rjw@sisk.pl>

Hi John, Dave,

On Sat, 8 May 2010 04:13:24 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>
> On Friday 07 May 2010, Stephen Rothwell wrote:
> > 
> > After merging the suspend tree, today's linux-next build (x86_64
> > allmodconfig) failed like this:
> > 
> > net/mac80211/scan.c: In function 'ieee80211_scan_state_decision':
> > net/mac80211/scan.c:510: error: implicit declaration of function 'pm_qos_requirement'
> > 
> > Caused by commit 62bad14fc6e0911a99882c261390968977d43283 ("PM QOS
> > update") from the suspend tree interacting with commit
> > df13cce53a7b28a81460e6bfc4857e9df4956141 ("mac80211: Improve software
> > scan timing") from the net tree.
> > 
> > I have added the following merge fixup patch and can carry it as
> > necessary:
> 
> Thanks a lot, please do so if that's not a problem.
> 
> Both trees are based on Linus' current and I don't see a good way of fixing
> this issue in any of them individually.

The suspend tree has been merged into Linus' tree, so this patch is
needed in the net tree before it is merged (or as part of the merge).

Here is the patch again:

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Fri, 7 May 2010 13:02:54 +1000
Subject: [PATCH] wireless: update for pm_qos_requirement to pm_qos_request rename

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 net/mac80211/scan.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index e14c441..e1b0be7 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -510,7 +510,7 @@ static int ieee80211_scan_state_decision(struct ieee80211_local *local,
 		bad_latency = time_after(jiffies +
 				ieee80211_scan_get_channel_time(next_chan),
 				local->leave_oper_channel_time +
-				usecs_to_jiffies(pm_qos_requirement(PM_QOS_NETWORK_LATENCY)));
+				usecs_to_jiffies(pm_qos_request(PM_QOS_NETWORK_LATENCY)));
 
 		listen_int_exceeded = time_after(jiffies +
 				ieee80211_scan_get_channel_time(next_chan),
-- 
1.7.1

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Neil Horman @ 2010-05-21  0:39 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, David Miller, bmb, tgraf, nhorman, netdev
In-Reply-To: <20100520231630.GA22593@gondor.apana.org.au>

On Fri, May 21, 2010 at 09:16:30AM +1000, Herbert Xu wrote:
> On Thu, May 20, 2010 at 01:29:18PM -0400, Neil Horman wrote:
> >
> > So, I'm testing this patch out now, and unfotunately it doesn't seem to be
> > working.  Every frame seems to be holding a classid of 0.  Trying to figure out
> > why now.
> 
> Not very surprising since tun.c doesn't go through the normal
> socket interface.  I'll send a additional patch for that.
> 
I don't think thats it.  I think its a chicken and egg situation.  I think the
problem is that tasks can't be assigned to cgroups until their created, and in
that time a sock can be created.  Its a natural race.  If you create a socket
before you assign it to a cgroup, that socket retains a classid of zero.  I'm
going to try modify the patch to update sockets owned by tasks when the cgroup
is assigned.

Best
Neil

> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> 

^ permalink raw reply

* Final Notification
From: NG Inter Switch ATM Organization @ 2010-05-21  7:37 UTC (permalink / raw)


$950,000.00 has been accredited in your favor, In view of this, 
you are instructed to contact the Senate House with the details 
stated below and Endeavor to discuss the funds delivery. More 
details will be sent to you once you contact Mrs Linda Hills with
your Full Names: Delivery Address: Sex: Age: Occupation: Phone Number.

Contact Person:Mrs.Linda Hills
Email:senatehouse106@yahoo.com.hk
Tel:+234 70 622 577 63












^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-21  1:02 UTC (permalink / raw)
  To: Neil Horman; +Cc: Eric Dumazet, David Miller, bmb, tgraf, nhorman, netdev
In-Reply-To: <20100521003939.GA2223@localhost.localdomain>

On Thu, May 20, 2010 at 08:39:39PM -0400, Neil Horman wrote:
>
> > Not very surprising since tun.c doesn't go through the normal
> > socket interface.  I'll send a additional patch for that.
> > 
> I don't think thats it.  I think its a chicken and egg situation.  I think the
> problem is that tasks can't be assigned to cgroups until their created, and in
> that time a sock can be created.  Its a natural race.  If you create a socket
> before you assign it to a cgroup, that socket retains a classid of zero.  I'm
> going to try modify the patch to update sockets owned by tasks when the cgroup
> is assigned.

That's what I meant above.  My patch will make tun.c to the
classid update every time it sends out a packet.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-21  1:16 UTC (permalink / raw)
  To: Neil Horman; +Cc: Eric Dumazet, David Miller, bmb, tgraf, nhorman, netdev
In-Reply-To: <20100521010211.GA23671@gondor.apana.org.au>

On Fri, May 21, 2010 at 11:02:11AM +1000, Herbert Xu wrote:
> 
> That's what I meant above.  My patch will make tun.c to the
> classid update every time it sends out a packet.

Here it is:

tun: Update classid on packet injection

This patch makes tun update its socket classid every time we
inject a packet into the network stack.  This is so that any
updates made by the admin to the process writing packets to
tun is effected.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4326520..a8a9aa8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -525,6 +525,8 @@ static inline struct sk_buff *tun_alloc_skb(struct tun_struct *tun,
 	struct sk_buff *skb;
 	int err;
 
+	sock_update_classid(sk);
+
 	/* Under a page?  Don't bother with paged skb. */
 	if (prepad + len < PAGE_SIZE || !linear)
 		linear = len;
diff --git a/net/core/sock.c b/net/core/sock.c
index 8f7fdf8..4969bd1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1055,6 +1055,7 @@ void sock_update_classid(struct sock *sk)
 	if (classid && classid != sk->sk_classid)
 		sk->classid = classid;
 }
+EXPORT_SYMBOL(sock_update_classid);
 #endif
 
 /**

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* gro: Fix bogus gso_size on the first fraglist entry
From: Herbert Xu @ 2010-05-21  2:46 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Igor Zhang

Hi:

gro: Fix bogus gso_size on the first fraglist entry

When GRO produces fraglist entries, and the resulting skb hits
an interface that is incapable of TSO but capable of FRAGLIST,
we end up producing a bogus packet with gso_size non-zero.

This was reported in the field with older versions of KVM that
did not set the TSO bits on tuntap.

This patch fixes that.

Reported-by: Igor Zhang <yugzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 93c4e06..cad8e97 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2729,6 +2729,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	*NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);
 	skb_shinfo(nskb)->frag_list = p;
 	skb_shinfo(nskb)->gso_size = pinfo->gso_size;
+	pinfo->gso_size = 0;
 	skb_header_release(p);
 	nskb->prev = p;

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* [PATCH v2] net: fix problem in dequeuing from input_pkt_queue
From: Tom Herbert @ 2010-05-21  4:37 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, xiaosuo, netdev

Fix some issues introduced in batch skb dequeuing for input_pkt_queue.
The primary issue it that the queue head must be incremented only
after a packet has been processed, that is only after
__netif_receive_skb has been called.  This is needed for the mechanism
to prevent OOO packet in RFS.  Also when flushing the input_pkt_queue
and process_queue, the process queue should be done first to prevent
OOO packets.

Because the input_pkt_queue has been effectively split into two queues,
the calculation of the tail ptr is no longer correct.  The correct value
would be head+input_pkt_queue->len+process_queue->len.  To avoid
this calculation we added an explict input_queue_tail in softnet_data.
The tail value is simply incremented when queuing to input_pkt_queue.

Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c3487a6..726b3cb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1403,17 +1403,25 @@ struct softnet_data {
 	struct softnet_data	*rps_ipi_next;
 	unsigned int		cpu;
 	unsigned int		input_queue_head;
+	unsigned int		input_queue_tail;
 #endif
 	unsigned		dropped;
 	struct sk_buff_head	input_pkt_queue;
 	struct napi_struct	backlog;
 };
 
-static inline void input_queue_head_add(struct softnet_data *sd,
-					unsigned int len)
+static inline void input_queue_head_incr(struct softnet_data *sd)
 {
 #ifdef CONFIG_RPS
-	sd->input_queue_head += len;
+	sd->input_queue_head++;
+#endif
+}
+
+static inline void input_queue_tail_incr_save(struct softnet_data *sd,
+					      unsigned int *qtail)
+{
+#ifdef CONFIG_RPS
+	*qtail = ++sd->input_queue_tail;
 #endif
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 6c82065..0aab66d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2426,10 +2426,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 		if (skb_queue_len(&sd->input_pkt_queue)) {
 enqueue:
 			__skb_queue_tail(&sd->input_pkt_queue, skb);
-#ifdef CONFIG_RPS
-			*qtail = sd->input_queue_head +
-					skb_queue_len(&sd->input_pkt_queue);
-#endif
+			input_queue_tail_incr_save(sd, qtail);
 			rps_unlock(sd);
 			local_irq_restore(flags);
 			return NET_RX_SUCCESS;
@@ -2964,7 +2961,7 @@ static void flush_backlog(void *arg)
 		if (skb->dev == dev) {
 			__skb_unlink(skb, &sd->input_pkt_queue);
 			kfree_skb(skb);
-			input_queue_head_add(sd, 1);
+			input_queue_head_incr(sd);
 		}
 	}
 	rps_unlock(sd);
@@ -2973,6 +2970,7 @@ static void flush_backlog(void *arg)
 		if (skb->dev == dev) {
 			__skb_unlink(skb, &sd->process_queue);
 			kfree_skb(skb);
+			input_queue_head_incr(sd);
 		}
 	}
 }
@@ -3328,18 +3326,20 @@ static int process_backlog(struct napi_struct *napi, int quota)
 		while ((skb = __skb_dequeue(&sd->process_queue))) {
 			local_irq_enable();
 			__netif_receive_skb(skb);
-			if (++work >= quota)
-				return work;
 			local_irq_disable();
+			input_queue_head_incr(sd);
+			if (++work >= quota) {
+				local_irq_enable();
+				return work;
+			}
 		}
 
 		rps_lock(sd);
 		qlen = skb_queue_len(&sd->input_pkt_queue);
-		if (qlen) {
-			input_queue_head_add(sd, qlen);
+		if (qlen)
 			skb_queue_splice_tail_init(&sd->input_pkt_queue,
 						   &sd->process_queue);
-		}
+
 		if (qlen < quota - work) {
 			/*
 			 * Inline a custom version of __napi_complete().
@@ -5679,12 +5679,14 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 	local_irq_enable();
 
 	/* Process offline CPU's input_pkt_queue */
-	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
+	while ((skb = __skb_dequeue(&oldsd->process_queue))) {
 		netif_rx(skb);
-		input_queue_head_add(oldsd, 1);
+		input_queue_head_incr(oldsd);
 	}
-	while ((skb = __skb_dequeue(&oldsd->process_queue)))
+	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
 		netif_rx(skb);
+		input_queue_head_incr(oldsd);
+	}
 
 	return NOTIFY_OK;
 }

^ permalink raw reply related

* Re: linux-next: build failure after merge of the suspend tree
From: David Miller @ 2010-05-21  5:46 UTC (permalink / raw)
  To: sfr; +Cc: linville, torvalds, rjw, linux-next, linux-kernel, Helmut.Schaa,
	netdev
In-Reply-To: <20100521102913.ae4e8cd2.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Fri, 21 May 2010 10:29:13 +1000

> On Sat, 8 May 2010 04:13:24 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>>
>> On Friday 07 May 2010, Stephen Rothwell wrote:
>> Both trees are based on Linus' current and I don't see a good way of fixing
>> this issue in any of them individually.
> 
> The suspend tree has been merged into Linus' tree, so this patch is
> needed in the net tree before it is merged (or as part of the merge).

Since the net tree is still on it's way to Linus, we'll just have to
wait for him to do that merge.

Then we can sort this out.  I don't want to touch a tree that is
already on it's way.

Thanks Stephen.

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: David Miller @ 2010-05-21  5:49 UTC (permalink / raw)
  To: nhorman; +Cc: herbert, eric.dumazet, bmb, tgraf, nhorman, netdev
In-Reply-To: <20100521003939.GA2223@localhost.localdomain>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Thu, 20 May 2010 20:39:39 -0400

> On Fri, May 21, 2010 at 09:16:30AM +1000, Herbert Xu wrote:
>> On Thu, May 20, 2010 at 01:29:18PM -0400, Neil Horman wrote:
>> >
>> > So, I'm testing this patch out now, and unfotunately it doesn't seem to be
>> > working.  Every frame seems to be holding a classid of 0.  Trying to figure out
>> > why now.
>> 
>> Not very surprising since tun.c doesn't go through the normal
>> socket interface.  I'll send a additional patch for that.
>> 
> I don't think thats it.  I think its a chicken and egg situation.  I think the
> problem is that tasks can't be assigned to cgroups until their created, and in
> that time a sock can be created.  Its a natural race.  If you create a socket
> before you assign it to a cgroup, that socket retains a classid of zero.  I'm
> going to try modify the patch to update sockets owned by tasks when the cgroup
> is assigned.

Neil, you must not be using Herbert's most recent patch.

Either that or you haven't even read it.

Herbert's most recent patch doesn't create this chicken and egg
problem you mention because it explicitly watches for cgroupid changes
at all socket I/O operations including sendmsg() and sendmsg().  And
if it sees a different cgroupid at a socket I/O call, it updates the
cgroupid value in the socket.

So you very much can change the cgroup of the process mid-socket
ownership and it will work.

The only problem is, as Herbert stated, tun.  Because it does it's
networking I/O directly by calling netif_receive_skb() so it won't
hit any of Herbert's cgroup check points.

^ permalink raw reply

* Re: linux-next: build failure after merge of the suspend tree
From: Eric Dumazet @ 2010-05-21  5:51 UTC (permalink / raw)
  To: David Miller
  Cc: sfr, linville, torvalds, rjw, linux-next, linux-kernel,
	Helmut.Schaa, netdev
In-Reply-To: <20100520.224644.21290162.davem@davemloft.net>

Le jeudi 20 mai 2010 à 22:46 -0700, David Miller a écrit :
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Fri, 21 May 2010 10:29:13 +1000
> 
> > On Sat, 8 May 2010 04:13:24 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> >>
> >> On Friday 07 May 2010, Stephen Rothwell wrote:
> >> Both trees are based on Linus' current and I don't see a good way of fixing
> >> this issue in any of them individually.
> > 
> > The suspend tree has been merged into Linus' tree, so this patch is
> > needed in the net tree before it is merged (or as part of the merge).
> 
> Since the net tree is still on it's way to Linus, we'll just have to
> wait for him to do that merge.
> 
> Then we can sort this out.  I don't want to touch a tree that is
> already on it's way.

Linus merged your tree David.

^ permalink raw reply

* Re: linux-next: build failure after merge of the suspend tree
From: David Miller @ 2010-05-21  5:56 UTC (permalink / raw)
  To: eric.dumazet
  Cc: sfr, linville, torvalds, rjw, linux-next, linux-kernel,
	Helmut.Schaa, netdev
In-Reply-To: <1274421117.4977.9.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 21 May 2010 07:51:57 +0200

> Linus merged your tree David.

This must have happened in the past hour :-)

Great, I can sort this now.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox