Netdev List
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 40572] New: Intel Gigabit Ethernet 82576 50% packet loss after reboot
From: Alexander Duyck @ 2011-08-24 20:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev, e1000-devel, bugme-daemon, vojcik
In-Reply-To: <20110823143053.832c1aaa.akpm@linux-foundation.org>

On 08/23/2011 02:30 PM, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Fri, 5 Aug 2011 07:07:05 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=40572
>>
>>             Summary: Intel Gigabit Ethernet 82576 50% packet loss after
>>                      reboot
>>             Product: Drivers
>>             Version: 2.5
>>      Kernel Version: 3.0
>>            Platform: All
>>          OS/Version: Linux
>>                Tree: Mainline
>>              Status: NEW
>>            Severity: blocking
>>            Priority: P1
>>           Component: Network
>>          AssignedTo: drivers_network@kernel-bugs.osdl.org
>>          ReportedBy: vojcik@gmail.com
>>          Regression: No
> I'll change this to "yes".
>
>> Hi,
>>
>> I have strange problem with Intel dualport Gigabit ehternet card.
>> Problem appears after 3rd - 5th reboot.
>>
>> If you ping or make any network traffic you get 50% packet loss. No error
>> messages in logs.
>> When you make reboot all is ok in next few reboots.
>>
>> We have eliminated network problems like switches, cables etc. It's software
>> related.
>>
>> It looks like in kernel 2.6.37 we have the same problem but in 2.6.28.6
>> everything looks fine.
>>
>> I attach some files for additional information
This type of issue is typically a sign of a hardware problem.  I would 
recommend doing an lspci -vvv for the device in both the working and the 
non-working cases to see if there is any difference between the two.

One thing we have seen in the past is an issue where the PCIe will not 
link at x4 in all cases and will sometimes link at only x1.  When this 
occurs the device does not have enough PCIe bandwidth to handle heavy 
workloads.  You might want to try either reseating the network adapter 
into the slot or moving it from one PCIe slot to another in the system 
as it is possible the PCIe slot it is in may have an issue with one ore 
more of the PCIe lanes.

Thanks,

Alex

^ permalink raw reply

* Re: [BUG] tcp : how many times a frame can possibly be retransmitted ?
From: Jerry Chu @ 2011-08-24 19:39 UTC (permalink / raw)
  To: Alexander Zimmermann
  Cc: Eric Dumazet, netdev, Lukowski Damian, Hannemann Arnd
In-Reply-To: <3482698A-C35B-4BED-AEEF-EBA135991705@comsys.rwth-aachen.de>

Hi Alexander,

On Wed, Aug 24, 2011 at 12:03 PM, Alexander Zimmermann
<alexander.zimmermann@comsys.rwth-aachen.de> wrote:
> Hi Eric,
>
> Am 24.08.2011 um 18:21 schrieb Eric Dumazet:
>
>> On one dev machine running net-next, I just found strange tcp sessions
>> that retransmit a frame forever (The other peer disappeared)
>
> not forever...
> If remember correctly you will stop after 120s.

Yup. It looks like this "feature" was introduced in the patch
"Revert Backoff [v3]: Calculate TCP's connection close threshold as a
time value"
by Damian as well to bound the abort timeout by time duration rather
than how many
retries (icsk_retransmits). But as pointed out if rto is small it
could mean a lot of
retransmissions before one gives up.

Jerry

>
>>
>> # ss -emoi dst 10.2.1.1
>> State      Recv-Q Send-Q      Local Address:Port          Peer Address:Port
>> ESTAB      0      816              10.2.1.2:37930             10.2.1.1:ssh      timer:(on,630ms,246) ino:60786 sk:ffff8801189aa400
>>        mem:(r0,w3776,f320,t0) ts sack ecn cubic wscale:8,6 rto:1680 rtt:16.25/7.5 ato:40 ssthresh:7 send 1.4Mbps rcv_rtt:10 rcv_space:16632
>>
>>
>> You can see the retransmit count : 246
>>
>> What possibly can be going on ?
>>
>> What happened to backoff ?
>>
>> # grep . /proc/sys/net/ipv4/tcp_retries*
>> /proc/sys/net/ipv4/tcp_retries1:3
>> /proc/sys/net/ipv4/tcp_retries2:15
>>
>>
>>
>> extract of tcpdump :
>>
>> 12:01:02.074244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128024 59389>
>> 12:01:03.754243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128192 59389>
>> 12:01:05.434245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128360 59389>
>> 12:01:07.114243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128528 59389>
>> 12:01:08.794248 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128696 59389>
>> 12:01:10.474242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128864 59389>
>> 12:01:12.154243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129032 59389>
>> 12:01:13.834241 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129200 59389>
>> 12:01:15.514246 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129368 59389>
>> 12:01:17.194244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129536 59389>
>> 12:01:18.874248 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129704 59389>
>> 12:01:20.554243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129872 59389>
>> 12:01:22.234244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130040 59389>
>> 12:01:23.914244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130208 59389>
>> 12:01:25.594247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130376 59389>
>> 12:01:27.274242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130544 59389>
>> 12:01:28.954242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130712 59389>
>> 12:01:30.634248 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130880 59389>
>> 12:01:32.314245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131048 59389>
>> 12:01:33.994243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131216 59389>
>> 12:01:35.674250 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131384 59389>
>> 12:01:37.354244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131552 59389>
>> 12:01:39.034245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131720 59389>
>> 12:01:40.714245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131888 59389>
>> 12:01:42.394245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132056 59389>
>> 12:01:44.074242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132224 59389>
>> 12:01:45.754249 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132392 59389>
>> 12:01:47.434242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132560 59389>
>> 12:01:49.114247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132728 59389>
>> 12:01:50.794250 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132896 59389>
>> 12:01:52.474247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133064 59389>
>> 12:01:54.154242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133232 59389>
>> 12:01:55.834246 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133400 59389>
>> 12:01:57.514243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133568 59389>
>> 12:01:59.194247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133736 59389>
>> 12:02:00.874250 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133904 59389>
>> 12:02:02.554242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134072 59389>
>> 12:02:04.234243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134240 59389>
>> 12:02:05.914245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134408 59389>
>> 12:02:07.594244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134576 59389>
>> 12:02:09.274249 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134744 59389>
>> 12:02:10.954241 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134912 59389>
>> 12:02:12.634249 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16135080 59389>
>>
>> tcp_retransmit_timer() does the exponential backoff, but something
>> resets icsk_rto to a low value ?
>>
>> Ah, it seems to be because of commit f1ecd5d9e7366609
>> (Revert Backoff [v3]: Revert RTO on ICMP destination unreachable)
>>
>> Since arp resolution (or routing, I dont know yet) fails, an
>> internal/loopback ICMP host/network unreachable message is
>> generated and handled in tcp_v4_err() :
>
> Yeah, you have a local connectivity disruption. This is one
> possible scenario.
>
>>
>> icsk_backoff-- and icsk_rto is reset.
>>
>> I am afraid this can generate a storm (cpu time at very least),
>> in case we have many tcp sessions in this state.
>
> Hmm, maybe. I don't know. Arnd or Damian what are you thing about this point?
>
>>
>> I guess its time for me to read RFC 6069
>
> If you find a bug. Let me know.
>
> Alex
>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> //
> // Dipl.-Inform. Alexander Zimmermann
> // Department of Computer Science, Informatik 4
> // RWTH Aachen University
> // Ahornstr. 55, 52056 Aachen, Germany
> // phone: (49-241) 80-21422, fax: (49-241) 80-22222
> // email: zimmermann@cs.rwth-aachen.de
> // web: http://www.umic-mesh.net
> //
>
>

^ permalink raw reply

* Re: [BUG] tcp : how many times a frame can possibly be retransmitted ?
From: Eric Dumazet @ 2011-08-24 19:45 UTC (permalink / raw)
  To: Alexander Zimmermann; +Cc: netdev, Jerry Chu, Lukowski Damian, Hannemann Arnd
In-Reply-To: <3482698A-C35B-4BED-AEEF-EBA135991705@comsys.rwth-aachen.de>

Le mercredi 24 août 2011 à 21:03 +0200, Alexander Zimmermann a écrit :
> Hi Eric,
> 
> Am 24.08.2011 um 18:21 schrieb Eric Dumazet:
> 
> > On one dev machine running net-next, I just found strange tcp sessions
> > that retransmit a frame forever (The other peer disappeared)
> 
> not forever...
> If remember correctly you will stop after 120s.
> 

Hi Alexander

I just tried again one session, and got much more delay than that.

It stops because of a side effect, "icsk_retransmits" being a 8bit
field.

Every 256 retransmits, it becomes 255+1 -> 0

retransmits_timed_out() immediately returns false.

And backoff increases at this time.

Eventually, we retransmit 256*15 times, process 256*15 ICMP messages.

Thanks

^ permalink raw reply

* Re: [BUG] tcp : how many times a frame can possibly be retransmitted ?
From: Alexander Zimmermann @ 2011-08-24 19:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jerry Chu, Lukowski Damian, Hannemann Arnd
In-Reply-To: <1314202918.2296.39.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

Hi Eric,

Am 24.08.2011 um 18:21 schrieb Eric Dumazet:

> On one dev machine running net-next, I just found strange tcp sessions
> that retransmit a frame forever (The other peer disappeared)

not forever...
If remember correctly you will stop after 120s.

> 
> # ss -emoi dst 10.2.1.1
> State      Recv-Q Send-Q      Local Address:Port          Peer Address:Port   
> ESTAB      0      816              10.2.1.2:37930             10.2.1.1:ssh      timer:(on,630ms,246) ino:60786 sk:ffff8801189aa400
> 	 mem:(r0,w3776,f320,t0) ts sack ecn cubic wscale:8,6 rto:1680 rtt:16.25/7.5 ato:40 ssthresh:7 send 1.4Mbps rcv_rtt:10 rcv_space:16632
> 
> 
> You can see the retransmit count : 246 
> 
> What possibly can be going on ?
> 
> What happened to backoff ?
> 
> # grep . /proc/sys/net/ipv4/tcp_retries*
> /proc/sys/net/ipv4/tcp_retries1:3
> /proc/sys/net/ipv4/tcp_retries2:15
> 
> 
> 
> extract of tcpdump :
> 
> 12:01:02.074244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128024 59389>
> 12:01:03.754243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128192 59389>
> 12:01:05.434245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128360 59389>
> 12:01:07.114243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128528 59389>
> 12:01:08.794248 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128696 59389>
> 12:01:10.474242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128864 59389>
> 12:01:12.154243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129032 59389>
> 12:01:13.834241 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129200 59389>
> 12:01:15.514246 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129368 59389>
> 12:01:17.194244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129536 59389>
> 12:01:18.874248 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129704 59389>
> 12:01:20.554243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129872 59389>
> 12:01:22.234244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130040 59389>
> 12:01:23.914244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130208 59389>
> 12:01:25.594247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130376 59389>
> 12:01:27.274242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130544 59389>
> 12:01:28.954242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130712 59389>
> 12:01:30.634248 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130880 59389>
> 12:01:32.314245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131048 59389>
> 12:01:33.994243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131216 59389>
> 12:01:35.674250 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131384 59389>
> 12:01:37.354244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131552 59389>
> 12:01:39.034245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131720 59389>
> 12:01:40.714245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131888 59389>
> 12:01:42.394245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132056 59389>
> 12:01:44.074242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132224 59389>
> 12:01:45.754249 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132392 59389>
> 12:01:47.434242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132560 59389>
> 12:01:49.114247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132728 59389>
> 12:01:50.794250 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132896 59389>
> 12:01:52.474247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133064 59389>
> 12:01:54.154242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133232 59389>
> 12:01:55.834246 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133400 59389>
> 12:01:57.514243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133568 59389>
> 12:01:59.194247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133736 59389>
> 12:02:00.874250 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133904 59389>
> 12:02:02.554242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134072 59389>
> 12:02:04.234243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134240 59389>
> 12:02:05.914245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134408 59389>
> 12:02:07.594244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134576 59389>
> 12:02:09.274249 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134744 59389>
> 12:02:10.954241 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134912 59389>
> 12:02:12.634249 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16135080 59389>
> 
> tcp_retransmit_timer() does the exponential backoff, but something
> resets icsk_rto to a low value ?
> 
> Ah, it seems to be because of commit f1ecd5d9e7366609 
> (Revert Backoff [v3]: Revert RTO on ICMP destination unreachable)
> 
> Since arp resolution (or routing, I dont know yet) fails, an
> internal/loopback ICMP host/network unreachable message is 
> generated and handled in tcp_v4_err() :

Yeah, you have a local connectivity disruption. This is one
possible scenario.

> 
> icsk_backoff-- and icsk_rto is reset.
> 
> I am afraid this can generate a storm (cpu time at very least),
> in case we have many tcp sessions in this state.

Hmm, maybe. I don't know. Arnd or Damian what are you thing about this point?  

> 
> I guess its time for me to read RFC 6069

If you find a bug. Let me know.

Alex

> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22222
// email: zimmermann@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//

^ permalink raw reply

* [PATCH 5/5] SUNRPC: remove rpcbind clients destruction on module cleanup
From: Stanislav Kinsbursky @ 2011-08-24 18:34 UTC (permalink / raw)
  To: Trond.Myklebust
  Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, bfields, davem
In-Reply-To: <20110824183304.4924.94670.stgit@localhost6.localdomain6>

We don't need this anymore since now rpcbind clients are destroying during last
RPC service shutdown.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>

---
 net/sunrpc/rpcb_clnt.c   |   12 ------------
 net/sunrpc/sunrpc_syms.c |    3 ---
 2 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index f363efe..94a310d 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -1098,15 +1098,3 @@ static struct rpc_program rpcb_program = {
 	.version	= rpcb_version,
 	.stats		= &rpcb_stats,
 };
-
-/**
- * cleanup_rpcb_clnt - remove xprtsock's sysctls, unregister
- *
- */
-void cleanup_rpcb_clnt(void)
-{
-	if (rpcb_local_clnt4)
-		rpc_shutdown_client(rpcb_local_clnt4);
-	if (rpcb_local_clnt)
-		rpc_shutdown_client(rpcb_local_clnt);
-}
diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
index 9d08091..8ec9778 100644
--- a/net/sunrpc/sunrpc_syms.c
+++ b/net/sunrpc/sunrpc_syms.c
@@ -61,8 +61,6 @@ static struct pernet_operations sunrpc_net_ops = {
 
 extern struct cache_detail unix_gid_cache;
 
-extern void cleanup_rpcb_clnt(void);
-
 static int __init
 init_sunrpc(void)
 {
@@ -102,7 +100,6 @@ out:
 static void __exit
 cleanup_sunrpc(void)
 {
-	cleanup_rpcb_clnt();
 	rpcauth_remove_module();
 	cleanup_socket_xprt();
 	svc_cleanup_xprt_sock();

^ permalink raw reply related

* [PATCH 4/5] SUNRPC: remove rpcbind clients creation during service registring
From: Stanislav Kinsbursky @ 2011-08-24 18:34 UTC (permalink / raw)
  To: Trond.Myklebust
  Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, bfields, davem
In-Reply-To: <20110824183304.4924.94670.stgit@localhost6.localdomain6>

We don't need this code since rpcbind clients are creating during RPC service
creation.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>

---
 net/sunrpc/rpcb_clnt.c |    9 ---------
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 437ec60..f363efe 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -429,11 +429,6 @@ int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port)
 	struct rpc_message msg = {
 		.rpc_argp	= &map,
 	};
-	int error;
-
-	error = rpcb_create_local();
-	if (error)
-		return error;
 
 	dprintk("RPC:       %sregistering (%u, %u, %d, %u) with local "
 			"rpcbind\n", (port ? "" : "un"),
@@ -569,11 +564,7 @@ int rpcb_v4_register(const u32 program, const u32 version,
 	struct rpc_message msg = {
 		.rpc_argp	= &map,
 	};
-	int error;
 
-	error = rpcb_create_local();
-	if (error)
-		return error;
 	if (rpcb_local_clnt4 == NULL)
 		return -EPROTONOSUPPORT;
 

^ permalink raw reply related

* [PATCH 3/5] SUNRPC: make RPC service dependable on rpcbind clients creation
From: Stanislav Kinsbursky @ 2011-08-24 18:33 UTC (permalink / raw)
  To: Trond.Myklebust
  Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, bfields, davem
In-Reply-To: <20110824183304.4924.94670.stgit@localhost6.localdomain6>

We create or increase users counter of rcbind clients during RPC service
creation and decrease this counter (and possibly destroy those clients) on RPC
service destruction.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>

---
 include/linux/sunrpc/clnt.h |    2 ++
 net/sunrpc/rpcb_clnt.c      |    2 +-
 net/sunrpc/svc.c            |    5 +++++
 3 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index db7bcaf..65a8115 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -135,10 +135,12 @@ void		rpc_shutdown_client(struct rpc_clnt *);
 void		rpc_release_client(struct rpc_clnt *);
 void		rpc_task_release_client(struct rpc_task *);
 
+int		rpcb_create_local(void);
 int		rpcb_register(u32, u32, int, unsigned short);
 int		rpcb_v4_register(const u32 program, const u32 version,
 				 const struct sockaddr *address,
 				 const char *netid);
+void		rpcb_put_local(void);
 void		rpcb_getport_async(struct rpc_task *);
 
 void		rpc_call_start(struct rpc_task *);
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index b4cc0f1..437ec60 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -318,7 +318,7 @@ out:
  * Returns zero on success, otherwise a negative errno value
  * is returned.
  */
-static int rpcb_create_local(void)
+int rpcb_create_local(void)
 {
 	static DEFINE_MUTEX(rpcb_create_local_mutex);
 	int result = 0;
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 6a69a11..0df8532 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -367,6 +367,9 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
 	unsigned int xdrsize;
 	unsigned int i;
 
+	if (rpcb_create_local() < 0)
+		return NULL;
+
 	if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL)))
 		return NULL;
 	serv->sv_name      = prog->pg_name;
@@ -491,6 +494,8 @@ svc_destroy(struct svc_serv *serv)
 	svc_unregister(serv);
 	kfree(serv->sv_pools);
 	kfree(serv);
+
+	rpcb_put_local();
 }
 EXPORT_SYMBOL_GPL(svc_destroy);
 

^ permalink raw reply related

* [PATCH 2/5] SUNRPC: use reference count helpers
From: Stanislav Kinsbursky @ 2011-08-24 18:33 UTC (permalink / raw)
  To: Trond.Myklebust
  Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, bfields, davem
In-Reply-To: <20110824183304.4924.94670.stgit@localhost6.localdomain6>

All is simple: we just increase users conters if rpcbind clients are present
already. Otherwise we create new rpcbind clients and set users counter to 1.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>

---
 net/sunrpc/rpcb_clnt.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index c84e6a3..b4cc0f1 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -256,9 +256,7 @@ static int rpcb_create_local_unix(void)
 		clnt4 = NULL;
 	}
 
-	/* Protected by rpcb_create_local_mutex */
-	rpcb_local_clnt = clnt;
-	rpcb_local_clnt4 = clnt4;
+	rpcb_set_local(clnt, clnt4);
 
 out:
 	return result;
@@ -310,9 +308,7 @@ static int rpcb_create_local_net(void)
 		clnt4 = NULL;
 	}
 
-	/* Protected by rpcb_create_local_mutex */
-	rpcb_local_clnt = clnt;
-	rpcb_local_clnt4 = clnt4;
+	rpcb_set_local(clnt, clnt4);
 
 out:
 	return result;
@@ -327,11 +323,11 @@ static int rpcb_create_local(void)
 	static DEFINE_MUTEX(rpcb_create_local_mutex);
 	int result = 0;
 
-	if (rpcb_local_clnt)
+	if (rpcb_get_local())
 		return result;
 
 	mutex_lock(&rpcb_create_local_mutex);
-	if (rpcb_local_clnt)
+	if (rpcb_get_local())
 		goto out;
 
 	if (rpcb_create_local_unix() != 0)

^ permalink raw reply related

* [PATCH 1/5] SUNRPC: introduce helpers for reference counted rpcbind clients
From: Stanislav Kinsbursky @ 2011-08-24 18:33 UTC (permalink / raw)
  To: Trond.Myklebust
  Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, bfields, davem
In-Reply-To: <20110824183304.4924.94670.stgit@localhost6.localdomain6>

This helpers will be used for dynamical creation and destruction of rpcbind
clients.
Variable rpcb_users is actually a counter of lauched RPC services. If rpcbind
client has been created already, then we just increase rpcb_users.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>

---
 net/sunrpc/rpcb_clnt.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index e45d2fb..c84e6a3 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -114,6 +114,9 @@ static struct rpc_program	rpcb_program;
 static struct rpc_clnt *	rpcb_local_clnt;
 static struct rpc_clnt *	rpcb_local_clnt4;
 
+DEFINE_SPINLOCK(rpcb_clnt_lock);
+unsigned int			rpcb_users;
+
 struct rpcbind_args {
 	struct rpc_xprt *	r_xprt;
 
@@ -161,6 +164,54 @@ static void rpcb_map_release(void *data)
 	kfree(map);
 }
 
+static int rpcb_get_local(void)
+{
+	spin_lock(&rpcb_clnt_lock);
+	if (rpcb_users)
+		rpcb_users++;
+	spin_unlock(&rpcb_clnt_lock);
+
+	return rpcb_users;
+}
+
+void rpcb_put_local(void)
+{
+	struct rpc_clnt *clnt = rpcb_local_clnt;
+	struct rpc_clnt *clnt4 = rpcb_local_clnt4;
+	int shutdown;
+
+	spin_lock(&rpcb_clnt_lock);
+	if (--rpcb_users == 0) {
+		rpcb_local_clnt = NULL;
+		rpcb_local_clnt4 = NULL;
+	}
+	shutdown = !rpcb_users;
+	spin_unlock(&rpcb_clnt_lock);
+
+	if (shutdown) {
+		/*
+		 * cleanup_rpcb_clnt - remove xprtsock's sysctls, unregister
+		 */
+		if (clnt4)
+			rpc_shutdown_client(clnt4);
+		if (clnt)
+			rpc_shutdown_client(clnt);
+	}
+	return;
+}
+
+static void rpcb_set_local(struct rpc_clnt *clnt, struct rpc_clnt *clnt4)
+{
+	/* Protected by rpcb_create_local_mutex */
+	rpcb_local_clnt = clnt;
+	rpcb_local_clnt4 = clnt4;
+	rpcb_users++;
+	dprintk("RPC:       created new rpcb local clients (rpcb_local_clnt: "
+			"0x%p, rpcb_local_clnt4: 0x%p)\n", rpcb_local_clnt,
+			rpcb_local_clnt4);
+
+}
+
 /*
  * Returns zero on success, otherwise a negative errno value
  * is returned.

^ permalink raw reply related

* [PATCH 0/5] SUNRPC: make rpcbind clients allocated and destroyed on dynamically
From: Stanislav Kinsbursky @ 2011-08-24 18:33 UTC (permalink / raw)
  To: Trond.Myklebust
  Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, bfields, davem

This patch is required for further RPC layer virtualization, because rpcbind
clients have to be per network namespace.
To achive this, we have to untie network namespace from rpcbind clients sockets.
The idea of this patch set is to make rpcbind clients non-static. I.e. rpcbind
clients will be created during first RPC service creation, and destroyed when
last RPC service is stopped.
With this patch set rpcbind clients can be virtualized easely.

The following series consists of:

---

Stanislav Kinsbursky (5):
      SUNRPC: introduce helpers for reference counted rpcbind clients
      SUNRPC: use reference count helpers
      SUNRPC: make RPC service dependable on rpcbind clients creation
      SUNRPC: remove rpcbind clients creation during service registring
      SUNRPC: remove rpcbind clients destruction on module cleanup


 include/linux/sunrpc/clnt.h |    2 +
 net/sunrpc/rpcb_clnt.c      |   86 ++++++++++++++++++++++++++++---------------
 net/sunrpc/sunrpc_syms.c    |    3 --
 net/sunrpc/svc.c            |    5 +++
 4 files changed, 63 insertions(+), 33 deletions(-)

-- 
Signature

^ permalink raw reply

* Re: [PATCH 68/75] hv: netvsc: convert to SKB paged frag API.
From: Konrad Rzeszutek Wilk @ 2011-08-24 18:30 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev, linux-kernel, Hank Janssen, Haiyang Zhang,
	Greg Kroah-Hartman, K. Y. Srinivasan, Abhishek Kane, devel
In-Reply-To: <1313760467-8598-68-git-send-email-ian.campbell@citrix.com>

What is with the 'XXX' ?

> diff --git a/drivers/net/cxgb4/sge.c b/drivers/net/cxgb4/sge.c
> index f1813b5..3e7c4b3 100644
> --- a/drivers/net/cxgb4/sge.c
> +++ b/drivers/net/cxgb4/sge.c
> @@ -1416,7 +1416,7 @@ static inline void copy_frags(struct sk_buff *skb,
>  	unsigned int n;
>  
>  	/* usually there's just one frag */
> -	skb_frag_set_page(skb, 0, gl->frags[0].page);
> +	skb_frag_set_page(skb, 0, gl->frags[0].page.p);	/* XXX */
>  	ssi->frags[0].page_offset = gl->frags[0].page_offset + offset;
>  	ssi->frags[0].size = gl->frags[0].size - offset;
>  	ssi->nr_frags = gl->nfrags;
> @@ -1425,7 +1425,7 @@ static inline void copy_frags(struct sk_buff *skb,
>  		memcpy(&ssi->frags[1], &gl->frags[1], n * sizeof(skb_frag_t));
>  
>  	/* get a reference to the last page, we don't own it */
> -	get_page(gl->frags[n].page);
> +	get_page(gl->frags[n].page.p);	/* XXX */
>  }
>  
>  /**
> @@ -1482,7 +1482,7 @@ static void t4_pktgl_free(const struct pkt_gl *gl)
>  	const skb_frag_t *p;
>  
>  	for (p = gl->frags, n = gl->nfrags - 1; n--; p++)
> -		put_page(p->page);
> +		put_page(p->page.p); /* XXX */
>  }
>  
>  /*
> @@ -1635,7 +1635,7 @@ static void restore_rx_bufs(const struct pkt_gl *si, struct sge_fl *q,
>  		else
>  			q->cidx--;
>  		d = &q->sdesc[q->cidx];
> -		d->page = si->frags[frags].page;
> +		d->page = si->frags[frags].page.p; /* XXX */
>  		d->dma_addr |= RX_UNMAPPED_BUF;
>  		q->avail++;
>  	}
> @@ -1717,7 +1717,7 @@ static int process_responses(struct sge_rspq *q, int budget)
>  			for (frags = 0, fp = si.frags; ; frags++, fp++) {
>  				rsd = &rxq->fl.sdesc[rxq->fl.cidx];
>  				bufsz = get_buf_size(rsd);
> -				fp->page = rsd->page;
> +				fp->page.p = rsd->page; /* XXX */
>  				fp->page_offset = q->offset;
>  				fp->size = min(bufsz, len);
>  				len -= fp->size;
> @@ -1734,8 +1734,8 @@ static int process_responses(struct sge_rspq *q, int budget)
>  						get_buf_addr(rsd),
>  						fp->size, DMA_FROM_DEVICE);
>  
> -			si.va = page_address(si.frags[0].page) +
> -				si.frags[0].page_offset;
> +			si.va = page_address(si.frags[0].page.p) +
> +				si.frags[0].page_offset; /* XXX */
>  
>  			prefetch(si.va);
>  
> diff --git a/drivers/net/cxgb4vf/sge.c b/drivers/net/cxgb4vf/sge.c
> index 6d6060e..3688423 100644
> --- a/drivers/net/cxgb4vf/sge.c
> +++ b/drivers/net/cxgb4vf/sge.c
> @@ -1397,7 +1397,7 @@ struct sk_buff *t4vf_pktgl_to_skb(const struct pkt_gl *gl,
>  		skb_copy_to_linear_data(skb, gl->va, pull_len);
>  
>  		ssi = skb_shinfo(skb);
> -		skb_frag_set_page(skb, 0, gl->frags[0].page);
> +		skb_frag_set_page(skb, 0, gl->frags[0].page.p); /* XXX */
>  		ssi->frags[0].page_offset = gl->frags[0].page_offset + pull_len;
>  		ssi->frags[0].size = gl->frags[0].size - pull_len;
>  		if (gl->nfrags > 1)
> @@ -1410,7 +1410,7 @@ struct sk_buff *t4vf_pktgl_to_skb(const struct pkt_gl *gl,
>  		skb->truesize += skb->data_len;
>  
>  		/* Get a reference for the last page, we don't own it */
> -		get_page(gl->frags[gl->nfrags - 1].page);
> +		get_page(gl->frags[gl->nfrags - 1].page.p); /* XXX */
>  	}
>  
>  out:
> @@ -1430,7 +1430,7 @@ void t4vf_pktgl_free(const struct pkt_gl *gl)
>  
>  	frag = gl->nfrags - 1;
>  	while (frag--)
> -		put_page(gl->frags[frag].page);
> +		put_page(gl->frags[frag].page.p); /* XXX */
>  }
>  
>  /**
> @@ -1450,7 +1450,7 @@ static inline void copy_frags(struct sk_buff *skb,
>  	unsigned int n;
>  
>  	/* usually there's just one frag */
> -	skb_frag_set_page(skb, 0, gl->frags[0].page);
> +	skb_frag_set_page(skb, 0, gl->frags[0].page.p);	/* XXX */
>  	si->frags[0].page_offset = gl->frags[0].page_offset + offset;
>  	si->frags[0].size = gl->frags[0].size - offset;
>  	si->nr_frags = gl->nfrags;
> @@ -1460,7 +1460,7 @@ static inline void copy_frags(struct sk_buff *skb,
>  		memcpy(&si->frags[1], &gl->frags[1], n * sizeof(skb_frag_t));
>  
>  	/* get a reference to the last page, we don't own it */
> -	get_page(gl->frags[n].page);
> +	get_page(gl->frags[n].page.p); /* XXX */
>  }
>  
>  /**
> @@ -1613,7 +1613,7 @@ static void restore_rx_bufs(const struct pkt_gl *gl, struct sge_fl *fl,
>  		else
>  			fl->cidx--;
>  		sdesc = &fl->sdesc[fl->cidx];
> -		sdesc->page = gl->frags[frags].page;
> +		sdesc->page = gl->frags[frags].page.p; /* XXX */
>  		sdesc->dma_addr |= RX_UNMAPPED_BUF;
>  		fl->avail++;
>  	}
> @@ -1701,7 +1701,7 @@ int process_responses(struct sge_rspq *rspq, int budget)
>  				BUG_ON(rxq->fl.avail == 0);
>  				sdesc = &rxq->fl.sdesc[rxq->fl.cidx];
>  				bufsz = get_buf_size(sdesc);
> -				fp->page = sdesc->page;
> +				fp->page.p = sdesc->page; /* XXX */
>  				fp->page_offset = rspq->offset;
>  				fp->size = min(bufsz, len);
>  				len -= fp->size;
> @@ -1719,8 +1719,8 @@ int process_responses(struct sge_rspq *rspq, int budget)
>  			dma_sync_single_for_cpu(rspq->adapter->pdev_dev,
>  						get_buf_addr(sdesc),
>  						fp->size, DMA_FROM_DEVICE);
> -			gl.va = (page_address(gl.frags[0].page) +
> -				 gl.frags[0].page_offset);
> +			gl.va = (page_address(gl.frags[0].page.p) +
> +				 gl.frags[0].page_offset); /* XXX */
>  			prefetch(gl.va);
>  

^ permalink raw reply

* Re: [PATCH 01/75] net: add APIs for manipulating skb page fragments.
From: Konrad Rzeszutek Wilk @ 2011-08-24 18:21 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev, linux-kernel, David S. Miller, Eric Dumazet,
	Michał Mirosław
In-Reply-To: <1313760467-8598-1-git-send-email-ian.campbell@citrix.com>

On Fri, Aug 19, 2011 at 02:26:33PM +0100, Ian Campbell wrote:
> The primary aim is to add skb_frag_(ref|unref) in order to remove the use of
> bare get/put_page on SKB pages fragments and to isolate users from subsequent
> changes to the skb_frag_t data structure.
> 
> Also included are helper APIs for passing a paged fragment to kmap and
> dma_map_page since I was seeing the same pattern a lot. A helper for
> pci_map_page is ommitted due to Michał Mirosław's recommendation that users
> should transition to pci_map_page instead.

You mean "transition to dma_map_page instead." ?

^ permalink raw reply

* Re: pull request: batman-adv 2011-08-24
From: David Miller @ 2011-08-24 17:35 UTC (permalink / raw)
  To: lindner_marek; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <1314190838-2273-1-git-send-email-lindner_marek@yahoo.de>

From: Marek Lindner <lindner_marek@yahoo.de>
Date: Wed, 24 Aug 2011 15:00:30 +0200

> the following 8 patches constitute the first batch I'd like to get the pulled
> into net-next-2.6/3.2. They bring a new feature (AP isolation on the mesh
> layer), some minor cleanups, spelling fixes and some additional debugfs 
> output. 
...
>   git://git.open-mesh.org/linux-merge.git batman-adv/next

Pulled, thanks.

^ permalink raw reply

* RE: [patch -next] bna: off by one in bfa_msgq_rspq_pi_update()
From: Rasesh Mody @ 2011-08-24 17:33 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Debashis Dutt, open list:BROCADE BNA 10 GI...,
	kernel-janitors@vger.kernel.org, Jing Huang
In-Reply-To: <20110824113028.GD5975@shale.localdomain>

>From: Dan Carpenter [mailto:error27@gmail.com]
>Sent: Wednesday, August 24, 2011 4:30 AM
>
>The rspq->rsphdlr[] array has BFI_MC_MAX elements, so this test was
>off by one.
>
>Signed-off-by: Dan Carpenter <error27@gmail.com>
>
>diff --git a/drivers/net/ethernet/brocade/bna/bfa_msgq.c
>b/drivers/net/ethernet/brocade/bna/bfa_msgq.c
>index ed52187..dd36427 100644
>--- a/drivers/net/ethernet/brocade/bna/bfa_msgq.c
>+++ b/drivers/net/ethernet/brocade/bna/bfa_msgq.c
>@@ -483,7 +483,7 @@ bfa_msgq_rspq_pi_update(struct bfa_msgq_rspq *rspq,
>struct bfi_mbmsg *mb)
> 		mc = msghdr->msg_class;
> 		num_entries = ntohs(msghdr->num_entries);
>
>-		if ((mc > BFI_MC_MAX) || (rspq->rsphdlr[mc].cbfn == NULL))
>+		if ((mc >= BFI_MC_MAX) || (rspq->rsphdlr[mc].cbfn == NULL))
> 			break;
>
> 		(rspq->rsphdlr[mc].cbfn)(rspq->rsphdlr[mc].cbarg, msghdr);

Acked-by: Rasesh Mody <rmody@brocade.com>

Thanks,
Rasesh

^ permalink raw reply

* Re: [net-next 00/10][pull request] Complete drivers/net/ move
From: David Miller @ 2011-08-24 17:31 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo
In-Reply-To: <1314176570-20298-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 24 Aug 2011 02:02:40 -0700

> The following are changes since commit 131ea6675c761f655d43b808dd0fe83d15d5cdd3:
>   net: add APIs for manipulating skb page fragments.
> and are available in the git repository at:
>   master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/next-organize master

Try again:

ERROR: "pppox_ioctl" [net/l2tp/l2tp_ppp.ko] undefined!
ERROR: "unregister_pppox_proto" [net/l2tp/l2tp_ppp.ko] undefined!
ERROR: "register_pppox_proto" [net/l2tp/l2tp_ppp.ko] undefined!
ERROR: "ppp_input" [net/l2tp/l2tp_ppp.ko] undefined!
ERROR: "pppox_unbind_sock" [net/l2tp/l2tp_ppp.ko] undefined!
ERROR: "ppp_register_net_channel" [net/l2tp/l2tp_ppp.ko] undefined!
ERROR: "ppp_dev_name" [net/l2tp/l2tp_ppp.ko] undefined!
ERROR: "ppp_unit_number" [net/irda/irnet/irnet.ko] undefined!
ERROR: "ppp_unregister_channel" [net/irda/irnet/irnet.ko] undefined!
ERROR: "ppp_output_wakeup" [net/irda/irnet/irnet.ko] undefined!
ERROR: "ppp_input_error" [net/irda/irnet/irnet.ko] undefined!
ERROR: "ppp_input" [net/irda/irnet/irnet.ko] undefined!
ERROR: "ppp_register_channel" [net/irda/irnet/irnet.ko] undefined!
ERROR: "ppp_channel_index" [net/irda/irnet/irnet.ko] undefined!
ERROR: "ppp_input_error" [net/atm/pppoatm.ko] undefined!
ERROR: "ppp_input" [net/atm/pppoatm.ko] undefined!
ERROR: "ppp_unregister_channel" [net/atm/pppoatm.ko] undefined!
ERROR: "ppp_output_wakeup" [net/atm/pppoatm.ko] undefined!
ERROR: "ppp_unit_number" [net/atm/pppoatm.ko] undefined!
ERROR: "ppp_channel_index" [net/atm/pppoatm.ko] undefined!
ERROR: "ppp_register_channel" [net/atm/pppoatm.ko] undefined!
ERROR: "ppp_unit_number" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: "ppp_unregister_channel" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: "ppp_output_wakeup" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: "ppp_input" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: "ppp_register_channel" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: "ppp_channel_index" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: "slhc_init" [drivers/isdn/i4l/isdn.ko] undefined!
ERROR: "slhc_remember" [drivers/isdn/i4l/isdn.ko] undefined!
ERROR: "slhc_uncompress" [drivers/isdn/i4l/isdn.ko] undefined!
ERROR: "slhc_free" [drivers/isdn/i4l/isdn.ko] undefined!
ERROR: "slhc_compress" [drivers/isdn/i4l/isdn.ko] undefined!

^ permalink raw reply

* RE: [patch -next] bna: unlock on error path in pnad_pci_probe()
From: Rasesh Mody @ 2011-08-24 17:25 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Debashis Dutt, open list:BROCADE BNA 10 GI...,
	kernel-janitors@vger.kernel.org, Jing Huang
In-Reply-To: <20110824112922.GC5975@shale.localdomain>

>From: Dan Carpenter [mailto:error27@gmail.com]
>Sent: Wednesday, August 24, 2011 4:29 AM
>
>We introduced a new lock here, so there was error path which needs
>an unlock now.
>
>Signed-off-by: Dan Carpenter <error27@gmail.com>
>
>diff --git a/drivers/net/ethernet/brocade/bna/bnad.c
>b/drivers/net/ethernet/brocade/bna/bnad.c
>index bdfda07..6ad4b47 100644
>--- a/drivers/net/ethernet/brocade/bna/bnad.c
>+++ b/drivers/net/ethernet/brocade/bna/bnad.c
>@@ -3167,7 +3167,7 @@ bnad_pci_probe(struct pci_dev *pdev,
> 	 */
> 	err = bnad_pci_init(bnad, pdev, &using_dac);
> 	if (err)
>-		goto free_netdev;
>+		goto unlock_mutex;
>
> 	/*
> 	 * Initialize bnad structure
>@@ -3296,9 +3296,9 @@ drv_uninit:
> 	bnad_uninit(bnad);
> pci_uninit:
> 	bnad_pci_uninit(pdev);
>+unlock_mutex:
> 	mutex_unlock(&bnad->conf_mutex);
> 	bnad_lock_uninit(bnad);
>-free_netdev:
> 	free_netdev(netdev);
> 	return err;
> }

Acked-by: Rasesh Mody <rmody@brocade.com>

Thanks,
Rasesh

^ permalink raw reply

* [BUG] tcp : how many times a frame can possibly be retransmitted ?
From: Eric Dumazet @ 2011-08-24 16:21 UTC (permalink / raw)
  To: netdev, Jerry Chu, Damian Lukowski

On one dev machine running net-next, I just found strange tcp sessions
that retransmit a frame forever (The other peer disappeared)

# ss -emoi dst 10.2.1.1
State      Recv-Q Send-Q      Local Address:Port          Peer Address:Port   
ESTAB      0      816              10.2.1.2:37930             10.2.1.1:ssh      timer:(on,630ms,246) ino:60786 sk:ffff8801189aa400
	 mem:(r0,w3776,f320,t0) ts sack ecn cubic wscale:8,6 rto:1680 rtt:16.25/7.5 ato:40 ssthresh:7 send 1.4Mbps rcv_rtt:10 rcv_space:16632


You can see the retransmit count : 246 

What possibly can be going on ?

What happened to backoff ?

# grep . /proc/sys/net/ipv4/tcp_retries*
/proc/sys/net/ipv4/tcp_retries1:3
/proc/sys/net/ipv4/tcp_retries2:15



extract of tcpdump :

12:01:02.074244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128024 59389>
12:01:03.754243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128192 59389>
12:01:05.434245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128360 59389>
12:01:07.114243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128528 59389>
12:01:08.794248 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128696 59389>
12:01:10.474242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16128864 59389>
12:01:12.154243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129032 59389>
12:01:13.834241 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129200 59389>
12:01:15.514246 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129368 59389>
12:01:17.194244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129536 59389>
12:01:18.874248 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129704 59389>
12:01:20.554243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16129872 59389>
12:01:22.234244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130040 59389>
12:01:23.914244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130208 59389>
12:01:25.594247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130376 59389>
12:01:27.274242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130544 59389>
12:01:28.954242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130712 59389>
12:01:30.634248 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16130880 59389>
12:01:32.314245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131048 59389>
12:01:33.994243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131216 59389>
12:01:35.674250 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131384 59389>
12:01:37.354244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131552 59389>
12:01:39.034245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131720 59389>
12:01:40.714245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16131888 59389>
12:01:42.394245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132056 59389>
12:01:44.074242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132224 59389>
12:01:45.754249 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132392 59389>
12:01:47.434242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132560 59389>
12:01:49.114247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132728 59389>
12:01:50.794250 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16132896 59389>
12:01:52.474247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133064 59389>
12:01:54.154242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133232 59389>
12:01:55.834246 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133400 59389>
12:01:57.514243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133568 59389>
12:01:59.194247 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133736 59389>
12:02:00.874250 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16133904 59389>
12:02:02.554242 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134072 59389>
12:02:04.234243 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134240 59389>
12:02:05.914245 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134408 59389>
12:02:07.594244 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134576 59389>
12:02:09.274249 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134744 59389>
12:02:10.954241 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16134912 59389>
12:02:12.634249 IP 10.2.1.2.37930 > 10.2.1.1.ssh: P 0:144(144) ack 1 win 1002 <nop,nop,timestamp 16135080 59389>

tcp_retransmit_timer() does the exponential backoff, but something
resets icsk_rto to a low value ?

Ah, it seems to be because of commit f1ecd5d9e7366609 
(Revert Backoff [v3]: Revert RTO on ICMP destination unreachable)

Since arp resolution (or routing, I dont know yet) fails, an
 internal/loopback ICMP host/network unreachable message is 
generated and handled in tcp_v4_err() :

icsk_backoff-- and icsk_rto is reset.

I am afraid this can generate a storm (cpu time at very least),
in case we have many tcp sessions in this state.

I guess its time for me to read RFC 6069

^ permalink raw reply

* Re: Interface without IP address can route??
From: Ben Greear @ 2011-08-24 16:20 UTC (permalink / raw)
  To: David Lamparter; +Cc: jhs, jamal, netdev
In-Reply-To: <20110824161557.GC611458@jupiter.n2.diac24.net>

On 08/24/2011 09:15 AM, David Lamparter wrote:
> On Wed, Aug 24, 2011 at 06:24:54AM -0700, Ben Greear wrote:
>> On 08/24/2011 06:01 AM, jamal wrote:
>>> It makes sense to behave this way.
>>> IPv4 addresses are owned by the system not interfaces.
>>> If you want to control the forwarding behavior, control ARP so it doesnt
>>> respond on the interfaces with no IP.
>
> I agree.
>
>> I understand your argument about IPs being owned by system instead of
>> interface, but I think it's the wrong behaviour in this case.  Can
>> you think of any case where this behaviour actually helps?
>
> It's used for oddball /32 setups at server hosting farms that look like:
>        /--- eth0, no ip ---- server 0.1.4.5/32, default via 0.1.2.3
> router --- eth1, no ip ---- server 0.1.6.7/32, default via 0.1.2.3
>        \--- eth2, no ip ---- server 0.1.8.9/32, default via 0.1.2.3
>     \- eth3: 0.1.2.3/28 - to rest of internet
>
> The general idea is to a) conserve IPs and b) not renumber servers even
> when they move, so you end up with random scattered /32s on the servers
> and the router has no sensible IP.
>
>> Either way, it appears I can work around this by explicitly disabling
>> forwarding for this particular interface.
>
> I was about to suggest exactly this :)

Ok..glad to know there are folks with even crazier setups than mine :)

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: Interface without IP address can route??
From: David Lamparter @ 2011-08-24 16:15 UTC (permalink / raw)
  To: Ben Greear; +Cc: jhs, jamal, netdev
In-Reply-To: <4E54FBA6.6090905@candelatech.com>

On Wed, Aug 24, 2011 at 06:24:54AM -0700, Ben Greear wrote:
> On 08/24/2011 06:01 AM, jamal wrote:
> > It makes sense to behave this way.
> > IPv4 addresses are owned by the system not interfaces.
> > If you want to control the forwarding behavior, control ARP so it doesnt
> > respond on the interfaces with no IP.

I agree.

> I understand your argument about IPs being owned by system instead of
> interface, but I think it's the wrong behaviour in this case.  Can
> you think of any case where this behaviour actually helps?

It's used for oddball /32 setups at server hosting farms that look like:
      /--- eth0, no ip ---- server 0.1.4.5/32, default via 0.1.2.3
router --- eth1, no ip ---- server 0.1.6.7/32, default via 0.1.2.3
      \--- eth2, no ip ---- server 0.1.8.9/32, default via 0.1.2.3
   \- eth3: 0.1.2.3/28 - to rest of internet

The general idea is to a) conserve IPs and b) not renumber servers even
when they move, so you end up with random scattered /32s on the servers
and the router has no sensible IP.

> Either way, it appears I can work around this by explicitly disabling
> forwarding for this particular interface.

I was about to suggest exactly this :)


David

^ permalink raw reply

* Re: Interface without IP address can route??
From: Ben Greear @ 2011-08-24 13:24 UTC (permalink / raw)
  To: jhs; +Cc: jamal, netdev
In-Reply-To: <1314190890.25967.114.camel@mojatatu>

On 08/24/2011 06:01 AM, jamal wrote:
>
> It makes sense to behave this way.
> IPv4 addresses are owned by the system not interfaces.
> If you want to control the forwarding behavior, control ARP so it doesnt
> respond on the interfaces with no IP.

ARP is already controlled, but interface was effectively promisc,
so it received packets anyway.  This allows me to bridge packets
in user-space using packet sockets.

I understand your argument about IPs being owned by system instead of
interface, but I think it's the wrong behaviour in this case.  Can
you think of any case where this behaviour actually helps?

Either way, it appears I can work around this by explicitly disabling
forwarding for this particular interface.

Thanks,
Ben

>
> cheers,
> jamal
> On Tue, 2011-08-23 at 17:20 -0700, Ben Greear wrote:
>> I just noticed on a 3.0.1 kernel that the system is routing packets
>> received on an interface without an IP address. (I was trying to use the
>> interface in a user-space wifi_station-to-wired bridge application).
>>
>> [root@lf0301-demo1 lanforge]# cat /proc/sys/net/ipv4/conf/sta1/forwarding
>> 1
>> [root@lf0301-demo1 lanforge]# ifconfig sta1
>> sta1      Link encap:Ethernet  HWaddr 00:03:2D:12:16:0D
>>             UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>             RX packets:85248 errors:0 dropped:0 overruns:0 frame:0
>>             TX packets:1419 errors:0 dropped:0 overruns:0 carrier:0
>>             collisions:0 txqueuelen:1000
>>             RX bytes:67423391 (64.2 MiB)  TX bytes:1087581 (1.0 MiB)
>>
>>
>> Seems that older stock kernels have forwarding set for interfaces without
>> IP addresses too, so maybe it's always been this way...
>>
>> Anyway, I can add some logic to my config to explicitly disable
>> routing for interfaces w/out IP address, but it seems to me that
>> it should automatically not route packets received on an interface
>> that had no IP address on it..
>>
>> Thanks,
>> Ben
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: Interface without IP address can route??
From: jamal @ 2011-08-24 13:01 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4E5443CD.60502@candelatech.com>


It makes sense to behave this way.
IPv4 addresses are owned by the system not interfaces.
If you want to control the forwarding behavior, control ARP so it doesnt
respond on the interfaces with no IP.

cheers,
jamal
On Tue, 2011-08-23 at 17:20 -0700, Ben Greear wrote:
> I just noticed on a 3.0.1 kernel that the system is routing packets
> received on an interface without an IP address. (I was trying to use the
> interface in a user-space wifi_station-to-wired bridge application).
> 
> [root@lf0301-demo1 lanforge]# cat /proc/sys/net/ipv4/conf/sta1/forwarding
> 1
> [root@lf0301-demo1 lanforge]# ifconfig sta1
> sta1      Link encap:Ethernet  HWaddr 00:03:2D:12:16:0D
>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>            RX packets:85248 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:1419 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:1000
>            RX bytes:67423391 (64.2 MiB)  TX bytes:1087581 (1.0 MiB)
> 
> 
> Seems that older stock kernels have forwarding set for interfaces without
> IP addresses too, so maybe it's always been this way...
> 
> Anyway, I can add some logic to my config to explicitly disable
> routing for interfaces w/out IP address, but it seems to me that
> it should automatically not route packets received on an interface
> that had no IP address on it..
> 
> Thanks,
> Ben
> 

^ permalink raw reply

* [PATCH 8/8] batman-adv: merge update_transtable() into tt related code
From: Marek Lindner @ 2011-08-24 13:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner
In-Reply-To: <1314190838-2273-1-git-send-email-lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
---
 net/batman-adv/routing.c           |   66 ++--------------------------------
 net/batman-adv/translation-table.c |   69 +++++++++++++++++++++++++++++++++---
 net/batman-adv/translation-table.h |    9 ++---
 3 files changed, 70 insertions(+), 74 deletions(-)

diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 91a7860..1949928 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -64,65 +64,6 @@ void slide_own_bcast_window(struct hard_iface *hard_iface)
 	}
 }
 
-static void update_transtable(struct bat_priv *bat_priv,
-			      struct orig_node *orig_node,
-			      const unsigned char *tt_buff,
-			      uint8_t tt_num_changes, uint8_t ttvn,
-			      uint16_t tt_crc)
-{
-	uint8_t orig_ttvn = (uint8_t)atomic_read(&orig_node->last_ttvn);
-	bool full_table = true;
-
-	/* the ttvn increased by one -> we can apply the attached changes */
-	if (ttvn - orig_ttvn == 1) {
-		/* the OGM could not contain the changes due to their size or
-		 * because they have already been sent TT_OGM_APPEND_MAX times.
-		 * In this case send a tt request */
-		if (!tt_num_changes) {
-			full_table = false;
-			goto request_table;
-		}
-
-		tt_update_changes(bat_priv, orig_node, tt_num_changes, ttvn,
-				  (struct tt_change *)tt_buff);
-
-		/* Even if we received the precomputed crc with the OGM, we
-		 * prefer to recompute it to spot any possible inconsistency
-		 * in the global table */
-		orig_node->tt_crc = tt_global_crc(bat_priv, orig_node);
-
-		/* The ttvn alone is not enough to guarantee consistency
-		 * because a single value could represent different states
-		 * (due to the wrap around). Thus a node has to check whether
-		 * the resulting table (after applying the changes) is still
-		 * consistent or not. E.g. a node could disconnect while its
-		 * ttvn is X and reconnect on ttvn = X + TTVN_MAX: in this case
-		 * checking the CRC value is mandatory to detect the
-		 * inconsistency */
-		if (orig_node->tt_crc != tt_crc)
-			goto request_table;
-
-		/* Roaming phase is over: tables are in sync again. I can
-		 * unset the flag */
-		orig_node->tt_poss_change = false;
-	} else {
-		/* if we missed more than one change or our tables are not
-		 * in sync anymore -> request fresh tt data */
-		if (ttvn != orig_ttvn || orig_node->tt_crc != tt_crc) {
-request_table:
-			bat_dbg(DBG_TT, bat_priv, "TT inconsistency for %pM. "
-				"Need to retrieve the correct information "
-				"(ttvn: %u last_ttvn: %u crc: %u last_crc: "
-				"%u num_changes: %u)\n", orig_node->orig, ttvn,
-				orig_ttvn, tt_crc, orig_node->tt_crc,
-				tt_num_changes);
-			send_tt_request(bat_priv, orig_node, ttvn, tt_crc,
-					full_table);
-			return;
-		}
-	}
-}
-
 static void update_route(struct bat_priv *bat_priv,
 			 struct orig_node *orig_node,
 			 struct neigh_node *neigh_node)
@@ -499,10 +440,9 @@ update_tt:
 	if (((batman_packet->orig != ethhdr->h_source) &&
 				(batman_packet->ttl > 2)) ||
 				(batman_packet->flags & PRIMARIES_FIRST_HOP))
-		update_transtable(bat_priv, orig_node, tt_buff,
-				  batman_packet->tt_num_changes,
-				  batman_packet->ttvn,
-				  batman_packet->tt_crc);
+		tt_update_orig(bat_priv, orig_node, tt_buff,
+			       batman_packet->tt_num_changes,
+			       batman_packet->ttvn, batman_packet->tt_crc);
 
 	if (orig_node->gw_flags != batman_packet->gw_flags)
 		gw_node_update(bat_priv, orig_node, batman_packet->gw_flags);
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index e8f849f..cc53f78 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -1079,8 +1079,9 @@ out:
 	return skb;
 }
 
-int send_tt_request(struct bat_priv *bat_priv, struct orig_node *dst_orig_node,
-		    uint8_t ttvn, uint16_t tt_crc, bool full_table)
+static int send_tt_request(struct bat_priv *bat_priv,
+			   struct orig_node *dst_orig_node,
+			   uint8_t ttvn, uint16_t tt_crc, bool full_table)
 {
 	struct sk_buff *skb = NULL;
 	struct tt_query_packet *tt_request;
@@ -1455,9 +1456,10 @@ out:
 		orig_node_free_ref(orig_node);
 }
 
-void tt_update_changes(struct bat_priv *bat_priv, struct orig_node *orig_node,
-		       uint16_t tt_num_changes, uint8_t ttvn,
-		       struct tt_change *tt_change)
+static void tt_update_changes(struct bat_priv *bat_priv,
+			      struct orig_node *orig_node,
+			      uint16_t tt_num_changes, uint8_t ttvn,
+			      struct tt_change *tt_change)
 {
 	_tt_update_changes(bat_priv, orig_node, tt_change, tt_num_changes,
 			   ttvn);
@@ -1802,3 +1804,60 @@ out:
 		tt_local_entry_free_ref(tt_local_entry);
 	return ret;
 }
+
+void tt_update_orig(struct bat_priv *bat_priv, struct orig_node *orig_node,
+		    const unsigned char *tt_buff, uint8_t tt_num_changes,
+		    uint8_t ttvn, uint16_t tt_crc)
+{
+	uint8_t orig_ttvn = (uint8_t)atomic_read(&orig_node->last_ttvn);
+	bool full_table = true;
+
+	/* the ttvn increased by one -> we can apply the attached changes */
+	if (ttvn - orig_ttvn == 1) {
+		/* the OGM could not contain the changes due to their size or
+		 * because they have already been sent TT_OGM_APPEND_MAX times.
+		 * In this case send a tt request */
+		if (!tt_num_changes) {
+			full_table = false;
+			goto request_table;
+		}
+
+		tt_update_changes(bat_priv, orig_node, tt_num_changes, ttvn,
+				  (struct tt_change *)tt_buff);
+
+		/* Even if we received the precomputed crc with the OGM, we
+		 * prefer to recompute it to spot any possible inconsistency
+		 * in the global table */
+		orig_node->tt_crc = tt_global_crc(bat_priv, orig_node);
+
+		/* The ttvn alone is not enough to guarantee consistency
+		 * because a single value could represent different states
+		 * (due to the wrap around). Thus a node has to check whether
+		 * the resulting table (after applying the changes) is still
+		 * consistent or not. E.g. a node could disconnect while its
+		 * ttvn is X and reconnect on ttvn = X + TTVN_MAX: in this case
+		 * checking the CRC value is mandatory to detect the
+		 * inconsistency */
+		if (orig_node->tt_crc != tt_crc)
+			goto request_table;
+
+		/* Roaming phase is over: tables are in sync again. I can
+		 * unset the flag */
+		orig_node->tt_poss_change = false;
+	} else {
+		/* if we missed more than one change or our tables are not
+		 * in sync anymore -> request fresh tt data */
+		if (ttvn != orig_ttvn || orig_node->tt_crc != tt_crc) {
+request_table:
+			bat_dbg(DBG_TT, bat_priv, "TT inconsistency for %pM. "
+				"Need to retrieve the correct information "
+				"(ttvn: %u last_ttvn: %u crc: %u last_crc: "
+				"%u num_changes: %u)\n", orig_node->orig, ttvn,
+				orig_ttvn, tt_crc, orig_node->tt_crc,
+				tt_num_changes);
+			send_tt_request(bat_priv, orig_node, ttvn, tt_crc,
+					full_table);
+			return;
+		}
+	}
+}
diff --git a/net/batman-adv/translation-table.h b/net/batman-adv/translation-table.h
index b47e876..30efd49 100644
--- a/net/batman-adv/translation-table.h
+++ b/net/batman-adv/translation-table.h
@@ -49,14 +49,8 @@ void tt_save_orig_buffer(struct bat_priv *bat_priv, struct orig_node *orig_node,
 uint16_t tt_local_crc(struct bat_priv *bat_priv);
 uint16_t tt_global_crc(struct bat_priv *bat_priv, struct orig_node *orig_node);
 void tt_free(struct bat_priv *bat_priv);
-int send_tt_request(struct bat_priv *bat_priv,
-		    struct orig_node *dst_orig_node, uint8_t ttvn,
-		    uint16_t tt_crc, bool full_table);
 bool send_tt_response(struct bat_priv *bat_priv,
 		      struct tt_query_packet *tt_request);
-void tt_update_changes(struct bat_priv *bat_priv, struct orig_node *orig_node,
-		       uint16_t tt_num_changes, uint8_t ttvn,
-		       struct tt_change *tt_change);
 bool is_my_client(struct bat_priv *bat_priv, const uint8_t *addr);
 void handle_tt_response(struct bat_priv *bat_priv,
 			struct tt_query_packet *tt_response);
@@ -64,5 +58,8 @@ void send_roam_adv(struct bat_priv *bat_priv, uint8_t *client,
 		   struct orig_node *orig_node);
 void tt_commit_changes(struct bat_priv *bat_priv);
 bool is_ap_isolated(struct bat_priv *bat_priv, uint8_t *src, uint8_t *dst);
+void tt_update_orig(struct bat_priv *bat_priv, struct orig_node *orig_node,
+		    const unsigned char *tt_buff, uint8_t tt_num_changes,
+		    uint8_t ttvn, uint16_t tt_crc);
 
 #endif /* _NET_BATMAN_ADV_TRANSLATION_TABLE_H_ */
-- 
1.7.5.3

^ permalink raw reply related

* [PATCH 7/8] batman-adv: reuse tt_len() to calculate tt buffer length
From: Marek Lindner @ 2011-08-24 13:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner
In-Reply-To: <1314190838-2273-1-git-send-email-lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/aggregation.h |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/aggregation.h b/net/batman-adv/aggregation.h
index 216337b..df4a5a9 100644
--- a/net/batman-adv/aggregation.h
+++ b/net/batman-adv/aggregation.h
@@ -28,8 +28,7 @@
 static inline int aggregated_packet(int buff_pos, int packet_len,
 				    int tt_num_changes)
 {
-	int next_buff_pos = buff_pos + BAT_PACKET_LEN + (tt_num_changes *
-						sizeof(struct tt_change));
+	int next_buff_pos = buff_pos + BAT_PACKET_LEN + tt_len(tt_num_changes);
 
 	return (next_buff_pos <= packet_len) &&
 		(next_buff_pos <= MAX_AGGREGATION_BYTES);
-- 
1.7.5.3

^ permalink raw reply related

* [PATCH 6/8] batman-adv: print client flags in the local/global transtables output
From: Marek Lindner @ 2011-08-24 13:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli, Marek Lindner
In-Reply-To: <1314190838-2273-1-git-send-email-lindner_marek@yahoo.de>

From: Antonio Quartulli <ordex@autistici.org>

Since clients can have several flags on or off, this patches make them
appear in the local/global transtable output so that they can be checked
for debugging purposes.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
---
 net/batman-adv/translation-table.c |   37 ++++++++++++++++++++++++++---------
 1 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 1f128e1..e8f849f 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -332,7 +332,7 @@ int tt_local_seq_print_text(struct seq_file *seq, void *offset)
 
 		rcu_read_lock();
 		__hlist_for_each_rcu(node, head)
-			buf_size += 21;
+			buf_size += 29;
 		rcu_read_unlock();
 	}
 
@@ -351,8 +351,19 @@ int tt_local_seq_print_text(struct seq_file *seq, void *offset)
 		rcu_read_lock();
 		hlist_for_each_entry_rcu(tt_local_entry, node,
 					 head, hash_entry) {
-			pos += snprintf(buff + pos, 22, " * %pM\n",
-					tt_local_entry->addr);
+			pos += snprintf(buff + pos, 30, " * %pM "
+					"[%c%c%c%c%c]\n",
+					tt_local_entry->addr,
+					(tt_local_entry->flags &
+					 TT_CLIENT_ROAM ? 'R' : '.'),
+					(tt_local_entry->flags &
+					 TT_CLIENT_NOPURGE ? 'P' : '.'),
+					(tt_local_entry->flags &
+					 TT_CLIENT_NEW ? 'N' : '.'),
+					(tt_local_entry->flags &
+					 TT_CLIENT_PENDING ? 'X' : '.'),
+					(tt_local_entry->flags &
+					 TT_CLIENT_WIFI ? 'W' : '.'));
 		}
 		rcu_read_unlock();
 	}
@@ -589,8 +600,8 @@ int tt_global_seq_print_text(struct seq_file *seq, void *offset)
 	seq_printf(seq,
 		   "Globally announced TT entries received via the mesh %s\n",
 		   net_dev->name);
-	seq_printf(seq, "       %-13s %s       %-15s %s\n",
-		   "Client", "(TTVN)", "Originator", "(Curr TTVN)");
+	seq_printf(seq, "       %-13s %s       %-15s %s %s\n",
+		   "Client", "(TTVN)", "Originator", "(Curr TTVN)", "Flags");
 
 	buf_size = 1;
 	/* Estimate length for: " * xx:xx:xx:xx:xx:xx (ttvn) via
@@ -600,7 +611,7 @@ int tt_global_seq_print_text(struct seq_file *seq, void *offset)
 
 		rcu_read_lock();
 		__hlist_for_each_rcu(node, head)
-			buf_size += 59;
+			buf_size += 67;
 		rcu_read_unlock();
 	}
 
@@ -619,14 +630,20 @@ int tt_global_seq_print_text(struct seq_file *seq, void *offset)
 		rcu_read_lock();
 		hlist_for_each_entry_rcu(tt_global_entry, node,
 					 head, hash_entry) {
-			pos += snprintf(buff + pos, 61,
-					" * %pM  (%3u) via %pM     (%3u)\n",
-					tt_global_entry->addr,
+			pos += snprintf(buff + pos, 69,
+					" * %pM  (%3u) via %pM     (%3u)   "
+					"[%c%c%c]\n", tt_global_entry->addr,
 					tt_global_entry->ttvn,
 					tt_global_entry->orig_node->orig,
 					(uint8_t) atomic_read(
 						&tt_global_entry->orig_node->
-						last_ttvn));
+						last_ttvn),
+					(tt_global_entry->flags &
+					 TT_CLIENT_ROAM ? 'R' : '.'),
+					(tt_global_entry->flags &
+					 TT_CLIENT_PENDING ? 'X' : '.'),
+					(tt_global_entry->flags &
+					 TT_CLIENT_WIFI ? 'W' : '.'));
 		}
 		rcu_read_unlock();
 	}
-- 
1.7.5.3

^ permalink raw reply related

* [PATCH 5/8] batman-adv: implement AP-isolation on the sender side
From: Marek Lindner @ 2011-08-24 13:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli, Marek Lindner
In-Reply-To: <1314190838-2273-1-git-send-email-lindner_marek@yahoo.de>

From: Antonio Quartulli <ordex@autistici.org>

If a node has to send a packet issued by a WIFI client to another WIFI client,
the packet is dropped.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
---
 net/batman-adv/routing.c           |    2 +-
 net/batman-adv/soft-interface.c    |    3 ++-
 net/batman-adv/translation-table.c |   28 +++++++++++++++++++++-------
 net/batman-adv/translation-table.h |    2 +-
 net/batman-adv/unicast.c           |    6 ++++--
 5 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 13444e9..91a7860 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -1535,7 +1535,7 @@ static int check_unicast_ttvn(struct bat_priv *bat_priv,
 
 		ethhdr = (struct ethhdr *)(skb->data +
 			sizeof(struct unicast_packet));
-		orig_node = transtable_search(bat_priv, ethhdr->h_dest);
+		orig_node = transtable_search(bat_priv, NULL, ethhdr->h_dest);
 
 		if (!orig_node) {
 			if (!is_my_client(bat_priv, ethhdr->h_dest))
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 9addbab..402fd96 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -597,7 +597,8 @@ static int interface_tx(struct sk_buff *skb, struct net_device *soft_iface)
 	/* Register the client MAC in the transtable */
 	tt_local_add(soft_iface, ethhdr->h_source, skb->skb_iif);
 
-	orig_node = transtable_search(bat_priv, ethhdr->h_dest);
+	orig_node = transtable_search(bat_priv, ethhdr->h_source,
+				      ethhdr->h_dest);
 	if (is_multicast_ether_addr(ethhdr->h_dest) ||
 				(orig_node && orig_node->gw_flags)) {
 		ret = gw_is_target(bat_priv, skb, orig_node);
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index d0ed931..1f128e1 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -794,29 +794,43 @@ static bool _is_ap_isolated(struct tt_local_entry *tt_local_entry,
 }
 
 struct orig_node *transtable_search(struct bat_priv *bat_priv,
-				    const uint8_t *addr)
+				    const uint8_t *src, const uint8_t *addr)
 {
-	struct tt_global_entry *tt_global_entry;
+	struct tt_local_entry *tt_local_entry = NULL;
+	struct tt_global_entry *tt_global_entry = NULL;
 	struct orig_node *orig_node = NULL;
 
+	if (src && atomic_read(&bat_priv->ap_isolation)) {
+		tt_local_entry = tt_local_hash_find(bat_priv, src);
+		if (!tt_local_entry)
+			goto out;
+	}
+
 	tt_global_entry = tt_global_hash_find(bat_priv, addr);
-
 	if (!tt_global_entry)
 		goto out;
 
+	/* check whether the clients should not communicate due to AP
+	 * isolation */
+	if (tt_local_entry && _is_ap_isolated(tt_local_entry, tt_global_entry))
+		goto out;
+
 	if (!atomic_inc_not_zero(&tt_global_entry->orig_node->refcount))
-		goto free_tt;
+		goto out;
 
 	/* A global client marked as PENDING has already moved from that
 	 * originator */
 	if (tt_global_entry->flags & TT_CLIENT_PENDING)
-		goto free_tt;
+		goto out;
 
 	orig_node = tt_global_entry->orig_node;
 
-free_tt:
-	tt_global_entry_free_ref(tt_global_entry);
 out:
+	if (tt_global_entry)
+		tt_global_entry_free_ref(tt_global_entry);
+	if (tt_local_entry)
+		tt_local_entry_free_ref(tt_local_entry);
+
 	return orig_node;
 }
 
diff --git a/net/batman-adv/translation-table.h b/net/batman-adv/translation-table.h
index f1d148e..b47e876 100644
--- a/net/batman-adv/translation-table.h
+++ b/net/batman-adv/translation-table.h
@@ -43,7 +43,7 @@ void tt_global_del(struct bat_priv *bat_priv,
 		   struct orig_node *orig_node, const unsigned char *addr,
 		   const char *message, bool roaming);
 struct orig_node *transtable_search(struct bat_priv *bat_priv,
-				    const uint8_t *addr);
+				    const uint8_t *src, const uint8_t *addr);
 void tt_save_orig_buffer(struct bat_priv *bat_priv, struct orig_node *orig_node,
 			 const unsigned char *tt_buff, uint8_t tt_num_changes);
 uint16_t tt_local_crc(struct bat_priv *bat_priv);
diff --git a/net/batman-adv/unicast.c b/net/batman-adv/unicast.c
index 32b125f..07d1c1d 100644
--- a/net/batman-adv/unicast.c
+++ b/net/batman-adv/unicast.c
@@ -299,8 +299,10 @@ int unicast_send_skb(struct sk_buff *skb, struct bat_priv *bat_priv)
 			goto find_router;
 	}
 
-	/* check for tt host - increases orig_node refcount */
-	orig_node = transtable_search(bat_priv, ethhdr->h_dest);
+	/* check for tt host - increases orig_node refcount.
+	 * returns NULL in case of AP isolation */
+	orig_node = transtable_search(bat_priv, ethhdr->h_source,
+				      ethhdr->h_dest);
 
 find_router:
 	/**
-- 
1.7.5.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox