linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* help: 802.11s bad performance with 802.11n enabled
@ 2012-11-16 17:41 Chaoxing Lin
  0 siblings, 0 replies; 25+ messages in thread
From: Chaoxing Lin @ 2012-11-16 17:41 UTC (permalink / raw)
  To: linux-wireless@vger.kernel.org

I set up a 7 node 802.11s mesh network and try to evaluate network performance.

My first test is to evaluate packet loss.
My test utility is very simple. Do a continuous ping to all 7 nodes. And count the ping replies. The ping rate is about 10 ping requests per second to each node.

802.11a channel 40. Clean RF environment, nobody else is on this channel

When 802.11n is NOT enabled, the ping loss rate is very good. Only a few packets are lost during an overnight test.

However, when 802.11n (HT40+ or HT20) is enabled, the network is crazily unstable. The ping loss is about 30% or more to each node. 

FYI, The 802.11n itself seems to work well with 802.11s when there are only 2 nodes (standalone). I say so because I did throughput test on a 2 node mesh with channel 40 HT40+. The throughput was good. IPERF TCP throughput is about 170Mbps out of 300Mbps (2 streams).


Does anyone know what's going on?
Or anyone did 802.11s performance test and can share the test data/setup, etc?


Thanks,

Chaoxing

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
@ 2012-11-17  9:20 Yeoh Chun-Yeow
  0 siblings, 0 replies; 25+ messages in thread
From: Yeoh Chun-Yeow @ 2012-11-17  9:20 UTC (permalink / raw)
  To: linux-wireless

Hi, Chaoxing

Do you network diagram to explain your setup, all the nodes are able
to talk to each others directly?

What the version of compat-wireless that you are using?

---
Chun-Yeow

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
@ 2012-12-03 14:37 Chaoxing Lin
  2012-12-03 14:45 ` Georgiewskiy Yuriy
  2012-12-04  4:35 ` Thomas Pedersen
  0 siblings, 2 replies; 25+ messages in thread
From: Chaoxing Lin @ 2012-12-03 14:37 UTC (permalink / raw)
  To: linux-wireless@vger.kernel.org

After a lot of experiments, here are various problems observed.

1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
Is anyone looking at this issue? This issue is now very easy to recreate.

2. Security Key for peer link and mesh path messed up

For example, in one case,
Device A cannot ping device B but it can ping device C. And it is seen that telnet
from device A to device C and from device C it can ping device B.
This means device A actually can reach device B. But user has to do it manually
(through a third device)

Below is a "reachable graph" in one of the real scenario.

147 ----> 115
    ----> 111 ------>103
              ------>104
              ------>113
              ------>115

Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node. 
But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.

Further analysis peer link between 147 and 104 reveals below.

147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.

I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?


3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability

For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
It's as stable as running 802.11a only mode.
So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
Any one has any idea why?




-----Original Message-----
From: Chaoxing Lin 
Sent: Friday, November 16, 2012 12:41 PM
To: 'linux-wireless@vger.kernel.org'
Subject: help: 802.11s bad performance with 802.11n enabled

I set up a 7 node 802.11s mesh network and try to evaluate network performance.

My first test is to evaluate packet loss.
My test utility is very simple. Do a continuous ping to all 7 nodes. And count the ping replies. The ping rate is about 10 ping requests per second to each node.

802.11a channel 40. Clean RF environment, nobody else is on this channel

When 802.11n is NOT enabled, the ping loss rate is very good. Only a few packets are lost during an overnight test.

However, when 802.11n (HT40+ or HT20) is enabled, the network is crazily unstable. The ping loss is about 30% or more to each node. 

FYI, The 802.11n itself seems to work well with 802.11s when there are only 2 nodes (standalone). I say so because I did throughput test on a 2 node mesh with channel 40 HT40+. The throughput was good. IPERF TCP throughput is about 170Mbps out of 300Mbps (2 streams).


Does anyone know what's going on?
Or anyone did 802.11s performance test and can share the test data/setup, etc?


Thanks,

Chaoxing

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 14:37 help: 802.11s bad performance with 802.11n enabled Chaoxing Lin
@ 2012-12-03 14:45 ` Georgiewskiy Yuriy
  2012-12-03 14:56   ` Chaoxing Lin
                     ` (2 more replies)
  2012-12-04  4:35 ` Thomas Pedersen
  1 sibling, 3 replies; 25+ messages in thread
From: Georgiewskiy Yuriy @ 2012-12-03 14:45 UTC (permalink / raw)
  To: Chaoxing Lin; +Cc: linux-wireless@vger.kernel.org

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4272 bytes --]

On 2012-12-03 14:37 -0000, Chaoxing Lin wrote linux-wireless@vger.kernel.org:

CL>After a lot of experiments, here are various problems observed.
CL>
CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
CL>Is anyone looking at this issue? This issue is now very easy to recreate.

In my case it much more than 3%.

CL>
CL>2. Security Key for peer link and mesh path messed up
CL>
CL>For example, in one case,
CL>Device A cannot ping device B but it can ping device C. And it is seen that telnet
CL>from device A to device C and from device C it can ping device B.
CL>This means device A actually can reach device B. But user has to do it manually
CL>(through a third device)
CL>
CL>Below is a "reachable graph" in one of the real scenario.
CL>
CL>147 ----> 115
CL>    ----> 111 ------>103
CL>              ------>104
CL>              ------>113
CL>              ------>115
CL>
CL>Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node. 
CL>But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.
CL>
CL>Further analysis peer link between 147 and 104 reveals below.
CL>
CL>147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
CL>But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
CL>That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.
CL>
CL>I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?
CL>
CL>
CL>3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability
CL>
CL>For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
CL>It's as stable as running 802.11a only mode.
CL>So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
CL>Any one has any idea why?

Can you post a patch? i want test this too.

CL>
CL>
CL>
CL>-----Original Message-----
CL>From: Chaoxing Lin 
CL>Sent: Friday, November 16, 2012 12:41 PM
CL>To: 'linux-wireless@vger.kernel.org'
CL>Subject: help: 802.11s bad performance with 802.11n enabled
CL>
CL>I set up a 7 node 802.11s mesh network and try to evaluate network performance.
CL>
CL>My first test is to evaluate packet loss.
CL>My test utility is very simple. Do a continuous ping to all 7 nodes. And count the ping replies. The ping rate is about 10 ping requests per second to each node.
CL>
CL>802.11a channel 40. Clean RF environment, nobody else is on this channel
CL>
CL>When 802.11n is NOT enabled, the ping loss rate is very good. Only a few packets are lost during an overnight test.
CL>
CL>However, when 802.11n (HT40+ or HT20) is enabled, the network is crazily unstable. The ping loss is about 30% or more to each node. 
CL>
CL>FYI, The 802.11n itself seems to work well with 802.11s when there are only 2 nodes (standalone). I say so because I did throughput test on a 2 node mesh with channel 40 HT40+. The throughput was good. IPERF TCP throughput is about 170Mbps out of 300Mbps (2 streams).
CL>
CL>
CL>Does anyone know what's going on?
CL>Or anyone did 802.11s performance test and can share the test data/setup, etc?
CL>
CL>
CL>Thanks,
CL>
CL>Chaoxing
CL>--
CL>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
CL>the body of a message to majordomo@vger.kernel.org
CL>More majordomo info at  http://vger.kernel.org/majordomo-info.html
CL>

C уважением                       With Best Regards
Георгиевский Юрий.                Georgiewskiy Yuriy
+7 4872 711666                    +7 4872 711666
факс +7 4872 711143               fax +7 4872 711143
Компания ООО "Ай Ти Сервис"       IT Service Ltd
http://nkoort.ru                  http://nkoort.ru
JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
YG129-RIPE                        YG129-RIPE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 14:45 ` Georgiewskiy Yuriy
@ 2012-12-03 14:56   ` Chaoxing Lin
  2012-12-03 15:43     ` Georgiewskiy Yuriy
  2012-12-08  3:17   ` Thomas Pedersen
  2012-12-10 15:11   ` Chaoxing Lin
  2 siblings, 1 reply; 25+ messages in thread
From: Chaoxing Lin @ 2012-12-03 14:56 UTC (permalink / raw)
  To: Georgiewskiy Yuriy; +Cc: linux-wireless@vger.kernel.org

CL>For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
CL>It's as stable as running 802.11a only mode.
CL>So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
CL>Any one has any idea why?

Can you post a patch? i want test this too.

The change is easy
In ath9k/init.c
Function ath9k_set_hw_capab()
Replace below
    if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
         hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION;
with
	hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION; 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 14:56   ` Chaoxing Lin
@ 2012-12-03 15:43     ` Georgiewskiy Yuriy
  2012-12-03 15:47       ` Chaoxing Lin
  2012-12-03 18:21       ` Paul Stoaks
  0 siblings, 2 replies; 25+ messages in thread
From: Georgiewskiy Yuriy @ 2012-12-03 15:43 UTC (permalink / raw)
  To: Chaoxing Lin; +Cc: linux-wireless@vger.kernel.org

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1279 bytes --]

On 2012-12-03 14:56 -0000, Chaoxing Lin wrote Georgiewskiy Yuriy:

CL>CL>For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
CL>CL>It's as stable as running 802.11a only mode.
CL>CL>So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
CL>CL>Any one has any idea why?
CL>
CL>Can you post a patch? i want test this too.
CL>
CL>The change is easy
CL>In ath9k/init.c
CL>Function ath9k_set_hw_capab()
CL>Replace below
CL>    if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
CL>         hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION;
CL>with
CL>	hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION; 

On first look in my case disabled aggregation reduces packet loss, 
link is more reliable, but it's also drop throughput to 15-20 Mbits/sec
from about ~50 with aggregation enabled.

C уважением                       With Best Regards
Георгиевский Юрий.                Georgiewskiy Yuriy
+7 4872 711666                    +7 4872 711666
факс +7 4872 711143               fax +7 4872 711143
Компания ООО "Ай Ти Сервис"       IT Service Ltd
http://nkoort.ru                  http://nkoort.ru
JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
YG129-RIPE                        YG129-RIPE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 15:43     ` Georgiewskiy Yuriy
@ 2012-12-03 15:47       ` Chaoxing Lin
  2012-12-03 18:21       ` Paul Stoaks
  1 sibling, 0 replies; 25+ messages in thread
From: Chaoxing Lin @ 2012-12-03 15:47 UTC (permalink / raw)
  To: Georgiewskiy Yuriy; +Cc: linux-wireless@vger.kernel.org


>On first look in my case disabled aggregation reduces packet loss, link is more reliable, but it's also drop throughput to 15-20 Mbits/sec from about ~50 with aggregation enabled.

Yes. This is the penalty of disabling aggregation. It's just a way to narrow down where is the problem.

Can any expert on 802.11n aggregation and/or 802.11s tell what's going on ?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
@ 2012-12-03 16:33 Chaoxing Lin
  0 siblings, 0 replies; 25+ messages in thread
From: Chaoxing Lin @ 2012-12-03 16:33 UTC (permalink / raw)
  To: linux-wireless@vger.kernel.org

A 4th problem is AES-CCM pairwise key complains about "packet replay". 
All keys are good, mpaths are good. But ping does not reply and counter in /debugfs/ieee80211/phy0/keys/[key-num]/replays keeps going up.

I have seen such problem many times when running 802.11s traffic. 


-----Original Message-----
From: Chaoxing Lin 
Sent: Monday, December 03, 2012 9:38 AM
To: 'linux-wireless@vger.kernel.org'
Subject: RE: help: 802.11s bad performance with 802.11n enabled

After a lot of experiments, here are various problems observed.

1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
Is anyone looking at this issue? This issue is now very easy to recreate.

2. Security Key for peer link and mesh path messed up

For example, in one case,
Device A cannot ping device B but it can ping device C. And it is seen that telnet from device A to device C and from device C it can ping device B.
This means device A actually can reach device B. But user has to do it manually (through a third device)

Below is a "reachable graph" in one of the real scenario.

147 ----> 115
    ----> 111 ------>103
              ------>104
              ------>113
              ------>115

Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node. 
But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.

Further analysis peer link between 147 and 104 reveals below.

147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.

I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?


3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability

For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
It's as stable as running 802.11a only mode.
So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
Any one has any idea why?




-----Original Message-----
From: Chaoxing Lin
Sent: Friday, November 16, 2012 12:41 PM
To: 'linux-wireless@vger.kernel.org'
Subject: help: 802.11s bad performance with 802.11n enabled

I set up a 7 node 802.11s mesh network and try to evaluate network performance.

My first test is to evaluate packet loss.
My test utility is very simple. Do a continuous ping to all 7 nodes. And count the ping replies. The ping rate is about 10 ping requests per second to each node.

802.11a channel 40. Clean RF environment, nobody else is on this channel

When 802.11n is NOT enabled, the ping loss rate is very good. Only a few packets are lost during an overnight test.

However, when 802.11n (HT40+ or HT20) is enabled, the network is crazily unstable. The ping loss is about 30% or more to each node. 

FYI, The 802.11n itself seems to work well with 802.11s when there are only 2 nodes (standalone). I say so because I did throughput test on a 2 node mesh with channel 40 HT40+. The throughput was good. IPERF TCP throughput is about 170Mbps out of 300Mbps (2 streams).


Does anyone know what's going on?
Or anyone did 802.11s performance test and can share the test data/setup, etc?


Thanks,

Chaoxing

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 15:43     ` Georgiewskiy Yuriy
  2012-12-03 15:47       ` Chaoxing Lin
@ 2012-12-03 18:21       ` Paul Stoaks
  2012-12-03 18:33         ` Georgiewskiy Yuriy
  2012-12-03 19:02         ` Chaoxing Lin
  1 sibling, 2 replies; 25+ messages in thread
From: Paul Stoaks @ 2012-12-03 18:21 UTC (permalink / raw)
  To: 'Georgiewskiy Yuriy', 'Chaoxing Lin'; +Cc: linux-wireless

What kind of traffic are you pushing through (packet sizes?)  Are they fixed
size, fixed rate, or ...?

Paul


-----Original Message-----
From: linux-wireless-owner@vger.kernel.org
[mailto:linux-wireless-owner@vger.kernel.org] On Behalf Of Georgiewskiy
Yuriy
Sent: Monday, December 03, 2012 7:44 AM
To: Chaoxing Lin
Cc: linux-wireless@vger.kernel.org
Subject: RE: help: 802.11s bad performance with 802.11n enabled

On 2012-12-03 14:56 -0000, Chaoxing Lin wrote Georgiewskiy Yuriy:

CL>CL>For experiment, I changed ath9k driver to disable 802.11n packet
aggregation. The network becomes much better.
CL>CL>It's as stable as running 802.11a only mode.
CL>CL>So it seems that the aggregation plays a big role in in-stability of
802.11s network with 802.11n.
CL>CL>Any one has any idea why?
CL>
CL>Can you post a patch? i want test this too.
CL>
CL>The change is easy
CL>In ath9k/init.c
CL>Function ath9k_set_hw_capab()
CL>Replace below
CL>    if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
CL>         hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION; with
CL>	hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION;

On first look in my case disabled aggregation reduces packet loss, link is
more reliable, but it's also drop throughput to 15-20 Mbits/sec from about
~50 with aggregation enabled.

C уважением                       With Best Regards
Георгиевский Юрий.                Georgiewskiy Yuriy
+7 4872 711666                    +7 4872 711666
факс +7 4872 711143               fax +7 4872 711143
Компания ООО "Ай Ти Сервис"       IT Service Ltd
http://nkoort.ru                  http://nkoort.ru
JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
YG129-RIPE                        YG129-RIPE


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 18:21       ` Paul Stoaks
@ 2012-12-03 18:33         ` Georgiewskiy Yuriy
  2012-12-03 19:02         ` Chaoxing Lin
  1 sibling, 0 replies; 25+ messages in thread
From: Georgiewskiy Yuriy @ 2012-12-03 18:33 UTC (permalink / raw)
  To: Paul Stoaks; +Cc: 'Chaoxing Lin', linux-wireless

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2643 bytes --]

On 2012-12-03 10:21 -0800, Paul Stoaks wrote 'Georgiewskiy Yuriy' and...':

i test with ping -A size 64 and 1500 bytes, and with iperf with default parameters plus 
just -c -i1 -t100500

PS>What kind of traffic are you pushing through (packet sizes?)  Are they fixed
PS>size, fixed rate, or ...?
PS>
PS>Paul
PS>
PS>
PS>-----Original Message-----
PS>From: linux-wireless-owner@vger.kernel.org
PS>[mailto:linux-wireless-owner@vger.kernel.org] On Behalf Of Georgiewskiy
PS>Yuriy
PS>Sent: Monday, December 03, 2012 7:44 AM
PS>To: Chaoxing Lin
PS>Cc: linux-wireless@vger.kernel.org
PS>Subject: RE: help: 802.11s bad performance with 802.11n enabled
PS>
PS>On 2012-12-03 14:56 -0000, Chaoxing Lin wrote Georgiewskiy Yuriy:
PS>
PS>CL>CL>For experiment, I changed ath9k driver to disable 802.11n packet
PS>aggregation. The network becomes much better.
PS>CL>CL>It's as stable as running 802.11a only mode.
PS>CL>CL>So it seems that the aggregation plays a big role in in-stability of
PS>802.11s network with 802.11n.
PS>CL>CL>Any one has any idea why?
PS>CL>
PS>CL>Can you post a patch? i want test this too.
PS>CL>
PS>CL>The change is easy
PS>CL>In ath9k/init.c
PS>CL>Function ath9k_set_hw_capab()
PS>CL>Replace below
PS>CL>    if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
PS>CL>         hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION; with
PS>CL>	hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION;
PS>
PS>On first look in my case disabled aggregation reduces packet loss, link is
PS>more reliable, but it's also drop throughput to 15-20 Mbits/sec from about
PS>~50 with aggregation enabled.
PS>
PS>C уважением                       With Best Regards
PS>Георгиевский Юрий.                Georgiewskiy Yuriy
PS>+7 4872 711666                    +7 4872 711666
PS>факс +7 4872 711143               fax +7 4872 711143
PS>Компания ООО "Ай Ти Сервис"       IT Service Ltd
PS>http://nkoort.ru                  http://nkoort.ru
PS>JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
PS>YG129-RIPE                        YG129-RIPE
PS>
PS>--
PS>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
PS>the body of a message to majordomo@vger.kernel.org
PS>More majordomo info at  http://vger.kernel.org/majordomo-info.html
PS>

C уважением                       With Best Regards
Георгиевский Юрий.                Georgiewskiy Yuriy
+7 4872 711666                    +7 4872 711666
факс +7 4872 711143               fax +7 4872 711143
Компания ООО "Ай Ти Сервис"       IT Service Ltd
http://nkoort.ru                  http://nkoort.ru
JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
YG129-RIPE                        YG129-RIPE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 18:21       ` Paul Stoaks
  2012-12-03 18:33         ` Georgiewskiy Yuriy
@ 2012-12-03 19:02         ` Chaoxing Lin
  1 sibling, 0 replies; 25+ messages in thread
From: Chaoxing Lin @ 2012-12-03 19:02 UTC (permalink / raw)
  To: paul@foresight-mands.com, 'Georgiewskiy Yuriy'
  Cc: linux-wireless@vger.kernel.org

My test is very simple.

Continuous ping at about 10 ping per node per second.
Ping size 64 bytes


-----Original Message-----
From: Paul Stoaks [mailto:paul@foresight-mands.com] 
Sent: Monday, December 03, 2012 1:22 PM
To: 'Georgiewskiy Yuriy'; Chaoxing Lin
Cc: linux-wireless@vger.kernel.org
Subject: RE: help: 802.11s bad performance with 802.11n enabled

What kind of traffic are you pushing through (packet sizes?)  Are they fixed size, fixed rate, or ...?

Paul


-----Original Message-----
From: linux-wireless-owner@vger.kernel.org
[mailto:linux-wireless-owner@vger.kernel.org] On Behalf Of Georgiewskiy Yuriy
Sent: Monday, December 03, 2012 7:44 AM
To: Chaoxing Lin
Cc: linux-wireless@vger.kernel.org
Subject: RE: help: 802.11s bad performance with 802.11n enabled

On 2012-12-03 14:56 -0000, Chaoxing Lin wrote Georgiewskiy Yuriy:

CL>CL>For experiment, I changed ath9k driver to disable 802.11n packet
aggregation. The network becomes much better.
CL>CL>It's as stable as running 802.11a only mode.
CL>CL>So it seems that the aggregation plays a big role in in-stability 
CL>CL>of
802.11s network with 802.11n.
CL>CL>Any one has any idea why?
CL>
CL>Can you post a patch? i want test this too.
CL>
CL>The change is easy
CL>In ath9k/init.c
CL>Function ath9k_set_hw_capab()
CL>Replace below
CL>    if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
CL>         hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION; with
CL>	hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION;

On first look in my case disabled aggregation reduces packet loss, link is more reliable, but it's also drop throughput to 15-20 Mbits/sec from about
~50 with aggregation enabled.

C уважением                       With Best Regards
Георгиевский Юрий.                Georgiewskiy Yuriy
+7 4872 711666                    +7 4872 711666
факс +7 4872 711143               fax +7 4872 711143
Компания ООО "Ай Ти Сервис"       IT Service Ltd
http://nkoort.ru                  http://nkoort.ru
JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
YG129-RIPE                        YG129-RIPE


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 14:37 help: 802.11s bad performance with 802.11n enabled Chaoxing Lin
  2012-12-03 14:45 ` Georgiewskiy Yuriy
@ 2012-12-04  4:35 ` Thomas Pedersen
  2012-12-04  8:03   ` Adrian Chadd
  1 sibling, 1 reply; 25+ messages in thread
From: Thomas Pedersen @ 2012-12-04  4:35 UTC (permalink / raw)
  To: Chaoxing Lin; +Cc: linux-wireless@vger.kernel.org

Hi Chaoxing,

On Mon, Dec 3, 2012 at 6:37 AM, Chaoxing Lin
<Chaoxing.Lin@ultra-3eti.com> wrote:
> After a lot of experiments, here are various problems observed.
>
> 1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
> Is anyone looking at this issue? This issue is now very easy to recreate.
>
> 2. Security Key for peer link and mesh path messed up
>
> For example, in one case,
> Device A cannot ping device B but it can ping device C. And it is seen that telnet
> from device A to device C and from device C it can ping device B.
> This means device A actually can reach device B. But user has to do it manually
> (through a third device)
>
> Below is a "reachable graph" in one of the real scenario.
>
> 147 ----> 115
>     ----> 111 ------>103
>               ------>104
>               ------>113
>               ------>115
>
> Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node.
> But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.
>
> Further analysis peer link between 147 and 104 reveals below.
>
> 147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
> But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
> That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.
>
> I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?
>
>
> 3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability
>
> For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
> It's as stable as running 802.11a only mode.
> So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
> Any one has any idea why?

I just learned BA and BAR frames only have a 16 bit field for
"starting sequence number", while mesh uses 32-bit "mesh sequence
numbers". Try to investigate whether these two counters interact
properly.

Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
  2012-12-04  4:35 ` Thomas Pedersen
@ 2012-12-04  8:03   ` Adrian Chadd
  0 siblings, 0 replies; 25+ messages in thread
From: Adrian Chadd @ 2012-12-04  8:03 UTC (permalink / raw)
  To: Thomas Pedersen; +Cc: Chaoxing Lin, linux-wireless@vger.kernel.org

... well, how do you implement aggregation? Aggregation also has a 16
bit sequence space (the 802.11 seqno..)



Adrian

On 3 December 2012 20:35, Thomas Pedersen <thomas@cozybit.com> wrote:
> Hi Chaoxing,
>
> On Mon, Dec 3, 2012 at 6:37 AM, Chaoxing Lin
> <Chaoxing.Lin@ultra-3eti.com> wrote:
>> After a lot of experiments, here are various problems observed.
>>
>> 1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
>> Is anyone looking at this issue? This issue is now very easy to recreate.
>>
>> 2. Security Key for peer link and mesh path messed up
>>
>> For example, in one case,
>> Device A cannot ping device B but it can ping device C. And it is seen that telnet
>> from device A to device C and from device C it can ping device B.
>> This means device A actually can reach device B. But user has to do it manually
>> (through a third device)
>>
>> Below is a "reachable graph" in one of the real scenario.
>>
>> 147 ----> 115
>>     ----> 111 ------>103
>>               ------>104
>>               ------>113
>>               ------>115
>>
>> Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node.
>> But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.
>>
>> Further analysis peer link between 147 and 104 reveals below.
>>
>> 147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
>> But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
>> That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.
>>
>> I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?
>>
>>
>> 3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability
>>
>> For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
>> It's as stable as running 802.11a only mode.
>> So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
>> Any one has any idea why?
>
> I just learned BA and BAR frames only have a 16 bit field for
> "starting sequence number", while mesh uses 32-bit "mesh sequence
> numbers". Try to investigate whether these two counters interact
> properly.
>
> Thomas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 14:45 ` Georgiewskiy Yuriy
  2012-12-03 14:56   ` Chaoxing Lin
@ 2012-12-08  3:17   ` Thomas Pedersen
  2012-12-08  3:23     ` Georgiewskiy Yuriy
  2012-12-10 15:23     ` Chaoxing Lin
  2012-12-10 15:11   ` Chaoxing Lin
  2 siblings, 2 replies; 25+ messages in thread
From: Thomas Pedersen @ 2012-12-08  3:17 UTC (permalink / raw)
  To: Georgiewskiy Yuriy; +Cc: Chaoxing Lin, linux-wireless@vger.kernel.org, open11s

Hi Chaoxing and Georgiewsky,

On Mon, Dec 3, 2012 at 6:45 AM, Georgiewskiy Yuriy <bottleman@icf.org.ru> wrote:
> On 2012-12-03 14:37 -0000, Chaoxing Lin wrote linux-wireless@vger.kernel.org:
>
> CL>After a lot of experiments, here are various problems observed.
> CL>
> CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
> CL>Is anyone looking at this issue? This issue is now very easy to recreate.
>
> In my case it much more than 3%.

With wireless-testing HEAD (671c924) I made the following observations
with 3 nodes in a mesh using ch. 149 HT20 on AR9280.

1. ping -i0.1 does not cause aggregation to take place, and losses are 0%
2. a UDP iperf test with two nodes generating traffic shows losses
around 1%. We can observe aggregation taking place in this case.

Can either of you guys reproduce this with the latest
wireless-testing? Also please CC devel@lists.open80211s.org on any
mesh bugs in the future.

Thanks!
Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
  2012-12-08  3:17   ` Thomas Pedersen
@ 2012-12-08  3:23     ` Georgiewskiy Yuriy
  2012-12-08  3:29       ` Georgiewskiy Yuriy
  2012-12-08  3:37       ` Thomas Pedersen
  2012-12-10 15:23     ` Chaoxing Lin
  1 sibling, 2 replies; 25+ messages in thread
From: Georgiewskiy Yuriy @ 2012-12-08  3:23 UTC (permalink / raw)
  To: Thomas Pedersen; +Cc: Chaoxing Lin, linux-wireless@vger.kernel.org, open11s

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1918 bytes --]

On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:

ок, i try this and report results, can you also test on 2.4 Ggz? as i 
understand ch 149 is 802.11a? or this make no sense here?

TP>Hi Chaoxing and Georgiewsky,
TP>
TP>On Mon, Dec 3, 2012 at 6:45 AM, Georgiewskiy Yuriy <bottleman@icf.org.ru> wrote:
TP>> On 2012-12-03 14:37 -0000, Chaoxing Lin wrote linux-wireless@vger.kernel.org:
TP>>
TP>> CL>After a lot of experiments, here are various problems observed.
TP>> CL>
TP>> CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
TP>> CL>Is anyone looking at this issue? This issue is now very easy to recreate.
TP>>
TP>> In my case it much more than 3%.
TP>
TP>With wireless-testing HEAD (671c924) I made the following observations
TP>with 3 nodes in a mesh using ch. 149 HT20 on AR9280.
TP>
TP>1. ping -i0.1 does not cause aggregation to take place, and losses are 0%
TP>2. a UDP iperf test with two nodes generating traffic shows losses
TP>around 1%. We can observe aggregation taking place in this case.
TP>
TP>Can either of you guys reproduce this with the latest
TP>wireless-testing? Also please CC devel@lists.open80211s.org on any
TP>mesh bugs in the future.
TP>
TP>Thanks!
TP>Thomas
TP>--
TP>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
TP>the body of a message to majordomo@vger.kernel.org
TP>More majordomo info at  http://vger.kernel.org/majordomo-info.html
TP>

C уважением                       With Best Regards
Георгиевский Юрий.                Georgiewskiy Yuriy
+7 4872 711666                    +7 4872 711666
факс +7 4872 711143               fax +7 4872 711143
Компания ООО "Ай Ти Сервис"       IT Service Ltd
http://nkoort.ru                  http://nkoort.ru
JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
YG129-RIPE                        YG129-RIPE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
  2012-12-08  3:23     ` Georgiewskiy Yuriy
@ 2012-12-08  3:29       ` Georgiewskiy Yuriy
  2012-12-08  3:37         ` Thomas Pedersen
  2012-12-08  3:37       ` Thomas Pedersen
  1 sibling, 1 reply; 25+ messages in thread
From: Georgiewskiy Yuriy @ 2012-12-08  3:29 UTC (permalink / raw)
  To: Thomas Pedersen; +Cc: Chaoxing Lin, linux-wireless@vger.kernel.org, open11s

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2757 bytes --]

On 2012-12-08 07:23 +0400, Georgiewskiy Yuriy wrote Thomas Pedersen:

GY>On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:
GY>
GY>ок, i try this and report results, can you also test on 2.4 Ggz? as i 
GY>understand ch 149 is 802.11a? or this make no sense here?

and also signal level make sense in my case, i just remove antennas from one of the nodes
in range of 3 meters, it works only with pigtails, signal drops to -70 - -80 dbm, and it's 
triggers filed to stop tx dma immediatlly.

GY>
GY>TP>Hi Chaoxing and Georgiewsky,
GY>TP>
GY>TP>On Mon, Dec 3, 2012 at 6:45 AM, Georgiewskiy Yuriy <bottleman@icf.org.ru> wrote:
GY>TP>> On 2012-12-03 14:37 -0000, Chaoxing Lin wrote linux-wireless@vger.kernel.org:
GY>TP>>
GY>TP>> CL>After a lot of experiments, here are various problems observed.
GY>TP>> CL>
GY>TP>> CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
GY>TP>> CL>Is anyone looking at this issue? This issue is now very easy to recreate.
GY>TP>>
GY>TP>> In my case it much more than 3%.
GY>TP>
GY>TP>With wireless-testing HEAD (671c924) I made the following observations
GY>TP>with 3 nodes in a mesh using ch. 149 HT20 on AR9280.
GY>TP>
GY>TP>1. ping -i0.1 does not cause aggregation to take place, and losses are 0%
GY>TP>2. a UDP iperf test with two nodes generating traffic shows losses
GY>TP>around 1%. We can observe aggregation taking place in this case.
GY>TP>
GY>TP>Can either of you guys reproduce this with the latest
GY>TP>wireless-testing? Also please CC devel@lists.open80211s.org on any
GY>TP>mesh bugs in the future.
GY>TP>
GY>TP>Thanks!
GY>TP>Thomas
GY>TP>--
GY>TP>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
GY>TP>the body of a message to majordomo@vger.kernel.org
GY>TP>More majordomo info at  http://vger.kernel.org/majordomo-info.html
GY>TP>
GY>
GY>C уважением                       With Best Regards
GY>Георгиевский Юрий.                Georgiewskiy Yuriy
GY>+7 4872 711666                    +7 4872 711666
GY>факс +7 4872 711143               fax +7 4872 711143
GY>Компания ООО "Ай Ти Сервис"       IT Service Ltd
GY>http://nkoort.ru                  http://nkoort.ru
GY>JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
GY>YG129-RIPE                        YG129-RIPE

C уважением                       With Best Regards
Георгиевский Юрий.                Georgiewskiy Yuriy
+7 4872 711666                    +7 4872 711666
факс +7 4872 711143               fax +7 4872 711143
Компания ООО "Ай Ти Сервис"       IT Service Ltd
http://nkoort.ru                  http://nkoort.ru
JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
YG129-RIPE                        YG129-RIPE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
  2012-12-08  3:23     ` Georgiewskiy Yuriy
  2012-12-08  3:29       ` Georgiewskiy Yuriy
@ 2012-12-08  3:37       ` Thomas Pedersen
  1 sibling, 0 replies; 25+ messages in thread
From: Thomas Pedersen @ 2012-12-08  3:37 UTC (permalink / raw)
  To: Georgiewskiy Yuriy; +Cc: Chaoxing Lin, linux-wireless@vger.kernel.org, open11s

On Fri, Dec 7, 2012 at 7:23 PM, Georgiewskiy Yuriy <bottleman@icf.org.ru> wrote:
> On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:
>
> ок, i try this and report results, can you also test on 2.4 Ggz? as i
> understand ch 149 is 802.11a? or this make no sense here?

Yes I get similar results on the 2.4Ghz band, and no it shouldn't make
a difference here :)

> TP>Hi Chaoxing and Georgiewsky,
> TP>
> TP>On Mon, Dec 3, 2012 at 6:45 AM, Georgiewskiy Yuriy <bottleman@icf.org.ru> wrote:
> TP>> On 2012-12-03 14:37 -0000, Chaoxing Lin wrote linux-wireless@vger.kernel.org:
> TP>>
> TP>> CL>After a lot of experiments, here are various problems observed.
> TP>> CL>
> TP>> CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
> TP>> CL>Is anyone looking at this issue? This issue is now very easy to recreate.
> TP>>
> TP>> In my case it much more than 3%.
> TP>
> TP>With wireless-testing HEAD (671c924) I made the following observations
> TP>with 3 nodes in a mesh using ch. 149 HT20 on AR9280.
> TP>
> TP>1. ping -i0.1 does not cause aggregation to take place, and losses are 0%
> TP>2. a UDP iperf test with two nodes generating traffic shows losses
> TP>around 1%. We can observe aggregation taking place in this case.
> TP>
> TP>Can either of you guys reproduce this with the latest
> TP>wireless-testing? Also please CC devel@lists.open80211s.org on any
> TP>mesh bugs in the future.
> TP>
> TP>Thanks!
> TP>Thomas
> TP>--
> TP>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> TP>the body of a message to majordomo@vger.kernel.org
> TP>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> TP>
>
> C уважением                       With Best Regards
> Георгиевский Юрий.                Georgiewskiy Yuriy
> +7 4872 711666                    +7 4872 711666
> факс +7 4872 711143               fax +7 4872 711143
> Компания ООО "Ай Ти Сервис"       IT Service Ltd
> http://nkoort.ru                  http://nkoort.ru
> JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
> YG129-RIPE                        YG129-RIPE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
  2012-12-08  3:29       ` Georgiewskiy Yuriy
@ 2012-12-08  3:37         ` Thomas Pedersen
  2012-12-08  3:47           ` Georgiewskiy Yuriy
  0 siblings, 1 reply; 25+ messages in thread
From: Thomas Pedersen @ 2012-12-08  3:37 UTC (permalink / raw)
  To: Georgiewskiy Yuriy; +Cc: Chaoxing Lin, linux-wireless@vger.kernel.org, open11s

On Fri, Dec 7, 2012 at 7:29 PM, Georgiewskiy Yuriy <bottleman@icf.org.ru> wrote:
> On 2012-12-08 07:23 +0400, Georgiewskiy Yuriy wrote Thomas Pedersen:
>
> GY>On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:
> GY>
> GY>ок, i try this and report results, can you also test on 2.4 Ggz? as i
> GY>understand ch 149 is 802.11a? or this make no sense here?
>
> and also signal level make sense in my case, i just remove antennas from one of the nodes
> in range of 3 meters, it works only with pigtails, signal drops to -70 - -80 dbm, and it's
> triggers filed to stop tx dma immediatlly.

Are you talking about a different bug?

Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
  2012-12-08  3:37         ` Thomas Pedersen
@ 2012-12-08  3:47           ` Georgiewskiy Yuriy
  2012-12-10 15:48             ` Chaoxing Lin
  0 siblings, 1 reply; 25+ messages in thread
From: Georgiewskiy Yuriy @ 2012-12-08  3:47 UTC (permalink / raw)
  To: Thomas Pedersen; +Cc: Chaoxing Lin, linux-wireless@vger.kernel.org, open11s

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1498 bytes --]

On 2012-12-07 19:37 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:

TP>On Fri, Dec 7, 2012 at 7:29 PM, Georgiewskiy Yuriy <bottleman@icf.org.ru> wrote:
TP>> On 2012-12-08 07:23 +0400, Georgiewskiy Yuriy wrote Thomas Pedersen:
TP>>
TP>> GY>On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:
TP>> GY>
TP>> GY>ок, i try this and report results, can you also test on 2.4 Ggz? as i
TP>> GY>understand ch 149 is 802.11a? or this make no sense here?
TP>>
TP>> and also signal level make sense in my case, i just remove antennas from one of the nodes
TP>> in range of 3 meters, it works only with pigtails, signal drops to -70 - -80 dbm, and it's
TP>> triggers filed to stop tx dma immediatlly.
TP>
TP>Are you talking about a different bug?

Hm, may bee, but according to Chaoxing Lin emails there is several bugs which cause 
performance degradation in 802.11s mode, and symptoms in my case indentical, i get same results
as Chaoxing Lin, and seems same throbles, i will make tests what you want anyway and report 
results.

C уважением                       With Best Regards
Георгиевский Юрий.                Georgiewskiy Yuriy
+7 4872 711666                    +7 4872 711666
факс +7 4872 711143               fax +7 4872 711143
Компания ООО "Ай Ти Сервис"       IT Service Ltd
http://nkoort.ru                  http://nkoort.ru
JID: GHhost@icf.org.ru            JID: GHhost@icf.org.ru
YG129-RIPE                        YG129-RIPE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-03 14:45 ` Georgiewskiy Yuriy
  2012-12-03 14:56   ` Chaoxing Lin
  2012-12-08  3:17   ` Thomas Pedersen
@ 2012-12-10 15:11   ` Chaoxing Lin
  2 siblings, 0 replies; 25+ messages in thread
From: Chaoxing Lin @ 2012-12-10 15:11 UTC (permalink / raw)
  To: Georgiewskiy Yuriy; +Cc: linux-wireless@vger.kernel.org



-----Original Message-----
From: Georgiewskiy Yuriy [mailto:bottleman@icf.org.ru] 
Sent: Monday, December 03, 2012 9:45 AM
To: Chaoxing Lin
Cc: linux-wireless@vger.kernel.org
Subject: RE: help: 802.11s bad performance with 802.11n enabled

On 2012-12-03 14:37 -0000, Chaoxing Lin wrote linux-wireless@vger.kernel.org:

CL>After a lot of experiments, here are various problems observed.
CL>
CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
CL>Is anyone looking at this issue? This issue is now very easy to recreate.

GY>>In my case it much more than 3%.

When I say 3% loss due to "Tx DMA", it's measured by eliminate other factors as much as possible (turning off aggregation, etc.) Below is the number from 3-day continuous test. During this 3-day test, "Fail to stop Tx DMA" happens once a while on various nodes.



                CLIN network activity/stability monitoring system

       Total targets: 6 Total instability monitored: 3221

192.168.5.103    ICMP Tx 5899087 Rx 5883235 Seq=846  OutOfOrder 427 Pkt loss 15852(0.27%) RTT min/avg/max = 55.0/57.9/11580.0 ms
192.168.5.104    ICMP Tx 5899087 Rx 5721023 Seq=846  OutOfOrder 121213 Pkt loss 178064(3.02%) RTT min/avg/max = 54.0/57.7/9032.0 ms
192.168.5.111    ICMP Tx 5899087 Rx 5726094 Seq=846  OutOfOrder 164950 Pkt loss 172993(2.93%) RTT min/avg/max = 54.0/59.0/9421.0 ms
192.168.5.113    ICMP Tx 5899087 Rx 5894984 Seq=846  OutOfOrder 686 Pkt loss 4103(0.07%) RTT min/avg/max = 54.0/58.0/11524.0 ms
192.168.5.115    ICMP Tx 5899087 Rx 5869967 Seq=846  OutOfOrder 66782 Pkt loss 29120(0.49%) RTT min/avg/max = 54.0/67.7/11801.0 ms
192.168.5.147    ICMP Tx 5899087 Rx 5899086 Seq=846  OutOfOrder 0 Pkt loss   1(0.00%) RTT min/avg/max = 0.0/54.0/110.0 ms



Bad Packets: 0 short pkt, 3217664 not-my-echo, 0 not-echo-reply, 0 unknown sender
  Application Starts: Thu Dec  6 13:59:58 2012
        Current Time: Mon Dec 10 08:50:34 2012


Notes: 
1. ignore the big numbers in "outoforder" column. When ICMP sequence number is about to overflow (65535), if the last few packets (e.g. ICMP sn.65535) get lost, all packets after overflow will be counted as "outoforder".
2. if anyone is interested in the test tool I used in this test, it's here
http://sites.google.com/site/ebaylinkan5709pictures/files-to-share/clinmonitor.gz
It's an executable running on any 32-bit Linux.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-08  3:17   ` Thomas Pedersen
  2012-12-08  3:23     ` Georgiewskiy Yuriy
@ 2012-12-10 15:23     ` Chaoxing Lin
  1 sibling, 0 replies; 25+ messages in thread
From: Chaoxing Lin @ 2012-12-10 15:23 UTC (permalink / raw)
  To: Thomas Pedersen, Georgiewskiy Yuriy
  Cc: linux-wireless@vger.kernel.org, open11s

Thanks Thomas.

TP> With wireless-testing HEAD (671c924) I made the following observations with 3 nodes in a mesh using ch. 149 HT20 on AR9280.

3 nodes may not be enough to see the problem.

TP> 1. ping -i0.1 does not cause aggregation to take place, and losses are 0% 

My tool ping each node (total 7 nodes) at about 10~15 packet/second. Maybe in theory, it should not aggregate.
The fact is with/without aggregation, there is a huge difference on ping loss.

If you are interested in my test tool, it's here.
http://sites.google.com/site/ebaylinkan5709pictures/files-to-share/clinmonitor.gz
It's an executable running on any 32-bit Linux.
I lost the source code for this tool. Only found the executable in my old machine.


TP> 2. a UDP iperf test with two nodes generating traffic shows losses around 1%. We can observe aggregation taking place in this case.

For all the kernel versions I have tested, I did not see a problem with two node 802.11s network.
Before the stability test, I did fairly extensive on two-node throughput tests and did not any problem on overnight test. 150 ~ 220 Mbps TCP throughput (varied on different atheros 11n chipsets)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-08  3:47           ` Georgiewskiy Yuriy
@ 2012-12-10 15:48             ` Chaoxing Lin
  2012-12-10 19:13               ` Thomas Pedersen
  0 siblings, 1 reply; 25+ messages in thread
From: Chaoxing Lin @ 2012-12-10 15:48 UTC (permalink / raw)
  To: Georgiewskiy Yuriy, Thomas Pedersen
  Cc: linux-wireless@vger.kernel.org, open11s

VFA+DQpUUD5BcmUgeW91IHRhbGtpbmcgYWJvdXQgYSBkaWZmZXJlbnQgYnVnPw0KDQpHWT4gSG0s
IG1heSBiZWUsIGJ1dCBhY2NvcmRpbmcgdG8gQ2hhb3hpbmcgTGluIGVtYWlscyB0aGVyZSBpcyBz
ZXZlcmFsIGJ1Z3Mgd2hpY2ggY2F1c2UgcGVyZm9ybWFuY2UgZGVncmFkYXRpb24gaW4gODAyLjEx
cyBtb2RlLCBhbmQgc3ltcHRvbXMgaW4gbXkgY2FzZSBpbmRlbnRpY2FsLCBpIGdldCBzYW1lIHJl
c3VsdHMgYXMgQ2hhb3hpbmcgTGluLCBhbmQgc2VlbXMgc2FtZSB0aHJvYmxlcywgaSB3aWxsIG1h
a2UgdGVzdHMgd2hhdCB5b3Ugd2FudCBhbnl3YXkgYW5kIHJlcG9ydCByZXN1bHRzLg0KDQpGb3Ig
ZWFzeSByZWZlcmVuY2UsIEkgc3VtbWFyaXplIHRoZSA0IHByb2JsZW1zIEkgdW5jb3ZlcmVkIHNv
IGZhciB0aGF0IGNvbnRyaWJ1dGUgdG8gaW4tc3RhYmlsaXR5IG9mIDctbm9kZSA4MDIuMTFzIG5l
dHdvcmsuDQoNCjEuIGF0aDlrICJUeCBETUEgZXJyb3IiLiBQaW5nIHBhY2tldCBsb3NzIGlzIHNl
ZW4gZWFjaCB0aW1lICJGYWlsIHRvIHN0b3AgVHggRE1BIiBsb2cgaXMgc2Vlbi4gIEl0J3MgTk9U
IHRoZSBtYWluIGNhdXNlLg0KDQoyLiBhdXRoc2FlIG9yIDgwMi4xMXMga2VybmVsIHByb2JsZW06
IFRoZSB0d28gZW5kcyBvZiBhIHBlZXIgbGluayBnZXQgb3V0IG9mIHN5bmMgZm9yIHdoYXRldmVy
IHJlYXNvbi4gT25lIGVuZCBzYXlzLCB0aGUgcGVlciBsaW5rIGlzICJFU1RBQiIgYW5kIGFsbCAz
IGtleXMgYXJlIGluIHBsYWNlLiBXaGlsZSB0aGUgb3RoZXIgZW5kIHNheXMgdGhpcyBwZWVyIGxp
bmsgaXMgbm90ICJFU1RBQiIsIG5vIGtleXMgaW5zdGFsbGVkIGZvciB0aGUgcGVlci4NCg0KMy4g
QUVTLUNDTSBwYWlyd2lzZSBrZXkgc29tZXRpbWVzIGNvbXBsYWlucyBwYWNrZXQgcmVwbGF5IHNv
IHBpbmcgcGFja2V0cyBhcmUgZHJvcHBlZC4gQSBrZXJuZWwga2V5IGR1bXAgaW4gdGhpcyBlcnJv
ciBjYXNlIGlzIGJlbG93LiAoSSBvdmVyd3JvdGUga2V5X2tleV9yZWFkKCkgZnVuY3Rpb24gaW4g
ZGVidWdmc19rZXkuYyB0byBkdW1wIGFsbCBpbmZvKQ0KDQoJS2V5IDM2MjogICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
IA0KCTB4Y2YzOTM4MDAgQUVTLUNDTSBLZXk6IDQ5MzA1YTczNmE4YjZkNWZjYjM0MDU3ZWU2OTgz
ZDQ0ICAgUGFpcndpc2UNCglQZWVyIE1BQzogMDA6MGU6OGU6Mzg6MzY6MDMgIA0KCXR4X3BuOiAw
MDAwMDAwMDAwMDAwMDlmICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICANCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQogICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgDQoJcnhfcG5bIDBdOiAwMDAwMDAwZDc4OGIgIHJ4X3BuWyAxXTogMDAwMDAwMDAwMDAwICBy
eF9wblsgMl06IDAwMDAwMDAwMDAwMCANCglyeF9wblsgM106IDAwMDAwMDAwMDAwMCAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQoJcnhfcG5bIDRd
OiAwMDAwMDAwMDAwMDAgIHJ4X3BuWyA1XTogMDAwMDAwMDAwMDAwICByeF9wblsgNl06IDAwMDAw
MDAwMDAwMCANCglyeF9wblsgN106IDAwMDAwMDAwMDAwMCAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQoJcnhfcG5bIDhdOiAwMDAwMDAwMDAwMDAg
IHJ4X3BuWyA5XTogMDAwMDAwMDAwMDAwICByeF9wblsxMF06IDAwMDAwMDAwMDAwMCANCglyeF9w
blsxMV06IDAwMDAwMDAwMDAwMCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgDQoJcnhfcG5bMTJdOiAwMDAwMDAwMDAwMDAgIHJ4X3BuWzEzXTogMDAw
MDAwMDAwMDAwICByeF9wblsxNF06IDAwMDAwMDAwMDAwMCANCglyeF9wblsxNV06IDAwMDAwMDAw
MDAwMCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
DQoJcnhfcG5bMTZdOiAwMDAwMDAwMDM1ODAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgIA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICANCglyZXBsYXlzOiAxMTk3
MCBpY3ZlcnJvcjogIDw9PT09PT09PT09PT09PT09PT09PT09PXByb2JsZW0gaGVyZT09PT09PT09
PT09DQoNClRoZSB3b3JzZSB0aGluZyBmb3IgcHJvYmxlbSAyIGFuZCAzIGFib3ZlIGlzLCB3aGVu
IGl0IGdldHMgaW50byB0aGlzIHN0YXRlLCB0aGUgbXBhdGggc3RpbGwgc3RheXMgYWN0aXZlLiBT
byBhbGwgcGFja2V0cyBhcmUgc3RpbGwgcm91dGVkIHRvIHRoZSBiYWQgcGVlciBsaW5rL21wYXRo
IGFuZCB3aWxsIGJlIGRyb3BwZWQgYnkgcGVlci4NCg0KNC4gODAyLjExbiBwYWNrZXQgYWdncmVn
YXRpb24uIEkgYmVsaWV2ZSB0aGlzIGlzIHRoZSBtYWluIHByb2JsZW0gYnkgdGhlIGZhY3QgdGhh
dCwgZGlzYWJsaW5nIDgwMi4xMW4gcGFja2V0IGFnZ3JlZ2F0aW9uIGluIGF0aDlrIGRyaXZlciB3
aWxsIG1ha2UgdGhlIG5ldHdvcmsgc3RhYmxlIGFuZCBwcm9ibGVtIDIgYW5kIDMgYXJlIG5vdCBz
ZWVuLiBJbiBvdGhlciB3b3JkcywgcHJvYmxlbSAyIGFuZCAzIG1heSBiZSBjYXVzZWQgYnkgYWdn
cmVnYXRpb24gKG15IGltYWdpbmF0aW9uLCBhZ2dyZWdhdGlvbiBjYXVzZWQgY2VydGFpbiBlcnJv
ciBjb25kaXRpb24gdGhhdCBpcyBub3QgaGFuZGxlZCBwcm9wZXJseSwgd2hpY2ggdHJpZ2dlcnMg
cHJvYmxlbSAyIGFuZCAzKSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgDQogICAgICAgICAgDQo=

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: help: 802.11s bad performance with 802.11n enabled
  2012-12-10 15:48             ` Chaoxing Lin
@ 2012-12-10 19:13               ` Thomas Pedersen
  2012-12-10 19:28                 ` Chaoxing Lin
  2013-01-17 16:14                 ` Chaoxing Lin
  0 siblings, 2 replies; 25+ messages in thread
From: Thomas Pedersen @ 2012-12-10 19:13 UTC (permalink / raw)
  To: Chaoxing Lin; +Cc: Georgiewskiy Yuriy, linux-wireless@vger.kernel.org, open11s

On Mon, Dec 10, 2012 at 7:48 AM, Chaoxing Lin
<Chaoxing.Lin@ultra-3eti.com> wrote:
> TP>
> TP>Are you talking about a different bug?
>
> GY> Hm, may bee, but according to Chaoxing Lin emails there is several bugs which cause performance degradation in 802.11s mode, and symptoms in my case indentical, i get same results as Chaoxing Lin, and seems same throbles, i will make tests what you want anyway and report results.
>
> For easy reference, I summarize the 4 problems I uncovered so far that contribute to in-stability of 7-node 802.11s network.
>
> 1. ath9k "Tx DMA error". Ping packet loss is seen each time "Fail to stop Tx DMA" log is seen.  It's NOT the main cause.
>
> 2. authsae or 802.11s kernel problem: The two ends of a peer link get out of sync for whatever reason. One end says, the peer link is "ESTAB" and all 3 keys are in place. While the other end says this peer link is not "ESTAB", no keys installed for the peer.

We recently applied
https://github.com/cozybit/authsae/commit/0e5c65c3f773db820d6cee7b365cd4a70181c72d
which may fix your issue.

> 3. AES-CCM pairwise key sometimes complains packet replay so ping packets are dropped. A kernel key dump in this error case is below. (I overwrote key_key_read() function in debugfs_key.c to dump all info)
>
>         Key 362:
>         0xcf393800 AES-CCM Key: 49305a736a8b6d5fcb34057ee6983d44   Pairwise
>         Peer MAC: 00:0e:8e:38:36:03
>         tx_pn: 000000000000009f
>
>
>         rx_pn[ 0]: 0000000d788b  rx_pn[ 1]: 000000000000  rx_pn[ 2]: 000000000000
>         rx_pn[ 3]: 000000000000
>         rx_pn[ 4]: 000000000000  rx_pn[ 5]: 000000000000  rx_pn[ 6]: 000000000000
>         rx_pn[ 7]: 000000000000
>         rx_pn[ 8]: 000000000000  rx_pn[ 9]: 000000000000  rx_pn[10]: 000000000000
>         rx_pn[11]: 000000000000
>         rx_pn[12]: 000000000000  rx_pn[13]: 000000000000  rx_pn[14]: 000000000000
>         rx_pn[15]: 000000000000
>         rx_pn[16]: 000000003580
>
>         replays: 11970 icverror:  <=======================problem here===========
>
> The worse thing for problem 2 and 3 above is, when it gets into this state, the mpath still stays active. So all packets are still routed to the bad peer link/mpath and will be dropped by peer.

ok. Patches are welcome.

> 4. 802.11n packet aggregation. I believe this is the main problem by the fact that, disabling 802.11n packet aggregation in ath9k driver will make the network stable and problem 2 and 3 are not seen. In other words, problem 2 and 3 may be caused by aggregation (my imagination, aggregation caused certain error condition that is not handled properly, which triggers problem 2 and 3)

And to reproduce you run a simultaneous ping from one node to ~6
others? It will take me a few days to find time to reproduce this, so
any interesting observations you can offer in the mean time would be
helpful.

Thanks,
Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-10 19:13               ` Thomas Pedersen
@ 2012-12-10 19:28                 ` Chaoxing Lin
  2013-01-17 16:14                 ` Chaoxing Lin
  1 sibling, 0 replies; 25+ messages in thread
From: Chaoxing Lin @ 2012-12-10 19:28 UTC (permalink / raw)
  To: Thomas Pedersen
  Cc: Georgiewskiy Yuriy, linux-wireless@vger.kernel.org, open11s


> 4. 802.11n packet aggregation. I believe this is the main problem by 
> the fact that, disabling 802.11n packet aggregation in ath9k driver 
> will make the network stable and problem 2 and 3 are not seen. In 
> other words, problem 2 and 3 may be caused by aggregation (my 
> imagination, aggregation caused certain error condition that is not 
> handled properly, which triggers problem 2 and 3)

TP> And to reproduce you run a simultaneous ping from one node to ~6 others? It will take me a few days to find time to reproduce this, so any interesting observations you can offer in the mean time would be helpful.

Yes, I run simultaneous ping from one node to all other 6 nodes.

No, when 802.11n is enabled, the ping loss is seen fairly fast (in a few minutes). Problem 2 and problem 3 is not that predictable. But once it's in that state, it stucks there and give me enough time to troubleshoot.

I post test result of test running a few days just to show that disabling 802.11n really make the network stable, instead of "stable by chance in a short period.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: help: 802.11s bad performance with 802.11n enabled
  2012-12-10 19:13               ` Thomas Pedersen
  2012-12-10 19:28                 ` Chaoxing Lin
@ 2013-01-17 16:14                 ` Chaoxing Lin
  1 sibling, 0 replies; 25+ messages in thread
From: Chaoxing Lin @ 2013-01-17 16:14 UTC (permalink / raw)
  To: Thomas Pedersen
  Cc: Georgiewskiy Yuriy, linux-wireless@vger.kernel.org, open11s


TP> We recently applied
TP> https://github.com/cozybit/authsae/commit/0e5c65c3f773db820d6cee7b365cd4a70181c72d
which may fix your issue.

All, I just find that the patch above introduce a segmentation fault.

Below is the patch content. Look at line 970, "cand->state" would dereference a NULL pointer because the "if"  statement makes sure "cand" is NULL.



if ((cand = find_peer(mgmt->sa, 0)) == NULL) {

968 	  	

-    sae_debug(AMPE_DEBUG_FSM, "Mesh plink: plink open from unauthed peer\n");

  	967 	

+    /* "1" here means only get peers in SAE_ACCEPTED */

  	968 	

+    if ((cand = find_peer(mgmt->sa, 1)) == NULL) {

  	969 	

+    sae_debug(AMPE_DEBUG_FSM, "Mesh plink: plink open from unauthed peer "MACSTR" state=%d\n",

  	970 	

+                  MAC2STR(mgmt->sa), cand->state);

969 	971 	

         return 0;

970 	972 	

     }



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-01-17 16:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-03 14:37 help: 802.11s bad performance with 802.11n enabled Chaoxing Lin
2012-12-03 14:45 ` Georgiewskiy Yuriy
2012-12-03 14:56   ` Chaoxing Lin
2012-12-03 15:43     ` Georgiewskiy Yuriy
2012-12-03 15:47       ` Chaoxing Lin
2012-12-03 18:21       ` Paul Stoaks
2012-12-03 18:33         ` Georgiewskiy Yuriy
2012-12-03 19:02         ` Chaoxing Lin
2012-12-08  3:17   ` Thomas Pedersen
2012-12-08  3:23     ` Georgiewskiy Yuriy
2012-12-08  3:29       ` Georgiewskiy Yuriy
2012-12-08  3:37         ` Thomas Pedersen
2012-12-08  3:47           ` Georgiewskiy Yuriy
2012-12-10 15:48             ` Chaoxing Lin
2012-12-10 19:13               ` Thomas Pedersen
2012-12-10 19:28                 ` Chaoxing Lin
2013-01-17 16:14                 ` Chaoxing Lin
2012-12-08  3:37       ` Thomas Pedersen
2012-12-10 15:23     ` Chaoxing Lin
2012-12-10 15:11   ` Chaoxing Lin
2012-12-04  4:35 ` Thomas Pedersen
2012-12-04  8:03   ` Adrian Chadd
  -- strict thread matches above, loose matches on Subject: below --
2012-12-03 16:33 Chaoxing Lin
2012-11-17  9:20 Yeoh Chun-Yeow
2012-11-16 17:41 Chaoxing Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).