netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
       [not found]     ` <tkrat.276aeae22ec60090@s5r6.in-berlin.de>
@ 2010-11-07  3:26       ` Maxim Levitsky
  2010-11-07  3:32         ` Maxim Levitsky
  2010-11-07 11:10         ` Stefan Richter
  0 siblings, 2 replies; 7+ messages in thread
From: Maxim Levitsky @ 2010-11-07  3:26 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, netdev@vger.kernel.org

On Sun, 2010-11-07 at 00:23 +0100, Stefan Richter wrote:
> On  6 Nov, Stefan Richter wrote:
> > Then I tried an XIO2213A card in the AMD PC (again the Intel PC as peer)
> > and got 243 times "failed: 12" i.e. RCODE_BUSY and 81 times "failed: 10"
> > i.e. RCODE_SEND_ERROR during ftp transfer of a >500 MB large file from
> > XIO2213A to FW323.
> 

I also am getting strange results (but very good compared to what I had
recently).

With all your patches, I get very stable TCP and UDP streams from laptop
to desktop at 180~190 Mbits/s.

However, the opposite direction (desktop->laptop) still suffers from
tlabel exhaustion.
I added some printks, and I see, clearly that netif_stop_queue doesn't
always work (probably this is intended?).

If I replace == with >= in inc_queue_packets and similar in
dec_queued_packets, then tlabel exhaustion disappears, and I get ~240
Mbit/s on TCP and UDP.

UDP transfers work quite well, tested for few minutes.
TCP transfers unfortunelly trigger (probably a hardware) bug in notebook
OHCI controller (I have seen that meny times so far.)

Transfer just stops, and controller goes south.
If I unload the firewire-ohci, then when I load it:

[ 2062.632532] firewire_ohci 0000:07:00.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 2072.650173] firewire_ohci: Failed to reset ohci card.
[ 2072.650267] firewire_ohci 0000:07:00.0: PCI INT A disabled
[ 2072.650314] firewire_ohci: probe of 0000:07:00.0 failed with error -16


Only suspend to ram helps bring it back from that state.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
  2010-11-07  3:26       ` [PATCH] firewire: net: rate-limit log spam at transmit failure Maxim Levitsky
@ 2010-11-07  3:32         ` Maxim Levitsky
  2010-11-07 12:03           ` Stefan Richter
  2010-11-07 11:10         ` Stefan Richter
  1 sibling, 1 reply; 7+ messages in thread
From: Maxim Levitsky @ 2010-11-07  3:32 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1979 bytes --]

On Sun, 2010-11-07 at 05:26 +0200, Maxim Levitsky wrote:
> On Sun, 2010-11-07 at 00:23 +0100, Stefan Richter wrote:
> > On  6 Nov, Stefan Richter wrote:
> > > Then I tried an XIO2213A card in the AMD PC (again the Intel PC as peer)
> > > and got 243 times "failed: 12" i.e. RCODE_BUSY and 81 times "failed: 10"
> > > i.e. RCODE_SEND_ERROR during ftp transfer of a >500 MB large file from
> > > XIO2213A to FW323.
> > 
> 
> I also am getting strange results (but very good compared to what I had
> recently).
> 
> With all your patches, I get very stable TCP and UDP streams from laptop
> to desktop at 180~190 Mbits/s.
> 
> However, the opposite direction (desktop->laptop) still suffers from
> tlabel exhaustion.
> I added some printks, and I see, clearly that netif_stop_queue doesn't
> always work (probably this is intended?).
> 
> If I replace == with >= in inc_queue_packets and similar in
> dec_queued_packets, then tlabel exhaustion disappears, and I get ~240
> Mbit/s on TCP and UDP.
> 
> UDP transfers work quite well, tested for few minutes.
> TCP transfers unfortunelly trigger (probably a hardware) bug in notebook
> OHCI controller (I have seen that meny times so far.)
> 
> Transfer just stops, and controller goes south.
> If I unload the firewire-ohci, then when I load it:
> 
> [ 2062.632532] firewire_ohci 0000:07:00.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
> [ 2072.650173] firewire_ohci: Failed to reset ohci card.
> [ 2072.650267] firewire_ohci 0000:07:00.0: PCI INT A disabled
> [ 2072.650314] firewire_ohci: probe of 0000:07:00.0 failed with error -16
> 
> 
> Only suspend to ram helps bring it back from that state.
> 
> Best regards,
> 	Maxim Levitsky
> 

To be clear, I use patches I attached on top of today mainline.
I have here my own hack to set the transaction timeout, because
update_split_timeout isn't called here at all, because nobody writes
to corresponding CSR registers.
Maybe I miss something though.

Best regards,
	Maxim Levitsky


[-- Attachment #2: patches.tar.bz2 --]
[-- Type: application/x-bzip-compressed-tar, Size: 4498 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
  2010-11-07  3:26       ` [PATCH] firewire: net: rate-limit log spam at transmit failure Maxim Levitsky
  2010-11-07  3:32         ` Maxim Levitsky
@ 2010-11-07 11:10         ` Stefan Richter
  2010-11-07 12:04           ` Maxim Levitsky
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Richter @ 2010-11-07 11:10 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: netdev@vger.kernel.org, linux1394-devel

Maxim Levitsky wrote:
> On Sun, 2010-11-07 at 00:23 +0100, Stefan Richter wrote:
>> On  6 Nov, Stefan Richter wrote:
>>> Then I tried an XIO2213A card in the AMD PC (again the Intel PC as peer)
>>> and got 243 times "failed: 12" i.e. RCODE_BUSY and 81 times "failed: 10"
>>> i.e. RCODE_SEND_ERROR during ftp transfer of a >500 MB large file from
>>> XIO2213A to FW323.
> 
> I also am getting strange results (but very good compared to what I had
> recently).
> 
> With all your patches, I get very stable TCP and UDP streams from laptop
> to desktop at 180~190 Mbits/s.
> 
> However, the opposite direction (desktop->laptop) still suffers from
> tlabel exhaustion.
> I added some printks, and I see, clearly that netif_stop_queue doesn't
> always work (probably this is intended?).
> 
> If I replace == with >= in inc_queue_packets and similar in
> dec_queued_packets, then tlabel exhaustion disappears, and I get ~240
> Mbit/s on TCP and UDP.

Remind me, is this FireWire 800?  And what controllers in particular?  I get
about half of your numbers with FireWire 400 connections.

The == vs. >= is a good hint.  If .ndo_start_xmit can be entered by multiple
CPUs, the upper limit will clearly exceeded eventually.

With >= instead of ==, the same test as that quoted above gives 71x RCODE_BUSY
+ 0x RCODE_SEND_ERROR, and 59x RCODE_BUSY + 0x RCODE_SEND_ERROR in a
repetition.  (0x + 0x in the other direction.)  There were no RCODE_CANCELLED
occurrences, which I had occasionally in the past.

I then tried

	if (dev->queued_packets >= FWNET_MAX_QUEUED_PACKETS)
		return NETDEV_TX_BUSY;

at the top of fwnet_tx but it did not change the amount of RCODE_BUSY, which
is not too surprising.  So next I should have a look at the responder side again.

BTW, FireWire 400 CardBus controllers usually feature a limitation of max_rec
= 1024 (maximum size of asynchronous packets they can receive).  Incidentally,
the VT6306 card that I used in my other tests from yesterday is one of those.
So, since link fragmentation is quite common due to this kind of cards, I
should perhaps count queued fragments instead of queued datagrams.

> UDP transfers work quite well, tested for few minutes.
> TCP transfers unfortunelly trigger (probably a hardware) bug in notebook
> OHCI controller (I have seen that meny times so far.)
> 
> Transfer just stops, and controller goes south.
> If I unload the firewire-ohci, then when I load it:
> 
> [ 2062.632532] firewire_ohci 0000:07:00.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
> [ 2072.650173] firewire_ohci: Failed to reset ohci card.
> [ 2072.650267] firewire_ohci 0000:07:00.0: PCI INT A disabled
> [ 2072.650314] firewire_ohci: probe of 0000:07:00.0 failed with error -16
> 
> 
> Only suspend to ram helps bring it back from that state.

On the bright side, s2ram fixes things for once instead of breaking them...
-- 
Stefan Richter
-=====-==-=- =-== --===
http://arcgraph.de/sr/

------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a 
Billion" shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
  2010-11-07  3:32         ` Maxim Levitsky
@ 2010-11-07 12:03           ` Stefan Richter
  2010-11-08  1:41             ` Maxim Levitsky
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Richter @ 2010-11-07 12:03 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: linux1394-devel, netdev@vger.kernel.org

Maxim Levitsky wrote:
> I have here my own hack to set the transaction timeout,

You can use firecontrol to set it on-the-fly.  To set it on node ffc2 i.e. the
node with phy ID 2, and controller 1 (i.e. libraw1394 "port" 1):
# echo "w . 2 0xfffff0000018 4 3" | firecontrol 1

This sets the whole-seconds part to 3, which gives you 3.1 seconds timeout if
the fractional part at 0xfffff000001c is still at its default value of 0.1
seconds, i.e. 800 << 19 (subsecond part in units of (1/8000)s shifted by 19).

# { echo "r . 2 0xfffff0000018 4"; echo "r . 2 0xfffff000001c 4"; } |
firecontrol 1

displays the current register value on node ffc2 at port 1.  Type "help" in
firecontrol for more available firecontrol commands.
-- 
Stefan Richter
-=====-==-=- =-== --===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
  2010-11-07 11:10         ` Stefan Richter
@ 2010-11-07 12:04           ` Maxim Levitsky
  0 siblings, 0 replies; 7+ messages in thread
From: Maxim Levitsky @ 2010-11-07 12:04 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, netdev@vger.kernel.org

On Sun, 2010-11-07 at 12:10 +0100, Stefan Richter wrote:
> Maxim Levitsky wrote:
> > On Sun, 2010-11-07 at 00:23 +0100, Stefan Richter wrote:
> >> On  6 Nov, Stefan Richter wrote:
> >>> Then I tried an XIO2213A card in the AMD PC (again the Intel PC as peer)
> >>> and got 243 times "failed: 12" i.e. RCODE_BUSY and 81 times "failed: 10"
> >>> i.e. RCODE_SEND_ERROR during ftp transfer of a >500 MB large file from
> >>> XIO2213A to FW323.
> > 
> > I also am getting strange results (but very good compared to what I had
> > recently).
> > 
> > With all your patches, I get very stable TCP and UDP streams from laptop
> > to desktop at 180~190 Mbits/s.
> > 
> > However, the opposite direction (desktop->laptop) still suffers from
> > tlabel exhaustion.
> > I added some printks, and I see, clearly that netif_stop_queue doesn't
> > always work (probably this is intended?).
> > 
> > If I replace == with >= in inc_queue_packets and similar in
> > dec_queued_packets, then tlabel exhaustion disappears, and I get ~240
> > Mbit/s on TCP and UDP.
> 
> Remind me, is this FireWire 800?  And what controllers in particular?  I get
> about half of your numbers with FireWire 400 connections.
Firewire 400.

Laptop: Ricoh R5C832
Desktop: TI TSB43AB22A


> 
> The == vs. >= is a good hint.  If .ndo_start_xmit can be entered by multiple
> CPUs, the upper limit will clearly exceeded eventually.
> 
> With >= instead of ==, the same test as that quoted above gives 71x RCODE_BUSY
> + 0x RCODE_SEND_ERROR, and 59x RCODE_BUSY + 0x RCODE_SEND_ERROR in a
> repetition.  (0x + 0x in the other direction.)  There were no RCODE_CANCELLED
> occurrences, which I had occasionally in the past.
RCODE_CANCELLED is result of timeout timer which is set to run with 0
jiffies in the future.

> 
> I then tried
> 
> 	if (dev->queued_packets >= FWNET_MAX_QUEUED_PACKETS)
> 		return NETDEV_TX_BUSY;
> 
> at the top of fwnet_tx but it did not change the amount of RCODE_BUSY, which
> is not too surprising.  So next I should have a look at the responder side again.
I see RCODE_BUSY sometimes too.

> 
> BTW, FireWire 400 CardBus controllers usually feature a limitation of max_rec
> = 1024 (maximum size of asynchronous packets they can receive).  Incidentally,
> the VT6306 card that I used in my other tests from yesterday is one of those.
> So, since link fragmentation is quite common due to this kind of cards, I
> should perhaps count queued fragments instead of queued datagrams.
> 
> > UDP transfers work quite well, tested for few minutes.
> > TCP transfers unfortunelly trigger (probably a hardware) bug in notebook
> > OHCI controller (I have seen that meny times so far.)
> > 
> > Transfer just stops, and controller goes south.
> > If I unload the firewire-ohci, then when I load it:
> > 
> > [ 2062.632532] firewire_ohci 0000:07:00.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
> > [ 2072.650173] firewire_ohci: Failed to reset ohci card.
> > [ 2072.650267] firewire_ohci 0000:07:00.0: PCI INT A disabled
> > [ 2072.650314] firewire_ohci: probe of 0000:07:00.0 failed with error -16
> > 
> > 
> > Only suspend to ram helps bring it back from that state.
> 
> On the bright side, s2ram fixes things for once instead of breaking them...
Well, of course after s2ram I have to reload the firewire-ohci to make
it work, but that is separate issue.

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
  2010-11-07 12:03           ` Stefan Richter
@ 2010-11-08  1:41             ` Maxim Levitsky
  2010-11-08  8:12               ` Stefan Richter
  0 siblings, 1 reply; 7+ messages in thread
From: Maxim Levitsky @ 2010-11-08  1:41 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, netdev@vger.kernel.org

On Sun, 2010-11-07 at 13:03 +0100, Stefan Richter wrote:
> Maxim Levitsky wrote:
> > I have here my own hack to set the transaction timeout,
> 
> You can use firecontrol to set it on-the-fly.  To set it on node ffc2 i.e. the
> node with phy ID 2, and controller 1 (i.e. libraw1394 "port" 1):
> # echo "w . 2 0xfffff0000018 4 3" | firecontrol 1
> 
> This sets the whole-seconds part to 3, which gives you 3.1 seconds timeout if
> the fractional part at 0xfffff000001c is still at its default value of 0.1
> seconds, i.e. 800 << 19 (subsecond part in units of (1/8000)s shifted by 19).
> 
> # { echo "r . 2 0xfffff0000018 4"; echo "r . 2 0xfffff000001c 4"; } |
> firecontrol 1
> 
> displays the current register value on node ffc2 at port 1.  Type "help" in
> firecontrol for more available firecontrol commands.
Agreed.

But why the timeout is  never set?
What is the default?

I think that timeout_jiffies is never initialized, thus, it is 0 due to
kzalloc.

Also, note that I see here that if I send a TCP stream from one system
to another then the system that recieves the packets (and sends TCP
acks), still overflows the queue (error 10, and confirmed by printks).

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
  2010-11-08  1:41             ` Maxim Levitsky
@ 2010-11-08  8:12               ` Stefan Richter
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Richter @ 2010-11-08  8:12 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: linux1394-devel, netdev

Maxim Levitsky wrote:
> But why the timeout is  never set?

It is set to the default 0.1s per IEEE 1394 in core-card.c::fw_card_initialize.

If card->split_timeout_jiffies or card->split_timeout_cycles /ever/ become
zero, then only due to a memory corrupting bug.

> Also, note that I see here that if I send a TCP stream from one system
> to another then the system that recieves the packets (and sends TCP
> acks), still overflows the queue (error 10, and confirmed by printks).

OK, I'll send a stricter version of "firewire: net: throttle TX queue before
running out of tlabels".
-- 
Stefan Richter
-=====-==-=- =-== -=---
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-08  8:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <tkrat.01ca17fba0508ae0@s5r6.in-berlin.de>
     [not found] ` <tkrat.1b9925fa1d199c23@s5r6.in-berlin.de>
     [not found]   ` <tkrat.18b9f67ac78dcbea@s5r6.in-berlin.de>
     [not found]     ` <tkrat.276aeae22ec60090@s5r6.in-berlin.de>
2010-11-07  3:26       ` [PATCH] firewire: net: rate-limit log spam at transmit failure Maxim Levitsky
2010-11-07  3:32         ` Maxim Levitsky
2010-11-07 12:03           ` Stefan Richter
2010-11-08  1:41             ` Maxim Levitsky
2010-11-08  8:12               ` Stefan Richter
2010-11-07 11:10         ` Stefan Richter
2010-11-07 12:04           ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).