From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Richter Subject: Re: [PATCH] firewire: net: rate-limit log spam at transmit failure Date: Sun, 07 Nov 2010 12:10:29 +0100 Message-ID: <4CD68925.8080302@s5r6.in-berlin.de> References: <1289100404.3277.28.camel@maxim-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: "netdev@vger.kernel.org" , linux1394-devel@lists.sourceforge.net To: Maxim Levitsky Return-path: In-Reply-To: <1289100404.3277.28.camel@maxim-laptop> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux1394-devel-bounces@lists.sourceforge.net List-Id: netdev.vger.kernel.org Maxim Levitsky wrote: > On Sun, 2010-11-07 at 00:23 +0100, Stefan Richter wrote: >> On 6 Nov, Stefan Richter wrote: >>> Then I tried an XIO2213A card in the AMD PC (again the Intel PC as peer) >>> and got 243 times "failed: 12" i.e. RCODE_BUSY and 81 times "failed: 10" >>> i.e. RCODE_SEND_ERROR during ftp transfer of a >500 MB large file from >>> XIO2213A to FW323. > > I also am getting strange results (but very good compared to what I had > recently). > > With all your patches, I get very stable TCP and UDP streams from laptop > to desktop at 180~190 Mbits/s. > > However, the opposite direction (desktop->laptop) still suffers from > tlabel exhaustion. > I added some printks, and I see, clearly that netif_stop_queue doesn't > always work (probably this is intended?). > > If I replace == with >= in inc_queue_packets and similar in > dec_queued_packets, then tlabel exhaustion disappears, and I get ~240 > Mbit/s on TCP and UDP. Remind me, is this FireWire 800? And what controllers in particular? I get about half of your numbers with FireWire 400 connections. The == vs. >= is a good hint. If .ndo_start_xmit can be entered by multiple CPUs, the upper limit will clearly exceeded eventually. With >= instead of ==, the same test as that quoted above gives 71x RCODE_BUSY + 0x RCODE_SEND_ERROR, and 59x RCODE_BUSY + 0x RCODE_SEND_ERROR in a repetition. (0x + 0x in the other direction.) There were no RCODE_CANCELLED occurrences, which I had occasionally in the past. I then tried if (dev->queued_packets >= FWNET_MAX_QUEUED_PACKETS) return NETDEV_TX_BUSY; at the top of fwnet_tx but it did not change the amount of RCODE_BUSY, which is not too surprising. So next I should have a look at the responder side again. BTW, FireWire 400 CardBus controllers usually feature a limitation of max_rec = 1024 (maximum size of asynchronous packets they can receive). Incidentally, the VT6306 card that I used in my other tests from yesterday is one of those. So, since link fragmentation is quite common due to this kind of cards, I should perhaps count queued fragments instead of queued datagrams. > UDP transfers work quite well, tested for few minutes. > TCP transfers unfortunelly trigger (probably a hardware) bug in notebook > OHCI controller (I have seen that meny times so far.) > > Transfer just stops, and controller goes south. > If I unload the firewire-ohci, then when I load it: > > [ 2062.632532] firewire_ohci 0000:07:00.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20 > [ 2072.650173] firewire_ohci: Failed to reset ohci card. > [ 2072.650267] firewire_ohci 0000:07:00.0: PCI INT A disabled > [ 2072.650314] firewire_ohci: probe of 0000:07:00.0 failed with error -16 > > > Only suspend to ram helps bring it back from that state. On the bright side, s2ram fixes things for once instead of breaking them... -- Stefan Richter -=====-==-=- =-== --=== http://arcgraph.de/sr/ ------------------------------------------------------------------------------ The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book "Blueprint to a Billion" shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev