From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Jander Subject: Re: [RESEND] [PATCH] net: CAN: at91_can.c: decrease likelyhood of RX overruns Date: Mon, 6 Oct 2014 13:39:42 +0200 Message-ID: <20141006133942.0f1c820b@archvile> References: <1403775686-19352-1-git-send-email-david@protonic.nl> <5282990.80afdvE4aW@ws-stein> <20141006112644.672440b2@archvile> <2417882.RxeThGvgsF@ws-stein> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: Received: from protonic.xs4all.nl ([83.163.252.89]:28864 "EHLO protonic.xs4all.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751316AbaJFLje (ORCPT ); Mon, 6 Oct 2014 07:39:34 -0400 In-Reply-To: <2417882.RxeThGvgsF@ws-stein> Sender: linux-can-owner@vger.kernel.org List-ID: To: Alexander Stein Cc: Marc Kleine-Budde , linux-can@vger.kernel.org, Wolfgang Grandegger , Oliver Hartkopp , "Hans J. Koch" On Mon, 06 Oct 2014 13:21:22 +0200 Alexander Stein wrote: > On Monday 06 October 2014 11:26:44, David Jander wrote: > > Alexander Stein wrote: > > > > > Hello David, > > > > > > On Friday 03 October 2014 11:01:41, David Jander wrote: > > > > On Thu, 02 Oct 2014 14:41:25 +0200 > > > > Alexander Stein wrote: > > > > > > > > > finally I got the chance to test your patch. I originally expected to > > > > > test it on a AT91SAM9263, but I did it now on a AT91SAM9X35. The > > > > > tests were done on a v3.17-rc7 kernel + a DT patch. If I only run my > > > > > CAN burst test without any other load on the ARM everything works > > > > > fine, on the unpatched kernel, with your patch and also with rx-fifo > > > > > branch of https://gitorious.org/linux-can/linux-can-next. When > > > > > running an iperf (client on PC) in parallel, the situation is as > > > > > follows: unpatched kernel: driver hangs after ~15s. No messages are > > > > > received again while the kernel is still running. your patch: 37346 > > > > > of 500000 msg lost rx-fifo: 36806 of 500000 msg lost > > > > > > > > Thanks a lot for taking the time to look at this. > > > > I just looked at the rx-fifo patch, but I still don't understand how > > > > it is supposed to improve the situation of this driver.... beats me. > > > > Nevertheless you just proved that it is at least as good as my patch. > > > > AFAIK, there is nothing that should work as well as off-loading the CAN > > > > controller in the IRQ handler by a far margin. But the rx-fifo patch > > > > does not do that, so it is hard for me to believe it is really that > > > > good. Could you repeat your test at a lower bitrate? The only thing I > > > > can think of is that 37000 out of 500000 messages the latency has > > > > spiked on your system, but that spike should be a lot more contained > > > > with my patch than with rx-fifo, so if I'm right, then lowering the > > > > bitrate we might see a situation in which rx-fifo still loses a > > > > message here and there, while my patch doesn't. Other than that, I am > > > > tempted to think my patch is simply broken. > > > > > > Ok, here is another test run (including iperf) at 250kBit/s. Did all > > > tests 3 times. plain: 0, 2, lockup > > > your patch: 0, 0, 0 > > > rx-fifo: 26, 0, 43 > > > > Ok, this confirms what I suspected... latency-peaks are more contained when > > emptying the CAN controller in the interrupt handler. > > > > > When the plain driver lockups I see those kernel messages: > > > at91_can f8004000.can can0: order of incoming frames cannot be guaranteed > > > > > > And the same with 500kBit/s: > > > plain: 0, 0, lockup > > > your patch: 0, 0, 0 > > > rx-fifo: 0, 0, 0 > > > > This is weird. Either you were lucky, your embedded devices aren't able to > > send back-to-back at that rate specifically, or the situation regarding > > load and latency spikes changed somehow. The results don't make sense to > > me. > > Well, I guess this will change if I would run more than 3 times, but as > overruns already occured at 250kBit/s there _is_ still a problem in rx-fifo, > independently from 1MBit/s drops due to heavy load. Well, I now think that rx-fifo was never intended to improve the driver performance (correct me if I'm wrong, Marc), but only to build a common subsystem around the same concept that seems to be re-invented in at91_can, flexcan and ti_hecc. It does fix the lockup-bug in at91_can though. > > One interesting control-metric would be to monitor the amount of > > messages/second your test-devices are able to generate. > > I just noticed that this testing hardware has a DDR2 with only 16bit > interface. I think this will also reduce performance considerably. The 16-bit DDR2 (even at its lowest clock of 200MHz) is not exactly slow... why do you think that could be a bottleneck? > embedded device send ~1000 CAN frames/s, each which is an average busload of > 20%, but in burst time, it should be 100%. You mean the bursts of 250 messages you talked about earlier should produce a bus load of 100%? I think it is important to get some certainty about what's really going on on the bus, specially if we see things we cannot explain. You don't have access to a PC with a CAN interface or an oscilloscope, do you? Best regards, -- David Jander Protonic Holland.