From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Jander Subject: Re: [PATCH v5] can: flexcan: Re-write receive path to use MB queue instead of FIFO Date: Tue, 30 Sep 2014 09:13:55 +0200 Message-ID: <20140930091355.770fac72@archvile> References: <1411995175-13540-1-git-send-email-david@protonic.nl> <8124948.gcTnPkg5PL@ws-stein> <20140929163932.055fae8f@archvile> <5017123.OgYMn6dde4@ws-stein> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: Received: from protonic.xs4all.nl ([83.163.252.89]:7123 "EHLO protonic.xs4all.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750976AbaI3HNw (ORCPT ); Tue, 30 Sep 2014 03:13:52 -0400 In-Reply-To: <5017123.OgYMn6dde4@ws-stein> Sender: linux-can-owner@vger.kernel.org List-ID: To: Alexander Stein Cc: Marc Kleine-Budde , Wolfgang Grandegger , linux-can@vger.kernel.org Dear Alexander, On Mon, 29 Sep 2014 17:02:39 +0200 Alexander Stein wrote: > On Monday 29 September 2014 16:39:32, David Jander wrote: > > > > Dear Alexander, > > > > On Mon, 29 Sep 2014 15:29:28 +0200 > > Alexander Stein wrote: > > > > > On Monday 29 September 2014 14:52:55, David Jander wrote: > > > > The FlexCAN controller has a RX FIFO that is only 6 messages deep, and > > > > a mailbox space capable of holding up to 63 messages. > > > > > > > > This space was largely unused, limiting the permissible latency from > > > > interrupt to NAPI to only 6 messages. This patch uses all available MBs > > > > for message reception and frees the MBs in the IRQ handler to greatly > > > > decrease the likelihood of receive overruns. > > > > > > > > Signed-off-by: David Jander > > > > > > AFAICT, If you disable Rx FIFO mode, you essentially break RTR reception > > > on (at least) i.MX3. Please refere to the reference manual 24.4.8.1 > > > Remote Frames. Vybrid and i.MX6 (not sure about i.MX5) seem to have more > > > features about RTR reception. > > > > Argh! Looks like you are right! > > RTR reception did not work for i.MX6 either, but that is because I forgot > > to set RRS bit in CTRL2... which does not exist on i.MX53 nor i.MX35. > > What's strange is the fact that the i.MX53 RM does not contain the chapter > > you mention (it is contained in the i.MX35 RM though), and this is the > > only place that clearly seems to indicate that this indeed will not work > > on the i.MX3: > > > > "A received remote request frame is not stored in a receive buffer. It is > > only used to trigger a transmission of a frame in response." > > Yep, it seems that the FlexCAN part of the RM is even more shorter in i.MX5 > than i.MX3 or i.MX6... > > > AFAICS, we have little choice but to use the Rx FIFO, at least for i.MX3/5 > > or older IPs... > > > > Maybe I can re-factor the code in such a way that the same construction is > > used outside the IRQ context, but the IRQ routine will either empty the > > FIFO (for revision 3 and older flexcan) or the while MB area (for revision > > 10 and newer). > > > > The BIG drawback of using the RX FIFO is that it is really tiny. Not using > > it is really a big win for i.MX6 and newer... which I'd like to keep. > > I don't know how the MB actually work, but I know about race conditions in > C_CAN (actually pch can) with the pseudo FIFO implemented using message > boxes. May this also happen here? That's a reason I'm really happy there is > a real FIFO in hardware. I can think of a few possible race conditions that can happen when doing this, but AFAIK, I have them all covered in this patch. I have done quite some testing looking at message ordering and message loss, but it seems very robust. If you read the comment for the function flexcan_copy_rxmbs(), you can see that there is a condition that can produce out of order messages, but that only happens if interrupt latency goes beyond 30 messages... and you get a nice warning in the kernel message log. When I first looked at the flexcan peripheral I also thought: "Cool, at last a CAN controller with a real FIFO", but unfortunately that FIFO is only 6 messages deep... compared to a MB area of a whopping 64 MBs that will go almost completely unused! > > Nevertheless, emptying the FIFO in the IRQ handler will still be a big > > improvement, since the only thing that could still kill the driver and > > cause message loss is interrupt latency, which normally should not be so > > high. NAPI scheduling latency is probably much worse, and this is the > > biggest issue with the current driver. > > > > Any suggestion on what to do? > > Get rid of NAPI and use RT-preempt with proper priorities :) But joke aside, > which workload does increase the NAPI latency so much, an overrun occurs? I > tested CAN bursts on i.MX35 without any loss. I have seen overruns on an i.MX6 at only 250kbaud receiving back-to-back messages of 1 byte long. I usually test bursts of 10000 messages or more. Things get a lot worse if you also happen to have kernel messages output to a serial console and plug in an USB device (because there are printk's in the EHCI driver inside spin locks with interrupts disabled!!), but that's a different story. 6 messages at 250kbaud is little over 1 ms latency in a worst-case scenario, and at 1Mbaud it is just a few 100 microseconds. I am not an expert in Linux schedulers, but IMHO for a non-RT kernel these latency times are totally off-limits. Maybe if the controller is off-loaded in the IRQ handler and interrupt priorities are well adjusted 6 messages can be feasible at 250kbaud, but I wouldn't dare to go beyond that. Best regards, -- David Jander Protonic Holland.