Re: [PATCH v5] can: flexcan: Re-write receive path to use MB queue instead of FIFO

linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Jander <david@protonic.nl>
To: David Jander <david@protonic.nl>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>,
	Alexander Stein <alexander.stein@systec-electronic.com>,
	Wolfgang Grandegger <wg@grandegger.com>,
	linux-can@vger.kernel.org
Subject: Re: [PATCH v5] can: flexcan: Re-write receive path to use MB queue instead of FIFO
Date: Thu, 9 Oct 2014 12:37:25 +0200	[thread overview]
Message-ID: <20141009123725.391b2db9@archvile> (raw)
In-Reply-To: <20141008160113.10c2e887@archvile>


Hi Marc,

On Wed, 8 Oct 2014 16:01:13 +0200
David Jander <david@protonic.nl> wrote:

> On Wed, 08 Oct 2014 11:56:04 +0200
> Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> 
> > On 10/08/2014 11:08 AM, David Jander wrote:
> > > 
> > > Dear Marc,
> > > 
> > > On Mon, 06 Oct 2014 12:00:03 +0200
> > > Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> > > 
> > >> On 10/06/2014 09:28 AM, David Jander wrote:
> > >>>>> 2.- Since the problem addressed by my patch to at91_can is very
> > >>>>> similar, what about solving these problems in the SocketCAN framework
> > >>>>> (if that is possible)?
> > >>>>
> > >>>> Have you had a look at my rx-fifo branch in
> > >>>> https://gitorious.org/linux-can/linux-can-next? It already tries to
> > >>>> abstract the simulation of the FIFO with the linear mailboxes.
> > >>>
> > >>> Looks interesting. I think it is a good idea to do this in dev.c, since
> > >>> there are obviously more CAN drivers that can use this. Unfortunately
> > >>> it seems you are still pretending the napi-poll handler to call
> > >>> can_rx_fifo_poll(). Wouldn't it be better to just empty all MBs into a
> > >>> circular buffer or kfifo from the interrupt handler instead?
> > >>
> > >> Yes probably, I started the rx-fifo patch before you came up with that
> > >> idea.
> > >>
> > >>> I still don't understand the results Alexander is getting, though....
> > >>>
> > >>> What are you going to do with the rx-fifo work? Do you recommend to
> > >>> base my patch on that? In that case, calling can_rx_fifo_poll() from
> > >>> the interrupt handler will look a little awkward... but it should
> > >>> work. Or should I propose an extension to rx-fifo?
> > >>
> > >> My plans, or rather the points that need to be addressed for the rx-fifo
> > >> are:
> > >> - improve to work with more than 32 mailboxes. 64 are probably enough
> > >>   for everybody :)
> > >> - make it work with the flexcan linear buffers
> > >> - make it work with the ti_hecc driver
> > >> - add option or convert to run from interrupt handler and copy to
> > >>   kfifo/cyclic buffer/...
> > > 
> > > Can you lend you brain for a sec on this problem... I think I discovered
> > > a race condition in your rx-fifo code, but it might just be me not
> > > understanding it correctly....
> > > 
> > > 	do {
> > > 		pending = fifo->read_pending(fifo);
> > > 		pending &= fifo->active;
> > > 
> > > 		if (!(pending & BIT_ULL(fifo->next))) {
> > > 			/*
> > > 			 * Wrap around only if:
> > > 			 * - we are in the upper group and
> > > 			 * - there is a CAN frame in the first mailbox
> > > 			 *   of the lower group.
> > > 			 */
> > > 			if (can_rx_fifo_ge(fifo, fifo->next,
> > > fifo->high_first) && (pending & BIT_ULL(fifo->low_first))) {
> > > 				fifo->next = fifo->low_first;
> > > 				fifo->active |= fifo->mask_high;
> > > 				fifo->mailbox_enable_mask(fifo,
> > > fifo->mask_high); } else {
> > > 				break;
> > > 			}
> > > 		}
> > > 
> > > In this piece of code, suppose that fifo->next is in the upper half and a
> > > message is being written to the MB it is pointing at, but it is still not
> > > pending. The lower half has already been enabled and filled up completely
> > > (due to latency). In that case, fifo-next will jump back to
> > > fifo->low_first and leave a lonely filled MB in the middle of the upper
> > > half, that will trigger and infinite loop between IRQ and
> > > can_rx_fifo_poll(). The interrupt will never get cleared again.
> > > I know this is an extreme case of latency being so high as to fill more
> > > than the complete lower half, but if it strikes, it results in a lock-up.
> > > Am I right, or did I screw up somewhere?
> > 
> > Correct analysis :( At least it shows it makes sense to have this code
> > in a central place.....
> 
> I placed a printk() in the interrupt handler for debugging, and that made
> this issue quite likely to hit.
> 
> > If we handle the low_first mailbox, we might have to check if the "old"
> > next, or better, if any of the mailboxes >= old_next are pending *and*
> > there is a non pending mailbox < "old" next. This should be possible
> > with one or two clever bitmasks.
> 
> I've tried to do that, but if this situation happens, then one should first
> read the lower MBs anyway and then continue with the "old" next MB and
> re-enable them immediately....
> Problem is, I can't seem to fit this logic nicely into the current do-while
> loop anymore.
> Part of the problem could be solved if we could dismiss the requirement of
> the "quota" parameter, i.e. support only the "empty from interrupt into
> circular buffer"-scenario. Then the whole loop could get a lot simpler. Any
> reason not to do this?
> 
> Besides, what about simplifying the logic just a little bit like what I did
> in the original flexcan patch (mildly reworded):
> 
> " We split the MB area into two halves:
> The lower half and the upper half.
> We start with all MBs in EMPTY state.
> The CAN controller will start filling from the lowest MB. Once this function
> is called, we copy and disable in an atomic operation all FULL MBs.
> The first EMPTY one we encounter may be about to get filled so we stop there.
> Next time the CAN controller will have filled more MBs. Once there are no
> EMPTY MBs in the lower half, we EMPTY all FULL or locked MBs again.
> Next time we have the following situation:
> If there is a FULL MB in the upper half, it is because it was about to get
> filled when we scanned last time, or got filled just before emptying the
> lowest MB, so this will be the first MB we need to copy now. If there are
> no EMPTY MBs in the lower half at this time, it means we cannot guarantee
> ordering anymore. It also means latency exceeded the maximum amount of
> messages this controller can safely handle."
> 
> This small change avoids the pitfall we have now, because we always enable
> all MBs once we crossed the split limit. To ensure ordering we only need to
> check whether there are pending messages in the upper half before reading
> the lower half, if previously we had crossed the split.

Ok, I have just re-implemented rx-fifo this way, and after some testing with
a modified flexcan driver I'm reasonably confident that it works correctly at
least for the flexcan case. See the V2 patches that just hit the list.

Unfortunately there are a few assumptions made by the API that I haven't found
a nice way to make otherwise fool proof or self-explanatory:

1.- fifo->mailbox_move_to_buffer() must atomically disable a mailbox before or
after reading it. I guess for at91_can it needs to be disabled before reading,
while for flexcan it uses its hardware MB locking feature.

2.- fifo->mailbox_move_to_buffer() should only read out valid full MBs and
ignore empty ones, and should reflect that in the return value. This is
necessary for the algorithm to work correcly with flexcan (due to the MB
locking thing), but should also be implementable in other CAN controllers

3.- fifo->mailbox_enable_mask() and fifo->mailbox_enable() should only enable
mailboxes that are actually disabled to avoid race conditions.

> This means also, we can tweak performance by making an un-even split. The MBs
> below the split are our "latency-buffer", while the MBs above the split are
> only needed to handle incoming messages while we are busy in the read-out
> loop. This part can theoretically be much smaller.

This is BS, please ignore.

> The old at91_can driver intended to work more or less like this also, so I
> think this will work at least for both drivers... not sure about ti_hecc
> though.

Best regards,

-- 
David Jander
Protonic Holland.

next prev parent reply	other threads:[~2014-10-09 10:37 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-29 12:52 [PATCH v5] can: flexcan: Re-write receive path to use MB queue instead of FIFO David Jander
2014-09-29 13:29 ` Alexander Stein
2014-09-29 14:39   ` David Jander
2014-09-29 15:02     ` Alexander Stein
2014-09-30  7:13       ` David Jander
2014-09-30  7:43         ` Alexander Stein
2014-10-01  6:29           ` David Jander
2014-10-01  7:11             ` Alexander Stein
2014-10-01  7:15               ` Marc Kleine-Budde
2014-10-01  8:29                 ` Alexander Stein
2014-10-01  9:07                   ` David Jander
2014-10-01  9:19                     ` Alexander Stein
2014-10-01  9:34                       ` David Jander
2014-10-01  9:58                         ` Marc Kleine-Budde
2014-10-06  7:28                           ` David Jander
2014-10-06 10:00                             ` Marc Kleine-Budde
2014-10-06 11:17                               ` David Jander
2014-10-07  9:30                                 ` [RFC PATCH 1/2] can: rx-fifo: Increase MB size limit from 32 to 64 David Jander
2014-10-07  9:30                                   ` [RFC PATCH 2/2] can: rx-fifo: Add support for IRQ readout and NAPI poll David Jander
2014-10-07 13:17                                   ` [RFC PATCH 1/2] can: rx-fifo: Increase MB size limit from 32 to 64 Marc Kleine-Budde
2014-10-07 13:27                                     ` David Jander
2014-10-07 14:18                                       ` Marc Kleine-Budde
2014-10-08  9:08                               ` [PATCH v5] can: flexcan: Re-write receive path to use MB queue instead of FIFO David Jander
2014-10-08  9:56                                 ` Marc Kleine-Budde
2014-10-08 10:36                                   ` Alexander Stein
2014-10-08 10:43                                     ` Marc Kleine-Budde
2014-10-08 14:01                                   ` David Jander
2014-10-09 10:37                                     ` David Jander [this message]
2014-10-01  9:19               ` David Jander

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141009123725.391b2db9@archvile \
    --to=david@protonic.nl \
    --cc=alexander.stein@systec-electronic.com \
    --cc=linux-can@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=wg@grandegger.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).