linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wolfgang Grandegger <wg@grandegger.com>
To: Christian Bendele <Christian.Bendele@gmx.net>
Cc: linux-can@vger.kernel.org
Subject: Re: pch_can probable bug
Date: Tue, 11 Dec 2012 23:05:56 +0100	[thread overview]
Message-ID: <50C7AE44.4040603@grandegger.com> (raw)
In-Reply-To: <loom.20121211T155025-697@post.gmane.org>

On 12/11/2012 04:43 PM, Christian Bendele wrote:
> Hi all,
> 
> I believe I have found a bug in the pch_can driver. It _might_ also be 
> related to what was discussed recently in the thread at
> http://thread.gmane.org/gmane.linux.can/2483/focus=2573
> however, a problem with transmission was reported there and this bug is not
> related to transmission but to receiving.
> 
> This is what I observed:
> 
> If very high load on the can bus coincides with very high load on the pc,
> so that the 26 message fifo buffer configured in the can controller 
> actually runs full, then receiving stops completely and the pc gets
> very sluggish.

OK. Two other people are reporting somehow different problems with tx
but also rx.

> This is what I think is going on:
> 
> According to Sections 13.5.9 and 13.5.9.1 of the Intel PCH EG20T
> Datasheet setting up a FIFO buffer on the pch can works like this:
> 
> "...The EOB bit of all message objects of a FIFO buffer except the last have
> to be programmed to 0. The EOB bits of the last message object of a FIFO
> buffer is set to 1, configuring it as the End of the Block...."
> "When a message is stored into a message object [...] the NEWDAT bit of
> that object is set. [...]the message object is locked for further write
> accesses [...] until the CPU has written the NEWDAT bit back to 0."
> "Messaged are stored into a FIFO buffer until the last message object [...]
> is reached. If none of the preceding objects is released by writing NEWDAT
> to 0, all further messages [...] are written into the last message object
> [...] and therefore overwrite previous messages."

Yes, note that the PCH does use a C_CAN core (even the manual is identical).

> 
> The pch_c_can defines a 26 messages deep FIFO buffer (from message object 1 
> to message object 26) for receiving. 

Please do not invest time fixing the pch_can driver. We want to get rid
of it asap.

> However, In the pch_can_rx_normal() method of the pch_can.c driver, right at
> the top of the main loop in which pending messages are read from the
> controller, the loop is aborted if the EOB bit is send in the message object
> currently considered:
> 
> 		if (reg & PCH_IF_MCONT_EOB)
> 			break;
> 
> That means that the message(s) stored in this message object are never read
> out, and the INTPND (interrupt pending, is set together with NEWDAT when a
> message is stored into the object) bit in that message object's control
> register is never cleared.

Looking to c_can.c, I see this problem as well. The above check should
be done after reading out the message.

> Thus a kind of "endless loop" happens where the same message object (26
> = PCH_RX_OBJ_END) is considered again and again but never handled, until the
> napi quota is consumed, and then immediately the device interrupt happens
> again because the INTPND bit in the message object was never cleared.
> The system is completely swamped.
> 
> Due to the way the FIFO buffer is handled by the driver (kind of like a 16
> deep fifo with a 10 deep overflow, it doesn't matter, though), that
> message object is only ever used if the FIFO actually runs completely
> full. Thus the problem only manifests if very high loads on the canbus and
> on the host happen at the same time. If that never happens the system will
> work fine for a long time.
> 
> I'm still trying to fix this problem here, but this is the first time I
> actually did anything using the napi and the whole driver looks slightly
> confusing to me (half the time I don't understand why it does things the
> way it does them). Just moving down the offending break towards the bottom
> of the loop unfortunately doesn't work (The driver breaks completely). I
> would appreciate it if somebody who knows more about this stuff could
> try to fix this.

Could you please use my "RFC V3" c_can_pci driver patches I sent out an
hour ago. Anyway, avoiding out-of-order reception with a message FIFO is
tricky and we need to fix that part as well. Unfortunately, I do not
have a system for development and testing.

Wolfgang.


  parent reply	other threads:[~2012-12-11 22:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-11 15:43 pch_can probable bug Christian Bendele
2012-12-11 16:08 ` Christian Bendele
2012-12-11 16:22   ` Wolfgang Grandegger
2012-12-11 22:05 ` Wolfgang Grandegger [this message]
2012-12-17 13:21   ` Christian Bendele
2012-12-17 14:50     ` Wolfgang Grandegger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C7AE44.4040603@grandegger.com \
    --to=wg@grandegger.com \
    --cc=Christian.Bendele@gmx.net \
    --cc=linux-can@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).