From: Wolfgang Grandegger <wg@grandegger.com>
To: Christian Bendele <Christian.Bendele@gmx.net>
Cc: linux-can@vger.kernel.org
Subject: Re: pch_can probable bug
Date: Tue, 11 Dec 2012 23:05:56 +0100 [thread overview]
Message-ID: <50C7AE44.4040603@grandegger.com> (raw)
In-Reply-To: <loom.20121211T155025-697@post.gmane.org>
On 12/11/2012 04:43 PM, Christian Bendele wrote:
> Hi all,
>
> I believe I have found a bug in the pch_can driver. It _might_ also be
> related to what was discussed recently in the thread at
> http://thread.gmane.org/gmane.linux.can/2483/focus=2573
> however, a problem with transmission was reported there and this bug is not
> related to transmission but to receiving.
>
> This is what I observed:
>
> If very high load on the can bus coincides with very high load on the pc,
> so that the 26 message fifo buffer configured in the can controller
> actually runs full, then receiving stops completely and the pc gets
> very sluggish.
OK. Two other people are reporting somehow different problems with tx
but also rx.
> This is what I think is going on:
>
> According to Sections 13.5.9 and 13.5.9.1 of the Intel PCH EG20T
> Datasheet setting up a FIFO buffer on the pch can works like this:
>
> "...The EOB bit of all message objects of a FIFO buffer except the last have
> to be programmed to 0. The EOB bits of the last message object of a FIFO
> buffer is set to 1, configuring it as the End of the Block...."
> "When a message is stored into a message object [...] the NEWDAT bit of
> that object is set. [...]the message object is locked for further write
> accesses [...] until the CPU has written the NEWDAT bit back to 0."
> "Messaged are stored into a FIFO buffer until the last message object [...]
> is reached. If none of the preceding objects is released by writing NEWDAT
> to 0, all further messages [...] are written into the last message object
> [...] and therefore overwrite previous messages."
Yes, note that the PCH does use a C_CAN core (even the manual is identical).
>
> The pch_c_can defines a 26 messages deep FIFO buffer (from message object 1
> to message object 26) for receiving.
Please do not invest time fixing the pch_can driver. We want to get rid
of it asap.
> However, In the pch_can_rx_normal() method of the pch_can.c driver, right at
> the top of the main loop in which pending messages are read from the
> controller, the loop is aborted if the EOB bit is send in the message object
> currently considered:
>
> if (reg & PCH_IF_MCONT_EOB)
> break;
>
> That means that the message(s) stored in this message object are never read
> out, and the INTPND (interrupt pending, is set together with NEWDAT when a
> message is stored into the object) bit in that message object's control
> register is never cleared.
Looking to c_can.c, I see this problem as well. The above check should
be done after reading out the message.
> Thus a kind of "endless loop" happens where the same message object (26
> = PCH_RX_OBJ_END) is considered again and again but never handled, until the
> napi quota is consumed, and then immediately the device interrupt happens
> again because the INTPND bit in the message object was never cleared.
> The system is completely swamped.
>
> Due to the way the FIFO buffer is handled by the driver (kind of like a 16
> deep fifo with a 10 deep overflow, it doesn't matter, though), that
> message object is only ever used if the FIFO actually runs completely
> full. Thus the problem only manifests if very high loads on the canbus and
> on the host happen at the same time. If that never happens the system will
> work fine for a long time.
>
> I'm still trying to fix this problem here, but this is the first time I
> actually did anything using the napi and the whole driver looks slightly
> confusing to me (half the time I don't understand why it does things the
> way it does them). Just moving down the offending break towards the bottom
> of the loop unfortunately doesn't work (The driver breaks completely). I
> would appreciate it if somebody who knows more about this stuff could
> try to fix this.
Could you please use my "RFC V3" c_can_pci driver patches I sent out an
hour ago. Anyway, avoiding out-of-order reception with a message FIFO is
tricky and we need to fix that part as well. Unfortunately, I do not
have a system for development and testing.
Wolfgang.
next prev parent reply other threads:[~2012-12-11 22:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-11 15:43 pch_can probable bug Christian Bendele
2012-12-11 16:08 ` Christian Bendele
2012-12-11 16:22 ` Wolfgang Grandegger
2012-12-11 22:05 ` Wolfgang Grandegger [this message]
2012-12-17 13:21 ` Christian Bendele
2012-12-17 14:50 ` Wolfgang Grandegger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50C7AE44.4040603@grandegger.com \
--to=wg@grandegger.com \
--cc=Christian.Bendele@gmx.net \
--cc=linux-can@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.