linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephane Grosjean <s.grosjean@peak-system.com>
To: Andri Yngvason <andri.yngvason@marel.com>, linux-can@vger.kernel.org
Cc: wg@grandegger.com, mkl@pengutronix.de
Subject: Re: peak_pci: TX Frame Loss
Date: Thu, 19 Nov 2015 09:38:17 +0100	[thread overview]
Message-ID: <564D8A79.7090605@peak-system.com> (raw)
In-Reply-To: <20151118145121.32487.38169@maxwell.marel.net>

Hi Andri,

Could you first give me the result of sudo lspci -d 1c: -vvv please?

Regards,

Stéphane

Le 18/11/2015 15:51, Andri Yngvason a écrit :
> Hi all,
>
> We've been experiencing frame loss on transmission in the peak_pci netdev
> driver.
>
> The frames are not reported as "dumped" by the netlink interface.
>
> We are running CANopen and this manifests sporadically as nodes dropping off the
> network due to failure to answer node guarding RTR and as SDO request timeouts.
>
> Example with can0 and can1 on the same bus where the CANopen master is on can0
> and can1 is set up to listen only:
> (1446688151.783844)  can0  701  [1] remote request <- node guarding request on can0
> (1446688151.784296)  can0  70A  [1] remote request <- another node guarding request
> (1446688151.784304)  can1  70A  [1] remote request <- only the latter is seen by can1
> (1446688151.784751)  can0  720  [1] remote request
> (1446688151.784763)  can1  720  [1] remote request
> (1446688151.785793)  can1  283  [8] 00 00 00 00 00 00 00 00
> (1446688151.785792)  can0  283  [8] 00 00 00 00 00 00 00 00
> (1446688151.786164)  can0  70A  [1] 85 <-- node guarding response
> (1446688151.786163)  can1  70A  [1] 85
> (1446688151.786641)  can1  720  [1] 85
> (1446688151.786641)  can0  720  [1] 85 <-- node guarding response
> (1446688151.787057)  can0  721  [1] remote request
> (1446688151.787063)  can1  721  [1] remote request
> (1446688151.787728)  can0  721  [1] 05
> (1446688151.787733)  can1  721  [1] 05
>
> Node 1 never responded because it never received the request.
>
> The node guarding requests are sent in bursts where lower ids appear before
> higher ids. A curious observation is that it's always the lowest id that drops
> out first. I.e. the first frame in a burst of frames is the one that's lost.
>
> Another interesting thing that we've found out is that if we turn off SMP on the
> system, the problem disappears. But obviously we don't want to disable SMP in a
> production system. ;)
> It helps to set the cpy affinity of all threads and processes that touch the CAN
> bus to a single core but sadly it doesn't eliminate the problem.
>
> Our systems are running on kernel version 3.14.3 with the rt patch. I tried
> running 4.1.12-rt13 but that did not eliminate the problem. We also tried
> running with the pcan netdev driver from peak which does in fact run without
> frame loss. Thus, this is probably an issue with either peak_pci or sja1000.
>
> I tried poking around in sja1000.c. I noticed that sja1000_start_xmit() is not
> guarded against trying to transmit when the tx buffer is occupied, so I added a
> check and a print-out:
> diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
> index 32bd7f4..adc49db 100644
> --- a/drivers/net/can/sja1000/sja1000.c
> +++ b/drivers/net/can/sja1000/sja1000.c
> @@ -292,6 +292,11 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
>   
>          netif_stop_queue(dev);
>   
> +       if (!(priv->read_reg(priv, SJA1000_SR) & SR_TBS)) {
> +               netdev_err(dev, "BUG!, TX FIFO full when queue awake!\n");
> +               return NETDEV_TX_BUSY;
> +       }
> +
>          fi = dlc = cf->can_dlc;
>          id = cf->can_id;
>
> There was no error message in dmesg after frame loss, so that's not the problem.
>
> The CPU is an Intel i7-4700EQ and the CAN interface is a Peak PCIe dual channel.
>
> Does anyone have an idea what might be wrong? :)
>
> Best regards,
> Andri

--
PEAK-System Technik GmbH
Sitz der Gesellschaft Darmstadt
Handelsregister Darmstadt HRB 9183 
Geschaeftsfuehrung: Alexander Gach, Uwe Wilhelm
--

  reply	other threads:[~2015-11-19  8:47 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-18 14:51 peak_pci: TX Frame Loss Andri Yngvason
2015-11-19  8:38 ` Stephane Grosjean [this message]
2015-11-19 10:12   ` Andri Yngvason
2015-12-02 18:09 ` Andri Yngvason
2015-12-02 19:19   ` Oliver Hartkopp
2015-12-03  6:37     ` Oliver Hartkopp
2015-12-03 11:23       ` Andri Yngvason
2015-12-03 11:44         ` Marc Kleine-Budde
2015-12-08 10:21           ` Stephane Grosjean
2015-12-08 10:50             ` Andri Yngvason
2015-12-08 11:42               ` Stephane Grosjean
2015-12-08 12:24                 ` Andri Yngvason
2015-12-08 14:12                   ` [BULK]Re: " Stephane Grosjean
2015-12-22  8:13                   ` Stephane Grosjean
2015-12-22 11:51                     ` Andri Yngvason
2015-12-03 16:37         ` Stephane Grosjean
2015-12-03  8:20     ` Marc Kleine-Budde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=564D8A79.7090605@peak-system.com \
    --to=s.grosjean@peak-system.com \
    --cc=andri.yngvason@marel.com \
    --cc=linux-can@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=wg@grandegger.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).