linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wolfgang Grandegger <wg@grandegger.com>
To: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: linux-can@vger.kernel.org, bhupesh.sharma@st.com, tomoya.rohm@gmail.com
Subject: Re: [RFC v2 0/7] pch_can/c_can: fix races and add PCH support to c_can
Date: Thu, 06 Dec 2012 15:31:20 +0100	[thread overview]
Message-ID: <50C0AC38.8080405@grandegger.com> (raw)
In-Reply-To: <4036287.fuKZ6k5idx@ws-stein>

On 12/06/2012 02:38 PM, Alexander Stein wrote:
> Hello Wolfgang,
> 
> On Wednesday 05 December 2012 18:35:25, Wolfgang Grandegger wrote:
>>>> - The messages look still ok (not currupted, I mean)?
>>>
>>> The received frames all look good (despite wrong counter sometimes due to 
>>> wrong order or lost frames).
>>>
>>>>> Even worse, if I use the following patch to check if PCI writes were 
>>>>> successfully, I notices that some writes (or the consecutive read) don't 
>>>>> succeed. And I also get lots of I2C timeouts waiting for a xfer complete.
>>>>
>>>> Be careful, there might be some registers changing their values after
>>>> writing. Can you show the value read after writing and the register
>>>> offset? The influence on the I2C bus looks more like an overload or
>>>> hardware problem. What is your CAN interrupt rate?
>>>
>>> I get about 33 interrupts per second on i2c. On a successful run I get 366886 
>>> interrupts for 500000 messages with the c_can driver.
>>
>> In what time? Is the CAN bus highly loaded.
> 
> Busload is about 14-18% according to canbusload with 1MBit. A complete sucessful run takes about 5 minutes.
> 
>>> On the second line you can see that the register isn't written at all (or the 
>>> read failed for some reason).
>>
>> I assume the latter. Could you please retry reading the register until
>> the correct value shows up. With some timeout, of course.
> 
> I notices having all error events printed on serial console using pch_uart driver has negative effect (I guess one problem causes another one), so I setup 'dmesg -n1' to reduce serial load before the test.
> Running the test with just one heartbeat triggered LED set using I2C the test runs without errors. I only see
> Lots of 'c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. got: 0x2000' at the beginning. It seems this (reserved?) bit is always 1 no matter what we write.

There are reserved bit! From the manual:

 "Reserved: This bit is reserved for future expansion. Only “0” is
  accepted as the write data to the reserved bit. When “1” is
  written, the operation is not guaranteed."

  When reading the reserved bit of the IFmMASK2 register (m = 1), a “1”
  is read. Write a “1” for write.

Well, I don't fully understand but it's clear that we always read "1!.

> But things start to break when running the test while running 'watch sensors' (sensors queries several temperature sensors via I2C) in a seconds ssh session.
> First off the driver error messages (omitting the messages as written above):
> [  384.466217] c_can_pci 0000:02:0c.3: can0: write 0x73 to offset 0x24 failed. got: 0xbc


Hm...

> [  384.466630] c_can_pci 0000:02:0c.3: can0: write 0x73 to offset 0x24 failed. got: 0xb8
...
> [  700.646608] c_can_pci 0000:02:0c.3: can0: write 0xe to offset 0x0 failed. got: 0x28
> [  700.647000] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x0 failed. got: 0x88
> 
> As you can see, sometimes it needs several write retries to succeed. Nevertheless my test application detects also some problems:

OK, obviously the real value needs some time to show up when there is
other activity on the bus. This hardware is really special. We need to
contact the hardware developers.

>> Error on MSG ID 0x252. Got counter 96480 and expected 96466
>> Error on MSG ID 0x251. Got counter 96480 and expected 96466
>> Error on MSG ID 0x252. Got counter 101706 and expected 101696
>> Error on MSG ID 0x251. Got counter 101706 and expected 101696
> Here were messages missed/dropped for both CAN-IDs.

It is interesting that the same number of messages are missing for both
IDs... which come from different CAN nodes, right?

Do you see dropped messages with "ip -d -s link show canX"?
Did you check if the protocol layer did drop messages. You can
use "candump -d" to find that out.

>> Error on MSG ID 0x251. Got counter 108673 and expected 108672
>> Error on MSG ID 0x251. Got counter 108672 and expected 108674
>> Error on MSG ID 0x251. Got counter 108674 and expected 108673
>                                      ^^^^^^
> Here you can see the CAN frame with counter 108673 is read before 108672.

You could add some offset/mask to the counter data of MSG ID 0x252 to
see if they get mixed up.

>> Error on MSG ID 0x251. Got counter 117488 and expected 117487
>> Error on MSG ID 0x251. Got counter 117487 and expected 117489
>> Error on MSG ID 0x251. Got counter 117489 and expected 117488
>                                      ^^^^^^
> Here you can see the CAN frame with counter 117488 is read before 117487.
>> Error on MSG ID 0x251. Got counter 142297 and expected 142296
>> Error on MSG ID 0x251. Got counter 142296 and expected 142298
>> Error on MSG ID 0x251. Got counter 142298 and expected 142297
> Same again.
>> Error on MSG ID 0x251. Got counter 147932 and expected 147924
>> Error on MSG ID 0x251. Got counter 147936 and expected 147933
>> Error on MSG ID 0x252. Got counter 147936 and expected 147924
>> Error on MSG ID 0x251. Got counter 165268 and expected 165267
>> Error on MSG ID 0x251. Got counter 218560 and expected 218558
>> Error on MSG ID 0x252. Got counter 218560 and expected 218558
>> Error on MSG ID 0x252. Got counter 231076 and expected 231075
>> Error on MSG ID 0x251. Got counter 231077 and expected 231076
>> Error on MSG ID 0x251. Got counter 241959 and expected 241958
> Messages missed/dropped again.
> 
> Below is the patch for c_can.
> 
> Best regards,
> Alexander
> 
> diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
> index d63aaa3..da9bbc0 100644
> --- a/drivers/net/can/c_can/c_can.c
> +++ b/drivers/net/can/c_can/c_can.c
> @@ -1186,6 +1186,7 @@ struct net_device *alloc_c_can_dev(void)
>                                         CAN_CTRLMODE_BERR_REPORTING;
>  
>         spin_lock_init(&priv->lock);
> +       spin_lock_init(&priv->testlock);
>  
>         return dev;
>  }
> diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
> index 3487d5e..2b58b75 100644
> --- a/drivers/net/can/c_can/c_can.h
> +++ b/drivers/net/can/c_can/c_can.h
> @@ -173,6 +173,7 @@ struct c_can_priv {
>         unsigned int instance;
>         void (*init) (const struct c_can_priv *priv, bool enable);
>         spinlock_t lock;        /* to protect tx and rx message objects */
> +       spinlock_t testlock;    /* to protect tx and rx message objects */
>  };
>  
>  struct net_device *alloc_c_can_dev(void);
> diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c
> index 2516ea9..0ac4d43 100644
> --- a/drivers/net/can/c_can/c_can_pci.c
> +++ b/drivers/net/can/c_can/c_can_pci.c
> @@ -74,13 +74,37 @@ static void c_can_pci_write_reg_aligned_to_32bit(struct c_can_priv *priv,
>  static u16 c_can_pci_read_reg_32bit(struct c_can_priv *priv,
>                                     enum reg index)
>  {
> -       return (u16)ioread32(priv->base + 2 * priv->regs[index]);
> +       unsigned long flags;
> +       u16 reg;
> +
> +       spin_lock_irqsave(&priv->testlock, flags);
> +       reg = (u16)ioread32(priv->base + 2 * priv->regs[index]);
> +       spin_unlock_irqrestore(&priv->testlock, flags);
> +
> +       return reg;
>  }
>  
>  static void c_can_pci_write_reg_32bit(struct c_can_priv *priv,
>                                       enum reg index, u16 val)
>  {
> -       iowrite32((u32)val, priv->base + 2 * priv->regs[index]);
> +       u16 reg;
> +       unsigned long flags;
> +       int retries;
> +
> +       retries = 0;
> +
> +       spin_lock_irqsave(&priv->testlock, flags);
> +
> +       do
> +       {
> +               iowrite32((u32)val, priv->base + 2 * priv->regs[index]);

I think it's enough to write the message once.

> +               reg = (u16)ioread32(priv->base + 2 * priv->regs[index]);
> +               if (reg != val)
> +               {
> +                       netdev_err(priv->dev, "write 0x%x to offset 0x%x failed. got: 0x%x\n", val, 2 * priv->regs[index], reg);
> +               }
> +       } while ((reg != val) && (retries++ < 20));
> +       spin_unlock_irqrestore(&priv->testlock, flags);
>  }
>  
>  static void c_can_pci_reset_pch(const struct c_can_priv *priv, bool enable)

Wolfgang.


  parent reply	other threads:[~2012-12-06 14:31 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-29 14:39 [RFC v2 0/7] pch_can/c_can: fix races and add PCH support to c_can Wolfgang Grandegger
2012-11-29 14:39 ` [RFC v2 1/7] pch_can: add spinlocks to protect tx objects Wolfgang Grandegger
2012-11-29 14:39 ` [RFC v2 2/7] c_can: rename callback "initram" to "init" to more general usage Wolfgang Grandegger
2012-12-03 14:20   ` Alexander Stein
2012-12-03 14:32     ` Wolfgang Grandegger
2012-11-29 14:39 ` [RFC v2 3/7] c_can: use different sets of interface registers for rx and tx Wolfgang Grandegger
2012-11-30  8:39   ` Marc Kleine-Budde
2012-11-30  9:15     ` Wolfgang Grandegger
2012-11-29 14:39 ` [RFC v2 4/7] c_can_pci: introduce board specific PCI bar Wolfgang Grandegger
2012-11-30  8:45   ` Marc Kleine-Budde
2012-11-30  9:11     ` Wolfgang Grandegger
2012-11-30  9:19       ` Marc Kleine-Budde
2012-11-29 14:39 ` [RFC v2 5/7] c_can_pci: enable PCI bus master only for MSI Wolfgang Grandegger
2012-11-30  8:54   ` Marc Kleine-Budde
2012-11-29 14:39 ` [RFC v2 6/7] c_can_pci: add support for PCH CAN on Intel EG20T PCH Wolfgang Grandegger
2012-11-29 14:39 ` [RFC v2 7/7] c_can: add spinlock to protect tx and rx objects Wolfgang Grandegger
2012-12-05 12:09 ` [RFC v2 0/7] pch_can/c_can: fix races and add PCH support to c_can Alexander Stein
2012-12-05 12:50   ` Wolfgang Grandegger
2012-12-05 14:46     ` Alexander Stein
2012-12-05 17:35       ` Wolfgang Grandegger
2012-12-05 21:52         ` Marc Kleine-Budde
2012-12-06  7:09           ` Wolfgang Grandegger
2012-12-06  8:35             ` Marc Kleine-Budde
2012-12-06  8:17         ` Wolfgang Grandegger
2012-12-06 13:38         ` Alexander Stein
2012-12-06 14:02           ` Marc Kleine-Budde
2012-12-06 14:31           ` Wolfgang Grandegger [this message]
2012-12-06 14:37             ` Marc Kleine-Budde
2012-12-06 14:56             ` Alexander Stein
2012-12-06 15:15               ` Wolfgang Grandegger
2012-12-06 15:27                 ` Wolfgang Grandegger
2012-12-06 15:55                   ` Alexander Stein
2012-12-06 17:14             ` Alexander Stein
2012-12-06 23:34               ` Marc Kleine-Budde
2012-12-07  9:26                 ` Wolfgang Grandegger
2012-12-07  9:55                   ` Marc Kleine-Budde
2012-12-07 10:00                     ` Bhupesh SHARMA
2012-12-07 10:09                       ` Marc Kleine-Budde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C0AC38.8080405@grandegger.com \
    --to=wg@grandegger.com \
    --cc=alexander.stein@systec-electronic.com \
    --cc=bhupesh.sharma@st.com \
    --cc=linux-can@vger.kernel.org \
    --cc=tomoya.rohm@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).