[PATCH] can: c_can: tx interrupts can be missed, check for tx outside of just the tx interrupt routine

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] can: c_can: tx interrupts can be missed, check for tx outside of just the tx interrupt routine
@ 2013-12-19 19:38 Glen, Andrew
  2013-12-19 20:55 ` Marc Kleine-Budde
  0 siblings, 1 reply; 2+ messages in thread
From: Glen, Andrew @ 2013-12-19 19:38 UTC (permalink / raw)
  To: linux-can@vger.kernel.org

Using the c_can driver on an AM335x platform (the Beaglebone Black) I have found that tx interrupts can be occasionally missed, causing the tx queue to halt indefinitely (i.e. until manually restarted with a can down/up).

The fault occurs when there is a high can-bus load (~40% @ 250k) and I hot-swapping a usb devices (i.e. continuously plugging and unplugging a usb flash drive the fault will occur about 1/10 of the time) - I'd assume there would be other processor intensive combinations of things that would also cause this problem.

The can device continues to rx fine, and reports no obvious fault, other than rejecting new message writes due to a tx buffer overflow, which can then only be cleared by bringing the device down and then back up again.

To ensure tx events are always handled even if a tx interrupt is missed, I propose the following patch to check for a successful tx on every can interrupt, including rx and state changes. If a successful tx is detected in any of these cases a conditional call to netif_wake_queue is called.

My initial thought was to perform this check the 'c_can_start_xmit' function, but it is my understanding that once 'netif_stop_queue' has been called there will be no further calls made to 'c_can_start_xmit' until 'netif_wake_queue' has been called - which is only ever occurs after a tx interrupt has been handled - which as noted earlier can be problematic. So even checking for a successful tx in 'c_can_start_xmit' would catch most cases of missed tx interrupts, it would still leave open the corner case that if the tx buffer did become full and then very next successful tx was missed, the tx queue would become stuck.

This obviously introduces some redundancy, in terms of the frequency at which the tx register is checked, but I think this is a necessary backstop for missed interrupts under all circumstances.

Reported-by: Andrew Glen <AGlen@bepmarine.com>
Tested-by: Andrew Glen <AGlen@bepmarine.com>
---

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 77061ee..447c873 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -756,6 +756,7 @@ static void c_can_do_tx(struct net_device *dev)
        u32 msg_obj_no;
        struct c_can_priv *priv = netdev_priv(dev);
        struct net_device_stats *stats = &dev->stats;
+       bool tx_done = false;

        for (/* nix */; (priv->tx_next - priv->tx_echo) > 0; priv->tx_echo++) {
                msg_obj_no = get_tx_echo_msg_obj(priv);
@@ -770,15 +771,19 @@ static void c_can_do_tx(struct net_device *dev)
                        stats->tx_packets++;
                        can_led_event(dev, CAN_LED_EVENT_TX);
                        c_can_inval_msg_object(dev, 0, msg_obj_no);
+                       tx_done = true;
                } else {
                        break;
                }
        }

-       /* restart queue if wrap-up or if queue stalled on last pkt */
-       if (((priv->tx_next & C_CAN_NEXT_MSG_OBJ_MASK) != 0) ||
-                       ((priv->tx_echo & C_CAN_NEXT_MSG_OBJ_MASK) == 0))
-               netif_wake_queue(dev);
+    if (tx_done)
+    {
+               /* restart queue if wrap-up or if queue stalled on last pkt */
+               if (((priv->tx_next & C_CAN_NEXT_MSG_OBJ_MASK) != 0) ||
+                               ((priv->tx_echo & C_CAN_NEXT_MSG_OBJ_MASK) == 0))
+                   netif_wake_queue(dev);
+       }
 }

 /*
@@ -1082,6 +1087,9 @@ static int c_can_poll(struct napi_struct *napi, int quota)
                c_can_do_tx(dev);
        }

+       /* Always check for successful tx */
+       c_can_do_tx(dev);
+
 end:
        if (work_done < quota) {
                napi_complete(napi);

________________________________

Actuant Corporation Email Notice

This message is intended only for the use of the Addressee and may contain information that is PRIVILEGED and/or CONFIDENTIAL.
This email is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this email is not an intended recipient, you have received this email in error and any review, dissemination, distribution or copying is strictly prohibited.
If you have received this email in error, please notify the sender immediately by return mail and permanently delete the copy you received.

Thank you.

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] can: c_can: tx interrupts can be missed, check for tx outside of just the tx interrupt routine
  2013-12-19 19:38 [PATCH] can: c_can: tx interrupts can be missed, check for tx outside of just the tx interrupt routine Glen, Andrew
@ 2013-12-19 20:55 ` Marc Kleine-Budde
  0 siblings, 0 replies; 2+ messages in thread
From: Marc Kleine-Budde @ 2013-12-19 20:55 UTC (permalink / raw)
  To: Glen, Andrew; +Cc: linux-can@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 5207 bytes --]

On 12/19/2013 08:38 PM, Glen, Andrew wrote:
> Using the c_can driver on an AM335x platform (the Beaglebone Black) I
> have found that tx interrupts can be occasionally missed, causing the
> tx queue to halt indefinitely (i.e. until manually restarted with a
> can down/up).
> 
> The fault occurs when there is a high can-bus load (~40% @ 250k) and
> I hot-swapping a usb devices (i.e. continuously plugging and
> unplugging a usb flash drive the fault will occur about 1/10 of the
> time) - I'd assume there would be other processor intensive
> combinations of things that would also cause this problem.
> 
> The can device continues to rx fine, and reports no obvious fault,
> other than rejecting new message writes due to a tx buffer overflow,
> which can then only be cleared by bringing the device down and then
> back up again.
> 
> To ensure tx events are always handled even if a tx interrupt is
> missed, I propose the following patch to check for a successful tx on
> every can interrupt, including rx and state changes. If a successful
> tx is detected in any of these cases a conditional call to
> netif_wake_queue is called.
> 
> My initial thought was to perform this check the 'c_can_start_xmit'
> function, but it is my understanding that once 'netif_stop_queue' has
> been called there will be no further calls made to 'c_can_start_xmit'
> until 'netif_wake_queue' has been called - which is only ever occurs
> after a tx interrupt has been handled - which as noted earlier can be
> problematic. So even checking for a successful tx in
> 'c_can_start_xmit' would catch most cases of missed tx interrupts, it
> would still leave open the corner case that if the tx buffer did
> become full and then very next successful tx was missed, the tx queue
> would become stuck.
> 
> This obviously introduces some redundancy, in terms of the frequency
> at which the tx register is checked, but I think this is a necessary
> backstop for missed interrupts under all circumstances.
> 
> Reported-by: Andrew Glen <AGlen@bepmarine.com>
> Tested-by: Andrew Glen <AGlen@bepmarine.com>

When you write a patch, you should add your Signed-off-by here, please
remove the Reported and Tested by.

> ---
> 
> diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
> index 77061ee..447c873 100644
> --- a/drivers/net/can/c_can/c_can.c
> +++ b/drivers/net/can/c_can/c_can.c
> @@ -756,6 +756,7 @@ static void c_can_do_tx(struct net_device *dev)
>         u32 msg_obj_no;
>         struct c_can_priv *priv = netdev_priv(dev);
>         struct net_device_stats *stats = &dev->stats;
> +       bool tx_done = false;
> 
>         for (/* nix */; (priv->tx_next - priv->tx_echo) > 0; priv->tx_echo++) {
>                 msg_obj_no = get_tx_echo_msg_obj(priv);
> @@ -770,15 +771,19 @@ static void c_can_do_tx(struct net_device *dev)
>                         stats->tx_packets++;
>                         can_led_event(dev, CAN_LED_EVENT_TX);
>                         c_can_inval_msg_object(dev, 0, msg_obj_no);
> +                       tx_done = true;
>                 } else {
>                         break;
>                 }
>         }
> 
> -       /* restart queue if wrap-up or if queue stalled on last pkt */
> -       if (((priv->tx_next & C_CAN_NEXT_MSG_OBJ_MASK) != 0) ||
> -                       ((priv->tx_echo & C_CAN_NEXT_MSG_OBJ_MASK) == 0))
> -               netif_wake_queue(dev);
> +    if (tx_done)
> +    {
> +               /* restart queue if wrap-up or if queue stalled on last pkt */
> +               if (((priv->tx_next & C_CAN_NEXT_MSG_OBJ_MASK) != 0) ||
> +                               ((priv->tx_echo & C_CAN_NEXT_MSG_OBJ_MASK) == 0))
> +                   netif_wake_queue(dev);
> +       }
>  }
> 
>  /*
> @@ -1082,6 +1087,9 @@ static int c_can_poll(struct napi_struct *napi, int quota)
>                 c_can_do_tx(dev);
>         }
> 
> +       /* Always check for successful tx */
> +       c_can_do_tx(dev);
> +
>  end:
>         if (work_done < quota) {
>                 napi_complete(napi);
> 
> ________________________________

You change the driver to always check for a TX Interrupt, right? Having
a quick look at c_can_pollc_can_poll() I think the driver has a problem,
when more than one Interrupt occurs:

	if (irqstatus == STATUS_INTERRUPT) {
	} else if ((irqstatus >= C_CAN_MSG_OBJ_RX_FIRST) &&
			(irqstatus <= C_CAN_MSG_OBJ_RX_LAST)) {
	} else if ((irqstatus >= C_CAN_MSG_OBJ_TX_FIRST) &&
			(irqstatus <= C_CAN_MSG_OBJ_TX_LAST)) {
	}

I think there should be individual if()s checking for the bits set,
about like this:

	if (irqstatus & STATUS_INTERRUPT) {
	}
	if (irqstatus & C_CAN_MSG_OBJ_RX_MASK) {
	}
	if ((irqstatus & C_CAN_MSG_OBJ_TX_MASK) {
	}

C_CAN_MSG_OBJ_RX_MASK and C_CAN_MSG_OBJ_TX_MASK must be defined
appropriately.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-12-19 20:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-19 19:38 [PATCH] can: c_can: tx interrupts can be missed, check for tx outside of just the tx interrupt routine Glen, Andrew
2013-12-19 20:55 ` Marc Kleine-Budde

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.