* [PATCH 5/5] can: kvaser_usb: Fix tx queue start/stop race conditions [not found] ` <20150226152558.GD6075@linux> @ 2015-02-26 15:29 ` Ahmed S. Darwish 0 siblings, 0 replies; 15+ messages in thread From: Ahmed S. Darwish @ 2015-02-26 15:29 UTC (permalink / raw) To: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason Cc: Linux-CAN, netdev, LKML From: Ahmed S. Darwish <ahmed.darwish@valeo.com> A number of tx queue wake-up events went missing due to the outlined scenario below. Start state is a pool of 16 tx URBs, active tx_urbs count = 15, with the netdev tx queue open. start_xmit() tx_acknowledge() ............ ................ atomic_inc(&tx_urbs); if (atomic_read(&tx_urbs) >= 16) { URB completion IRQ! --> atomic_dec(&tx_urbs); netif_wake_queue(); return; <-- end of IRQ! netif_stop_queue(); } At the end, the correct state expected is a 15 tx_urbs count value with the tx queue state _open_. Due to the race, we get the same tx_urbs value but with the tx queue state _stopped_. The wake-up event is completely lost. Thus avoid hand-rolled concurrency mechanisms and use a proper lock for contexts protection. Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> --- drivers/net/can/usb/kvaser_usb.c | 82 +++++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 31 deletions(-) diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c index 13bae86..807ab0c 100644 --- a/drivers/net/can/usb/kvaser_usb.c +++ b/drivers/net/can/usb/kvaser_usb.c @@ -14,6 +14,7 @@ * Copyright (C) 2015 Valeo S.A. */ +#include <linux/spinlock.h> #include <linux/kernel.h> #include <linux/completion.h> #include <linux/module.h> @@ -471,10 +472,12 @@ struct kvaser_usb { struct kvaser_usb_net_priv { struct can_priv can; - atomic_t active_tx_urbs; - struct usb_anchor tx_submitted; + spinlock_t tx_contexts_lock; + int active_tx_contexts; struct kvaser_usb_tx_urb_context *tx_contexts; + struct usb_anchor tx_submitted; + struct completion start_comp, stop_comp; struct kvaser_usb *dev; @@ -702,6 +705,7 @@ static void kvaser_usb_tx_acknowledge(const struct kvaser_usb *dev, struct kvaser_usb_net_priv *priv; struct sk_buff *skb; struct can_frame *cf; + unsigned long flags; u8 channel, tid; channel = msg->u.tx_acknowledge_header.channel; @@ -745,12 +749,15 @@ static void kvaser_usb_tx_acknowledge(const struct kvaser_usb *dev, stats->tx_packets++; stats->tx_bytes += context->dlc; - can_get_echo_skb(priv->netdev, context->echo_index); - context->echo_index = dev->max_tx_urbs; - atomic_dec(&priv->active_tx_urbs); + spin_lock_irqsave(&priv->tx_contexts_lock, flags); + can_get_echo_skb(priv->netdev, context->echo_index); + context->echo_index = dev->max_tx_urbs; + --priv->active_tx_contexts; netif_wake_queue(priv->netdev); + + spin_unlock_irqrestore(&priv->tx_contexts_lock, flags); } static void kvaser_usb_simple_msg_callback(struct urb *urb) @@ -811,18 +818,6 @@ static int kvaser_usb_simple_msg_async(struct kvaser_usb_net_priv *priv, return 0; } -static void kvaser_usb_unlink_tx_urbs(struct kvaser_usb_net_priv *priv) -{ - struct kvaser_usb *dev = priv->dev; - int i; - - usb_kill_anchored_urbs(&priv->tx_submitted); - atomic_set(&priv->active_tx_urbs, 0); - - for (i = 0; i < dev->max_tx_urbs; i++) - priv->tx_contexts[i].echo_index = dev->max_tx_urbs; -} - static void kvaser_usb_rx_error_update_can_state(struct kvaser_usb_net_priv *priv, const struct kvaser_usb_error_summary *es, struct can_frame *cf) @@ -1524,6 +1519,26 @@ error: return err; } +static void kvaser_usb_reset_tx_urb_contexts(struct kvaser_usb_net_priv *priv) +{ + int i, max_tx_urbs; + + max_tx_urbs = priv->dev->max_tx_urbs; + + priv->active_tx_contexts = 0; + for (i = 0; i < max_tx_urbs; i++) + priv->tx_contexts[i].echo_index = max_tx_urbs; +} + +/* This method might sleep. Do not call it in the atomic context + * of URB completions. + */ +static void kvaser_usb_unlink_tx_urbs(struct kvaser_usb_net_priv *priv) +{ + usb_kill_anchored_urbs(&priv->tx_submitted); + kvaser_usb_reset_tx_urb_contexts(priv); +} + static void kvaser_usb_unlink_all_urbs(struct kvaser_usb *dev) { int i; @@ -1643,6 +1658,7 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, struct kvaser_msg *msg; int i, err, ret = NETDEV_TX_OK; u8 *msg_tx_can_flags = NULL; /* GCC */ + unsigned long flags; if (can_dropped_invalid_skb(netdev, skb)) return NETDEV_TX_OK; @@ -1696,12 +1712,21 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, if (cf->can_id & CAN_RTR_FLAG) *msg_tx_can_flags |= MSG_FLAG_REMOTE_FRAME; + spin_lock_irqsave(&priv->tx_contexts_lock, flags); for (i = 0; i < dev->max_tx_urbs; i++) { if (priv->tx_contexts[i].echo_index == dev->max_tx_urbs) { context = &priv->tx_contexts[i]; + + context->echo_index = i; + can_put_echo_skb(skb, netdev, context->echo_index); + ++priv->active_tx_contexts; + if (priv->active_tx_contexts >= dev->max_tx_urbs) + netif_stop_queue(netdev); + break; } } + spin_unlock_irqrestore(&priv->tx_contexts_lock, flags); /* This should never happen; it implies a flow control bug */ if (!context) { @@ -1713,7 +1738,6 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, } context->priv = priv; - context->echo_index = i; context->dlc = cf->can_dlc; msg->u.tx_can.tid = context->echo_index; @@ -1725,18 +1749,17 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, kvaser_usb_write_bulk_callback, context); usb_anchor_urb(urb, &priv->tx_submitted); - can_put_echo_skb(skb, netdev, context->echo_index); - - atomic_inc(&priv->active_tx_urbs); - - if (atomic_read(&priv->active_tx_urbs) >= dev->max_tx_urbs) - netif_stop_queue(netdev); - err = usb_submit_urb(urb, GFP_ATOMIC); if (unlikely(err)) { + spin_lock_irqsave(&priv->tx_contexts_lock, flags); + can_free_echo_skb(netdev, context->echo_index); + context->echo_index = dev->max_tx_urbs; + --priv->active_tx_contexts; + netif_wake_queue(netdev); + + spin_unlock_irqrestore(&priv->tx_contexts_lock, flags); - atomic_dec(&priv->active_tx_urbs); usb_unanchor_urb(urb); stats->tx_dropped++; @@ -1863,7 +1886,7 @@ static int kvaser_usb_init_one(struct usb_interface *intf, struct kvaser_usb *dev = usb_get_intfdata(intf); struct net_device *netdev; struct kvaser_usb_net_priv *priv; - int i, err; + int err; err = kvaser_usb_send_simple_msg(dev, CMD_RESET_CHIP, channel); if (err) @@ -1892,10 +1915,7 @@ static int kvaser_usb_init_one(struct usb_interface *intf, priv->channel = channel; init_usb_anchor(&priv->tx_submitted); - atomic_set(&priv->active_tx_urbs, 0); - - for (i = 0; i < dev->max_tx_urbs; i++) - priv->tx_contexts[i].echo_index = dev->max_tx_urbs; + kvaser_usb_reset_tx_urb_contexts(priv); priv->can.state = CAN_STATE_STOPPED; priv->can.clock.freq = CAN_USB_CLOCK; -- 1.9.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
[parent not found: <20150311173744.GA13095@linux>]
[parent not found: <5500B6EB.8050905@pengutronix.de>]
* Re: [PATCH v3 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions [not found] ` <5500B6EB.8050905@pengutronix.de> @ 2015-03-12 19:30 ` Ahmed S. Darwish 0 siblings, 0 replies; 15+ messages in thread From: Ahmed S. Darwish @ 2015-03-12 19:30 UTC (permalink / raw) To: Marc Kleine-Budde Cc: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Andri Yngvason, Linux-CAN, LKML, Linux-netdev Hi Marc, On Wed, Mar 11, 2015 at 10:43:07PM +0100, Marc Kleine-Budde wrote: > On 03/11/2015 06:37 PM, Ahmed S. Darwish wrote: > > From: Ahmed S. Darwish <ahmed.darwish@valeo.com> > > > > A number of tx queue wake-up events went missing due to the > > outlined scenario below. Start state is a pool of 16 tx URBs, > > active tx_urbs count = 15, with the netdev tx queue open. > > > > start_xmit() tx_acknowledge() > > ............ ................ > > atomic_inc(&tx_urbs); > > if (atomic_read(&tx_urbs) >= 16) { > > URB completion IRQ! > > --> > > atomic_dec(&tx_urbs); > > netif_wake_queue(); > > return; > > <-- > > end of IRQ! > > netif_stop_queue(); > > } > > > > At the end, the correct state expected is a 15 tx_urbs count > > value with the tx queue state _open_. Due to the race, we get > > the same tx_urbs value but with the tx queue state _stopped_. > > The wake-up event is completely lost. > > > > Thus avoid hand-rolled concurrency mechanisms and use a proper > > lock for contexts protection. > > > > Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> > > --- > > drivers/net/can/usb/kvaser_usb.c | 83 ++++++++++++++++++++++++---------------- > > 1 file changed, 51 insertions(+), 32 deletions(-) > > > > changelog-v3: Added missing spin_lock_init(). With all kernel > > lock debugging options set, I've been running my test suite > > for an hour now without apparent problems in dmesg so far. > > > > changelog-v2: Put bugfix patch at the start of the series. > > > > diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c > > index a316fa4..e97a08c 100644 > > --- a/drivers/net/can/usb/kvaser_usb.c > > +++ b/drivers/net/can/usb/kvaser_usb.c > > @@ -14,6 +14,7 @@ > > * Copyright (C) 2015 Valeo S.A. > > */ > > > > +#include <linux/spinlock.h> > > #include <linux/kernel.h> > > #include <linux/completion.h> > > #include <linux/module.h> > > @@ -467,10 +468,11 @@ struct kvaser_usb { > > struct kvaser_usb_net_priv { > > struct can_priv can; > > > > - atomic_t active_tx_urbs; > > - struct usb_anchor tx_submitted; > > + spinlock_t tx_contexts_lock; > > + int active_tx_contexts; > > struct kvaser_usb_tx_urb_context tx_contexts[MAX_TX_URBS]; > > > > + struct usb_anchor tx_submitted; > > struct completion start_comp, stop_comp; > > > > struct kvaser_usb *dev; > > @@ -694,6 +696,7 @@ static void kvaser_usb_tx_acknowledge(const struct kvaser_usb *dev, > > struct kvaser_usb_net_priv *priv; > > struct sk_buff *skb; > > struct can_frame *cf; > > + unsigned long flags; > > u8 channel, tid; > > > > channel = msg->u.tx_acknowledge_header.channel; > > @@ -737,12 +740,15 @@ static void kvaser_usb_tx_acknowledge(const struct kvaser_usb *dev, > > > > stats->tx_packets++; > > stats->tx_bytes += context->dlc; > > - can_get_echo_skb(priv->netdev, context->echo_index); > > > > - context->echo_index = MAX_TX_URBS; > > - atomic_dec(&priv->active_tx_urbs); > > + spin_lock_irqsave(&priv->tx_contexts_lock, flags); > > > > + can_get_echo_skb(priv->netdev, context->echo_index); > > + context->echo_index = MAX_TX_URBS; > > + --priv->active_tx_contexts; > > netif_wake_queue(priv->netdev); > > + > > + spin_unlock_irqrestore(&priv->tx_contexts_lock, flags); > > I think it should be possible to move tun unlock before waking up the queue? > Hmmm... I've kept thinking about this today. Here's my understanding of the whole situation: 1) start_xmit() runs in NET_TX_SOFTIRQ softirq context 2) tx_acknowledge() occurs in URB completion softirq context 3) In an SMP machine, softirqs can run parallel to each other 4) start_xmit is protected from itself by the _xmit_lock spin 5) start_xmit() and tx_acknowledge() can, and do, run in parallel to each other >From the above, we can imagine the following: ################################################################ # Transfer queue length = 16 # Transfer queue index = 15 # # CPU #1 CPU #2 # start_xmit() tx_acknowledge() # ------------ ---------------- # ... # spin_lock(ctx_lock) # index-- # spin_unlock(ctx__lock) # <--- # ... # spin_lock(ctx_lock) # index++ # spin_unlock(ctx_lock) # return; # # /* Another invocation of start_xmit */ # # ... # spin_lock(ctx_lock) # index++ # /* We've filled the tx queue */ # if (index == 16) # netif_stop_queue() # spin_unlock(ctx_lock) # return; # # /* Transfer queue stopped */ # # ---> # /* Queue woken up # while it's full */ # netif_wake_queue() # ################################################################ I admit that unlike the race in the patch commit message, which actually appeared in practice in a normal but heavy use case, the one in the box above is highly theoretical. Nonetheless, I was actually able to fully and reliably produce it by inserting a busy loop as follows: static void kvaser_usb_tx_acknowledge(...) { ... spin_lock_irqsave(&priv->tx_contexts_lock, flags); can_get_echo_skb(priv->netdev, context->echo_index); context->echo_index = dev->max_tx_urbs; --priv->active_tx_contexts; spin_unlock_irqrestore(&priv->tx_contexts_lock, flags); /* Manual delay; use cpu_relax() not to be optimized out */ for (n = 0; n < 1000; n++) cpu_relax(); netif_wake_queue(priv->netdev); ... } Then running a very heavy transmission load: $ for i in {1..10}; do (cangen -i -I 111 -g1 -Di -L4 can0 &); done And as I've expected, dmesg began seeing entries of "cannot find free context" due to waking up the tx queue while being full: [ +0.000002] kvaser_usb 1-1.2.1.1:1.0 can0: cannot find free context [ +0.000003] kvaser_usb 1-1.2.1.1:1.0 can0: cannot find free context [ +0.000002] kvaser_usb 1-1.2.1.1:1.0 can0: cannot find free context [ +0.000002] kvaser_usb 1-1.2.1.1:1.0 can0: cannot find free context Puting the netif_wake_queue() back inside the critical section, even while keeping the delay loop in place, avoided such a race condition. Under normal conditions, such a heavy delay between the critical section exit and netif_wake_queue() will not usually occur. This is even more true since a softirq cannot preempt another softirq running on the same CPU. Nonetheless, since the race can be manually triggered as seen above, I'd be safe and wake the queue inside the critical section rather than sorry... What do you think? Regards, Darwish ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions [not found] <20150226152011.GA6075@linux> [not found] ` <20150226152202.GB6075@linux> [not found] ` <20150311173744.GA13095@linux> @ 2015-03-14 13:02 ` Ahmed S. Darwish 2015-03-14 13:09 ` [PATCH v4 2/3] can: kvaser_usb: Utilize all possible tx URBs Ahmed S. Darwish 2015-03-14 13:41 ` [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions Marc Kleine-Budde 2 siblings, 2 replies; 15+ messages in thread From: Ahmed S. Darwish @ 2015-03-14 13:02 UTC (permalink / raw) To: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason Cc: Linux-CAN, LKML, netdev From: Ahmed S. Darwish <ahmed.darwish@valeo.com> A number of tx queue wake-up events went missing due to the outlined scenario below. Start state is a pool of 16 tx URBs, active tx_urbs count = 15, with the netdev tx queue open. CPU #1 [softirq] CPU #2 [softirq] start_xmit() tx_acknowledge() ................ ................ atomic_inc(&tx_urbs); if (atomic_read(&tx_urbs) >= 16) { --> atomic_dec(&tx_urbs); netif_wake_queue(); return; <-- netif_stop_queue(); } At the end, the correct state expected is a 15 tx_urbs count value with the tx queue state _open_. Due to the race, we get the same tx_urbs value but with the tx queue state _stopped_. The wake-up event is completely lost. Thus avoid hand-rolled concurrency mechanisms and use a proper lock for contexts and tx queue protection. Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> --- drivers/net/can/usb/kvaser_usb.c | 83 ++++++++++++++++++++++++---------------- 1 file changed, 51 insertions(+), 32 deletions(-) Changelog v4: ------------- Improve the commit log not to give the impression of a softirq preempting another softirq, which can never happen. The race condition occurs by having the softirqs running in parallel. For why are we waking up the queue inside the newly created critical section, kindly check the explanation here: http://article.gmane.org/gmane.linux.kernel/1907377 Archived at: http://www.webcitation.org/6X1SNi708 Meanwhile, I've been running the driver for 30 hours now under very heavy and ordered "cangen -Di" traffic from both ends. Analyzing the tens of gigabytes candump traces (generated, in parallel, using in-kernel CAN ID filters to avoid SO_RXQ_OVFL overflows) shows that all the frames were sent and received in the expected sequence. Changelog v3: ------------- Add missing spin_lock_init(). Run driver tests with locking and memory management debugging options on. Changelog v2: ------------- Put this bugfix patch at the top of the series diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c index a316fa4..e97a08c 100644 --- a/drivers/net/can/usb/kvaser_usb.c +++ b/drivers/net/can/usb/kvaser_usb.c @@ -14,6 +14,7 @@ * Copyright (C) 2015 Valeo S.A. */ +#include <linux/spinlock.h> #include <linux/kernel.h> #include <linux/completion.h> #include <linux/module.h> @@ -467,10 +468,11 @@ struct kvaser_usb { struct kvaser_usb_net_priv { struct can_priv can; - atomic_t active_tx_urbs; - struct usb_anchor tx_submitted; + spinlock_t tx_contexts_lock; + int active_tx_contexts; struct kvaser_usb_tx_urb_context tx_contexts[MAX_TX_URBS]; + struct usb_anchor tx_submitted; struct completion start_comp, stop_comp; struct kvaser_usb *dev; @@ -694,6 +696,7 @@ static void kvaser_usb_tx_acknowledge(const struct kvaser_usb *dev, struct kvaser_usb_net_priv *priv; struct sk_buff *skb; struct can_frame *cf; + unsigned long flags; u8 channel, tid; channel = msg->u.tx_acknowledge_header.channel; @@ -737,12 +740,15 @@ static void kvaser_usb_tx_acknowledge(const struct kvaser_usb *dev, stats->tx_packets++; stats->tx_bytes += context->dlc; - can_get_echo_skb(priv->netdev, context->echo_index); - context->echo_index = MAX_TX_URBS; - atomic_dec(&priv->active_tx_urbs); + spin_lock_irqsave(&priv->tx_contexts_lock, flags); + can_get_echo_skb(priv->netdev, context->echo_index); + context->echo_index = MAX_TX_URBS; + --priv->active_tx_contexts; netif_wake_queue(priv->netdev); + + spin_unlock_irqrestore(&priv->tx_contexts_lock, flags); } static void kvaser_usb_simple_msg_callback(struct urb *urb) @@ -803,17 +809,6 @@ static int kvaser_usb_simple_msg_async(struct kvaser_usb_net_priv *priv, return 0; } -static void kvaser_usb_unlink_tx_urbs(struct kvaser_usb_net_priv *priv) -{ - int i; - - usb_kill_anchored_urbs(&priv->tx_submitted); - atomic_set(&priv->active_tx_urbs, 0); - - for (i = 0; i < MAX_TX_URBS; i++) - priv->tx_contexts[i].echo_index = MAX_TX_URBS; -} - static void kvaser_usb_rx_error_update_can_state(struct kvaser_usb_net_priv *priv, const struct kvaser_usb_error_summary *es, struct can_frame *cf) @@ -1515,6 +1510,24 @@ error: return err; } +static void kvaser_usb_reset_tx_urb_contexts(struct kvaser_usb_net_priv *priv) +{ + int i; + + priv->active_tx_contexts = 0; + for (i = 0; i < MAX_TX_URBS; i++) + priv->tx_contexts[i].echo_index = MAX_TX_URBS; +} + +/* This method might sleep. Do not call it in the atomic context + * of URB completions. + */ +static void kvaser_usb_unlink_tx_urbs(struct kvaser_usb_net_priv *priv) +{ + usb_kill_anchored_urbs(&priv->tx_submitted); + kvaser_usb_reset_tx_urb_contexts(priv); +} + static void kvaser_usb_unlink_all_urbs(struct kvaser_usb *dev) { int i; @@ -1634,6 +1647,7 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, struct kvaser_msg *msg; int i, err, ret = NETDEV_TX_OK; u8 *msg_tx_can_flags = NULL; /* GCC */ + unsigned long flags; if (can_dropped_invalid_skb(netdev, skb)) return NETDEV_TX_OK; @@ -1687,12 +1701,21 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, if (cf->can_id & CAN_RTR_FLAG) *msg_tx_can_flags |= MSG_FLAG_REMOTE_FRAME; + spin_lock_irqsave(&priv->tx_contexts_lock, flags); for (i = 0; i < ARRAY_SIZE(priv->tx_contexts); i++) { if (priv->tx_contexts[i].echo_index == MAX_TX_URBS) { context = &priv->tx_contexts[i]; + + context->echo_index = i; + can_put_echo_skb(skb, netdev, context->echo_index); + ++priv->active_tx_contexts; + if (priv->active_tx_contexts >= MAX_TX_URBS) + netif_stop_queue(netdev); + break; } } + spin_unlock_irqrestore(&priv->tx_contexts_lock, flags); /* This should never happen; it implies a flow control bug */ if (!context) { @@ -1704,7 +1727,6 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, } context->priv = priv; - context->echo_index = i; context->dlc = cf->can_dlc; msg->u.tx_can.tid = context->echo_index; @@ -1716,18 +1738,17 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, kvaser_usb_write_bulk_callback, context); usb_anchor_urb(urb, &priv->tx_submitted); - can_put_echo_skb(skb, netdev, context->echo_index); - - atomic_inc(&priv->active_tx_urbs); - - if (atomic_read(&priv->active_tx_urbs) >= MAX_TX_URBS) - netif_stop_queue(netdev); - err = usb_submit_urb(urb, GFP_ATOMIC); if (unlikely(err)) { + spin_lock_irqsave(&priv->tx_contexts_lock, flags); + can_free_echo_skb(netdev, context->echo_index); + context->echo_index = MAX_TX_URBS; + --priv->active_tx_contexts; + netif_wake_queue(netdev); + + spin_unlock_irqrestore(&priv->tx_contexts_lock, flags); - atomic_dec(&priv->active_tx_urbs); usb_unanchor_urb(urb); stats->tx_dropped++; @@ -1854,7 +1875,7 @@ static int kvaser_usb_init_one(struct usb_interface *intf, struct kvaser_usb *dev = usb_get_intfdata(intf); struct net_device *netdev; struct kvaser_usb_net_priv *priv; - int i, err; + int err; err = kvaser_usb_send_simple_msg(dev, CMD_RESET_CHIP, channel); if (err) @@ -1868,19 +1889,17 @@ static int kvaser_usb_init_one(struct usb_interface *intf, priv = netdev_priv(netdev); + init_usb_anchor(&priv->tx_submitted); init_completion(&priv->start_comp); init_completion(&priv->stop_comp); - init_usb_anchor(&priv->tx_submitted); - atomic_set(&priv->active_tx_urbs, 0); - - for (i = 0; i < ARRAY_SIZE(priv->tx_contexts); i++) - priv->tx_contexts[i].echo_index = MAX_TX_URBS; - priv->dev = dev; priv->netdev = netdev; priv->channel = channel; + spin_lock_init(&priv->tx_contexts_lock); + kvaser_usb_reset_tx_urb_contexts(priv); + priv->can.state = CAN_STATE_STOPPED; priv->can.clock.freq = CAN_USB_CLOCK; priv->can.bittiming_const = &kvaser_usb_bittiming_const; -- 1.9.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 2/3] can: kvaser_usb: Utilize all possible tx URBs 2015-03-14 13:02 ` [PATCH v4 " Ahmed S. Darwish @ 2015-03-14 13:09 ` Ahmed S. Darwish 2015-03-14 13:11 ` [PATCH v4 3/3] can: kvaser_usb: Use can-dev unregistration mechanism Ahmed S. Darwish 2015-03-14 13:41 ` [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions Marc Kleine-Budde 1 sibling, 1 reply; 15+ messages in thread From: Ahmed S. Darwish @ 2015-03-14 13:09 UTC (permalink / raw) To: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason Cc: Linux-CAN, LKML, netdev From: Ahmed S. Darwish <ahmed.darwish@valeo.com> The driver currently limits the number of outstanding, not yet ACKed, transfers to 16 URBs. Meanwhile, the Kvaser firmware provides its actual max supported number of outstanding transmissions in its reply to the CMD_GET_SOFTWARE_INFO message. One example is the UsbCan-II HS/LS device which reports support of up to 48 tx URBs instead of just 16, increasing the driver throughput by two-fold and reducing the possibility of -ENOBUFs. Dynamically set the max tx URBs value according to firmware replies. Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> --- drivers/net/can/usb/kvaser_usb.c | 64 ++++++++++++++++++++++++---------------- 1 file changed, 39 insertions(+), 25 deletions(-) Changelog v4: Allocate the now-dynamically-sized tx_contexts[] array as a C99 "flexible array member", insuring automatic proper de-allocation on driver exit. diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c index e97a08c..60eadf5 100644 --- a/drivers/net/can/usb/kvaser_usb.c +++ b/drivers/net/can/usb/kvaser_usb.c @@ -25,7 +25,6 @@ #include <linux/can/dev.h> #include <linux/can/error.h> -#define MAX_TX_URBS 16 #define MAX_RX_URBS 4 #define START_TIMEOUT 1000 /* msecs */ #define STOP_TIMEOUT 1000 /* msecs */ @@ -443,6 +442,7 @@ struct kvaser_usb_error_summary { }; }; +/* Context for an outstanding, not yet ACKed, transmission */ struct kvaser_usb_tx_urb_context { struct kvaser_usb_net_priv *priv; u32 echo_index; @@ -456,8 +456,13 @@ struct kvaser_usb { struct usb_endpoint_descriptor *bulk_in, *bulk_out; struct usb_anchor rx_submitted; + /* @max_tx_urbs: Firmware-reported maximum number of oustanding, + * not yet ACKed, transmissions on this device. This value is + * also used as a sentinel for marking free tx contexts. + */ u32 fw_version; unsigned int nchannels; + unsigned int max_tx_urbs; enum kvaser_usb_family family; bool rxinitdone; @@ -467,19 +472,18 @@ struct kvaser_usb { struct kvaser_usb_net_priv { struct can_priv can; - - spinlock_t tx_contexts_lock; - int active_tx_contexts; - struct kvaser_usb_tx_urb_context tx_contexts[MAX_TX_URBS]; - - struct usb_anchor tx_submitted; - struct completion start_comp, stop_comp; + struct can_berr_counter bec; struct kvaser_usb *dev; struct net_device *netdev; int channel; - struct can_berr_counter bec; + struct completion start_comp, stop_comp; + struct usb_anchor tx_submitted; + + spinlock_t tx_contexts_lock; + int active_tx_contexts; + struct kvaser_usb_tx_urb_context tx_contexts[]; }; static const struct usb_device_id kvaser_usb_table[] = { @@ -657,9 +661,13 @@ static int kvaser_usb_get_software_info(struct kvaser_usb *dev) switch (dev->family) { case KVASER_LEAF: dev->fw_version = le32_to_cpu(msg.u.leaf.softinfo.fw_version); + dev->max_tx_urbs = + le16_to_cpu(msg.u.leaf.softinfo.max_outstanding_tx); break; case KVASER_USBCAN: dev->fw_version = le32_to_cpu(msg.u.usbcan.softinfo.fw_version); + dev->max_tx_urbs = + le16_to_cpu(msg.u.usbcan.softinfo.max_outstanding_tx); break; } @@ -715,7 +723,7 @@ static void kvaser_usb_tx_acknowledge(const struct kvaser_usb *dev, stats = &priv->netdev->stats; - context = &priv->tx_contexts[tid % MAX_TX_URBS]; + context = &priv->tx_contexts[tid % dev->max_tx_urbs]; /* Sometimes the state change doesn't come after a bus-off event */ if (priv->can.restart_ms && @@ -744,7 +752,7 @@ static void kvaser_usb_tx_acknowledge(const struct kvaser_usb *dev, spin_lock_irqsave(&priv->tx_contexts_lock, flags); can_get_echo_skb(priv->netdev, context->echo_index); - context->echo_index = MAX_TX_URBS; + context->echo_index = dev->max_tx_urbs; --priv->active_tx_contexts; netif_wake_queue(priv->netdev); @@ -1512,11 +1520,13 @@ error: static void kvaser_usb_reset_tx_urb_contexts(struct kvaser_usb_net_priv *priv) { - int i; + int i, max_tx_urbs; + + max_tx_urbs = priv->dev->max_tx_urbs; priv->active_tx_contexts = 0; - for (i = 0; i < MAX_TX_URBS; i++) - priv->tx_contexts[i].echo_index = MAX_TX_URBS; + for (i = 0; i < max_tx_urbs; i++) + priv->tx_contexts[i].echo_index = max_tx_urbs; } /* This method might sleep. Do not call it in the atomic context @@ -1702,14 +1712,14 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, *msg_tx_can_flags |= MSG_FLAG_REMOTE_FRAME; spin_lock_irqsave(&priv->tx_contexts_lock, flags); - for (i = 0; i < ARRAY_SIZE(priv->tx_contexts); i++) { - if (priv->tx_contexts[i].echo_index == MAX_TX_URBS) { + for (i = 0; i < dev->max_tx_urbs; i++) { + if (priv->tx_contexts[i].echo_index == dev->max_tx_urbs) { context = &priv->tx_contexts[i]; context->echo_index = i; can_put_echo_skb(skb, netdev, context->echo_index); ++priv->active_tx_contexts; - if (priv->active_tx_contexts >= MAX_TX_URBS) + if (priv->active_tx_contexts >= dev->max_tx_urbs) netif_stop_queue(netdev); break; @@ -1743,7 +1753,7 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, spin_lock_irqsave(&priv->tx_contexts_lock, flags); can_free_echo_skb(netdev, context->echo_index); - context->echo_index = MAX_TX_URBS; + context->echo_index = dev->max_tx_urbs; --priv->active_tx_contexts; netif_wake_queue(netdev); @@ -1881,7 +1891,9 @@ static int kvaser_usb_init_one(struct usb_interface *intf, if (err) return err; - netdev = alloc_candev(sizeof(*priv), MAX_TX_URBS); + netdev = alloc_candev(sizeof(*priv) + + dev->max_tx_urbs * sizeof(*priv->tx_contexts), + dev->max_tx_urbs); if (!netdev) { dev_err(&intf->dev, "Cannot alloc candev\n"); return -ENOMEM; @@ -1928,7 +1940,7 @@ static int kvaser_usb_init_one(struct usb_interface *intf, return err; } - netdev_dbg(netdev, "device registered\n"); + netdev_info(netdev, "device registered\n"); return 0; } @@ -2009,6 +2021,13 @@ static int kvaser_usb_probe(struct usb_interface *intf, return err; } + dev_dbg(&intf->dev, "Firmware version: %d.%d.%d\n", + ((dev->fw_version >> 24) & 0xff), + ((dev->fw_version >> 16) & 0xff), + (dev->fw_version & 0xffff)); + + dev_dbg(&intf->dev, "Max oustanding tx = %d URBs\n", dev->max_tx_urbs); + err = kvaser_usb_get_card_info(dev); if (err) { dev_err(&intf->dev, @@ -2016,11 +2035,6 @@ static int kvaser_usb_probe(struct usb_interface *intf, return err; } - dev_dbg(&intf->dev, "Firmware version: %d.%d.%d\n", - ((dev->fw_version >> 24) & 0xff), - ((dev->fw_version >> 16) & 0xff), - (dev->fw_version & 0xffff)); - for (i = 0; i < dev->nchannels; i++) { err = kvaser_usb_init_one(intf, id, i); if (err) { -- 1.9.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 3/3] can: kvaser_usb: Use can-dev unregistration mechanism 2015-03-14 13:09 ` [PATCH v4 2/3] can: kvaser_usb: Utilize all possible tx URBs Ahmed S. Darwish @ 2015-03-14 13:11 ` Ahmed S. Darwish 2015-03-14 15:26 ` Marc Kleine-Budde 0 siblings, 1 reply; 15+ messages in thread From: Ahmed S. Darwish @ 2015-03-14 13:11 UTC (permalink / raw) To: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason Cc: Linux-CAN, LKML, netdev From: Ahmed S. Darwish <ahmed.darwish@valeo.com> Use can-dev's unregister_candev() instead of directly calling networking unregister_netdev(). While both are functionally equivalent, unregister_candev() might do extra stuff in the future than just calling networking layer unregistration code. Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> --- drivers/net/can/usb/kvaser_usb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c index 60eadf5..d44fb1e 100644 --- a/drivers/net/can/usb/kvaser_usb.c +++ b/drivers/net/can/usb/kvaser_usb.c @@ -1866,7 +1866,7 @@ static void kvaser_usb_remove_interfaces(struct kvaser_usb *dev) if (!dev->nets[i]) continue; - unregister_netdev(dev->nets[i]->netdev); + unregister_candev(dev->nets[i]->netdev); } kvaser_usb_unlink_all_urbs(dev); -- 1.9.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] can: kvaser_usb: Use can-dev unregistration mechanism 2015-03-14 13:11 ` [PATCH v4 3/3] can: kvaser_usb: Use can-dev unregistration mechanism Ahmed S. Darwish @ 2015-03-14 15:26 ` Marc Kleine-Budde 2015-03-14 15:41 ` Ahmed S. Darwish 0 siblings, 1 reply; 15+ messages in thread From: Marc Kleine-Budde @ 2015-03-14 15:26 UTC (permalink / raw) To: Ahmed S. Darwish, Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Andri Yngvason Cc: Linux-CAN, LKML, netdev [-- Attachment #1: Type: text/plain, Size: 706 bytes --] On 03/14/2015 02:11 PM, Ahmed S. Darwish wrote: > From: Ahmed S. Darwish <ahmed.darwish@valeo.com> > > Use can-dev's unregister_candev() instead of directly calling > networking unregister_netdev(). While both are functionally > equivalent, unregister_candev() might do extra stuff in the > future than just calling networking layer unregistration code. Since 2 goes into can, I've applied this into can-next. Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] can: kvaser_usb: Use can-dev unregistration mechanism 2015-03-14 15:26 ` Marc Kleine-Budde @ 2015-03-14 15:41 ` Ahmed S. Darwish 2015-03-14 15:55 ` Marc Kleine-Budde 0 siblings, 1 reply; 15+ messages in thread From: Ahmed S. Darwish @ 2015-03-14 15:41 UTC (permalink / raw) To: Marc Kleine-Budde Cc: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Andri Yngvason, Linux-CAN, LKML, netdev On Sat, Mar 14, 2015 at 04:26:56PM +0100, Marc Kleine-Budde wrote: > On 03/14/2015 02:11 PM, Ahmed S. Darwish wrote: > > From: Ahmed S. Darwish <ahmed.darwish@valeo.com> > > > > Use can-dev's unregister_candev() instead of directly calling > > networking unregister_netdev(). While both are functionally > > equivalent, unregister_candev() might do extra stuff in the > > future than just calling networking layer unregistration code. > > Since 2 goes into can, I've applied this into can-next. > Merci. Was this a cherry-pick? Because I was going to send a new series with patch #2 better worded, and with a new patch for the endiannes issue. Regards, Darwish ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] can: kvaser_usb: Use can-dev unregistration mechanism 2015-03-14 15:41 ` Ahmed S. Darwish @ 2015-03-14 15:55 ` Marc Kleine-Budde 2015-03-14 16:06 ` Ahmed S. Darwish 0 siblings, 1 reply; 15+ messages in thread From: Marc Kleine-Budde @ 2015-03-14 15:55 UTC (permalink / raw) To: Ahmed S. Darwish Cc: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Andri Yngvason, Linux-CAN, LKML, netdev [-- Attachment #1: Type: text/plain, Size: 1785 bytes --] On 03/14/2015 04:41 PM, Ahmed S. Darwish wrote: > On Sat, Mar 14, 2015 at 04:26:56PM +0100, Marc Kleine-Budde wrote: >> On 03/14/2015 02:11 PM, Ahmed S. Darwish wrote: >>> From: Ahmed S. Darwish <ahmed.darwish@valeo.com> >>> >>> Use can-dev's unregister_candev() instead of directly calling >>> networking unregister_netdev(). While both are functionally >>> equivalent, unregister_candev() might do extra stuff in the >>> future than just calling networking layer unregistration code. >> >> Since 2 goes into can, I've applied this into can-next. > Was this a cherry-pick? Because I was going to send a new > series with patch #2 better worded, and with a new patch for > the endiannes issue. Yes, no need to resend patch #3, as it's already applied to can-next. regards, Marc 1 = can: kvaser_usb: Fix tx queue start/stop race conditions 2 = can: kvaser_usb: Utilize all possible tx URBs 3 = can: kvaser_usb: Use can-dev unregistration mechanism 4 = the endianess issue 1 = is in linux-can and included in linux-can-fixes-for-4.0-20150314 2 = will go into linux-can with a better commit message which is currently prepare by you will be in the next pull request for net 3 = is in linux-can-next and will be included in the next pull request for net-next 4 = is currently prepared by you will be in the next pull request for net This means, you'll send me patches 2 and 4 in a new v5 series. (This patches will of course have new numbers, 1 and 2.) -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] can: kvaser_usb: Use can-dev unregistration mechanism 2015-03-14 15:55 ` Marc Kleine-Budde @ 2015-03-14 16:06 ` Ahmed S. Darwish 0 siblings, 0 replies; 15+ messages in thread From: Ahmed S. Darwish @ 2015-03-14 16:06 UTC (permalink / raw) To: Marc Kleine-Budde Cc: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Andri Yngvason, Linux-CAN, LKML, netdev On Sat, Mar 14, 2015 at 04:55:11PM +0100, Marc Kleine-Budde wrote: > On 03/14/2015 04:41 PM, Ahmed S. Darwish wrote: > > On Sat, Mar 14, 2015 at 04:26:56PM +0100, Marc Kleine-Budde wrote: > >> On 03/14/2015 02:11 PM, Ahmed S. Darwish wrote: > >>> From: Ahmed S. Darwish <ahmed.darwish@valeo.com> > >>> > >>> Use can-dev's unregister_candev() instead of directly calling > >>> networking unregister_netdev(). While both are functionally > >>> equivalent, unregister_candev() might do extra stuff in the > >>> future than just calling networking layer unregistration code. > >> > >> Since 2 goes into can, I've applied this into can-next. > > > Was this a cherry-pick? Because I was going to send a new > > series with patch #2 better worded, and with a new patch for > > the endiannes issue. > > Yes, no need to resend patch #3, as it's already applied to can-next. > > regards, > Marc > > 1 = can: kvaser_usb: Fix tx queue start/stop race conditions > 2 = can: kvaser_usb: Utilize all possible tx URBs > 3 = can: kvaser_usb: Use can-dev unregistration mechanism > 4 = the endianess issue > > 1 = is in linux-can and included in linux-can-fixes-for-4.0-20150314 > 2 = will go into linux-can with a better commit message > which is currently prepare by you > will be in the next pull request for net > 3 = is in linux-can-next and will be included in the next pull request > for net-next > 4 = is currently prepared by you > will be in the next pull request for net > > This means, you'll send me patches 2 and 4 in a new v5 series. (This > patches will of course have new numbers, 1 and 2.) > A piece of cake :D ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions 2015-03-14 13:02 ` [PATCH v4 " Ahmed S. Darwish 2015-03-14 13:09 ` [PATCH v4 2/3] can: kvaser_usb: Utilize all possible tx URBs Ahmed S. Darwish @ 2015-03-14 13:41 ` Marc Kleine-Budde 2015-03-14 14:02 ` according net pull-request - was " Oliver Hartkopp 2015-03-14 14:38 ` Ahmed S. Darwish 1 sibling, 2 replies; 15+ messages in thread From: Marc Kleine-Budde @ 2015-03-14 13:41 UTC (permalink / raw) To: Ahmed S. Darwish, Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Andri Yngvason Cc: Linux-CAN, LKML, netdev [-- Attachment #1: Type: text/plain, Size: 1683 bytes --] On 03/14/2015 02:02 PM, Ahmed S. Darwish wrote: > From: Ahmed S. Darwish <ahmed.darwish@valeo.com> > > A number of tx queue wake-up events went missing due to the > outlined scenario below. Start state is a pool of 16 tx URBs, > active tx_urbs count = 15, with the netdev tx queue open. > > CPU #1 [softirq] CPU #2 [softirq] > start_xmit() tx_acknowledge() > ................ ................ > > atomic_inc(&tx_urbs); > if (atomic_read(&tx_urbs) >= 16) { > --> > atomic_dec(&tx_urbs); > netif_wake_queue(); > return; > <-- > netif_stop_queue(); > } > > At the end, the correct state expected is a 15 tx_urbs count > value with the tx queue state _open_. Due to the race, we get > the same tx_urbs value but with the tx queue state _stopped_. > The wake-up event is completely lost. > > Thus avoid hand-rolled concurrency mechanisms and use a proper > lock for contexts and tx queue protection. > > Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> Applied to can. This will go into David's net tree and finally into net-next. Then I'll apply patches 2+3. Nag me, if I forget about them ;) Thanks, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* according net pull-request - was Re: [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions 2015-03-14 13:41 ` [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions Marc Kleine-Budde @ 2015-03-14 14:02 ` Oliver Hartkopp 2015-03-14 14:15 ` Marc Kleine-Budde 2015-03-14 14:38 ` Ahmed S. Darwish 1 sibling, 1 reply; 15+ messages in thread From: Oliver Hartkopp @ 2015-03-14 14:02 UTC (permalink / raw) To: Marc Kleine-Budde, Stephane Grosjean; +Cc: Linux-CAN, netdev Hi Marc, On 03/14/2015 02:41 PM, Marc Kleine-Budde wrote: > Applied to can. This will go into David's net tree and finally into > net-next. Then I'll apply patches 2+3. Nag me, if I forget about them ;) Stephane sent an oot-driver to me for the new CAN FD USB adapter driver. I splitted up this driver into three patches for a review from Stephane. I would suggest to wait until Tuesday so that we can add these three patches. I tested the patches and they work as expected. But Stephane should take a final look at them. Would it make sense to post the splitted patches on linux-can right now? Regards, Oliver ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: according net pull-request - was Re: [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions 2015-03-14 14:02 ` according net pull-request - was " Oliver Hartkopp @ 2015-03-14 14:15 ` Marc Kleine-Budde 0 siblings, 0 replies; 15+ messages in thread From: Marc Kleine-Budde @ 2015-03-14 14:15 UTC (permalink / raw) To: Oliver Hartkopp, Stephane Grosjean; +Cc: Linux-CAN, netdev [-- Attachment #1: Type: text/plain, Size: 1206 bytes --] On 03/14/2015 03:02 PM, Oliver Hartkopp wrote: > Hi Marc, > > On 03/14/2015 02:41 PM, Marc Kleine-Budde wrote: >> Applied to can. This will go into David's net tree and finally into >> net-next. Then I'll apply patches 2+3. Nag me, if I forget about them ;) > > Stephane sent an oot-driver to me for the new CAN FD USB adapter driver. > I splitted up this driver into three patches for a review from Stephane. > > I would suggest to wait until Tuesday so that we can add these three patches. You mean for the pull request? I'll be quite busy until Thursday, for #reasons, maybe the whole coming week. So I'll send this pull request and make another one after Stephane's review and of next week. > I tested the patches and they work as expected. But Stephane should take a > final look at them. > > Would it make sense to post the splitted patches on linux-can right now? Sure - Fire at will. Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions 2015-03-14 13:41 ` [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions Marc Kleine-Budde 2015-03-14 14:02 ` according net pull-request - was " Oliver Hartkopp @ 2015-03-14 14:38 ` Ahmed S. Darwish 2015-03-14 14:58 ` Marc Kleine-Budde 1 sibling, 1 reply; 15+ messages in thread From: Ahmed S. Darwish @ 2015-03-14 14:38 UTC (permalink / raw) To: Marc Kleine-Budde Cc: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Andri Yngvason, Linux-CAN, LKML, netdev Hi Marc, On Sat, Mar 14, 2015 at 02:41:18PM +0100, Marc Kleine-Budde wrote: > On 03/14/2015 02:02 PM, Ahmed S. Darwish wrote: > > From: Ahmed S. Darwish <ahmed.darwish@valeo.com> > > > > A number of tx queue wake-up events went missing due to the > > outlined scenario below. Start state is a pool of 16 tx URBs, > > active tx_urbs count = 15, with the netdev tx queue open. > > > > CPU #1 [softirq] CPU #2 [softirq] > > start_xmit() tx_acknowledge() > > ................ ................ > > > > atomic_inc(&tx_urbs); > > if (atomic_read(&tx_urbs) >= 16) { > > --> > > atomic_dec(&tx_urbs); > > netif_wake_queue(); > > return; > > <-- > > netif_stop_queue(); > > } > > > > At the end, the correct state expected is a 15 tx_urbs count > > value with the tx queue state _open_. Due to the race, we get > > the same tx_urbs value but with the tx queue state _stopped_. > > The wake-up event is completely lost. > > > > Thus avoid hand-rolled concurrency mechanisms and use a proper > > lock for contexts and tx queue protection. > > > > Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> > > Applied to can. This will go into David's net tree and finally into > net-next. Then I'll apply patches 2+3. Nag me, if I forget about them ;) > Thanks! :-) So if I've understood correctly, this patch will go to -rc5 and the rest will go into net-next? If so, IMHO patch #2 should also go to -rc5. I know it's usually frowned up on to add further patches at this late -rc stage, but here's my logic: The original driver code just _arbitrarily_ assumed a MAX_TX_URB value of 16 parallel transmissions. This value was chosen, it seems, because the driver was heavily based on esd_usb2.c and the esd code just did so :-( Meanwhile, in the Kvaser hardware at hand, if I've increased the driver's max parallel transmissions little above the recommended limit reported by firmware, the firmware breaks up badly, reports a massive list of internal errors, and the candump traces becomes sort of an internal mess hardly related to the actual frames sent and received. In my case, I was lucky that the brand we own here (*) had a higher max outstanding transmissions value than 16. But if there is hardware out there with a max oustanding tx support < 16 (#), such hardware will break badly under a heavy transmission load :-( (*) http://www.kvaser.com/products/kvaser-usb-hs-ii-hsls/ (#) There are a huge list of Kvaser products having the same controller but with different performance metrics, so this is quite a possiblity. Thanks, Darwish ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions 2015-03-14 14:38 ` Ahmed S. Darwish @ 2015-03-14 14:58 ` Marc Kleine-Budde 2015-03-14 15:19 ` Ahmed S. Darwish 0 siblings, 1 reply; 15+ messages in thread From: Marc Kleine-Budde @ 2015-03-14 14:58 UTC (permalink / raw) To: Ahmed S. Darwish Cc: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Andri Yngvason, Linux-CAN, LKML, netdev [-- Attachment #1: Type: text/plain, Size: 2196 bytes --] On 03/14/2015 03:38 PM, Ahmed S. Darwish wrote: >> Applied to can. This will go into David's net tree and finally into >> net-next. Then I'll apply patches 2+3. Nag me, if I forget about them ;) >> > > Thanks! :-) > > So if I've understood correctly, this patch will go to -rc5 and > the rest will go into net-next? > > If so, IMHO patch #2 should also go to -rc5. I know it's usually > frowned up on to add further patches at this late -rc stage, but > here's my logic: > > The original driver code just _arbitrarily_ assumed a MAX_TX_URB > value of 16 parallel transmissions. This value was chosen, it seems, > because the driver was heavily based on esd_usb2.c and the esd code > just did so :-( > > Meanwhile, in the Kvaser hardware at hand, if I've increased the > driver's max parallel transmissions little above the recommended > limit reported by firmware, the firmware breaks up badly, reports a > massive list of internal errors, and the candump traces becomes > sort of an internal mess hardly related to the actual frames sent > and received. In this particular hardware, what's the limit as reported by the firmware? > In my case, I was lucky that the brand we own here (*) had a higher > max outstanding transmissions value than 16. But if there is hardware Okay - higher. > out there with a max oustanding tx support < 16 (#), such hardware > will break badly under a heavy transmission load :-( I see. If you add this motivation to the patch description and let the subject reflect that this is a "fix" or safety measure rather than a possible performance improvement, no-one will say anything against this patch :D > (*) http://www.kvaser.com/products/kvaser-usb-hs-ii-hsls/ > > (#) There are a huge list of Kvaser products having the same controller > but with different performance metrics, so this is quite a > possiblity. Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions 2015-03-14 14:58 ` Marc Kleine-Budde @ 2015-03-14 15:19 ` Ahmed S. Darwish 0 siblings, 0 replies; 15+ messages in thread From: Ahmed S. Darwish @ 2015-03-14 15:19 UTC (permalink / raw) To: Marc Kleine-Budde Cc: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger, Andri Yngvason, Linux-CAN, LKML, netdev On Sat, Mar 14, 2015 at 03:58:39PM +0100, Marc Kleine-Budde wrote: > On 03/14/2015 03:38 PM, Ahmed S. Darwish wrote: > >> Applied to can. This will go into David's net tree and finally into > >> net-next. Then I'll apply patches 2+3. Nag me, if I forget about them ;) > >> > > > > Thanks! :-) > > > > So if I've understood correctly, this patch will go to -rc5 and > > the rest will go into net-next? > > > > If so, IMHO patch #2 should also go to -rc5. I know it's usually > > frowned up on to add further patches at this late -rc stage, but > > here's my logic: > > > > The original driver code just _arbitrarily_ assumed a MAX_TX_URB > > value of 16 parallel transmissions. This value was chosen, it seems, > > because the driver was heavily based on esd_usb2.c and the esd code > > just did so :-( > > > > Meanwhile, in the Kvaser hardware at hand, if I've increased the > > driver's max parallel transmissions little above the recommended > > limit reported by firmware, the firmware breaks up badly, reports a > > massive list of internal errors, and the candump traces becomes > > sort of an internal mess hardly related to the actual frames sent > > and received. > > In this particular hardware, what's the limit as reported by the firmware? > 48 max oustanding tx, which is quite big in itself it seems. Other drivers in the tree range between 10 (Peak) and 20 tx (8dev). > > In my case, I was lucky that the brand we own here (*) had a higher > > max outstanding transmissions value than 16. But if there is hardware > > Okay - higher. > > > out there with a max oustanding tx support < 16 (#), such hardware > > will break badly under a heavy transmission load :-( > > I see. > > If you add this motivation to the patch description and let the subject > reflect that this is a "fix" or safety measure rather than a possible > performance improvement, no-one will say anything against this patch :D > True; I admit the "fix" part should've been clearer :-) Will send a better worded commit message then. Thanks a lot, Darwish ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-03-14 16:06 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20150226152011.GA6075@linux>
[not found] ` <20150226152202.GB6075@linux>
[not found] ` <20150226152436.GC6075@linux>
[not found] ` <20150226152558.GD6075@linux>
2015-02-26 15:29 ` [PATCH 5/5] can: kvaser_usb: Fix tx queue start/stop race conditions Ahmed S. Darwish
[not found] ` <20150311173744.GA13095@linux>
[not found] ` <5500B6EB.8050905@pengutronix.de>
2015-03-12 19:30 ` [PATCH v3 1/3] " Ahmed S. Darwish
2015-03-14 13:02 ` [PATCH v4 " Ahmed S. Darwish
2015-03-14 13:09 ` [PATCH v4 2/3] can: kvaser_usb: Utilize all possible tx URBs Ahmed S. Darwish
2015-03-14 13:11 ` [PATCH v4 3/3] can: kvaser_usb: Use can-dev unregistration mechanism Ahmed S. Darwish
2015-03-14 15:26 ` Marc Kleine-Budde
2015-03-14 15:41 ` Ahmed S. Darwish
2015-03-14 15:55 ` Marc Kleine-Budde
2015-03-14 16:06 ` Ahmed S. Darwish
2015-03-14 13:41 ` [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions Marc Kleine-Budde
2015-03-14 14:02 ` according net pull-request - was " Oliver Hartkopp
2015-03-14 14:15 ` Marc Kleine-Budde
2015-03-14 14:38 ` Ahmed S. Darwish
2015-03-14 14:58 ` Marc Kleine-Budde
2015-03-14 15:19 ` Ahmed S. Darwish
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).