From: Lino Sanfilippo <LinoSanfilippo-Mmb7MZpHnFY@public.gmane.org>
To: Shuyu Wei <wsy2220-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: heiko-4mtYJXux2i+zQB+pC5nmwQ@public.gmane.org,
al.kochet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
Francois Romieu <romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org>,
David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
wxt-TNX95d0MmH7DzftRWevZcw@public.gmane.org
Subject: Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer
Date: Sun, 22 May 2016 13:30:27 +0200 [thread overview]
Message-ID: <57419853.9050701@gmx.de> (raw)
In-Reply-To: <20160522091742.GA8681@debian-dorm>
On 22.05.2016 11:17, Shuyu Wei wrote:
> Hi Lino,
>
> I tested this patch, it still got panic under stress.
> Just wget 2 large files simultaneously and it failed.
>
> Looks like the problem comes from the if statement in tx_clean().
> I changed your patch to use
>
> - if (info & FOR_EMAC)
> + if ((info & FOR_EMAC) || !txbd->data || !skb)
>
> and it worked.
Thanks for testing. However that extra check for skb not being NULL should not be
necessary if the code were correct. The changes I suggested were all about having
skb and info consistent with txbd_curr.
But I just realized that there is still a big flaw in the last changes. While
tx() looks correct now (we first set up the descriptor and assign the skb and _then_
advance txbd_curr) tx_clean still is not:
We _first_ have to read tx_curr and _then_ read the corresponding descriptor and its skb.
(The last patch implemented just the reverse - and thus wrong - order, first get skb and
descriptor and then read tx_curr).
So the patch below hopefully handles also tx_clean correctly. Could you please do once more a test
with this one?
>
> After further test, my patch to barrier timestamp() didn't work.
> Just like the original code in the tree, the emac still got stuck under
> high load, even if I changed the smp_wmb() to dma_wmb(). So the original
> code do have race somewhere.
So to make this clear: with the current code in net-next you still see a problem (lockup), right?
>
> I'm new to kernel development, and still trying to understand how memory
> barrier works
Its an interresting topic and thats what I am trying to understand, too :)
> ... and why Francois' fix worked. Please be patient with me :-).
So which fix(es) exactly work for you and solve your lockup issue?
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -159,12 +159,22 @@ static void arc_emac_tx_clean(struct net_device *ndev)
unsigned int *txbd_dirty = &priv->txbd_dirty;
struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];
- struct sk_buff *skb = tx_buff->skb;
unsigned int info = le32_to_cpu(txbd->info);
+ struct sk_buff *skb;
- if ((info & FOR_EMAC) || !txbd->data || !skb)
+ if (*txbd_dirty == priv->txbd_curr)
break;
+ /* Make sure curr pointer is consistent with info */
+ rmb();
+
+ info = le32_to_cpu(txbd->info);
+
+ if (info & FOR_EMAC)
+ break;
+
+ skb = tx_buff->skb;
+
if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) {
stats->tx_errors++;
stats->tx_dropped++;
@@ -195,8 +205,8 @@ static void arc_emac_tx_clean(struct net_device *ndev)
*txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM;
}
- /* Ensure that txbd_dirty is visible to tx() before checking
- * for queue stopped.
+ /* Ensure that txbd_dirty is visible to tx() and we see the most recent
+ * value for txbd_curr.
*/
smp_mb();
@@ -680,35 +690,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
dma_unmap_len_set(&priv->tx_buff[*txbd_curr], len, len);
priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
-
- /* Make sure pointer to data buffer is set */
- wmb();
+ priv->tx_buff[*txbd_curr].skb = skb;
skb_tx_timestamp(skb);
*info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
- /* Make sure info word is set */
+ /* 1. Make sure that with respect to tx_clean everything is set up
+ * properly before we advance txbd_curr.
+ * 2. Make sure writes to DMA descriptors are completed before we inform
+ * the hardware.
+ */
wmb();
- priv->tx_buff[*txbd_curr].skb = skb;
-
/* Increment index to point to the next BD */
*txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
- /* Ensure that tx_clean() sees the new txbd_curr before
- * checking the queue status. This prevents an unneeded wake
- * of the queue in tx_clean().
+ /* Ensure we see the most recent value of txbd_dirty and tx_clean() sees
+ * the updated value of txbd_curr.
*/
smp_mb();
- if (!arc_emac_tx_avail(priv)) {
+ if (!arc_emac_tx_avail(priv))
netif_stop_queue(ndev);
- /* Refresh tx_dirty */
- smp_mb();
- if (arc_emac_tx_avail(priv))
- netif_start_queue(ndev);
- }
arc_reg_set(priv, R_STATUS, TXPL_MASK);
--
1.9.1
next prev parent reply other threads:[~2016-05-22 11:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-17 15:25 [PATCH v2] ethernet:arc: Fix racing of TX ring buffer Shuyu Wei
2016-05-17 16:36 ` Aw: " Lino Sanfilippo
2016-05-17 18:24 ` David Miller
[not found] ` <20160517.142456.2247845107325931733.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2016-05-18 0:01 ` Francois Romieu
[not found] ` <20160518000153.GA21757-lmTtMILVy1jWQcoT9B9Ug5SCg42XY1Uw0E9HWUfgJXw@public.gmane.org>
2016-05-18 20:29 ` Lino Sanfilippo
[not found] ` <573CD09D.1060307-Mmb7MZpHnFY@public.gmane.org>
2016-05-18 22:55 ` Francois Romieu
[not found] ` <20160518225529.GA18671-lmTtMILVy1jWQcoT9B9Ug5SCg42XY1Uw0E9HWUfgJXw@public.gmane.org>
2016-05-19 21:15 ` Lino Sanfilippo
[not found] ` <573E2D0C.604-Mmb7MZpHnFY@public.gmane.org>
2016-05-20 0:31 ` Francois Romieu
2016-05-21 16:09 ` Shuyu Wei
2016-05-21 19:47 ` Francois Romieu
[not found] ` <20160521194733.GA30557-lmTtMILVy1jWQcoT9B9Ug5SCg42XY1Uw0E9HWUfgJXw@public.gmane.org>
2016-05-21 23:04 ` Lino Sanfilippo
[not found] ` <5740E98A.5050803-Mmb7MZpHnFY@public.gmane.org>
2016-05-22 21:21 ` Francois Romieu
2016-05-21 22:58 ` Lino Sanfilippo
2016-05-22 9:17 ` Shuyu Wei
2016-05-22 11:30 ` Lino Sanfilippo [this message]
[not found] ` <57419853.9050701-Mmb7MZpHnFY@public.gmane.org>
2016-05-22 22:36 ` Francois Romieu
[not found] ` <20160522223659.GB5086-lmTtMILVy1jWQcoT9B9Ug5SCg42XY1Uw0E9HWUfgJXw@public.gmane.org>
2016-05-24 1:09 ` Lino Sanfilippo
[not found] ` <5743A9DD.8010202-Mmb7MZpHnFY@public.gmane.org>
2016-05-24 19:02 ` Francois Romieu
2016-05-24 23:56 ` Lino Sanfilippo
2016-05-28 6:43 ` Shuyu Wei
2016-05-30 21:41 ` Lino Sanfilippo
2016-06-05 14:02 ` Shuyu Wei
2016-06-08 7:54 ` Lino Sanfilippo
2016-05-23 11:36 ` Shuyu Wei
2016-05-24 1:14 ` Lino Sanfilippo
2016-05-21 13:46 ` Shuyu Wei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57419853.9050701@gmx.de \
--to=linosanfilippo-mmb7mzphnfy@public.gmane.org \
--cc=al.kochet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
--cc=heiko-4mtYJXux2i+zQB+pC5nmwQ@public.gmane.org \
--cc=linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org \
--cc=wsy2220-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=wxt-TNX95d0MmH7DzftRWevZcw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.