netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac?
@ 2017-11-27 14:41 Jose Abreu
  2017-11-30  3:51 ` Niklas Cassel
  0 siblings, 1 reply; 7+ messages in thread
From: Jose Abreu @ 2017-11-27 14:41 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: Joao Pinto, linux-netdev, Giuseppe CAVALLARO

Hi Niklas,

I think your commit 05cf0d1bf4 ("net: stmmac: free an skb first
when there are no longer any descriptors using it") is breaking
stmmac driver in multi-queue configuration (this stacktrace may
contain some extra characters as I was using serial port):

------------->8-------------
general protection fault: 0000 [#1] SMP
Modules linked in: stmmac_pci stmmac libphy igb ptp pps_core
x86_pkg_temp_thermal
CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W       4.14.0-rc5 #2
Hardware name: Default string Default string/SKYBAY, BIOS 5.0.1.1
10/06/2016
task: ffffa2fe14d8b100 task.stack: ffffb8c6000b8000
RIP: 0010:skb_release_data+0x66/0x110
RSP: 0018:ffffa2fe2dd43d98 EFLAGS: 00010206
RAX: 0000000000000030 RBX: ffffa2fe13fab100 RCX: 00000000000005aa
RDX: ffffa2fe12a50000 RSI: 0000000000000000 RDI: fffcfffdfffbfffc
RBP: ffffa2fe2dd43db0 R08: ffffa2fe2dfcd000 R09: 0000000000000001
R10: ffffffffa06245d0 R11: ffffa2fe14c03700 R12: 0000000000000000
R13: ffffa2fe11e686c0 R14: ffffa2fe13fab100 R15: ffffa2fe129b8940
FS:  0000000000000000(0000) GS:ffffa2fe2dd40000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe26c457000 CR3: 000000002b609003 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 skb_release_all+0x1f/0x30
 consume_skb+0x1d/0x40
 __dev_kfree_skb_any+0x2a/0x30
 stmmac_tx_clean+0x230/0x4d0 [stmmac]
 stmmac_poll+0x4b/0x980 [stmmac]
 net_rx_action+0x1ad/0x290
 __do_softirq+0xdd/0x1d6
 irq_exit+0x77/0x80
 do_IRQ+0x4a/0xc0
 common_interrupt+0x93/0x93
 </IRQ>
RIP: 0010:cpuidle_enter_state+0x16a/0x210
RSP: 0018:ffffb8c6000bbe90 EFLAGS: 00000286 ORIG_RAX:
ffffffffffffffae
RAX: ffffa2fe2dd575c0 RBX: ffffa2fe14560200 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 00000122dbd7ae06 RDI: 0000000000000000
RBP: ffffb8c6000bbec0 R08: 0000000000000020 R09: 0000000000000002
R10: ffffb8c6000bbe60 R11: 0000000000123400 R12: 0000004f3c11b47a
R13: 0000000000000003 R14: ffffffffa0e3aa58 R15: 0000000000000003
 cpuidle_enter+0x12/0x20
 call_cpuidle+0x1e/0x40
 do_idle+0x16a/0x1c0
 cpu_startup_entry+0x18/0x20
 start_secondary+0x10d/0x110
 secondary_startup_64+0xa5/0xa5
------------->8-------------

Using tree with your commit I get this stacktrace upon streaming
data at random time (stacktrace does not appear everytime),
without the commit I get no errors.

I understand the reason for your commit but we already have
dirty_tx in stmmac_tx_clean(), shouldn't it be enough to prevent
skb use-after-free? Can you please review your patch to make sure
nothing else is missing?

BTW, this was not *officially* a git bissect, but I only reverted
your commit and it is working fine now.

Best Regards,
Jose Miguel Abreu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac?
  2017-11-27 14:41 Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac? Jose Abreu
@ 2017-11-30  3:51 ` Niklas Cassel
  2017-11-30 16:50   ` Jose Abreu
  0 siblings, 1 reply; 7+ messages in thread
From: Niklas Cassel @ 2017-11-30  3:51 UTC (permalink / raw)
  To: Jose Abreu; +Cc: Joao Pinto, linux-netdev, Giuseppe CAVALLARO

On Mon, Nov 27, 2017 at 02:41:00PM +0000, Jose Abreu wrote:
> Hi Niklas,

Hello Jose,

> 
> I think your commit 05cf0d1bf4 ("net: stmmac: free an skb first
> when there are no longer any descriptors using it") is breaking
> stmmac driver in multi-queue configuration (this stacktrace may
> contain some extra characters as I was using serial port):
> 
> ------------->8-------------
> general protection fault: 0000 [#1] SMP
> Modules linked in: stmmac_pci stmmac libphy igb ptp pps_core
> x86_pkg_temp_thermal
> CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W       4.14.0-rc5 #2
> Hardware name: Default string Default string/SKYBAY, BIOS 5.0.1.1
> 10/06/2016
> task: ffffa2fe14d8b100 task.stack: ffffb8c6000b8000
> RIP: 0010:skb_release_data+0x66/0x110
> RSP: 0018:ffffa2fe2dd43d98 EFLAGS: 00010206
> RAX: 0000000000000030 RBX: ffffa2fe13fab100 RCX: 00000000000005aa
> RDX: ffffa2fe12a50000 RSI: 0000000000000000 RDI: fffcfffdfffbfffc
> RBP: ffffa2fe2dd43db0 R08: ffffa2fe2dfcd000 R09: 0000000000000001
> R10: ffffffffa06245d0 R11: ffffa2fe14c03700 R12: 0000000000000000
> R13: ffffa2fe11e686c0 R14: ffffa2fe13fab100 R15: ffffa2fe129b8940
> FS:  0000000000000000(0000) GS:ffffa2fe2dd40000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe26c457000 CR3: 000000002b609003 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <IRQ>
>  skb_release_all+0x1f/0x30
>  consume_skb+0x1d/0x40
>  __dev_kfree_skb_any+0x2a/0x30
>  stmmac_tx_clean+0x230/0x4d0 [stmmac]
>  stmmac_poll+0x4b/0x980 [stmmac]
>  net_rx_action+0x1ad/0x290
>  __do_softirq+0xdd/0x1d6
>  irq_exit+0x77/0x80
>  do_IRQ+0x4a/0xc0
>  common_interrupt+0x93/0x93
>  </IRQ>
> RIP: 0010:cpuidle_enter_state+0x16a/0x210
> RSP: 0018:ffffb8c6000bbe90 EFLAGS: 00000286 ORIG_RAX:
> ffffffffffffffae
> RAX: ffffa2fe2dd575c0 RBX: ffffa2fe14560200 RCX: 000000000000001f
> RDX: 0000000000000000 RSI: 00000122dbd7ae06 RDI: 0000000000000000
> RBP: ffffb8c6000bbec0 R08: 0000000000000020 R09: 0000000000000002
> R10: ffffb8c6000bbe60 R11: 0000000000123400 R12: 0000004f3c11b47a
> R13: 0000000000000003 R14: ffffffffa0e3aa58 R15: 0000000000000003
>  cpuidle_enter+0x12/0x20
>  call_cpuidle+0x1e/0x40
>  do_idle+0x16a/0x1c0
>  cpu_startup_entry+0x18/0x20
>  start_secondary+0x10d/0x110
>  secondary_startup_64+0xa5/0xa5
> ------------->8-------------
> 
> Using tree with your commit I get this stacktrace upon streaming
> data at random time (stacktrace does not appear everytime),
> without the commit I get no errors.
> 
> I understand the reason for your commit but we already have
> dirty_tx in stmmac_tx_clean(), shouldn't it be enough to prevent
> skb use-after-free? Can you please review your patch to make sure
> nothing else is missing?

Dirty is not enough.

The problem that I thought I solved,
was if an single skb was spread out over
3 tx descs like this:

TX DESC #1: first descriptor bit set
TX DESC #2: 
TX DESC #3: last descriptor bit set

before my patch the
tx_q->tx_skbuff[first_entry] would have the skb pointer.

let's say that curr_tx is pointing to TX DESC4,
since curr_tx is supposed to point to the place where
we are going to start filling next time.

The problem I thought I solved was if stmmac_tx_clean
came and started cleaning, it could clean TX DESC #1,
and thus freeing the skb (which TX DESC #2 and TX DESC #3 uses.)

However, by reading the databook more carefully:
If an Ethernet packet is stored over data buffers in multiple
descriptors, the DMA will fetch all descriptors, and
will only update the own bit in all descriptors,
after the complete packet has been sent.

Because of this, I think that the case where
TX DESC #1 gets its own bit cleared, while the hardware
hasn't yet fetched the TX DESC #2 and TX DESC #3 cannot
happen, so my patch 05cf0d1bf4 ("net: stmmac: free an skb first
when there are no longer any descriptors using it")
has no real purpose, and could be reverted.



However, I don't see why is should cause your
kernel to crash.

It looks like skb_release_data is trying to access a member in a pointer
that is null, i.e. a double free of an skb.

I assume this is because the same skb address exists in several queues.

My guess is that the problem is that tx_q->tx_skbuff[] does not get set to
NULL properly when having multiple queues. (It obviously works for a single
tx queue).

stmmac_tx_clean() will do:
if (skb)
    tx_q->tx_skbuff[entry] = NULL;

so skbs should always be set to NULL.

However, one difference seems to be that stmmac_xmit and stmmac_tso_xmit
inside the nfrags for-loop, seems to set the skb pointer to NULL.

for (i = 0; i < nfrags; i++) {
    tx_q->tx_skbuff[entry] = NULL;

Considering this is already done in stmmac_tx_clean, this seems redundant.

But one difference is that before my patch,
all frags, except first TX DESC, would get NULL:ed
(first desc would be assigned the skb).

After my patch, all descs, except first TX DESC,
would get NULL:ed, last desc would get assigned
the skb, but first desc is not explicitly cleared.

Could you try to see if the following patch
solves your problem:
(still don't see why it should be needed,
considering stmmac_tx_clean() should set all non-NULL
entries to NULL)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1763e48c84e2..1d30b20b3740 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2830,6 +2830,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 
        tx_q->tx_skbuff_dma[first_entry].buf = des;
        tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb);
+       tx_q->tx_skbuff[first_entry] = NULL;
 
        first->des0 = cpu_to_le32(des);
 
@@ -3003,6 +3004,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
        first = desc;
 
+       tx_q->tx_skbuff[first_entry] = NULL;
+
        enh_desc = priv->plat->enh_desc;
        /* To program the descriptors according to the size of the frame */
        if (enh_desc)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac?
  2017-11-30  3:51 ` Niklas Cassel
@ 2017-11-30 16:50   ` Jose Abreu
  2017-12-06 12:14     ` Niklas Cassel
  0 siblings, 1 reply; 7+ messages in thread
From: Jose Abreu @ 2017-11-30 16:50 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: Joao Pinto, linux-netdev, Giuseppe CAVALLARO

Hi Niklas,

Thanks for the detailed explanation!

On 30-11-2017 03:51, Niklas Cassel wrote:
>
> Could you try to see if the following patch
> solves your problem:
> (still don't see why it should be needed,
> considering stmmac_tx_clean() should set all non-NULL
> entries to NULL)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 1763e48c84e2..1d30b20b3740 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2830,6 +2830,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>         tx_q->tx_skbuff_dma[first_entry].buf = des;
>         tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb);
> +       tx_q->tx_skbuff[first_entry] = NULL;
>  
>         first->des0 = cpu_to_le32(des);
>  
> @@ -3003,6 +3004,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>         first = desc;
>  
> +       tx_q->tx_skbuff[first_entry] = NULL;
> +
>         enh_desc = priv->plat->enh_desc;
>         /* To program the descriptors according to the size of the frame */
>         if (enh_desc)

I confirm that applying 05cf0d1bf4 ("net: stmmac: free an skb
first when there are no longer any descriptors using it") and
this patch makes my setup work perfectly in multi-queue
configuration. Can you send an official patch to -net?

You can add my:
Tested-by: Jose Abreu <joabreu@synopsys.com>

Also, maybe we should try to understand why stmmac_tx_clean() is
not setting the entry to NULL?

Thanks and Best Regards,
Jose Miguel Abreu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac?
  2017-11-30 16:50   ` Jose Abreu
@ 2017-12-06 12:14     ` Niklas Cassel
  2018-02-13 13:33       ` Niklas Cassel
  0 siblings, 1 reply; 7+ messages in thread
From: Niklas Cassel @ 2017-12-06 12:14 UTC (permalink / raw)
  To: Jose Abreu; +Cc: Joao Pinto, linux-netdev, Giuseppe CAVALLARO, alexandre.torgue

On Thu, Nov 30, 2017 at 04:50:38PM +0000, Jose Abreu wrote:
> Hi Niklas,
> 
> Thanks for the detailed explanation!
> 
> On 30-11-2017 03:51, Niklas Cassel wrote:
> >
> > Could you try to see if the following patch
> > solves your problem:
> > (still don't see why it should be needed,
> > considering stmmac_tx_clean() should set all non-NULL
> > entries to NULL)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index 1763e48c84e2..1d30b20b3740 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -2830,6 +2830,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
> >  
> >         tx_q->tx_skbuff_dma[first_entry].buf = des;
> >         tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb);
> > +       tx_q->tx_skbuff[first_entry] = NULL;
> >  
> >         first->des0 = cpu_to_le32(des);
> >  
> > @@ -3003,6 +3004,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
> >  
> >         first = desc;
> >  
> > +       tx_q->tx_skbuff[first_entry] = NULL;
> > +
> >         enh_desc = priv->plat->enh_desc;
> >         /* To program the descriptors according to the size of the frame */
> >         if (enh_desc)
> 
> I confirm that applying 05cf0d1bf4 ("net: stmmac: free an skb
> first when there are no longer any descriptors using it") and
> this patch makes my setup work perfectly in multi-queue
> configuration. Can you send an official patch to -net?
> 
> You can add my:
> Tested-by: Jose Abreu <joabreu@synopsys.com>
> 
> Also, maybe we should try to understand why stmmac_tx_clean() is
> not setting the entry to NULL?

Hello Jose,

It is not easy to decode the databook, however,
after reading the databook several times more,
looking at:

3.3.2.2 Tx DMA Operation: OSP Mode

"The DMA writes the status, with a cleared Own bit, to the
corresponding TDES3, thus closing the descriptor."

So closing the descriptor means clearing the own bit.


Now looking at the mode we are interested in:

3.3.2.1 Tx DMA Operation: Default (Non-OSP) Mode

8. If an Ethernet packet is stored over data buffers in multiple descriptors,
the DMA closes the intermediate descriptor and fetches the next descriptor.
Steps 3 through 7 are repeated until the end-of-Ethernet-packet data is
transferred to the MTL.

So I think that my initial understanding was correct after all,
i.e., the commit 05cf0d1bf4 ("net: stmmac: free an skb first when
there are no longer any descriptors using it") is needed to make
sure that the DMA is not using skbs that might have been freed.

How certain are you that this is the offending commit?

Your crash is on v4.14-rc5, I can see an interesting
commit: 8d5f4b071749 ("stmmac: Don't access tx_q->dirty_tx before
netif_tx_lock"), which got included first in v4.14-rc6.



I've tried to understand if this could be a use-after-free,
but I don't see it.

An skb has a certain queue-mapping, so it shouldn't be in more
than one queue.

tx_q->tx_skbuff is allocated with kmalloc_array(), but is
initialized in init_dma_tx_desc_rings().

Other than that, it is only used in stmmac_tso_xmit()/stmmac_xmit()
and stmmac_tx_clean(). stmmac_tx_clean() will free and non-NULL skb,
and set tx_q->tx_skbuff[entry] to NULL.

I don't see why stmmac does:

for (i = 0; i < nfrags; i++) {
    ...
    tx_q->tx_skbuff[tx_q->cur_tx] = NULL;

since it is initialized to NULL in init_dma_tx_desc_rings(),
and if it has been cleaned, it should be set to NULL again.

If we haven't cleaned for a long time, we will not be able
to queue more skbs, since stmmac_tso_xmit()/stmmac_xmit()
checks current "clean" entries, see stmmac_tx_avail().

I removed the tx_q->tx_skbuff[tx_q->cur_tx] = NULL
from both stmmac_tso_xmit()/stmmac_xmit() locally,
and I could not see any problems with ping (different sizes)
or with iperf3.


Could you try to reproduce the bug with CONFIG_KASAN enabled?
(And please don't cut anything from the oops message).


Regards,
Niklas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac?
  2017-12-06 12:14     ` Niklas Cassel
@ 2018-02-13 13:33       ` Niklas Cassel
  2018-02-16  9:34         ` Jose Abreu
  0 siblings, 1 reply; 7+ messages in thread
From: Niklas Cassel @ 2018-02-13 13:33 UTC (permalink / raw)
  To: Niklas Cassel, Jose Abreu
  Cc: Joao Pinto, linux-netdev, Giuseppe CAVALLARO, alexandre.torgue

Hello Jose,


I remember that you had a problem
with a use after free in stmmac_tx_clean().
I still don't think that it is related to
commit 05cf0d1bf4, however, when comparing
the stmmac driver to the amd-xgbe driver
I realized that:

xgbe_tx_poll() has both a smp_rmb() after fetching
cur_tx, and also a dma_rmb() after reading the own
bit, before reading any other descriptor fields.

stmmac_tx_clean() has neither a smp_rmb() or a
dma_rmb().


Also
xgbe_dev_xmit() has a dma_wmb() _before_ setting
the own bit, and a smp_wmb() after setting the own
bit.

stmmac simply has a dma_wmb() _after_ setting the
own bit.


I assume you are using a SMP system.

If you can still reproduce your problem quite easily,
perhaps you could try to make stmmac look more like
xgbe in these regards, and see if that solves your
use after free crash in stmmac_tx_clean().


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac?
  2018-02-13 13:33       ` Niklas Cassel
@ 2018-02-16  9:34         ` Jose Abreu
  2018-02-16 10:10           ` Niklas Cassel
  0 siblings, 1 reply; 7+ messages in thread
From: Jose Abreu @ 2018-02-16  9:34 UTC (permalink / raw)
  To: Niklas Cassel, Niklas Cassel, Jose Abreu
  Cc: Joao Pinto, linux-netdev, Giuseppe CAVALLARO, alexandre.torgue

Hi Niklas,

Thank you for looking into this!

On 13-02-2018 13:33, Niklas Cassel wrote:
> Hello Jose,
>
>
> I remember that you had a problem
> with a use after free in stmmac_tx_clean().
> I still don't think that it is related to
> commit 05cf0d1bf4, however, when comparing
> the stmmac driver to the amd-xgbe driver
> I realized that:
>
> xgbe_tx_poll() has both a smp_rmb() after fetching
> cur_tx, and also a dma_rmb() after reading the own
> bit, before reading any other descriptor fields.
>
> stmmac_tx_clean() has neither a smp_rmb() or a
> dma_rmb().
>
>
> Also
> xgbe_dev_xmit() has a dma_wmb() _before_ setting
> the own bit, and a smp_wmb() after setting the own
> bit.
>
> stmmac simply has a dma_wmb() _after_ setting the
> own bit.
>
>
> I assume you are using a SMP system.

Yes.

>
> If you can still reproduce your problem quite easily,
> perhaps you could try to make stmmac look more like
> xgbe in these regards, and see if that solves your
> use after free crash in stmmac_tx_clean().

At the time it would be easily reproduced if I flooded the
interface. I will check xgbe barriers and check if it should also
be used in stmmac.

Thanks again!

Best Regards,
Jose Miguel Abreu

>
>
> Kind regards,
> Niklas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac?
  2018-02-16  9:34         ` Jose Abreu
@ 2018-02-16 10:10           ` Niklas Cassel
  0 siblings, 0 replies; 7+ messages in thread
From: Niklas Cassel @ 2018-02-16 10:10 UTC (permalink / raw)
  To: Jose Abreu; +Cc: Joao Pinto, linux-netdev, Giuseppe CAVALLARO, alexandre.torgue

On Fri, Feb 16, 2018 at 09:34:39AM +0000, Jose Abreu wrote:
> Hi Niklas,
> 
> Thank you for looking into this!
> 
> On 13-02-2018 13:33, Niklas Cassel wrote:
> > Hello Jose,
> >
> >
> > I remember that you had a problem
> > with a use after free in stmmac_tx_clean().
> > I still don't think that it is related to
> > commit 05cf0d1bf4, however, when comparing
> > the stmmac driver to the amd-xgbe driver
> > I realized that:
> >
> > xgbe_tx_poll() has both a smp_rmb() after fetching
> > cur_tx, and also a dma_rmb() after reading the own
> > bit, before reading any other descriptor fields.
> >
> > stmmac_tx_clean() has neither a smp_rmb() or a
> > dma_rmb().

This still applies as far as I can tell.

> >
> >
> > Also
> > xgbe_dev_xmit() has a dma_wmb() _before_ setting
> > the own bit, and a smp_wmb() after setting the own
> > bit.
> >
> > stmmac simply has a dma_wmb() _after_ setting the
> > own bit.

For dwmac4, stmmac actually has a another dma_wmb() hidden in
dwmac4_rd_prepare_tx_desc().

It might be more obvious, since there already is a dma_wmb()
in stmmac_xmit()/stmmac_tso_xmit(), if this was also placed
there. We already have a set_owner() abstraction.


xgbe also has a dma_wmb() in dwmac4_release_tx_desc(),
which we appear to be missing.
Pehaps we want it right after the
priv->hw->desc->release_tx_desc(p, priv->mode)
call, so we don't hide it in the different implementations.

> > If you can still reproduce your problem quite easily,
> > perhaps you could try to make stmmac look more like
> > xgbe in these regards, and see if that solves your
> > use after free crash in stmmac_tx_clean().
> 
> At the time it would be easily reproduced if I flooded the
> interface. I will check xgbe barriers and check if it should also
> be used in stmmac.

I've fixed the tx timeout for multiqueues when running iperf3
(mss was not set per tx queue, as reported in another thread).
Will send out a patch today.


Regards,
Niklas

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-16 10:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-27 14:41 Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there are no longer any descriptors using it") breaks stmmac? Jose Abreu
2017-11-30  3:51 ` Niklas Cassel
2017-11-30 16:50   ` Jose Abreu
2017-12-06 12:14     ` Niklas Cassel
2018-02-13 13:33       ` Niklas Cassel
2018-02-16  9:34         ` Jose Abreu
2018-02-16 10:10           ` Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).