All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislaw Gruszka <sgruszka@redhat.com>
To: Soeren Moch <smoch@web.de>
Cc: Helmut Schaa <helmut.schaa@googlemail.com>,
	Kalle Valo <kvalo@codeaurora.org>,
	"David S. Miller" <davem@davemloft.net>,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] rt2x00: fix rx queue hang
Date: Mon, 1 Jul 2019 13:04:51 +0200	[thread overview]
Message-ID: <20190701110451.GA16985@redhat.com> (raw)
In-Reply-To: <06c55c1d-6da6-76b2-f6e7-c8eeccd5aa35@web.de>

On Mon, Jul 01, 2019 at 12:49:50PM +0200, Soeren Moch wrote:
> Hello!
> 
> On 29.06.19 10:50, Stanislaw Gruszka wrote:
> > Hello
> >
> > On Wed, Jun 26, 2019 at 03:28:00PM +0200, Soeren Moch wrote:
> >> Hi Stanislaw,
> >>
> >> the good news is: your patch below also solves the issue for me. But
> >> removing the ENTRY_DATA_STATUS_PENDING check in
> >> rt2x00usb_kick_rx_entry() alone does not help, while removing this check
> >> in rt2x00usb_work_rxdone() alone does the trick.
> >>
> >> So the real race seems to be that the flags set in the completion
> >> handler are not yet visible on the cpu core running the workqueue. And
> >> because the worker is not rescheduled when aborted, the entry can just
> >> wait forever.
> >> Do you think this could make sense?
> > Yes.
> >
> >>> I'm somewhat reluctant to change the order, because TX processing
> >>> might relay on it (we first mark we wait for TX status and
> >>> then mark entry is no longer owned by hardware).
> >> OK, maybe it's just good luck that changing the order solves the rx
> >> problem. Or can memory barriers associated with the spinlock in
> >> rt2x00lib_dmadone() be responsible for that?
> >> (I'm testing on a armv7 system, Cortex-A9 quadcore.)
> > I'm not sure, rt2x00queue_index_inc() also disable/enable interrupts,
> > so maybe that make race not reproducible. 
> I tested some more, the race is between setting ENTRY_DATA_IO_FAILED (if
> needed) and enabling workqueue processing. This enabling was done via
> ENTRY_DATA_STATUS_PENDING in my patch. So setting
> ENTRY_DATA_STATUS_PENDING behind the spinlock in
> rt2x00lib_dmadone()/rt2x00queue_index_inc() moved this very close to
> setting of ENTRY_DATA_IO_FAILED (if needed). While still in the wrong
> order, this made it very unlikely for the race to show up.
> >
> >> While looking at it, why we double-clear ENTRY_OWNER_DEVICE_DATA in
> >> rt2x00usb_interrupt_rxdone() directly and in rt2x00lib_dmadone() again,
> > rt2x00lib_dmadone() is called also on other palaces (error paths)
> > when we have to clear flags.
> Yes, but also clearing ENTRY_OWNER_DEVICE_DATA in
> rt2x00usb_interrupt_rxdone() directly is not necessary and can lead to
> the wrong processing order.
> >>  while not doing the same for tx? 
> > If I remember correctly we have some races on rx (not happened on tx)
> > that was solved by using test_and_clear_bit(ENTRY_OWNER_DEVICE_DATA).
> I searched in the history, it actually was the other way around. You
> changed test_and_clear_bit() to test_bit() in the TX path. I think this
> is also the right way to go in RX.
> >> Would it make more sense to possibly
> >> set ENTRY_DATA_IO_FAILED before clearing ENTRY_OWNER_DEVICE_DATA in
> >> rt2x00usb_interrupt_rxdone() as for tx?
> > I don't think so, ENTRY_DATA_IO_FAILED should be only set on error
> > case.
> 
> Yes of course. But if the error occurs, it should be signalled before
> enabling the workqueue processing, see the race described above.
> 
> After some more testing I'm convinced that this would be the right fix
> for this problem. I will send a v2 of this patch accordingly.

Great, now I understand the problem. Thank you very much!

Stanislaw

      reply	other threads:[~2019-07-01 11:05 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17  9:46 [PATCH] rt2x00: fix rx queue hang Soeren Moch
2019-06-18  9:34 ` Stanislaw Gruszka
2019-06-21 11:30   ` Soeren Moch
2019-06-25  9:57     ` Stanislaw Gruszka
2019-06-26 13:28       ` Soeren Moch
2019-06-29  8:50         ` Stanislaw Gruszka
2019-07-01 10:49           ` Soeren Moch
2019-07-01 11:04             ` Stanislaw Gruszka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190701110451.GA16985@redhat.com \
    --to=sgruszka@redhat.com \
    --cc=davem@davemloft.net \
    --cc=helmut.schaa@googlemail.com \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=smoch@web.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.