All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislaw Gruszka <sgruszka@redhat.com>
To: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Cc: linux-wireless@vger.kernel.org, Felix Fietkau <nbd@nbd.name>,
	Ryder Lee <ryder.lee@mediatek.com>, Roy Luo <royluo@google.com>
Subject: Re: [RFC] mt76: fix tx hung regression on MT7630E
Date: Tue, 30 Jul 2019 15:54:51 +0200	[thread overview]
Message-ID: <20190730135450.GA2361@redhat.com> (raw)
In-Reply-To: <20190729140241.GC4030@localhost.localdomain>

On Mon, Jul 29, 2019 at 04:02:41PM +0200, Lorenzo Bianconi wrote:
> > On Fri, Jul 26, 2019 at 02:10:56PM +0200, Stanislaw Gruszka wrote:
> > > Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet")
> > > I can observe firmware hangs on MT7630E on station mode: tx stop
> > > functioning after minor activity (rx keep working) and on module
> > > unload device fail to stop with messages:
> > > 
> > > [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop
> > > [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop
> > > 
> > > Loading module again results in failure to associate with AP.
> > > Only machine power off / power on cycle can make device work again.
> > > 
> > > I have no idea why the commit caused F/W hangs. Most likely some proper
> > > fix is needed of changing registers programming (or assuring proper order
> > > of actions), but so far I can not came up with any better fix than
> > > a partial revert of 41634aa8d6db.
> > 
> > The difference is that with 41634aa8d6db we can run mt76x02_poll_tx()
> > and mt76x02_tx_tasklet() in parallel on 2 different CPUs. Without
> > the commit, since tasklet is not scheduled from mt76_wake_tx_queue(),
> > it does not happen.
> > 
> > I'm not quite sure why, but MT7630E does not like when we poll tx status
> > on 2 cpus at once. Change like below:
> 
> Hi Stanislaw,

Hi

> have you tried to look at the commit: 6fe533378795f87
> ("mt76: mt76x02: remove irqsave/restore in locking for tx status fifo")?
> Could it be a race between a registermap update and a stats load? (we are using a
> different lock now)

This commit seems to be fine, reverting it did not solve the issue.

> > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c
> > index 467b28379870..622251faa415 100644
> > --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c
> > +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c
> > @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget)
> >  					       mt76.tx_napi);
> >  	int i;
> >  
> > -	mt76x02_mac_poll_tx_status(dev, false);
> > +	mt76x02_mac_poll_tx_status(dev, true);
> 
> I am not sure if we really need mt76x02_mac_poll_tx_status() here since we run
> it in mt76x02_tx_complete_skb() and in mt76x02_tx_tasklet(). Anyway the only
> difference doing so is we do not run mt76x02_send_tx_status().

I thought this is the problem, but it was my mistake during testing.
I tested the above change together with mt76_txq_schedule(dev, txq->ac)
change and get wrong impression it fixes the issue. But above change
alone does not help.

I tried to add some locking to avoid parallel execution of mt76x02_poll_tx()
and mt76x02_tx_tasklet(), but it didn't help either. So far only patch
originally posted here make the problem gone.

Stanislaw


  reply	other threads:[~2019-07-30 13:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-26 12:10 [RFC] mt76: fix tx hung regression on MT7630E Stanislaw Gruszka
2019-07-29 12:53 ` Stanislaw Gruszka
2019-07-29 14:02   ` Lorenzo Bianconi
2019-07-30 13:54     ` Stanislaw Gruszka [this message]
2019-07-30 14:55       ` Lorenzo Bianconi
2019-07-31  8:19         ` Stanislaw Gruszka
2019-07-31  8:51           ` Stanislaw Gruszka
2019-07-31  9:09             ` Lorenzo Bianconi
2019-08-05 10:01               ` Stanislaw Gruszka
2019-08-05 11:27                 ` Lorenzo Bianconi
2019-08-05 12:39                   ` Stanislaw Gruszka
2019-08-12 11:50                     ` Stanislaw Gruszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190730135450.GA2361@redhat.com \
    --to=sgruszka@redhat.com \
    --cc=linux-wireless@vger.kernel.org \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=nbd@nbd.name \
    --cc=royluo@google.com \
    --cc=ryder.lee@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.