From: Michael Buesch <mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
To: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org,
Stefano Brivio <st3-sGOZH3hwPm2sTnJN9+BGXg@public.gmane.org>
Subject: Re: RFC/T: Possible fix for bcm43xx periodic work bug
Date: Thu, 7 Sep 2006 22:39:19 +0200 [thread overview]
Message-ID: <200609072239.20074.mb@bu3sch.de> (raw)
In-Reply-To: <45006221.4090603-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
On Thursday 07 September 2006 20:17, Larry Finger wrote:
> Hi all,
>
> I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx timeouts and would like it
> to get as much testing as possible as this bug affects V2.6.18-rcX. If the problem is truly
> fixed, I hope to get the fix into mainline before release of the bug into the stable series.
>
> I got the idea for the fix when I discovered that the timeout interval used for the watchdog is the
> default value of 5 seconds. Obviously, the few milliseconds used in the periodic work handler
> weren't causing us to "just miss".
>
> To exacerbate the problem, I changed the repeat timer for periodic work from 15 to 1 sec. I also set
> BADNESS_LIMIT to 0. As a result, I was running the problem code once per second instead of once per
> minute. Now failures would occur in minutes instead of hours.
>
> Operating from the premise that the DMA needed some time to reach the idle state after the MAC was
> suspended, I tried various delays, but nothing worked.
>
> Then I decided to test the premise that the problem was associated with shutting down and restarting
> the network. That lead to the current patch, which has run for what is effectively 100 times longer
> than previous versions.
>
> Larry
> -------------------------------------------------------------------
>
>
> Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
> ===================================================================
> --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
> +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
> @@ -3244,8 +3244,6 @@ static void bcm43xx_periodic_work_handle
> * be preemtible.
> */
> mutex_lock(&bcm->mutex);
> - netif_stop_queue(bcm->net_dev);
> - synchronize_net();
> spin_lock_irqsave(&bcm->irq_lock, flags);
> bcm43xx_mac_suspend(bcm);
> if (bcm43xx_using_pio(bcm))
> @@ -3270,7 +3268,6 @@ static void bcm43xx_periodic_work_handle
> if (bcm43xx_using_pio(bcm))
> bcm43xx_pio_thaw_txqueues(bcm);
> bcm43xx_mac_enable(bcm);
> - netif_wake_queue(bcm->net_dev);
> }
> mmiowb();
> spin_unlock_irqrestore(&bcm->irq_lock, flags);
The real question is: Why does this patch help?
Let's explain it. We don't stop networking just for fun there.
While executing long preemptible periodic work, we must ensure
that the TX path into the driver is not entered. It's the same
reason why we disable IRQs in the first place. We can't take the
mutex in the TX path and the IRQ handler. (That are the only places
where we can't take the mutex).
Short: We must stop netif here.
The question is: Why does stopping netif queue cause a watchdog
trigger here? The maximum time it can take for the periodic
work inside of the critical section is about 0.2sec. So the queue
is stopped for about 0.2sec max. Why does the watchdog trigger?
Any idea from some networking guru?
Could synchronize_net() take over 5sec in some worst case? Why?
Questions over questions :D
--
Greetings Michael.
next prev parent reply other threads:[~2006-09-07 20:39 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-07 18:17 RFC/T: Possible fix for bcm43xx periodic work bug Larry Finger
[not found] ` <45006221.4090603-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
2006-09-07 20:39 ` Michael Buesch [this message]
[not found] ` <200609072239.20074.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
2006-09-07 22:08 ` Larry Finger
[not found] ` <45016F63.9000200@lwfinger.net>
[not found] ` <45016F63.9000200-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
2006-09-08 20:42 ` Michael Buesch
2006-09-08 9:42 ` Erik Mouw
2006-09-08 9:45 ` Michael Buesch
2006-09-08 13:23 ` Erik Mouw
[not found] ` <20060908132317.GS26916-7jKlSzr1t1+rFW3l5+5NieqUGfbH9hYC@public.gmane.org>
2006-09-08 13:36 ` Larry Finger
[not found] ` <450171E4.8070901-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
2006-09-11 10:52 ` Erik Mouw
[not found] ` <20060911105202.GA7903-7jKlSzr1t1+rFW3l5+5NieqUGfbH9hYC@public.gmane.org>
2006-09-11 14:37 ` Larry Finger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200609072239.20074.mb@bu3sch.de \
--to=mb-fseuscv1ubazqb+pc5nmwq@public.gmane.org \
--cc=Bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org \
--cc=Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=st3-sGOZH3hwPm2sTnJN9+BGXg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).