* [Bug #14925] sky2 panic under load
2009-12-29 15:05 2.6.33-rc2: Reported regressions from 2.6.32 Rafael J. Wysocki
@ 2009-12-29 15:10 ` Rafael J. Wysocki
0 siblings, 0 replies; 60+ messages in thread
From: Rafael J. Wysocki @ 2009-12-29 15:10 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Berck E. Nash, Berck Nash, Daniel Hazelton,
Michael Breuer, "netdev, Stephen Hemminger
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925
Subject : sky2 panic under load
Submitter : Berck E. Nash <flyboy@gmail.com>
Date : 2009-12-21 23:52 (9 days old)
References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4
http://marc.info/?l=linux-kernel&m=126160893126548&w=4
Handled-By : Stephen Hemminger <shemminger@vyatta.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* 2.6.33-rc3-git3: Reported regressions from 2.6.32
@ 2010-01-10 22:27 Rafael J. Wysocki
2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki
` (3 more replies)
0 siblings, 4 replies; 60+ messages in thread
From: Rafael J. Wysocki @ 2010-01-10 22:27 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
Kernel Testers List, Network Development, Linux ACPI,
Linux PM List, Linux SCSI List, Linux Wireless List, DRI
This message contains a list of some regressions from 2.6.32, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.
If you know of any other unresolved regressions from 2.6.32, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.
Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.
Listed regressions statistics:
Date Total Pending Unresolved
----------------------------------------
2010-01-10 55 33 21
2009-12-29 36 34 27
Unresolved regressions
----------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15043
Subject : Display goes off with i915.powersave=1
Submitter : Soeren Sonnenburg <sonne@debian.org>
Date : 2010-01-10 20:09 (1 days old)
References : http://marc.info/?l=linux-kernel&m=126315457519505&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15041
Subject : Pagemap endless read loop with LTP
Submitter : Andi Kleen <andi@firstfloor.org>
Date : 2010-01-10 2:09 (1 days old)
References : http://marc.info/?l=linux-kernel&m=126308941423848&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15038
Subject : drm/ksm: fbdev blanking regression
Submitter : Johan Hovold <jhovold@gmail.com>
Date : 2010-01-06 17:00 (5 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=731b5a15a3b1474a41c2ca29b4c32b0f21bc852e
References : http://marc.info/?l=linux-kernel&m=126279726418748&w=4
Handled-By : James Simmons <jsimmons@infradead.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15036
Subject : soft lockup in dmesg after suspend/resume
Submitter : ykzhao <yakui.zhao@intel.com>
Date : 2010-01-04 5:36 (7 days old)
References : http://marc.info/?l=linux-kernel&m=126258356202722&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15035
Subject : BUG: unable to handle kernel paging request in rs600_gart_set_page()
Submitter : Xiaotian Feng <dfeng@redhat.com>
Date : 2010-01-05 19:10 (6 days old)
References : http://lkml.org/lkml/2010/1/5/95
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15033
Subject : drm: gem_object_free without struct_mutex
Submitter : Hugh Dickins <hugh.dickins@tiscali.co.uk>
Date : 2009-12-30 19:45 (12 days old)
References : http://marc.info/?l=linux-kernel&m=126220236201529&w=4
Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15032
Subject : Oops in uart_resume_port() on resume
Submitter : Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date : 2010-01-04 15:47 (7 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba15ab0e8de0d4439a91342ad52d55ca9e313f3d
References : http://marc.info/?l=linux-kernel&m=126262008815689&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14976
Subject : No sound on snd_hda_codec_via in 2.6.33-rc2
Submitter : Mark Rosenstand <rosenstand@gmail.com>
Date : 2010-01-02 08:27 (9 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14954
Subject : warning from alloc_pages_nodemask on boot -- caused by commit 78f1699659963fff97975df44db6d5dbe7218e55
Submitter : Stephen Hemminger <shemminger@linux-foundation.org>
Date : 2009-12-29 19:57 (13 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=78f1699659963fff97975df44db6d5dbe7218e55
Handled-By : Alex Chiang <achiang@hp.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14950
Subject : tbench regression with 2.6.33-rc1
Submitter : Lin Ming <ming.m.lin@intel.com>
Date : 2009-12-25 11:11 (17 days old)
References : http://marc.info/?l=linux-kernel&m=126174044213172&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14949
Subject : drm_vm.c:drm_mmap: possible circular locking dependency detected
Submitter : Borislav Petkov <petkovbb@googlemail.com>
Date : 2009-12-26 9:45 (16 days old)
References : http://marc.info/?l=linux-kernel&m=126182073616279&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14948
Subject : EHCI resume sysfs duplicates
Submitter : Borislav Petkov <petkovbb@googlemail.com>
Date : 2009-12-26 9:36 (16 days old)
References : http://marc.info/?l=linux-kernel&m=126182020615709&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14946
Subject : All kernels after 2.6.32-git10 show only 1 CPU
Submitter : Sid Boyce <sboyce@blueyonder.co.uk>
Date : 2009-12-23 16:55 (19 days old)
References : http://marc.info/?l=linux-kernel&m=126158734326801&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14937
Subject : WARNING: at kernel/lockdep.c:2830
Submitter : Grant Wilson <grant.wilson@zen.co.uk>
Date : 2009-12-27 13:35 (15 days old)
References : http://marc.info/?l=linux-kernel&m=126192220404829&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14934
Subject : kernel crash during boot
Submitter : werner <w.landgraf@ru.ru>
Date : 2009-12-25 5:37 (17 days old)
References : http://marc.info/?l=linux-kernel&m=126171951030608&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925
Subject : sky2 panic under load
Submitter : Berck E. Nash <flyboy@gmail.com>
Date : 2009-12-21 23:52 (21 days old)
References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4
http://marc.info/?l=linux-kernel&m=126160893126548&w=4
Handled-By : Stephen Hemminger <shemminger@vyatta.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14924
Subject : Weird hard hangs when rendering 'some' web-sites in Firefox
Submitter : David <david@unsolicited.net>
Date : 2009-12-21 21:53 (21 days old)
References : http://marc.info/?l=linux-kernel&m=126143375823340&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14860
Subject : No DP-DVI output when laptop is docked
Submitter : Pär Andersson <paran@lysator.liu.se>
Date : 2009-12-21 21:02 (21 days old)
Handled-By : ykzhao <yakui.zhao@intel.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14859
Subject : System timer firing too much without cause
Submitter : Shawn Starr <shawn.starr@rogers.com>
Date : 2009-12-21 19:16 (21 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14792
Subject : Misdetection of the TV output
Submitter : Santi <santi@agolina.net>
Date : 2009-12-12 13:28 (30 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14791
Subject : Something has been broken in the network stack this week
Submitter : Delete This Account <speedyboyinovator@hotmail.com>
Date : 2009-12-12 13:06 (30 days old)
Regressions with patches
------------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15039
Subject : leds_alix2: can't allocate I/O for GPIO
Submitter : Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Date : 2010-01-07 10:26 (4 days old)
References : http://marc.info/?l=linux-kernel&m=126286001106257&w=4
Handled-By : Daniel Mack <daniel@caiaq.de>
Patch : http://patchwork.kernel.org/patch/72006/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15037
Subject : BUG during shutdown - bisected to commit e2912009
Submitter : Marc Dionne <marc.c.dionne@gmail.com>
Date : 2010-01-02 0:27 (9 days old)
References : http://marc.info/?l=linux-kernel&m=126239207631034&w=4
Handled-By : Xiaotian Feng <dfeng@redhat.com>
Patch : http://patchwork.kernel.org/patch/71537/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15034
Subject : volano ~30% regression
Submitter : Lin Ming <ming.m.lin@intel.com>
Date : 2010-01-04 8:15 (7 days old)
References : http://marc.info/?l=linux-kernel&m=126259391411982&w=4
Handled-By : Mike Galbraith <efault@gmx.de>
Patch : http://patchwork.kernel.org/patch/70623/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15031
Subject : bug in fs/btrfs/ordered-data.c:672
Submitter : Carlos R. Mafra <crmafra2@gmail.com>
Date : 2010-01-02 11:32 (9 days old)
References : http://lkml.org/lkml/2010/1/2/17
Handled-By : Yan, Zheng <yanzheng@21cn.com>
Patch : http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg03686.html
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15026
Subject : i915: Resume regression on MSI Wind U100 w/o KMS
Submitter : Rafael J. Wysocki <rjw@sisk.pl>
Date : 2010-01-08 23:45 (3 days old)
References : http://marc.info/?l=linux-kernel&m=126299433427673&w=4
Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
Patch : http://patchwork.kernel.org/patch/71887/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14974
Subject : tg3 does not resume from hibernation properly on BCM5787M
Submitter : Chow Loong Jin <hyperair@ubuntu.com>
Date : 2010-01-02 05:30 (9 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=87668d352aa8d135bd695a050f18bbfc7b50b506
Handled-By : Matt Carlson <mcarlson@broadcom.com>
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=14974#c12
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14957
Subject : Blank screen with KMS enabled
Submitter : Philipp Kohlbecher <xt28@gmx.de>
Date : 2009-12-30 00:08 (12 days old)
Handled-By : Zhao Yakui <yakui.zhao@intel.com>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=24476
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14944
Subject : 2.6.32.2 SATA link detect failed, 2.6.32.1 works fine
Submitter : fengxiangjun <fengxiangjun@neusoft.com>
Date : 2009-12-21 11:12 (21 days old)
References : http://marc.info/?l=linux-kernel&m=126139442328314&w=4
Handled-By : Tejun Heo <tj@kernel.org>
Patch : http://patchwork.kernel.org/patch/69746/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14942
Subject : gkrellm no longer shows all the temperatures on thinkpad x60
Submitter : Pavel Machek <pavel@ucw.cz>
Date : 2009-12-27 21:57 (15 days old)
References : http://marc.info/?l=linux-kernel&m=126195107005565&w=4
Handled-By : Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Patch : http://patchwork.kernel.org/patch/69809/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14907
Subject : Commit "kbuild: fix bzImage build for x86" breaks build
Submitter : Oliver Hartkopp <oliver@hartkopp.net>
Date : 2009-12-19 19:21 (23 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4a2ff67c88211026afcbdbc190c13f705dae1b59
References : http://marc.info/?l=linux-kernel&m=126125050622195&w=4
http://marc.info/?l=linux-kernel&m=126114565103300&w=4
http://marc.info/?l=linux-kbuild&m=126202819526623&w=2
Handled-By : Jonathan Nieder <jrnieder@gmail.com>
Patch : http://thread.gmane.org/gmane.linux.kbuild.devel/4313/focus=4319
http://patchwork.kernel.org/patch/71957/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14899
Subject : reiserfs: inconsistent lock state
Submitter : Alexander Beregalov <a.beregalov@gmail.com>
Date : 2009-12-11 22:06 (31 days old)
References : http://marc.info/?l=linux-kernel&m=126056920702515&w=4
Handled-By : Frederic Weisbecker <fweisbec@gmail.com>
Patch : http://patchwork.kernel.org/patch/67256/
http://patchwork.kernel.org/patch/67257/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14874
Subject : Ath5k regression with commit 8bf3d79bc401ca417ccf9fc076d3295d1a71dbf5
Submitter : Joshua Covington <joshuacov@googlemail.com>
Date : 2009-12-25 00:49 (17 days old)
Handled-By : Luis R. Rodriguez <mcgrof@gmail.com>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=24335
For details, please visit the bug entries and follow the links given in
references.
As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.32,
unresolved as well as resolved, at:
http://bugzilla.kernel.org/show_bug.cgi?id=14885
Please let me know if there are any Bugzilla entries that should be added to
the list in there.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 60+ messages in thread
* [Bug #14925] sky2 panic under load
2010-01-10 22:27 2.6.33-rc3-git3: Reported regressions from 2.6.32 Rafael J. Wysocki
@ 2010-01-10 22:32 ` Rafael J. Wysocki
2010-01-11 0:36 ` Berck E. Nash
2010-01-11 16:00 ` 2.6.33-rc3-git3: Reported regressions from 2.6.32 Luis R. Rodriguez
` (2 subsequent siblings)
3 siblings, 1 reply; 60+ messages in thread
From: Rafael J. Wysocki @ 2010-01-10 22:32 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Berck E. Nash, Berck Nash, Daniel Hazelton,
Michael Breuer, "netdev, Stephen Hemminger
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.32. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925
Subject : sky2 panic under load
Submitter : Berck E. Nash <flyboy@gmail.com>
Date : 2009-12-21 23:52 (21 days old)
References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4
http://marc.info/?l=linux-kernel&m=126160893126548&w=4
Handled-By : Stephen Hemminger <shemminger@vyatta.com>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki
@ 2010-01-11 0:36 ` Berck E. Nash
2010-01-11 13:26 ` Jarek Poplawski
2010-01-11 14:03 ` Mike McCormack
0 siblings, 2 replies; 60+ messages in thread
From: Berck E. Nash @ 2010-01-11 0:36 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: netdev, Jarek Poplawski
Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.32. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925
> Subject : sky2 panic under load
> Submitter : Berck E. Nash <flyboy@gmail.com>
> Date : 2009-12-21 23:52 (21 days old)
> References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4
> http://marc.info/?l=linux-kernel&m=126160893126548&w=4
> Handled-By : Stephen Hemminger <shemminger@vyatta.com>
The patch attached to the bug report did not fix the problem, but I'm
fairly certain that this one from Jarek P. did:
During TX timeout procedure dev could be awoken too early, e.g. by
sky2_complete_tx() called from sky2_down(). Then sky2_xmit_frame()
can run while buffers are freed causing an oops. This patch fixes it
by adding netif_device_present() test in sky2_tx_complete().
Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14925
With debugging by: Mike McCormack <mikem@ring3k.org>
Reported-by: Berck E. Nash <flyboy@gmail.com>
Tested-by: Berck E. Nash <flyboy@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---
drivers/net/sky2.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 1c01b96..7650f73 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1844,7 +1844,8 @@ static void sky2_tx_complete(struct sky2_port
*sky2, u16 done)
sky2->tx_cons = idx;
smp_mb();
- if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
+ /* Wake unless it's detached, and called e.g. from sky2_down() */
+ if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev))
netif_wake_queue(dev);
}
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 0:36 ` Berck E. Nash
@ 2010-01-11 13:26 ` Jarek Poplawski
2010-01-11 19:32 ` Rafael J. Wysocki
2010-01-11 14:03 ` Mike McCormack
1 sibling, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-11 13:26 UTC (permalink / raw)
To: Berck E. Nash; +Cc: Rafael J. Wysocki, netdev
On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote:
> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32. Please verify if it still should be listed and let me know
> > (either way).
BTW, I don't know why Berck didn't experience such a panic before
2.6.32, but seems not a regression to me. There might be new/more sky2
TX timeouts which trigger this panic and would make a real regression.
Cheers,
Jarek P.
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925
> > Subject : sky2 panic under load
> > Submitter : Berck E. Nash <flyboy@gmail.com>
> > Date : 2009-12-21 23:52 (21 days old)
> > References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4
> > http://marc.info/?l=linux-kernel&m=126160893126548&w=4
> > Handled-By : Stephen Hemminger <shemminger@vyatta.com>
>
> The patch attached to the bug report did not fix the problem, but I'm
> fairly certain that this one from Jarek P. did:
>
> During TX timeout procedure dev could be awoken too early, e.g. by
> sky2_complete_tx() called from sky2_down(). Then sky2_xmit_frame()
> can run while buffers are freed causing an oops. This patch fixes it
> by adding netif_device_present() test in sky2_tx_complete().
>
> Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14925
>
> With debugging by: Mike McCormack <mikem@ring3k.org>
>
> Reported-by: Berck E. Nash <flyboy@gmail.com>
> Tested-by: Berck E. Nash <flyboy@gmail.com>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>
> ---
>
> drivers/net/sky2.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index 1c01b96..7650f73 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -1844,7 +1844,8 @@ static void sky2_tx_complete(struct sky2_port
> *sky2, u16 done)
> sky2->tx_cons = idx;
> smp_mb();
>
> - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> + /* Wake unless it's detached, and called e.g. from sky2_down() */
> + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev))
> netif_wake_queue(dev);
> }
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 0:36 ` Berck E. Nash
2010-01-11 13:26 ` Jarek Poplawski
@ 2010-01-11 14:03 ` Mike McCormack
2010-01-11 16:45 ` Stephen Hemminger
1 sibling, 1 reply; 60+ messages in thread
From: Mike McCormack @ 2010-01-11 14:03 UTC (permalink / raw)
To: Berck E. Nash
Cc: Rafael J. Wysocki, netdev, Jarek Poplawski, Stephen Hemminger
Berck E. Nash wrote:
>
> During TX timeout procedure dev could be awoken too early, e.g. by
> sky2_complete_tx() called from sky2_down(). Then sky2_xmit_frame()
> can run while buffers are freed causing an oops. This patch fixes it
> by adding netif_device_present() test in sky2_tx_complete().
>
> Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14925
>
> With debugging by: Mike McCormack <mikem@ring3k.org>
>
> Reported-by: Berck E. Nash <flyboy@gmail.com>
> Tested-by: Berck E. Nash <flyboy@gmail.com>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>
> ---
>
> drivers/net/sky2.c | 3 ++-
Perhaps only sky2_tx_done should wake the queue?
Does the patch below fix the problem too?
thanks,
Mike
Subject: [PATCH] sky2: Don't wake queue in sky2_tx_complete()
We should only wake the tx queue on completion of
transmits, not after cleaning the tx ring.
Signed-off-by: Mike McCormack <mikem@ring3k.org>
---
drivers/net/sky2.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 93d9635..6d9111d 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
sky2->tx_cons = idx;
smp_mb();
-
- if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
- netif_wake_queue(dev);
}
static void sky2_tx_reset(struct sky2_hw *hw, unsigned port)
@@ -2416,8 +2413,11 @@ static inline void sky2_tx_done(struct net_device *dev, u16 last)
{
struct sky2_port *sky2 = netdev_priv(dev);
- if (netif_running(dev))
+ if (netif_running(dev)) {
sky2_tx_complete(sky2, last);
+ if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
+ netif_wake_queue(dev);
+ }
}
static inline void sky2_skb_rx(const struct sky2_port *sky2,
-- 1.5.6.5
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: 2.6.33-rc3-git3: Reported regressions from 2.6.32
2010-01-10 22:27 2.6.33-rc3-git3: Reported regressions from 2.6.32 Rafael J. Wysocki
2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki
@ 2010-01-11 16:00 ` Luis R. Rodriguez
2010-01-11 21:47 ` Nick Bowler
[not found] ` <omZ8aBZReyD.A.rz.COlSLB@chimera>
3 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2010-01-11 16:00 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
Linus Torvalds, Natalie Protasevich, Kernel Testers List,
Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
Linux Wireless List, DRI
On Sun, Jan 10, 2010 at 2:27 PM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> This message contains a list of some regressions from 2.6.32, for which there
> are no fixes in the mainline I know of. If any of them have been fixed already,
> please let me know.
>
> If you know of any other unresolved regressions from 2.6.32, please let me know
> either and I'll add them to the list. Also, please let me know if any of the
> entries below are invalid.
>
> Each entry from the list will be sent additionally in an automatic reply to
> this message with CCs to the people involved in reporting and handling the
> issue.
>
Of these only the one below was wireless, which is actually fixed now.
Not bad, rc3 and no wireless regressions yet.
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14874
> Subject : Ath5k regression with commit 8bf3d79bc401ca417ccf9fc076d3295d1a71dbf5
> Submitter : Joshua Covington <joshuacov@googlemail.com>
> Date : 2009-12-25 00:49 (17 days old)
> Handled-By : Luis R. Rodriguez <mcgrof-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Patch : http://bugzilla.kernel.org/attachment.cgi?id=24335
Luis
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 14:03 ` Mike McCormack
@ 2010-01-11 16:45 ` Stephen Hemminger
2010-01-11 22:07 ` Jarek Poplawski
2010-01-11 22:31 ` [Bug #14925] sky2 panic under load Jarek Poplawski
0 siblings, 2 replies; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-11 16:45 UTC (permalink / raw)
To: Mike McCormack; +Cc: Berck E. Nash, Rafael J. Wysocki, netdev, Jarek Poplawski
On Mon, 11 Jan 2010 23:03:41 +0900
Mike McCormack <mikem@ring3k.org> wrote:
> Perhaps only sky2_tx_done should wake the queue?
> Does the patch below fix the problem too?
>
> thanks,
>
> Mike
The idea is good, but what if transmit queue was full (stopped),
and TX FIFO gets stuck. Then Transmit timeout happens and
the reset logic clears the queue. What will start the queue?
Something like this:
-----------------------------------------------------------
--- a/drivers/net/sky2.c 2010-01-11 08:36:42.617426300 -0800
+++ b/drivers/net/sky2.c 2010-01-11 08:42:51.295237661 -0800
@@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2
sky2->tx_cons = idx;
smp_mb();
-
- if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
- netif_wake_queue(dev);
}
static void sky2_tx_reset(struct sky2_hw *hw, unsigned port)
@@ -2416,8 +2413,12 @@ static inline void sky2_tx_done(struct n
{
struct sky2_port *sky2 = netdev_priv(dev);
- if (netif_running(dev))
+ if (netif_running(dev) && netif_device_present(dev)) {
sky2_tx_complete(sky2, last);
+
+ if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
+ netif_wake_queue(dev);
+ }
}
static inline void sky2_skb_rx(const struct sky2_port *sky2,
@@ -3197,6 +3198,7 @@ static int sky2_reattach(struct net_devi
} else {
netif_device_attach(dev);
sky2_set_multicast(dev);
+ netif_start_queue(dev);
}
}
--
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 13:26 ` Jarek Poplawski
@ 2010-01-11 19:32 ` Rafael J. Wysocki
2010-01-11 20:31 ` Jarek Poplawski
0 siblings, 1 reply; 60+ messages in thread
From: Rafael J. Wysocki @ 2010-01-11 19:32 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Berck E. Nash, netdev
On Monday 11 January 2010, Jarek Poplawski wrote:
> On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote:
> > Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of recent regressions.
> > >
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.32. Please verify if it still should be listed and let me know
> > > (either way).
>
> BTW, I don't know why Berck didn't experience such a panic before
> 2.6.32, but seems not a regression to me. There might be new/more sky2
> TX timeouts which trigger this panic and would make a real regression.
Even if the code has always been broken, but it's only become visible after
2.6.32, that still counts as a regression IMO, because now the users are
affected who weren't before.
Rafael
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 19:32 ` Rafael J. Wysocki
@ 2010-01-11 20:31 ` Jarek Poplawski
2010-01-11 20:50 ` Rafael J. Wysocki
2010-01-11 21:02 ` Berck E. Nash
0 siblings, 2 replies; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-11 20:31 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Berck E. Nash, netdev
On Mon, Jan 11, 2010 at 08:32:24PM +0100, Rafael J. Wysocki wrote:
> On Monday 11 January 2010, Jarek Poplawski wrote:
> > On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote:
> > > Rafael J. Wysocki wrote:
> > > > This message has been generated automatically as a part of a report
> > > > of recent regressions.
> > > >
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.32. Please verify if it still should be listed and let me know
> > > > (either way).
> >
> > BTW, I don't know why Berck didn't experience such a panic before
> > 2.6.32, but seems not a regression to me. There might be new/more sky2
> > TX timeouts which trigger this panic and would make a real regression.
>
> Even if the code has always been broken, but it's only become visible after
> 2.6.32, that still counts as a regression IMO, because now the users are
> affected who weren't before.
Right, but:
1) someone with a similar but older problem might be mislead a fix is
not for them;
2) someone with exactly this one problem (i.e. Berck ;-) might be
mislead "no oops" is enough, while their linux might be still worse
than before. (So I intended Berck to re-consider or even re-check
this problem wrt. 2.6.31, and maybe even reporting another
regression.)
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 20:31 ` Jarek Poplawski
@ 2010-01-11 20:50 ` Rafael J. Wysocki
2010-01-11 21:02 ` Berck E. Nash
1 sibling, 0 replies; 60+ messages in thread
From: Rafael J. Wysocki @ 2010-01-11 20:50 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Berck E. Nash, netdev
On Monday 11 January 2010, Jarek Poplawski wrote:
> On Mon, Jan 11, 2010 at 08:32:24PM +0100, Rafael J. Wysocki wrote:
> > On Monday 11 January 2010, Jarek Poplawski wrote:
> > > On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote:
> > > > Rafael J. Wysocki wrote:
> > > > > This message has been generated automatically as a part of a report
> > > > > of recent regressions.
> > > > >
> > > > > The following bug entry is on the current list of known regressions
> > > > > from 2.6.32. Please verify if it still should be listed and let me know
> > > > > (either way).
> > >
> > > BTW, I don't know why Berck didn't experience such a panic before
> > > 2.6.32, but seems not a regression to me. There might be new/more sky2
> > > TX timeouts which trigger this panic and would make a real regression.
> >
> > Even if the code has always been broken, but it's only become visible after
> > 2.6.32, that still counts as a regression IMO, because now the users are
> > affected who weren't before.
>
> Right, but:
> 1) someone with a similar but older problem might be mislead a fix is
> not for them;
Not if the fix changelog mentions the older kernels explicitly.
> 2) someone with exactly this one problem (i.e. Berck ;-) might be
> mislead "no oops" is enough, while their linux might be still worse
> than before. (So I intended Berck to re-consider or even re-check
> this problem wrt. 2.6.31, and maybe even reporting another
> regression.)
Yes, rechecking the problem with 2.6.31 would be useful.
Rafael
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 20:31 ` Jarek Poplawski
2010-01-11 20:50 ` Rafael J. Wysocki
@ 2010-01-11 21:02 ` Berck E. Nash
2010-01-11 21:47 ` Jarek Poplawski
1 sibling, 1 reply; 60+ messages in thread
From: Berck E. Nash @ 2010-01-11 21:02 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Rafael J. Wysocki, netdev
Jarek Poplawski wrote:
> On Mon, Jan 11, 2010 at 08:32:24PM +0100, Rafael J. Wysocki wrote:
>> On Monday 11 January 2010, Jarek Poplawski wrote:
>>> On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote:
>>>> Rafael J. Wysocki wrote:
>>>>> This message has been generated automatically as a part of a report
>>>>> of recent regressions.
>>>>>
>>>>> The following bug entry is on the current list of known regressions
>>>>> from 2.6.32. Please verify if it still should be listed and let me know
>>>>> (either way).
>>> BTW, I don't know why Berck didn't experience such a panic before
>>> 2.6.32, but seems not a regression to me. There might be new/more sky2
>>> TX timeouts which trigger this panic and would make a real regression.
>> Even if the code has always been broken, but it's only become visible after
>> 2.6.32, that still counts as a regression IMO, because now the users are
>> affected who weren't before.
>
> Right, but:
> 1) someone with a similar but older problem might be mislead a fix is
> not for them;
> 2) someone with exactly this one problem (i.e. Berck ;-) might be
> mislead "no oops" is enough, while their linux might be still worse
> than before. (So I intended Berck to re-consider or even re-check
> this problem wrt. 2.6.31, and maybe even reporting another
> regression.)
Well, the problem with this bug is how hard it is for me to reproduce.
I'm willing to admit that just because I never got it before 2.6.32
isn't proof that it wasn't there in 2.6.31. But it's a regression
somewhere along the line, since I've been using this hardware for over 3
years now. There were lots of bugs in the sky2 driver years ago, but
for the last 2+ years or so, I haven't had any trouble at all until now.
The bug only shows up for me with bittorrent traffic. I also use the
same adapter to transfer backups over the network from several
computers, and that doesn't trigger it...
I used 2.6.31 for however long it was the current stable, and I never
got a crash with it. After I got several crashes in 2.6.32, I reverted
to 2.6.31 until Jarek sent this patch that seems to have fixed it. I've
never gotten it to crash in 2.6.31, so I'm pretty sure it's a 2.6.32
regression, but I can't prove it.
I would love to do more testing, but since I can't reproduce the bug at
will, I'm not really sure what to offer?
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: 2.6.33-rc3-git3: Reported regressions from 2.6.32
2010-01-10 22:27 2.6.33-rc3-git3: Reported regressions from 2.6.32 Rafael J. Wysocki
2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki
2010-01-11 16:00 ` 2.6.33-rc3-git3: Reported regressions from 2.6.32 Luis R. Rodriguez
@ 2010-01-11 21:47 ` Nick Bowler
2010-01-11 22:10 ` Rafael J. Wysocki
[not found] ` <omZ8aBZReyD.A.rz.COlSLB@chimera>
3 siblings, 1 reply; 60+ messages in thread
From: Nick Bowler @ 2010-01-11 21:47 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
Linus Torvalds, Natalie Protasevich, Kernel Testers List,
Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
Linux Wireless List, DRI
On 23:27 Sun 10 Jan , Rafael J. Wysocki wrote:
> This message contains a list of some regressions from 2.6.32, for which there
> are no fixes in the mainline I know of. If any of them have been fixed already,
> please let me know.
>
> If you know of any other unresolved regressions from 2.6.32, please let me know
> either and I'll add them to the list. Also, please let me know if any of the
> entries below are invalid.
Here are two that I don't see in the list:
* LVDS downclocking breaks on G45/thinkpad T500
LKML: http://lkml.org/lkml/2009/12/26/40
FDO: http://bugs.freedesktop.org/show_bug.cgi?id=25809
* Receive failure in et131x (Alan Cox sent me a patch for this but
it's not in mainline yet)
LKML: http://lkml.org/lkml/2009/12/29/210
--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 21:02 ` Berck E. Nash
@ 2010-01-11 21:47 ` Jarek Poplawski
0 siblings, 0 replies; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-11 21:47 UTC (permalink / raw)
To: Berck E. Nash; +Cc: Rafael J. Wysocki, netdev
On Mon, Jan 11, 2010 at 02:02:40PM -0700, Berck E. Nash wrote:
> Jarek Poplawski wrote:
> > 2) someone with exactly this one problem (i.e. Berck ;-) might be
> > mislead "no oops" is enough, while their linux might be still worse
> > than before. (So I intended Berck to re-consider or even re-check
> > this problem wrt. 2.6.31, and maybe even reporting another
> > regression.)
>
> Well, the problem with this bug is how hard it is for me to reproduce.
> I'm willing to admit that just because I never got it before 2.6.32
> isn't proof that it wasn't there in 2.6.31. But it's a regression
> somewhere along the line, since I've been using this hardware for over 3
> years now. There were lots of bugs in the sky2 driver years ago, but
> for the last 2+ years or so, I haven't had any trouble at all until now.
>
> The bug only shows up for me with bittorrent traffic. I also use the
> same adapter to transfer backups over the network from several
> computers, and that doesn't trigger it...
>
> I used 2.6.31 for however long it was the current stable, and I never
> got a crash with it. After I got several crashes in 2.6.32, I reverted
> to 2.6.31 until Jarek sent this patch that seems to have fixed it. I've
> never gotten it to crash in 2.6.31, so I'm pretty sure it's a 2.6.32
> regression, but I can't prove it.
>
> I would love to do more testing, but since I can't reproduce the bug at
> will, I'm not really sure what to offer?
I mainly thought about looking at the logs during such bittorrent to
compare if there is any visible change in "sky2 eth0: disabling
interface"/"sky2 eth0: enabling interface" etc. frequency between
2.6.31 and 2.6.32.
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 16:45 ` Stephen Hemminger
@ 2010-01-11 22:07 ` Jarek Poplawski
2010-01-12 0:14 ` David Miller
2010-01-11 22:31 ` [Bug #14925] sky2 panic under load Jarek Poplawski
1 sibling, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-11 22:07 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Mike McCormack, Berck E. Nash, Rafael J. Wysocki, netdev
On Mon, Jan 11, 2010 at 08:45:04AM -0800, Stephen Hemminger wrote:
> On Mon, 11 Jan 2010 23:03:41 +0900
> Mike McCormack <mikem@ring3k.org> wrote:
>
> > Perhaps only sky2_tx_done should wake the queue?
> > Does the patch below fix the problem too?
> >
> > thanks,
> >
> > Mike
>
> The idea is good,
I don't agree: you both try to make this check more specific.
Actually, since this problem is quite tricky, I wondered about
making it more genaral and visible for other maintainers by
adding something like netif_wake_queue_present() with the
check and some comment.
Jarek P.
> but what if transmit queue was full (stopped),
> and TX FIFO gets stuck. Then Transmit timeout happens and
> the reset logic clears the queue. What will start the queue?
>
> Something like this:
> -----------------------------------------------------------
>
>
>
> --- a/drivers/net/sky2.c 2010-01-11 08:36:42.617426300 -0800
> +++ b/drivers/net/sky2.c 2010-01-11 08:42:51.295237661 -0800
> @@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2
>
> sky2->tx_cons = idx;
> smp_mb();
> -
> - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> - netif_wake_queue(dev);
> }
>
> static void sky2_tx_reset(struct sky2_hw *hw, unsigned port)
> @@ -2416,8 +2413,12 @@ static inline void sky2_tx_done(struct n
> {
> struct sky2_port *sky2 = netdev_priv(dev);
>
> - if (netif_running(dev))
> + if (netif_running(dev) && netif_device_present(dev)) {
> sky2_tx_complete(sky2, last);
> +
> + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> + netif_wake_queue(dev);
> + }
> }
>
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> @@ -3197,6 +3198,7 @@ static int sky2_reattach(struct net_devi
> } else {
> netif_device_attach(dev);
> sky2_set_multicast(dev);
> + netif_start_queue(dev);
> }
> }
>
>
> --
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: 2.6.33-rc3-git3: Reported regressions from 2.6.32
2010-01-11 21:47 ` Nick Bowler
@ 2010-01-11 22:10 ` Rafael J. Wysocki
0 siblings, 0 replies; 60+ messages in thread
From: Rafael J. Wysocki @ 2010-01-11 22:10 UTC (permalink / raw)
To: Nick Bowler
Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
Linus Torvalds, Natalie Protasevich, Kernel Testers List,
Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
Linux Wireless List, DRI
On Monday 11 January 2010, Nick Bowler wrote:
> On 23:27 Sun 10 Jan , Rafael J. Wysocki wrote:
> > This message contains a list of some regressions from 2.6.32, for which there
> > are no fixes in the mainline I know of. If any of them have been fixed already,
> > please let me know.
> >
> > If you know of any other unresolved regressions from 2.6.32, please let me know
> > either and I'll add them to the list. Also, please let me know if any of the
> > entries below are invalid.
>
> Here are two that I don't see in the list:
>
> * LVDS downclocking breaks on G45/thinkpad T500
>
> LKML: http://lkml.org/lkml/2009/12/26/40
> FDO: http://bugs.freedesktop.org/show_bug.cgi?id=25809
Added this one.
> * Receive failure in et131x (Alan Cox sent me a patch for this but
> it's not in mainline yet)
>
> LKML: http://lkml.org/lkml/2009/12/29/210
This one is a staging driver, so I'm not going to list it.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 16:45 ` Stephen Hemminger
2010-01-11 22:07 ` Jarek Poplawski
@ 2010-01-11 22:31 ` Jarek Poplawski
1 sibling, 0 replies; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-11 22:31 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Mike McCormack, Berck E. Nash, Rafael J. Wysocki, netdev
On Mon, Jan 11, 2010 at 08:45:04AM -0800, Stephen Hemminger wrote:
> On Mon, 11 Jan 2010 23:03:41 +0900
> Mike McCormack <mikem@ring3k.org> wrote:
>
> > Perhaps only sky2_tx_done should wake the queue?
> > Does the patch below fix the problem too?
> >
> > thanks,
> >
> > Mike
>
> The idea is good, but what if transmit queue was full (stopped),
> and TX FIFO gets stuck. Then Transmit timeout happens and
> the reset logic clears the queue. What will start the queue?
>
> Something like this:
> -----------------------------------------------------------
>
>
>
> --- a/drivers/net/sky2.c 2010-01-11 08:36:42.617426300 -0800
> +++ b/drivers/net/sky2.c 2010-01-11 08:42:51.295237661 -0800
> @@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2
>
> sky2->tx_cons = idx;
> smp_mb();
> -
> - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> - netif_wake_queue(dev);
> }
>
> static void sky2_tx_reset(struct sky2_hw *hw, unsigned port)
> @@ -2416,8 +2413,12 @@ static inline void sky2_tx_done(struct n
> {
> struct sky2_port *sky2 = netdev_priv(dev);
>
> - if (netif_running(dev))
> + if (netif_running(dev) && netif_device_present(dev)) {
> sky2_tx_complete(sky2, last);
> +
> + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> + netif_wake_queue(dev);
> + }
> }
>
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> @@ -3197,6 +3198,7 @@ static int sky2_reattach(struct net_devi
> } else {
> netif_device_attach(dev);
> sky2_set_multicast(dev);
> + netif_start_queue(dev);
Why netif_device_attach() is not enough?
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-11 22:07 ` Jarek Poplawski
@ 2010-01-12 0:14 ` David Miller
2010-01-12 7:50 ` Jarek Poplawski
0 siblings, 1 reply; 60+ messages in thread
From: David Miller @ 2010-01-12 0:14 UTC (permalink / raw)
To: jarkao2; +Cc: shemminger, mikem, flyboy, rjw, netdev
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 11 Jan 2010 23:07:54 +0100
> I don't agree: you both try to make this check more specific.
> Actually, since this problem is quite tricky, I wondered about
> making it more genaral and visible for other maintainers by
> adding something like netif_wake_queue_present() with the
> check and some comment.
This detracts from the real problem.
The issue is that we have a code path bringing a device down, which
uses helper routines which are meant to be executed when the device is
up and functioning normally.
That's the bug.
No other driver does silly things like call helper routines which wake
the TX queue when taking the chip down.
Fix that, not the immediate symptoms. Write a routine that
unconditionally clears the TX queue, frees the packets, etc. and
has none of the wake logic.
That's what most other driver do, and those that don't should be
fixed similarly.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-12 0:14 ` David Miller
@ 2010-01-12 7:50 ` Jarek Poplawski
2010-01-12 8:08 ` David Miller
0 siblings, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-12 7:50 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, mikem, flyboy, rjw, netdev
On Mon, Jan 11, 2010 at 04:14:19PM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 11 Jan 2010 23:07:54 +0100
>
> > I don't agree: you both try to make this check more specific.
> > Actually, since this problem is quite tricky, I wondered about
> > making it more genaral and visible for other maintainers by
> > adding something like netif_wake_queue_present() with the
> > check and some comment.
>
> This detracts from the real problem.
>
> The issue is that we have a code path bringing a device down, which
> uses helper routines which are meant to be executed when the device is
> up and functioning normally.
>
> That's the bug.
>
> No other driver does silly things like call helper routines which wake
> the TX queue when taking the chip down.
>
> Fix that, not the immediate symptoms. Write a routine that
> unconditionally clears the TX queue, frees the packets, etc. and
> has none of the wake logic.
>
> That's what most other driver do, and those that don't should be
> fixed similarly.
As I wrote it's tricky and could be hard to track even if you "heard"
about it, e.g. Mike moves netif_wake_queue() to another place in his
last proposal, but it still can run after netif_device_detach() until
napi_synchronize() in sky2_down().
I think, I can see similar problems e.g. in gianfar or netxen, where
napi_disable() is done after netif_device_detach(), especially in
suspend procedures (there might be less severe (than oops) effects
yet). IMHO, it all looks simply error prone (sometime you have to
know a driver well to track all possible paths to say it's really
safe).
In the meantime users are endangered with known oopses or at least
suspend problems, while we know the reasons, can fix it (even
temporarily) with one-liners, but wait for the right fixes. (Btw,
such fixes might require a significant rewrite, risking inserting
new bugs, so I doubt they are "right" for -stable.) Anyway, I'm
OK with this (I hope what should be done will be done, and I
don't have sky2 ;-)
Cheers,
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-12 7:50 ` Jarek Poplawski
@ 2010-01-12 8:08 ` David Miller
2010-01-12 8:56 ` Jarek Poplawski
0 siblings, 1 reply; 60+ messages in thread
From: David Miller @ 2010-01-12 8:08 UTC (permalink / raw)
To: jarkao2; +Cc: shemminger, mikem, flyboy, rjw, netdev
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 12 Jan 2010 07:50:59 +0000
> I think, I can see similar problems e.g. in gianfar or netxen, where
> napi_disable() is done after netif_device_detach(), especially in
> suspend procedures (there might be less severe (than oops) effects
> yet). IMHO, it all looks simply error prone (sometime you have to
> know a driver well to track all possible paths to say it's really
> safe).
Then that's an even larger bug.
Until you do napi_disable(), the device can be touched.
Asynchronous paths outside of the driver's control, even
with interrupts disabled, can call back into the driver
and touch the chip.
F.e. netpoll via netconsole output on another cpu
So it therefore must be done before doing the actual work of bringing
the device down or suspending it.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-12 8:08 ` David Miller
@ 2010-01-12 8:56 ` Jarek Poplawski
2010-01-12 9:42 ` David Miller
0 siblings, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-12 8:56 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, mikem, flyboy, rjw, netdev, Michael Breuer
On Tue, Jan 12, 2010 at 12:08:04AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 12 Jan 2010 07:50:59 +0000
>
> > I think, I can see similar problems e.g. in gianfar or netxen, where
> > napi_disable() is done after netif_device_detach(), especially in
> > suspend procedures (there might be less severe (than oops) effects
> > yet). IMHO, it all looks simply error prone (sometime you have to
> > know a driver well to track all possible paths to say it's really
> > safe).
>
> Then that's an even larger bug.
>
> Until you do napi_disable(), the device can be touched.
>
> Asynchronous paths outside of the driver's control, even
> with interrupts disabled, can call back into the driver
> and touch the chip.
>
> F.e. netpoll via netconsole output on another cpu
>
> So it therefore must be done before doing the actual work of bringing
> the device down or suspending it.
Maybe I miss something, but once more: this patch mentioned by Berck
Nash has been tested by at least two users, Berck himself, and
probably even more intensively by Michael Breuer, during af_packet
debugging. Both guys acknowledged it helped, so it can't be that bad.
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-12 8:56 ` Jarek Poplawski
@ 2010-01-12 9:42 ` David Miller
2010-01-12 10:31 ` Jarek Poplawski
2010-01-12 10:56 ` David Miller
0 siblings, 2 replies; 60+ messages in thread
From: David Miller @ 2010-01-12 9:42 UTC (permalink / raw)
To: jarkao2; +Cc: shemminger, mikem, flyboy, rjw, netdev, mbreuer
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 12 Jan 2010 08:56:34 +0000
> Maybe I miss something, but once more: this patch mentioned by Berck
> Nash has been tested by at least two users, Berck himself, and
> probably even more intensively by Michael Breuer, during af_packet
> debugging. Both guys acknowledged it helped, so it can't be that bad.
I agree, as a short term fix, this patch is fine.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-12 9:42 ` David Miller
@ 2010-01-12 10:31 ` Jarek Poplawski
2010-01-12 10:56 ` David Miller
1 sibling, 0 replies; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-12 10:31 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, mikem, flyboy, rjw, netdev, mbreuer
On Tue, Jan 12, 2010 at 01:42:18AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 12 Jan 2010 08:56:34 +0000
>
> > Maybe I miss something, but once more: this patch mentioned by Berck
> > Nash has been tested by at least two users, Berck himself, and
> > probably even more intensively by Michael Breuer, during af_packet
> > debugging. Both guys acknowledged it helped, so it can't be that bad.
>
> I agree, as a short term fix, this patch is fine.
And I agree real problems might be deeper.
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-12 9:42 ` David Miller
2010-01-12 10:31 ` Jarek Poplawski
@ 2010-01-12 10:56 ` David Miller
2010-01-12 11:04 ` Jarek Poplawski
` (2 more replies)
1 sibling, 3 replies; 60+ messages in thread
From: David Miller @ 2010-01-12 10:56 UTC (permalink / raw)
To: jarkao2; +Cc: shemminger, mikem, flyboy, rjw, netdev, mbreuer
From: David Miller <davem@davemloft.net>
Date: Tue, 12 Jan 2010 01:42:18 -0800 (PST)
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 12 Jan 2010 08:56:34 +0000
>
>> Maybe I miss something, but once more: this patch mentioned by Berck
>> Nash has been tested by at least two users, Berck himself, and
>> probably even more intensively by Michael Breuer, during af_packet
>> debugging. Both guys acknowledged it helped, so it can't be that bad.
>
> I agree, as a short term fix, this patch is fine.
In case it isn't painfully obvious, I'm applying Jarek's "v2"
fix and will also queue that up for -stable.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-12 10:56 ` David Miller
@ 2010-01-12 11:04 ` Jarek Poplawski
2010-01-12 15:39 ` Stephen Hemminger
2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger
2 siblings, 0 replies; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-12 11:04 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, mikem, flyboy, rjw, netdev, mbreuer
On Tue, Jan 12, 2010 at 02:56:20AM -0800, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 12 Jan 2010 01:42:18 -0800 (PST)
>
> > From: Jarek Poplawski <jarkao2@gmail.com>
> > Date: Tue, 12 Jan 2010 08:56:34 +0000
> >
> >> Maybe I miss something, but once more: this patch mentioned by Berck
> >> Nash has been tested by at least two users, Berck himself, and
> >> probably even more intensively by Michael Breuer, during af_packet
> >> debugging. Both guys acknowledged it helped, so it can't be that bad.
> >
> > I agree, as a short term fix, this patch is fine.
>
> In case it isn't painfully obvious, I'm applying Jarek's "v2"
> fix and will also queue that up for -stable.
>
Thanks!
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Bug #14925] sky2 panic under load
2010-01-12 10:56 ` David Miller
2010-01-12 11:04 ` Jarek Poplawski
@ 2010-01-12 15:39 ` Stephen Hemminger
2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger
2 siblings, 0 replies; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-12 15:39 UTC (permalink / raw)
To: David Miller; +Cc: mikem, flyboy, rjw, netdev, mbreuer, jarkao2
Please hold off applying anything. I am testing a cleaner solution
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH] sky2: safer transmit ring cleaning
2010-01-12 10:56 ` David Miller
2010-01-12 11:04 ` Jarek Poplawski
2010-01-12 15:39 ` Stephen Hemminger
@ 2010-01-12 16:15 ` Stephen Hemminger
2010-01-12 16:32 ` Michael Breuer
` (2 more replies)
2 siblings, 3 replies; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-12 16:15 UTC (permalink / raw)
To: David Miller; +Cc: jarkao2, mikem, flyboy, rjw, netdev, mbreuer
This code makes transmit path and transmit reset safer by:
* adding memory barrier before checking available ring slots
* reseting state of tx ring elements after free
* seperate cleanup function from ring done function
* removing mostly unused tx_next element
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
Please apply this instead of the various bits and pieces flying
around labeled as sky2 panic under load
--- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
+++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800
@@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
/* Number of list elements available for next tx */
static inline int tx_avail(const struct sky2_port *sky2)
{
+ /* Makes sure update of tx_prod from start_xmit and
+ tx_cons from tx_done are seen. */
+ smp_mb();
return sky2->tx_pending - tx_inuse(sky2);
}
@@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
return count;
}
-static void sky2_tx_unmap(struct pci_dev *pdev,
- const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
{
if (re->flags & TX_MAP_SINGLE)
pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
pci_unmap_len(re, maplen),
PCI_DMA_TODEVICE);
+ re->flags = 0;
}
/*
@@ -1804,7 +1807,8 @@ mapping_error:
}
/*
- * Free ring elements from starting at tx_cons until "done"
+ * Transmit complete processing
+ * Free ring elements from starting at tx_cons until done index
*
* NB:
* 1. The hardware will tell us about partial completion of multi-part
@@ -1813,9 +1817,9 @@ mapping_error:
* looks at the tail of the queue of FIFO (tx_cons), not
* the head (tx_prod)
*/
-static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
+static void sky2_tx_done(struct net_device *dev, u16 done)
{
- struct net_device *dev = sky2->netdev;
+ struct sky2_port *sky2 = netdev_priv(dev);
unsigned idx;
BUG_ON(done >= sky2->tx_ring_size);
@@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2
sky2_tx_unmap(sky2->hw->pdev, re);
if (skb) {
+ re->skb = NULL;
+
if (unlikely(netif_msg_tx_done(sky2)))
printk(KERN_DEBUG "%s: tx done %u\n",
dev->name, idx);
@@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2
dev->stats.tx_bytes += skb->len;
dev_kfree_skb_any(skb);
-
- sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
}
}
sky2->tx_cons = idx;
- smp_mb();
if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
netif_wake_queue(dev);
@@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw
sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
}
+static void sky2_tx_clean(struct sky2_port *sky2)
+{
+ u16 idx;
+
+ for (idx = 0; idx < sky2->tx_ring_size; idx++) {
+ struct tx_ring_info *re = sky2->tx_ring + idx;
+
+ sky2_tx_unmap(sky2->hw->pdev, re);
+ if (re->skb) {
+ dev_kfree_skb_any(re->skb);
+ re->skb = NULL;
+ }
+ }
+}
+
/* Network shutdown */
static int sky2_down(struct net_device *dev)
{
@@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device *
sky2_tx_reset(hw, port);
/* Free any pending frames stuck in HW queue */
- sky2_tx_complete(sky2, sky2->tx_prod);
-
+ sky2_tx_clean(sky2);
sky2_rx_clean(sky2);
sky2_free_buffers(sky2);
@@ -2411,15 +2428,6 @@ error:
goto resubmit;
}
-/* Transmit complete */
-static inline void sky2_tx_done(struct net_device *dev, u16 last)
-{
- struct sky2_port *sky2 = netdev_priv(dev);
-
- if (netif_running(dev))
- sky2_tx_complete(sky2, last);
-}
-
static inline void sky2_skb_rx(const struct sky2_port *sky2,
u32 status, struct sk_buff *skb)
{
@@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi
/* Dump contents of tx ring */
sop = 1;
- for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
+ for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
idx = RING_NEXT(idx, sky2->tx_ring_size)) {
const struct sky2_tx_le *le = sky2->tx_le + idx;
u32 a = le32_to_cpu(le->addr);
--- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
+++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
@@ -2187,7 +2187,6 @@ struct sky2_port {
u16 tx_ring_size;
u16 tx_cons; /* next le to check */
u16 tx_prod; /* next le to use */
- u16 tx_next; /* debug only */
u16 tx_pending;
u16 tx_last_mss;
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning
2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger
@ 2010-01-12 16:32 ` Michael Breuer
2010-01-12 17:02 ` Stephen Hemminger
2010-01-12 18:04 ` Jarek Poplawski
2010-01-12 18:35 ` [PATCH] sky2: safer transmit ring cleaning Michael Breuer
2 siblings, 1 reply; 60+ messages in thread
From: Michael Breuer @ 2010-01-12 16:32 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev
On 01/12/2010 11:15 AM, Stephen Hemminger wrote:
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
>
>
>
Stephen,
Just want to confirm that this supersedes both of the patches I'm
currently running with:
1. skb_may_pull
2. Nash patch
--
Mike
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning
2010-01-12 16:32 ` Michael Breuer
@ 2010-01-12 17:02 ` Stephen Hemminger
0 siblings, 0 replies; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-12 17:02 UTC (permalink / raw)
To: Michael Breuer; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev
On Tue, 12 Jan 2010 11:32:45 -0500
Michael Breuer <mbreuer@majjas.com> wrote:
> On 01/12/2010 11:15 AM, Stephen Hemminger wrote:
> > This code makes transmit path and transmit reset safer by:
> > * adding memory barrier before checking available ring slots
> > * reseting state of tx ring elements after free
> > * seperate cleanup function from ring done function
> > * removing mostly unused tx_next element
> >
> >
> >
> Stephen,
>
> Just want to confirm that this supersedes both of the patches I'm
> currently running with:
>
> 1. skb_may_pull
> 2. Nash patch
You want:
1. AF_PACKET patch that makes sure skb is not modified after send
2. This patch.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning
2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger
2010-01-12 16:32 ` Michael Breuer
@ 2010-01-12 18:04 ` Jarek Poplawski
2010-01-12 18:13 ` Stephen Hemminger
2010-01-12 18:35 ` [PATCH] sky2: safer transmit ring cleaning Michael Breuer
2 siblings, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-12 18:04 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer
On Tue, Jan 12, 2010 at 08:15:13AM -0800, Stephen Hemminger wrote:
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
Does this patch prevent re-enabling tx after netif_device_detach(),
e.g. when sky2_detach() and sky2_tx_done() run at the same time on
different cpus?
Jarek P.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> ---
> Please apply this instead of the various bits and pieces flying
> around labeled as sky2 panic under load
>
>
> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
> +++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800
> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + /* Makes sure update of tx_prod from start_xmit and
> + tx_cons from tx_done are seen. */
> + smp_mb();
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags & TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1804,7 +1807,8 @@ mapping_error:
> }
>
> /*
> - * Free ring elements from starting at tx_cons until "done"
> + * Transmit complete processing
> + * Free ring elements from starting at tx_cons until done index
> *
> * NB:
> * 1. The hardware will tell us about partial completion of multi-part
> @@ -1813,9 +1817,9 @@ mapping_error:
> * looks at the tail of the queue of FIFO (tx_cons), not
> * the head (tx_prod)
> */
> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
> +static void sky2_tx_done(struct net_device *dev, u16 done)
> {
> - struct net_device *dev = sky2->netdev;
> + struct sky2_port *sky2 = netdev_priv(dev);
> unsigned idx;
>
> BUG_ON(done >= sky2->tx_ring_size);
> @@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2
> sky2_tx_unmap(sky2->hw->pdev, re);
>
> if (skb) {
> + re->skb = NULL;
> +
> if (unlikely(netif_msg_tx_done(sky2)))
> printk(KERN_DEBUG "%s: tx done %u\n",
> dev->name, idx);
> @@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_bytes += skb->len;
>
> dev_kfree_skb_any(skb);
> -
> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> }
> }
>
> sky2->tx_cons = idx;
> - smp_mb();
>
> if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> netif_wake_queue(dev);
> @@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw
> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
> }
>
> +static void sky2_tx_clean(struct sky2_port *sky2)
> +{
> + u16 idx;
> +
> + for (idx = 0; idx < sky2->tx_ring_size; idx++) {
> + struct tx_ring_info *re = sky2->tx_ring + idx;
> +
> + sky2_tx_unmap(sky2->hw->pdev, re);
> + if (re->skb) {
> + dev_kfree_skb_any(re->skb);
> + re->skb = NULL;
> + }
> + }
> +}
> +
> /* Network shutdown */
> static int sky2_down(struct net_device *dev)
> {
> @@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device *
> sky2_tx_reset(hw, port);
>
> /* Free any pending frames stuck in HW queue */
> - sky2_tx_complete(sky2, sky2->tx_prod);
> -
> + sky2_tx_clean(sky2);
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> @@ -2411,15 +2428,6 @@ error:
> goto resubmit;
> }
>
> -/* Transmit complete */
> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
> -{
> - struct sky2_port *sky2 = netdev_priv(dev);
> -
> - if (netif_running(dev))
> - sky2_tx_complete(sky2, last);
> -}
> -
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> u32 status, struct sk_buff *skb)
> {
> @@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi
>
> /* Dump contents of tx ring */
> sop = 1;
> - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
> + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
> const struct sky2_tx_le *le = sky2->tx_le + idx;
> u32 a = le32_to_cpu(le->addr);
> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
> @@ -2187,7 +2187,6 @@ struct sky2_port {
> u16 tx_ring_size;
> u16 tx_cons; /* next le to check */
> u16 tx_prod; /* next le to use */
> - u16 tx_next; /* debug only */
>
> u16 tx_pending;
> u16 tx_last_mss;
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning
2010-01-12 18:04 ` Jarek Poplawski
@ 2010-01-12 18:13 ` Stephen Hemminger
2010-01-12 18:24 ` Jarek Poplawski
0 siblings, 1 reply; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-12 18:13 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer
On Tue, 12 Jan 2010 19:04:30 +0100
Jarek Poplawski <jarkao2@gmail.com> wrote:
> On Tue, Jan 12, 2010 at 08:15:13AM -0800, Stephen Hemminger wrote:
> > This code makes transmit path and transmit reset safer by:
> > * adding memory barrier before checking available ring slots
> > * reseting state of tx ring elements after free
> > * seperate cleanup function from ring done function
> > * removing mostly unused tx_next element
>
> Does this patch prevent re-enabling tx after netif_device_detach(),
> e.g. when sky2_detach() and sky2_tx_done() run at the same time on
> different cpus?
>
Yes.
The napi is disabled during the detach so transmit completion can
not be done during that period.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning
2010-01-12 18:13 ` Stephen Hemminger
@ 2010-01-12 18:24 ` Jarek Poplawski
2010-01-12 18:49 ` [PATCH] sky2: safer transmit ring cleaning (v2) Stephen Hemminger
0 siblings, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-12 18:24 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer
On Tue, Jan 12, 2010 at 10:13:06AM -0800, Stephen Hemminger wrote:
> On Tue, 12 Jan 2010 19:04:30 +0100
> Jarek Poplawski <jarkao2@gmail.com> wrote:
>
> > On Tue, Jan 12, 2010 at 08:15:13AM -0800, Stephen Hemminger wrote:
> > > This code makes transmit path and transmit reset safer by:
> > > * adding memory barrier before checking available ring slots
> > > * reseting state of tx ring elements after free
> > > * seperate cleanup function from ring done function
> > > * removing mostly unused tx_next element
> >
> > Does this patch prevent re-enabling tx after netif_device_detach(),
> > e.g. when sky2_detach() and sky2_tx_done() run at the same time on
> > different cpus?
> >
>
> Yes.
> The napi is disabled during the detach so transmit completion can
> not be done during that period.
Could you point me where exactly the napi is disabled, probably I
missed this?
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning
2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger
2010-01-12 16:32 ` Michael Breuer
2010-01-12 18:04 ` Jarek Poplawski
@ 2010-01-12 18:35 ` Michael Breuer
2010-01-12 18:42 ` Michael Breuer
2 siblings, 1 reply; 60+ messages in thread
From: Michael Breuer @ 2010-01-12 18:35 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev
On 1/12/2010 11:15 AM, Stephen Hemminger wrote:
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>
> ---
> Please apply this instead of the various bits and pieces flying
> around labeled as sky2 panic under load
>
>
> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
> +++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800
> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + /* Makes sure update of tx_prod from start_xmit and
> + tx_cons from tx_done are seen. */
> + smp_mb();
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags& TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1804,7 +1807,8 @@ mapping_error:
> }
>
> /*
> - * Free ring elements from starting at tx_cons until "done"
> + * Transmit complete processing
> + * Free ring elements from starting at tx_cons until done index
> *
> * NB:
> * 1. The hardware will tell us about partial completion of multi-part
> @@ -1813,9 +1817,9 @@ mapping_error:
> * looks at the tail of the queue of FIFO (tx_cons), not
> * the head (tx_prod)
> */
> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
> +static void sky2_tx_done(struct net_device *dev, u16 done)
> {
> - struct net_device *dev = sky2->netdev;
> + struct sky2_port *sky2 = netdev_priv(dev);
> unsigned idx;
>
> BUG_ON(done>= sky2->tx_ring_size);
> @@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2
> sky2_tx_unmap(sky2->hw->pdev, re);
>
> if (skb) {
> + re->skb = NULL;
> +
> if (unlikely(netif_msg_tx_done(sky2)))
> printk(KERN_DEBUG "%s: tx done %u\n",
> dev->name, idx);
> @@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_bytes += skb->len;
>
> dev_kfree_skb_any(skb);
> -
> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> }
> }
>
> sky2->tx_cons = idx;
> - smp_mb();
>
> if (tx_avail(sky2)> MAX_SKB_TX_LE + 4)
> netif_wake_queue(dev);
> @@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw
> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
> }
>
> +static void sky2_tx_clean(struct sky2_port *sky2)
> +{
> + u16 idx;
> +
> + for (idx = 0; idx< sky2->tx_ring_size; idx++) {
> + struct tx_ring_info *re = sky2->tx_ring + idx;
> +
> + sky2_tx_unmap(sky2->hw->pdev, re);
> + if (re->skb) {
> + dev_kfree_skb_any(re->skb);
> + re->skb = NULL;
> + }
> + }
> +}
> +
> /* Network shutdown */
> static int sky2_down(struct net_device *dev)
> {
> @@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device *
> sky2_tx_reset(hw, port);
>
> /* Free any pending frames stuck in HW queue */
> - sky2_tx_complete(sky2, sky2->tx_prod);
> -
> + sky2_tx_clean(sky2);
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> @@ -2411,15 +2428,6 @@ error:
> goto resubmit;
> }
>
> -/* Transmit complete */
> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
> -{
> - struct sky2_port *sky2 = netdev_priv(dev);
> -
> - if (netif_running(dev))
> - sky2_tx_complete(sky2, last);
> -}
> -
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> u32 status, struct sk_buff *skb)
> {
> @@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi
>
> /* Dump contents of tx ring */
> sop = 1;
> - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
> const struct sky2_tx_le *le = sky2->tx_le + idx;
> u32 a = le32_to_cpu(le->addr);
> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
> @@ -2187,7 +2187,6 @@ struct sky2_port {
> u16 tx_ring_size;
> u16 tx_cons; /* next le to check */
> u16 tx_prod; /* next le to use */
> - u16 tx_next; /* debug only */
>
> u16 tx_pending;
> u16 tx_last_mss;
>
Test observations:
1. DHCP request/response sequence having some issues... can't confirm
that it's a result of this patch, but I don't see this with the prior
patch. Prior to this patch, if I connect a new device (Blackberry in
this case) I see DHCPDISCOVER;DHCPOFFER;DHCPREQUEST;DHCPACK (just the
four messages). With this patch I'm seeing repeated attempts - i.e.,
DISCOVER/OFFER 6 times before the REQUEST/ACK. This is repeatable and
happening whether or not under load. As my original problem started with
DHCP packets, this seems interesting. I don't see any errors logged
(dmesg, messages, etc.).
2. Probably not related to this patch, but perhaps to the driver - I've
finally tracked the source of my dropped RX packets. It's happening when
a sending data to a CIFS client on a MacOS system. The RX drop rate
seems proportional to the TX rate for SMB to that client - at a tx rate
of about 200Kb/s I see about 20 dropped RX packets/sec - at 400 I see
about 40. I'm thinking therefore it's ACKs being dropped on RX. Why? no
idea (yet). Nothing reported by ethtool or netstat -s remotely
correlates to the number of dropped RX packets.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning
2010-01-12 18:35 ` [PATCH] sky2: safer transmit ring cleaning Michael Breuer
@ 2010-01-12 18:42 ` Michael Breuer
2010-01-12 20:31 ` Michael Breuer
0 siblings, 1 reply; 60+ messages in thread
From: Michael Breuer @ 2010-01-12 18:42 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev
On 1/12/2010 1:35 PM, Michael Breuer wrote:
> On 1/12/2010 11:15 AM, Stephen Hemminger wrote:
>> This code makes transmit path and transmit reset safer by:
>> * adding memory barrier before checking available ring slots
>> * reseting state of tx ring elements after free
>> * seperate cleanup function from ring done function
>> * removing mostly unused tx_next element
>>
>> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>>
>> ---
>> Please apply this instead of the various bits and pieces flying
>> around labeled as sky2 panic under load
>>
>>
>> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
>> +++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800
>> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
>> /* Number of list elements available for next tx */
>> static inline int tx_avail(const struct sky2_port *sky2)
>> {
>> + /* Makes sure update of tx_prod from start_xmit and
>> + tx_cons from tx_done are seen. */
>> + smp_mb();
>> return sky2->tx_pending - tx_inuse(sky2);
>> }
>>
>> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
>> return count;
>> }
>>
>> -static void sky2_tx_unmap(struct pci_dev *pdev,
>> - const struct tx_ring_info *re)
>> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info
>> *re)
>> {
>> if (re->flags& TX_MAP_SINGLE)
>> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
>> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
>> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
>> pci_unmap_len(re, maplen),
>> PCI_DMA_TODEVICE);
>> + re->flags = 0;
>> }
>>
>> /*
>> @@ -1804,7 +1807,8 @@ mapping_error:
>> }
>>
>> /*
>> - * Free ring elements from starting at tx_cons until "done"
>> + * Transmit complete processing
>> + * Free ring elements from starting at tx_cons until done index
>> *
>> * NB:
>> * 1. The hardware will tell us about partial completion of
>> multi-part
>> @@ -1813,9 +1817,9 @@ mapping_error:
>> * looks at the tail of the queue of FIFO (tx_cons), not
>> * the head (tx_prod)
>> */
>> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
>> +static void sky2_tx_done(struct net_device *dev, u16 done)
>> {
>> - struct net_device *dev = sky2->netdev;
>> + struct sky2_port *sky2 = netdev_priv(dev);
>> unsigned idx;
>>
>> BUG_ON(done>= sky2->tx_ring_size);
>> @@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2
>> sky2_tx_unmap(sky2->hw->pdev, re);
>>
>> if (skb) {
>> + re->skb = NULL;
>> +
>> if (unlikely(netif_msg_tx_done(sky2)))
>> printk(KERN_DEBUG "%s: tx done %u\n",
>> dev->name, idx);
>> @@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2
>> dev->stats.tx_bytes += skb->len;
>>
>> dev_kfree_skb_any(skb);
>> -
>> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
>> }
>> }
>>
>> sky2->tx_cons = idx;
>> - smp_mb();
>>
>> if (tx_avail(sky2)> MAX_SKB_TX_LE + 4)
>> netif_wake_queue(dev);
>> @@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw
>> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
>> }
>>
>> +static void sky2_tx_clean(struct sky2_port *sky2)
>> +{
>> + u16 idx;
>> +
>> + for (idx = 0; idx< sky2->tx_ring_size; idx++) {
>> + struct tx_ring_info *re = sky2->tx_ring + idx;
>> +
>> + sky2_tx_unmap(sky2->hw->pdev, re);
>> + if (re->skb) {
>> + dev_kfree_skb_any(re->skb);
>> + re->skb = NULL;
>> + }
>> + }
>> +}
>> +
>> /* Network shutdown */
>> static int sky2_down(struct net_device *dev)
>> {
>> @@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device *
>> sky2_tx_reset(hw, port);
>>
>> /* Free any pending frames stuck in HW queue */
>> - sky2_tx_complete(sky2, sky2->tx_prod);
>> -
>> + sky2_tx_clean(sky2);
>> sky2_rx_clean(sky2);
>>
>> sky2_free_buffers(sky2);
>> @@ -2411,15 +2428,6 @@ error:
>> goto resubmit;
>> }
>>
>> -/* Transmit complete */
>> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
>> -{
>> - struct sky2_port *sky2 = netdev_priv(dev);
>> -
>> - if (netif_running(dev))
>> - sky2_tx_complete(sky2, last);
>> -}
>> -
>> static inline void sky2_skb_rx(const struct sky2_port *sky2,
>> u32 status, struct sk_buff *skb)
>> {
>> @@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi
>>
>> /* Dump contents of tx ring */
>> sop = 1;
>> - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx<
>> sky2->tx_ring_size;
>> + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx<
>> sky2->tx_ring_size;
>> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
>> const struct sky2_tx_le *le = sky2->tx_le + idx;
>> u32 a = le32_to_cpu(le->addr);
>> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
>> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
>> @@ -2187,7 +2187,6 @@ struct sky2_port {
>> u16 tx_ring_size;
>> u16 tx_cons; /* next le to check */
>> u16 tx_prod; /* next le to use */
>> - u16 tx_next; /* debug only */
>>
>> u16 tx_pending;
>> u16 tx_last_mss;
> Test observations:
>
> 1. DHCP request/response sequence having some issues... can't confirm
> that it's a result of this patch, but I don't see this with the prior
> patch. Prior to this patch, if I connect a new device (Blackberry in
> this case) I see DHCPDISCOVER;DHCPOFFER;DHCPREQUEST;DHCPACK (just the
> four messages). With this patch I'm seeing repeated attempts - i.e.,
> DISCOVER/OFFER 6 times before the REQUEST/ACK. This is repeatable and
> happening whether or not under load. As my original problem started
> with DHCP packets, this seems interesting. I don't see any errors
> logged (dmesg, messages, etc.).
> 2. Probably not related to this patch, but perhaps to the driver -
> I've finally tracked the source of my dropped RX packets. It's
> happening when a sending data to a CIFS client on a MacOS system. The
> RX drop rate seems proportional to the TX rate for SMB to that client
> - at a tx rate of about 200Kb/s I see about 20 dropped RX packets/sec
> - at 400 I see about 40. I'm thinking therefore it's ACKs being
> dropped on RX. Why? no idea (yet). Nothing reported by ethtool or
> netstat -s remotely correlates to the number of dropped RX packets.
>
>
Let me add: the CIFS client from which the packets are dropped is
connected via a dd-wrt router (on wifi) connected to the sky2 1G port. A
Windows client connected directly to the 1G port does not exhibit the
same symptoms (. I'll try later the Mac directly connected & a Wintel
box over wifi if possible. The DD-WRT router (linksys wrt54g-tm) is
bridged (wifi clients on same subnet as wired & serviced by DHCPD
running on the linux box). This is also the source of the aforementioned
perhaps-flaky DHCP connections.
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH] sky2: safer transmit ring cleaning (v2)
2010-01-12 18:24 ` Jarek Poplawski
@ 2010-01-12 18:49 ` Stephen Hemminger
2010-01-12 19:16 ` Jarek Poplawski
2010-01-12 19:34 ` Michael Breuer
0 siblings, 2 replies; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-12 18:49 UTC (permalink / raw)
To: Jarek Poplawski, David Miller; +Cc: mikem, flyboy, rjw, netdev, mbreuer
This code makes transmit path and transmit reset safer by:
* adding memory barrier before checking available ring slots
* reseting state of tx ring elements after free
* seperate cleanup function from ring done function
* removing mostly unused tx_next element
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
What is supposed to happen:
* restart sky2_restart calls napi_disable while cleaning
* dev_close we can't call napi_disable() because of two ports
sharing same NAPI, so napi_synchronize() is used to make sure that
any NAPI running on other CPU has completed.
* if status is reported by chip for down device, then tx_done should ignore
But, last patch was missing last step
--- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
+++ b/drivers/net/sky2.c 2010-01-12 10:44:56.068391575 -0800
@@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
/* Number of list elements available for next tx */
static inline int tx_avail(const struct sky2_port *sky2)
{
+ /* Makes sure update of tx_prod from start_xmit and
+ tx_cons from tx_done are seen. */
+ smp_mb();
return sky2->tx_pending - tx_inuse(sky2);
}
@@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
return count;
}
-static void sky2_tx_unmap(struct pci_dev *pdev,
- const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
{
if (re->flags & TX_MAP_SINGLE)
pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
pci_unmap_len(re, maplen),
PCI_DMA_TODEVICE);
+ re->flags = 0;
}
/*
@@ -1804,7 +1807,8 @@ mapping_error:
}
/*
- * Free ring elements from starting at tx_cons until "done"
+ * Transmit complete processing
+ * Free ring elements from starting at tx_cons until done index
*
* NB:
* 1. The hardware will tell us about partial completion of multi-part
@@ -1813,11 +1817,14 @@ mapping_error:
* looks at the tail of the queue of FIFO (tx_cons), not
* the head (tx_prod)
*/
-static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
+static void sky2_tx_done(struct net_device *dev, u16 done)
{
- struct net_device *dev = sky2->netdev;
+ struct sky2_port *sky2 = netdev_priv(dev);
unsigned idx;
+ if (unlikely(!netif_running(dev)))
+ return;
+
BUG_ON(done >= sky2->tx_ring_size);
for (idx = sky2->tx_cons; idx != done;
@@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
sky2_tx_unmap(sky2->hw->pdev, re);
if (skb) {
+ re->skb = NULL;
+
if (unlikely(netif_msg_tx_done(sky2)))
printk(KERN_DEBUG "%s: tx done %u\n",
dev->name, idx);
@@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2
dev->stats.tx_bytes += skb->len;
dev_kfree_skb_any(skb);
-
- sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
}
}
sky2->tx_cons = idx;
- smp_mb();
if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
netif_wake_queue(dev);
@@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
}
+static void sky2_tx_clean(struct sky2_port *sky2)
+{
+ u16 idx;
+
+ for (idx = 0; idx < sky2->tx_ring_size; idx++) {
+ struct tx_ring_info *re = sky2->tx_ring + idx;
+
+ sky2_tx_unmap(sky2->hw->pdev, re);
+ if (re->skb) {
+ dev_kfree_skb_any(re->skb);
+ re->skb = NULL;
+ }
+ }
+}
+
/* Network shutdown */
static int sky2_down(struct net_device *dev)
{
@@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device *
sky2_tx_reset(hw, port);
/* Free any pending frames stuck in HW queue */
- sky2_tx_complete(sky2, sky2->tx_prod);
-
+ sky2_tx_clean(sky2);
sky2_rx_clean(sky2);
sky2_free_buffers(sky2);
@@ -2411,15 +2431,6 @@ error:
goto resubmit;
}
-/* Transmit complete */
-static inline void sky2_tx_done(struct net_device *dev, u16 last)
-{
- struct sky2_port *sky2 = netdev_priv(dev);
-
- if (netif_running(dev))
- sky2_tx_complete(sky2, last);
-}
-
static inline void sky2_skb_rx(const struct sky2_port *sky2,
u32 status, struct sk_buff *skb)
{
@@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
/* Dump contents of tx ring */
sop = 1;
- for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
+ for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
idx = RING_NEXT(idx, sky2->tx_ring_size)) {
const struct sky2_tx_le *le = sky2->tx_le + idx;
u32 a = le32_to_cpu(le->addr);
--- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
+++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
@@ -2187,7 +2187,6 @@ struct sky2_port {
u16 tx_ring_size;
u16 tx_cons; /* next le to check */
u16 tx_prod; /* next le to use */
- u16 tx_next; /* debug only */
u16 tx_pending;
u16 tx_last_mss;
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2)
2010-01-12 18:49 ` [PATCH] sky2: safer transmit ring cleaning (v2) Stephen Hemminger
@ 2010-01-12 19:16 ` Jarek Poplawski
2010-01-12 19:23 ` Stephen Hemminger
2010-01-12 19:34 ` Michael Breuer
1 sibling, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-12 19:16 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer
On Tue, Jan 12, 2010 at 10:49:45AM -0800, Stephen Hemminger wrote:
>
>
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> ---
>
> What is supposed to happen:
> * restart sky2_restart calls napi_disable while cleaning
Yes, but it's after the detach; similarly to sky2_suspend().
(I'm not sure how safe vs such re-enabling is sky2_set_ringparam().)
> * dev_close we can't call napi_disable() because of two ports
> sharing same NAPI, so napi_synchronize() is used to make sure that
> any NAPI running on other CPU has completed.
So it seems still endangered.
> * if status is reported by chip for down device, then tx_done should ignore
>
> But, last patch was missing last step
>
> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
> +++ b/drivers/net/sky2.c 2010-01-12 10:44:56.068391575 -0800
> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + /* Makes sure update of tx_prod from start_xmit and
> + tx_cons from tx_done are seen. */
> + smp_mb();
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags & TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1804,7 +1807,8 @@ mapping_error:
> }
>
> /*
> - * Free ring elements from starting at tx_cons until "done"
> + * Transmit complete processing
> + * Free ring elements from starting at tx_cons until done index
> *
> * NB:
> * 1. The hardware will tell us about partial completion of multi-part
> @@ -1813,11 +1817,14 @@ mapping_error:
> * looks at the tail of the queue of FIFO (tx_cons), not
> * the head (tx_prod)
> */
> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
> +static void sky2_tx_done(struct net_device *dev, u16 done)
> {
> - struct net_device *dev = sky2->netdev;
> + struct sky2_port *sky2 = netdev_priv(dev);
> unsigned idx;
>
> + if (unlikely(!netif_running(dev)))
> + return;
> +
> BUG_ON(done >= sky2->tx_ring_size);
>
> for (idx = sky2->tx_cons; idx != done;
> @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
> sky2_tx_unmap(sky2->hw->pdev, re);
>
> if (skb) {
> + re->skb = NULL;
> +
> if (unlikely(netif_msg_tx_done(sky2)))
> printk(KERN_DEBUG "%s: tx done %u\n",
> dev->name, idx);
> @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_bytes += skb->len;
>
> dev_kfree_skb_any(skb);
> -
> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> }
> }
>
> sky2->tx_cons = idx;
> - smp_mb();
>
> if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> netif_wake_queue(dev);
> @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
> }
>
> +static void sky2_tx_clean(struct sky2_port *sky2)
> +{
> + u16 idx;
> +
> + for (idx = 0; idx < sky2->tx_ring_size; idx++) {
> + struct tx_ring_info *re = sky2->tx_ring + idx;
> +
> + sky2_tx_unmap(sky2->hw->pdev, re);
> + if (re->skb) {
> + dev_kfree_skb_any(re->skb);
> + re->skb = NULL;
> + }
> + }
> +}
> +
> /* Network shutdown */
> static int sky2_down(struct net_device *dev)
> {
> @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device *
> sky2_tx_reset(hw, port);
>
> /* Free any pending frames stuck in HW queue */
> - sky2_tx_complete(sky2, sky2->tx_prod);
> -
> + sky2_tx_clean(sky2);
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> @@ -2411,15 +2431,6 @@ error:
> goto resubmit;
> }
>
> -/* Transmit complete */
> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
> -{
> - struct sky2_port *sky2 = netdev_priv(dev);
> -
> - if (netif_running(dev))
> - sky2_tx_complete(sky2, last);
> -}
> -
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> u32 status, struct sk_buff *skb)
> {
> @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
>
> /* Dump contents of tx ring */
> sop = 1;
> - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
> + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
> const struct sky2_tx_le *le = sky2->tx_le + idx;
> u32 a = le32_to_cpu(le->addr);
> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
> @@ -2187,7 +2187,6 @@ struct sky2_port {
> u16 tx_ring_size;
> u16 tx_cons; /* next le to check */
> u16 tx_prod; /* next le to use */
> - u16 tx_next; /* debug only */
>
> u16 tx_pending;
> u16 tx_last_mss;
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2)
2010-01-12 19:16 ` Jarek Poplawski
@ 2010-01-12 19:23 ` Stephen Hemminger
2010-01-12 19:50 ` Jarek Poplawski
0 siblings, 1 reply; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-12 19:23 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer
On Tue, 12 Jan 2010 20:16:11 +0100
Jarek Poplawski <jarkao2@gmail.com> wrote:
> >
> > What is supposed to happen:
> > * restart sky2_restart calls napi_disable while cleaning
>
> Yes, but it's after the detach; similarly to sky2_suspend().
> (I'm not sure how safe vs such re-enabling is sky2_set_ringparam().
set_ringparam happens under rtnl_lock() so reset and ringparams can't
conflict.
>
> > * dev_close we can't call napi_disable() because of two ports
> > sharing same NAPI, so napi_synchronize() is used to make sure that
> > any NAPI running on other CPU has completed.
>
> So it seems still endangered.
It was but not in revised v2 patch.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2)
2010-01-12 18:49 ` [PATCH] sky2: safer transmit ring cleaning (v2) Stephen Hemminger
2010-01-12 19:16 ` Jarek Poplawski
@ 2010-01-12 19:34 ` Michael Breuer
1 sibling, 0 replies; 60+ messages in thread
From: Michael Breuer @ 2010-01-12 19:34 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Jarek Poplawski, David Miller, mikem, flyboy, rjw, netdev
On 1/12/2010 1:49 PM, Stephen Hemminger wrote:
>
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>
> ---
>
> What is supposed to happen:
> * restart sky2_restart calls napi_disable while cleaning
> * dev_close we can't call napi_disable() because of two ports
> sharing same NAPI, so napi_synchronize() is used to make sure that
> any NAPI running on other CPU has completed.
> * if status is reported by chip for down device, then tx_done should ignore
>
> But, last patch was missing last step
>
> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
> +++ b/drivers/net/sky2.c 2010-01-12 10:44:56.068391575 -0800
> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + /* Makes sure update of tx_prod from start_xmit and
> + tx_cons from tx_done are seen. */
> + smp_mb();
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags& TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1804,7 +1807,8 @@ mapping_error:
> }
>
> /*
> - * Free ring elements from starting at tx_cons until "done"
> + * Transmit complete processing
> + * Free ring elements from starting at tx_cons until done index
> *
> * NB:
> * 1. The hardware will tell us about partial completion of multi-part
> @@ -1813,11 +1817,14 @@ mapping_error:
> * looks at the tail of the queue of FIFO (tx_cons), not
> * the head (tx_prod)
> */
> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
> +static void sky2_tx_done(struct net_device *dev, u16 done)
> {
> - struct net_device *dev = sky2->netdev;
> + struct sky2_port *sky2 = netdev_priv(dev);
> unsigned idx;
>
> + if (unlikely(!netif_running(dev)))
> + return;
> +
> BUG_ON(done>= sky2->tx_ring_size);
>
> for (idx = sky2->tx_cons; idx != done;
> @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
> sky2_tx_unmap(sky2->hw->pdev, re);
>
> if (skb) {
> + re->skb = NULL;
> +
> if (unlikely(netif_msg_tx_done(sky2)))
> printk(KERN_DEBUG "%s: tx done %u\n",
> dev->name, idx);
> @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_bytes += skb->len;
>
> dev_kfree_skb_any(skb);
> -
> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> }
> }
>
> sky2->tx_cons = idx;
> - smp_mb();
>
> if (tx_avail(sky2)> MAX_SKB_TX_LE + 4)
> netif_wake_queue(dev);
> @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
> }
>
> +static void sky2_tx_clean(struct sky2_port *sky2)
> +{
> + u16 idx;
> +
> + for (idx = 0; idx< sky2->tx_ring_size; idx++) {
> + struct tx_ring_info *re = sky2->tx_ring + idx;
> +
> + sky2_tx_unmap(sky2->hw->pdev, re);
> + if (re->skb) {
> + dev_kfree_skb_any(re->skb);
> + re->skb = NULL;
> + }
> + }
> +}
> +
> /* Network shutdown */
> static int sky2_down(struct net_device *dev)
> {
> @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device *
> sky2_tx_reset(hw, port);
>
> /* Free any pending frames stuck in HW queue */
> - sky2_tx_complete(sky2, sky2->tx_prod);
> -
> + sky2_tx_clean(sky2);
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> @@ -2411,15 +2431,6 @@ error:
> goto resubmit;
> }
>
> -/* Transmit complete */
> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
> -{
> - struct sky2_port *sky2 = netdev_priv(dev);
> -
> - if (netif_running(dev))
> - sky2_tx_complete(sky2, last);
> -}
> -
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> u32 status, struct sk_buff *skb)
> {
> @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
>
> /* Dump contents of tx ring */
> sop = 1;
> - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
> const struct sky2_tx_le *le = sky2->tx_le + idx;
> u32 a = le32_to_cpu(le->addr);
> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
> @@ -2187,7 +2187,6 @@ struct sky2_port {
> u16 tx_ring_size;
> u16 tx_cons; /* next le to check */
> u16 tx_prod; /* next le to use */
> - u16 tx_next; /* debug only */
>
> u16 tx_pending;
> u16 tx_last_mss;
>
Testing observation:
This makes no sense to me, but the DHCP multiple request/inform issue I
noted with V1 of this patch is gone with V2 (this is a repeatable test).
I don't see how this patch could make the difference as the device
should never be down during this test. What am I missing?
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2)
2010-01-12 19:23 ` Stephen Hemminger
@ 2010-01-12 19:50 ` Jarek Poplawski
2010-01-13 1:23 ` Stephen Hemminger
0 siblings, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-12 19:50 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer
On Tue, Jan 12, 2010 at 11:23:14AM -0800, Stephen Hemminger wrote:
> On Tue, 12 Jan 2010 20:16:11 +0100
> Jarek Poplawski <jarkao2@gmail.com> wrote:
>
> > >
> > > What is supposed to happen:
> > > * restart sky2_restart calls napi_disable while cleaning
> >
> > Yes, but it's after the detach; similarly to sky2_suspend().
> > (I'm not sure how safe vs such re-enabling is sky2_set_ringparam().
>
> set_ringparam happens under rtnl_lock() so reset and ringparams can't
> conflict.
I didn't mean reset. I meant tx (dev_queue_xmit()) during ringparams.
>
> >
> > > * dev_close we can't call napi_disable() because of two ports
> > > sharing same NAPI, so napi_synchronize() is used to make sure that
> > > any NAPI running on other CPU has completed.
> >
> > So it seems still endangered.
>
> It was but not in revised v2 patch.
Sorry, I meant sky2_down() (except in dev_close()).
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning
2010-01-12 18:42 ` Michael Breuer
@ 2010-01-12 20:31 ` Michael Breuer
2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger
0 siblings, 1 reply; 60+ messages in thread
From: Michael Breuer @ 2010-01-12 20:31 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev
On 1/12/2010 1:42 PM, Michael Breuer wrote:
> On 1/12/2010 1:35 PM, Michael Breuer wrote:
>> On 1/12/2010 11:15 AM, Stephen Hemminger wrote:
>>> This code makes transmit path and transmit reset safer by:
>>> * adding memory barrier before checking available ring slots
>>> * reseting state of tx ring elements after free
>>> * seperate cleanup function from ring done function
>>> * removing mostly unused tx_next element
>>>
>>> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>>>
>>> ---
>>> Please apply this instead of the various bits and pieces flying
>>> around labeled as sky2 panic under load
>>>
>>>
>>> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
>>> +++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800
>>> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
>>> /* Number of list elements available for next tx */
>>> static inline int tx_avail(const struct sky2_port *sky2)
>>> {
>>> + /* Makes sure update of tx_prod from start_xmit and
>>> + tx_cons from tx_done are seen. */
>>> + smp_mb();
>>> return sky2->tx_pending - tx_inuse(sky2);
>>> }
>>>
>>> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
>>> return count;
>>> }
>>>
>>> -static void sky2_tx_unmap(struct pci_dev *pdev,
>>> - const struct tx_ring_info *re)
>>> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info
>>> *re)
>>> {
>>> if (re->flags& TX_MAP_SINGLE)
>>> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
>>> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
>>> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
>>> pci_unmap_len(re, maplen),
>>> PCI_DMA_TODEVICE);
>>> + re->flags = 0;
>>> }
>>>
>>> /*
>>> @@ -1804,7 +1807,8 @@ mapping_error:
>>> }
>>>
>>> /*
>>> - * Free ring elements from starting at tx_cons until "done"
>>> + * Transmit complete processing
>>> + * Free ring elements from starting at tx_cons until done index
>>> *
>>> * NB:
>>> * 1. The hardware will tell us about partial completion of
>>> multi-part
>>> @@ -1813,9 +1817,9 @@ mapping_error:
>>> * looks at the tail of the queue of FIFO (tx_cons), not
>>> * the head (tx_prod)
>>> */
>>> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
>>> +static void sky2_tx_done(struct net_device *dev, u16 done)
>>> {
>>> - struct net_device *dev = sky2->netdev;
>>> + struct sky2_port *sky2 = netdev_priv(dev);
>>> unsigned idx;
>>>
>>> BUG_ON(done>= sky2->tx_ring_size);
>>> @@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2
>>> sky2_tx_unmap(sky2->hw->pdev, re);
>>>
>>> if (skb) {
>>> + re->skb = NULL;
>>> +
>>> if (unlikely(netif_msg_tx_done(sky2)))
>>> printk(KERN_DEBUG "%s: tx done %u\n",
>>> dev->name, idx);
>>> @@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2
>>> dev->stats.tx_bytes += skb->len;
>>>
>>> dev_kfree_skb_any(skb);
>>> -
>>> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
>>> }
>>> }
>>>
>>> sky2->tx_cons = idx;
>>> - smp_mb();
>>>
>>> if (tx_avail(sky2)> MAX_SKB_TX_LE + 4)
>>> netif_wake_queue(dev);
>>> @@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw
>>> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
>>> }
>>>
>>> +static void sky2_tx_clean(struct sky2_port *sky2)
>>> +{
>>> + u16 idx;
>>> +
>>> + for (idx = 0; idx< sky2->tx_ring_size; idx++) {
>>> + struct tx_ring_info *re = sky2->tx_ring + idx;
>>> +
>>> + sky2_tx_unmap(sky2->hw->pdev, re);
>>> + if (re->skb) {
>>> + dev_kfree_skb_any(re->skb);
>>> + re->skb = NULL;
>>> + }
>>> + }
>>> +}
>>> +
>>> /* Network shutdown */
>>> static int sky2_down(struct net_device *dev)
>>> {
>>> @@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device *
>>> sky2_tx_reset(hw, port);
>>>
>>> /* Free any pending frames stuck in HW queue */
>>> - sky2_tx_complete(sky2, sky2->tx_prod);
>>> -
>>> + sky2_tx_clean(sky2);
>>> sky2_rx_clean(sky2);
>>>
>>> sky2_free_buffers(sky2);
>>> @@ -2411,15 +2428,6 @@ error:
>>> goto resubmit;
>>> }
>>>
>>> -/* Transmit complete */
>>> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
>>> -{
>>> - struct sky2_port *sky2 = netdev_priv(dev);
>>> -
>>> - if (netif_running(dev))
>>> - sky2_tx_complete(sky2, last);
>>> -}
>>> -
>>> static inline void sky2_skb_rx(const struct sky2_port *sky2,
>>> u32 status, struct sk_buff *skb)
>>> {
>>> @@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi
>>>
>>> /* Dump contents of tx ring */
>>> sop = 1;
>>> - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx<
>>> sky2->tx_ring_size;
>>> + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx<
>>> sky2->tx_ring_size;
>>> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
>>> const struct sky2_tx_le *le = sky2->tx_le + idx;
>>> u32 a = le32_to_cpu(le->addr);
>>> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
>>> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
>>> @@ -2187,7 +2187,6 @@ struct sky2_port {
>>> u16 tx_ring_size;
>>> u16 tx_cons; /* next le to check */
>>> u16 tx_prod; /* next le to use */
>>> - u16 tx_next; /* debug only */
>>>
>>> u16 tx_pending;
>>> u16 tx_last_mss;
>> Test observations:
>>
>> 1. DHCP request/response sequence having some issues... can't confirm
>> that it's a result of this patch, but I don't see this with the prior
>> patch. Prior to this patch, if I connect a new device (Blackberry in
>> this case) I see DHCPDISCOVER;DHCPOFFER;DHCPREQUEST;DHCPACK (just the
>> four messages). With this patch I'm seeing repeated attempts - i.e.,
>> DISCOVER/OFFER 6 times before the REQUEST/ACK. This is repeatable
>> and happening whether or not under load. As my original problem
>> started with DHCP packets, this seems interesting. I don't see any
>> errors logged (dmesg, messages, etc.).
>> 2. Probably not related to this patch, but perhaps to the driver -
>> I've finally tracked the source of my dropped RX packets. It's
>> happening when a sending data to a CIFS client on a MacOS system. The
>> RX drop rate seems proportional to the TX rate for SMB to that client
>> - at a tx rate of about 200Kb/s I see about 20 dropped RX packets/sec
>> - at 400 I see about 40. I'm thinking therefore it's ACKs being
>> dropped on RX. Why? no idea (yet). Nothing reported by ethtool or
>> netstat -s remotely correlates to the number of dropped RX packets.
>>
>>
> Let me add: the CIFS client from which the packets are dropped is
> connected via a dd-wrt router (on wifi) connected to the sky2 1G port.
> A Windows client connected directly to the 1G port does not exhibit
> the same symptoms (. I'll try later the Mac directly connected & a
> Wintel box over wifi if possible. The DD-WRT router (linksys
> wrt54g-tm) is bridged (wifi clients on same subnet as wired & serviced
> by DHCPD running on the linux box). This is also the source of the
> aforementioned perhaps-flaky DHCP connections.
Ok - a little more on the dropped packets - only happens when connected
through the wifi router - but happens using a wired connection as well
(via the router's ethernet port).
From sniffer traces: I see large frames outbound (even though
MTU=1500). I see the fragmented result arriving on the remote side. I
see ACK's for each of the fragmented frames outbound from the Mac, but
not received on the Linux side.
I also don't see any retransmissions or duplicate ACKS on either sniffer
trace. I'm wondering whether there is fragmentation offload to the
Yukon2, and whether the individual fragment acks are what's showing up
dropped. If so, I don't understand why it would only happen with the
wifi router vs. switch in the middle. Maybe the router is doing
something to the packets which is causing the Yukon to forward the acks
up differently? The router MTU is also 1500 on all ports, and does not
show dropped packets or errors corresponding to what I'm seeing on the
sky2 adapter.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2)
2010-01-12 19:50 ` Jarek Poplawski
@ 2010-01-13 1:23 ` Stephen Hemminger
0 siblings, 0 replies; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-13 1:23 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer
On Tue, 12 Jan 2010 20:50:04 +0100
Jarek Poplawski <jarkao2@gmail.com> wrote:
> On Tue, Jan 12, 2010 at 11:23:14AM -0800, Stephen Hemminger wrote:
> > On Tue, 12 Jan 2010 20:16:11 +0100
> > Jarek Poplawski <jarkao2@gmail.com> wrote:
> >
> > > >
> > > > What is supposed to happen:
> > > > * restart sky2_restart calls napi_disable while cleaning
> > >
> > > Yes, but it's after the detach; similarly to sky2_suspend().
> > > (I'm not sure how safe vs such re-enabling is sky2_set_ringparam().
> >
> > set_ringparam happens under rtnl_lock() so reset and ringparams can't
> > conflict.
>
> I didn't mean reset. I meant tx (dev_queue_xmit()) during ringparams.
sky2_detach does
tx_lock
netif_detach() -- stops the queue
tx_unlock
So if another CPU was about to send, it will have to wait
behind the tx_global_lock, and then it will see the queue as frozen.
BUT, you prod me to look harder and there is a race with
softirq (bottom half) here that needs to be fixed.
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH] sky2: safer transmit ring cleaning (v3)
2010-01-12 20:31 ` Michael Breuer
@ 2010-01-13 4:10 ` Stephen Hemminger
2010-01-13 4:31 ` Michael Breuer
` (2 more replies)
0 siblings, 3 replies; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-13 4:10 UTC (permalink / raw)
To: Michael Breuer, David Miller; +Cc: jarkao2, mikem, flyboy, rjw, netdev
Subject: sky2: safer transmit cleanup
This code makes transmit path and transmit reset safer by:
* adding memory barrier before checking available ring slots
* reseting state of tx ring elements after free
* seperate cleanup function from ring done function
* removing mostly unused tx_next element
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
This version adds missing _bh to sky2_detach
--- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
+++ b/drivers/net/sky2.c 2010-01-12 17:21:58.415268802 -0800
@@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
/* Number of list elements available for next tx */
static inline int tx_avail(const struct sky2_port *sky2)
{
+ /* Makes sure update of tx_prod from start_xmit and
+ tx_cons from tx_done are seen. */
+ smp_mb();
return sky2->tx_pending - tx_inuse(sky2);
}
@@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
return count;
}
-static void sky2_tx_unmap(struct pci_dev *pdev,
- const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
{
if (re->flags & TX_MAP_SINGLE)
pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
pci_unmap_len(re, maplen),
PCI_DMA_TODEVICE);
+ re->flags = 0;
}
/*
@@ -1804,7 +1807,8 @@ mapping_error:
}
/*
- * Free ring elements from starting at tx_cons until "done"
+ * Transmit complete processing
+ * Free ring elements from starting at tx_cons until done index
*
* NB:
* 1. The hardware will tell us about partial completion of multi-part
@@ -1813,11 +1817,14 @@ mapping_error:
* looks at the tail of the queue of FIFO (tx_cons), not
* the head (tx_prod)
*/
-static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
+static void sky2_tx_done(struct net_device *dev, u16 done)
{
- struct net_device *dev = sky2->netdev;
+ struct sky2_port *sky2 = netdev_priv(dev);
unsigned idx;
+ if (unlikely(!netif_running(dev)))
+ return;
+
BUG_ON(done >= sky2->tx_ring_size);
for (idx = sky2->tx_cons; idx != done;
@@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
sky2_tx_unmap(sky2->hw->pdev, re);
if (skb) {
+ re->skb = NULL;
+
if (unlikely(netif_msg_tx_done(sky2)))
printk(KERN_DEBUG "%s: tx done %u\n",
dev->name, idx);
@@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2
dev->stats.tx_bytes += skb->len;
dev_kfree_skb_any(skb);
-
- sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
}
}
sky2->tx_cons = idx;
- smp_mb();
if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
netif_wake_queue(dev);
@@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
}
+static void sky2_tx_clean(struct sky2_port *sky2)
+{
+ u16 idx;
+
+ for (idx = 0; idx < sky2->tx_ring_size; idx++) {
+ struct tx_ring_info *re = sky2->tx_ring + idx;
+
+ sky2_tx_unmap(sky2->hw->pdev, re);
+ if (re->skb) {
+ dev_kfree_skb_any(re->skb);
+ re->skb = NULL;
+ }
+ }
+}
+
/* Network shutdown */
static int sky2_down(struct net_device *dev)
{
@@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device *
sky2_tx_reset(hw, port);
/* Free any pending frames stuck in HW queue */
- sky2_tx_complete(sky2, sky2->tx_prod);
-
+ sky2_tx_clean(sky2);
sky2_rx_clean(sky2);
sky2_free_buffers(sky2);
@@ -2411,15 +2431,6 @@ error:
goto resubmit;
}
-/* Transmit complete */
-static inline void sky2_tx_done(struct net_device *dev, u16 last)
-{
- struct sky2_port *sky2 = netdev_priv(dev);
-
- if (netif_running(dev))
- sky2_tx_complete(sky2, last);
-}
-
static inline void sky2_skb_rx(const struct sky2_port *sky2,
u32 status, struct sk_buff *skb)
{
@@ -3176,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h
static void sky2_detach(struct net_device *dev)
{
if (netif_running(dev)) {
- netif_tx_lock(dev);
+ netif_tx_lock_bh(dev);
netif_device_detach(dev); /* stop txq */
- netif_tx_unlock(dev);
+ netif_tx_unlock_bh(dev);
sky2_down(dev);
}
}
@@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
/* Dump contents of tx ring */
sop = 1;
- for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
+ for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
idx = RING_NEXT(idx, sky2->tx_ring_size)) {
const struct sky2_tx_le *le = sky2->tx_le + idx;
u32 a = le32_to_cpu(le->addr);
--- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
+++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
@@ -2187,7 +2187,6 @@ struct sky2_port {
u16 tx_ring_size;
u16 tx_cons; /* next le to check */
u16 tx_prod; /* next le to use */
- u16 tx_next; /* debug only */
u16 tx_pending;
u16 tx_last_mss;
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v3)
2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger
@ 2010-01-13 4:31 ` Michael Breuer
2010-01-13 7:35 ` Jarek Poplawski
2010-01-13 16:04 ` Michael Breuer
2 siblings, 0 replies; 60+ messages in thread
From: Michael Breuer @ 2010-01-13 4:31 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev
On 01/12/2010 11:10 PM, Stephen Hemminger wrote:
> Subject: sky2: safer transmit cleanup
>
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>
> ---
> This version adds missing _bh to sky2_detach
>
>
> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
> +++ b/drivers/net/sky2.c 2010-01-12 17:21:58.415268802 -0800
> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + /* Makes sure update of tx_prod from start_xmit and
> + tx_cons from tx_done are seen. */
> + smp_mb();
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags& TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1804,7 +1807,8 @@ mapping_error:
> }
>
> /*
> - * Free ring elements from starting at tx_cons until "done"
> + * Transmit complete processing
> + * Free ring elements from starting at tx_cons until done index
> *
> * NB:
> * 1. The hardware will tell us about partial completion of multi-part
> @@ -1813,11 +1817,14 @@ mapping_error:
> * looks at the tail of the queue of FIFO (tx_cons), not
> * the head (tx_prod)
> */
> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
> +static void sky2_tx_done(struct net_device *dev, u16 done)
> {
> - struct net_device *dev = sky2->netdev;
> + struct sky2_port *sky2 = netdev_priv(dev);
> unsigned idx;
>
> + if (unlikely(!netif_running(dev)))
> + return;
> +
> BUG_ON(done>= sky2->tx_ring_size);
>
> for (idx = sky2->tx_cons; idx != done;
> @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
> sky2_tx_unmap(sky2->hw->pdev, re);
>
> if (skb) {
> + re->skb = NULL;
> +
> if (unlikely(netif_msg_tx_done(sky2)))
> printk(KERN_DEBUG "%s: tx done %u\n",
> dev->name, idx);
> @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_bytes += skb->len;
>
> dev_kfree_skb_any(skb);
> -
> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> }
> }
>
> sky2->tx_cons = idx;
> - smp_mb();
>
> if (tx_avail(sky2)> MAX_SKB_TX_LE + 4)
> netif_wake_queue(dev);
> @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
> }
>
> +static void sky2_tx_clean(struct sky2_port *sky2)
> +{
> + u16 idx;
> +
> + for (idx = 0; idx< sky2->tx_ring_size; idx++) {
> + struct tx_ring_info *re = sky2->tx_ring + idx;
> +
> + sky2_tx_unmap(sky2->hw->pdev, re);
> + if (re->skb) {
> + dev_kfree_skb_any(re->skb);
> + re->skb = NULL;
> + }
> + }
> +}
> +
> /* Network shutdown */
> static int sky2_down(struct net_device *dev)
> {
> @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device *
> sky2_tx_reset(hw, port);
>
> /* Free any pending frames stuck in HW queue */
> - sky2_tx_complete(sky2, sky2->tx_prod);
> -
> + sky2_tx_clean(sky2);
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> @@ -2411,15 +2431,6 @@ error:
> goto resubmit;
> }
>
> -/* Transmit complete */
> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
> -{
> - struct sky2_port *sky2 = netdev_priv(dev);
> -
> - if (netif_running(dev))
> - sky2_tx_complete(sky2, last);
> -}
> -
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> u32 status, struct sk_buff *skb)
> {
> @@ -3176,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h
> static void sky2_detach(struct net_device *dev)
> {
> if (netif_running(dev)) {
> - netif_tx_lock(dev);
> + netif_tx_lock_bh(dev);
> netif_device_detach(dev); /* stop txq */
> - netif_tx_unlock(dev);
> + netif_tx_unlock_bh(dev);
> sky2_down(dev);
> }
> }
> @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
>
> /* Dump contents of tx ring */
> sop = 1;
> - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
> const struct sky2_tx_le *le = sky2->tx_le + idx;
> u32 a = le32_to_cpu(le->addr);
> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
> @@ -2187,7 +2187,6 @@ struct sky2_port {
> u16 tx_ring_size;
> u16 tx_cons; /* next le to check */
> u16 tx_prod; /* next le to use */
> - u16 tx_next; /* debug only */
>
> u16 tx_pending;
> u16 tx_last_mss;
>
Will test later - just an FYI - hunk 11 depends on an earlier patch
which added the netif_tx_lock(dev). I don't think that was committed.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v3)
2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger
2010-01-13 4:31 ` Michael Breuer
@ 2010-01-13 7:35 ` Jarek Poplawski
2010-01-13 16:04 ` Michael Breuer
2 siblings, 0 replies; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-13 7:35 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Michael Breuer, David Miller, mikem, flyboy, rjw, netdev
On Tue, Jan 12, 2010 at 08:10:12PM -0800, Stephen Hemminger wrote:
> Subject: sky2: safer transmit cleanup
>
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
I agree this patch makes it safer, but still not really safe, at least
compared with the "sky2: Fix oops in sky2_xmit_frame() after TX
timeout" patch.
Btw, I'm not sure adding netif_running() test to protect dev_close()
is really needed, because it's after dev_deactivate(), so re-enabled
tx doesn't probably matter.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> ---
> This version adds missing _bh to sky2_detach
As I wrote before, I doubt adding tx lock alone at this place is the
proper fix for anything (the rx (napi) race is the real problem), but
without this _bh it's a bug.
Jarek P.
>
>
> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
> +++ b/drivers/net/sky2.c 2010-01-12 17:21:58.415268802 -0800
> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + /* Makes sure update of tx_prod from start_xmit and
> + tx_cons from tx_done are seen. */
> + smp_mb();
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags & TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1804,7 +1807,8 @@ mapping_error:
> }
>
> /*
> - * Free ring elements from starting at tx_cons until "done"
> + * Transmit complete processing
> + * Free ring elements from starting at tx_cons until done index
> *
> * NB:
> * 1. The hardware will tell us about partial completion of multi-part
> @@ -1813,11 +1817,14 @@ mapping_error:
> * looks at the tail of the queue of FIFO (tx_cons), not
> * the head (tx_prod)
> */
> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
> +static void sky2_tx_done(struct net_device *dev, u16 done)
> {
> - struct net_device *dev = sky2->netdev;
> + struct sky2_port *sky2 = netdev_priv(dev);
> unsigned idx;
>
> + if (unlikely(!netif_running(dev)))
> + return;
> +
> BUG_ON(done >= sky2->tx_ring_size);
>
> for (idx = sky2->tx_cons; idx != done;
> @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
> sky2_tx_unmap(sky2->hw->pdev, re);
>
> if (skb) {
> + re->skb = NULL;
> +
> if (unlikely(netif_msg_tx_done(sky2)))
> printk(KERN_DEBUG "%s: tx done %u\n",
> dev->name, idx);
> @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_bytes += skb->len;
>
> dev_kfree_skb_any(skb);
> -
> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> }
> }
>
> sky2->tx_cons = idx;
> - smp_mb();
>
> if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> netif_wake_queue(dev);
> @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
> }
>
> +static void sky2_tx_clean(struct sky2_port *sky2)
> +{
> + u16 idx;
> +
> + for (idx = 0; idx < sky2->tx_ring_size; idx++) {
> + struct tx_ring_info *re = sky2->tx_ring + idx;
> +
> + sky2_tx_unmap(sky2->hw->pdev, re);
> + if (re->skb) {
> + dev_kfree_skb_any(re->skb);
> + re->skb = NULL;
> + }
> + }
> +}
> +
> /* Network shutdown */
> static int sky2_down(struct net_device *dev)
> {
> @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device *
> sky2_tx_reset(hw, port);
>
> /* Free any pending frames stuck in HW queue */
> - sky2_tx_complete(sky2, sky2->tx_prod);
> -
> + sky2_tx_clean(sky2);
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> @@ -2411,15 +2431,6 @@ error:
> goto resubmit;
> }
>
> -/* Transmit complete */
> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
> -{
> - struct sky2_port *sky2 = netdev_priv(dev);
> -
> - if (netif_running(dev))
> - sky2_tx_complete(sky2, last);
> -}
> -
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> u32 status, struct sk_buff *skb)
> {
> @@ -3176,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h
> static void sky2_detach(struct net_device *dev)
> {
> if (netif_running(dev)) {
> - netif_tx_lock(dev);
> + netif_tx_lock_bh(dev);
> netif_device_detach(dev); /* stop txq */
> - netif_tx_unlock(dev);
> + netif_tx_unlock_bh(dev);
> sky2_down(dev);
> }
> }
> @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
>
> /* Dump contents of tx ring */
> sop = 1;
> - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
> + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
> const struct sky2_tx_le *le = sky2->tx_le + idx;
> u32 a = le32_to_cpu(le->addr);
> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
> @@ -2187,7 +2187,6 @@ struct sky2_port {
> u16 tx_ring_size;
> u16 tx_cons; /* next le to check */
> u16 tx_prod; /* next le to use */
> - u16 tx_next; /* debug only */
>
> u16 tx_pending;
> u16 tx_last_mss;
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v3)
2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger
2010-01-13 4:31 ` Michael Breuer
2010-01-13 7:35 ` Jarek Poplawski
@ 2010-01-13 16:04 ` Michael Breuer
2010-01-14 3:41 ` [PATCH] sky2: safer transmit ring cleaning (v4) Stephen Hemminger
2 siblings, 1 reply; 60+ messages in thread
From: Michael Breuer @ 2010-01-13 16:04 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev
On 1/12/2010 11:10 PM, Stephen Hemminger wrote:
> Subject: sky2: safer transmit cleanup
>
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>
> ---
> This version adds missing _bh to sky2_detach
>
>
> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800
> +++ b/drivers/net/sky2.c 2010-01-12 17:21:58.415268802 -0800
> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + /* Makes sure update of tx_prod from start_xmit and
> + tx_cons from tx_done are seen. */
> + smp_mb();
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags& TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1804,7 +1807,8 @@ mapping_error:
> }
>
> /*
> - * Free ring elements from starting at tx_cons until "done"
> + * Transmit complete processing
> + * Free ring elements from starting at tx_cons until done index
> *
> * NB:
> * 1. The hardware will tell us about partial completion of multi-part
> @@ -1813,11 +1817,14 @@ mapping_error:
> * looks at the tail of the queue of FIFO (tx_cons), not
> * the head (tx_prod)
> */
> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
> +static void sky2_tx_done(struct net_device *dev, u16 done)
> {
> - struct net_device *dev = sky2->netdev;
> + struct sky2_port *sky2 = netdev_priv(dev);
> unsigned idx;
>
> + if (unlikely(!netif_running(dev)))
> + return;
> +
> BUG_ON(done>= sky2->tx_ring_size);
>
> for (idx = sky2->tx_cons; idx != done;
> @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
> sky2_tx_unmap(sky2->hw->pdev, re);
>
> if (skb) {
> + re->skb = NULL;
> +
> if (unlikely(netif_msg_tx_done(sky2)))
> printk(KERN_DEBUG "%s: tx done %u\n",
> dev->name, idx);
> @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_bytes += skb->len;
>
> dev_kfree_skb_any(skb);
> -
> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> }
> }
>
> sky2->tx_cons = idx;
> - smp_mb();
>
> if (tx_avail(sky2)> MAX_SKB_TX_LE + 4)
> netif_wake_queue(dev);
> @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
> }
>
> +static void sky2_tx_clean(struct sky2_port *sky2)
> +{
> + u16 idx;
> +
> + for (idx = 0; idx< sky2->tx_ring_size; idx++) {
> + struct tx_ring_info *re = sky2->tx_ring + idx;
> +
> + sky2_tx_unmap(sky2->hw->pdev, re);
> + if (re->skb) {
> + dev_kfree_skb_any(re->skb);
> + re->skb = NULL;
> + }
> + }
> +}
> +
> /* Network shutdown */
> static int sky2_down(struct net_device *dev)
> {
> @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device *
> sky2_tx_reset(hw, port);
>
> /* Free any pending frames stuck in HW queue */
> - sky2_tx_complete(sky2, sky2->tx_prod);
> -
> + sky2_tx_clean(sky2);
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> @@ -2411,15 +2431,6 @@ error:
> goto resubmit;
> }
>
> -/* Transmit complete */
> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
> -{
> - struct sky2_port *sky2 = netdev_priv(dev);
> -
> - if (netif_running(dev))
> - sky2_tx_complete(sky2, last);
> -}
> -
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> u32 status, struct sk_buff *skb)
> {
> @@ -3176,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h
> static void sky2_detach(struct net_device *dev)
> {
> if (netif_running(dev)) {
> - netif_tx_lock(dev);
> + netif_tx_lock_bh(dev);
> netif_device_detach(dev); /* stop txq */
> - netif_tx_unlock(dev);
> + netif_tx_unlock_bh(dev);
> sky2_down(dev);
> }
> }
> @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
>
> /* Dump contents of tx ring */
> sop = 1;
> - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
> const struct sky2_tx_le *le = sky2->tx_le + idx;
> u32 a = le32_to_cpu(le->addr);
> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800
> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800
> @@ -2187,7 +2187,6 @@ struct sky2_port {
> u16 tx_ring_size;
> u16 tx_cons; /* next le to check */
> u16 tx_prod; /* next le to use */
> - u16 tx_next; /* debug only */
>
> u16 tx_pending;
> u16 tx_last_mss;
>
Not sure why, but with this version my system is running hot (literally
- higher MB temp & fan speed). This is happening at low throughput. CPU
utilization is low - no apparent change from prior versions. The only
indication of something amiss is seen using powertop. With older
versions, I never noticed (except under load) sky2 interrupt as the main
source of system activity. With this version, I see:
Top causes for wakeups:
22.9% (525.4) <kernel IPI> : Rescheduling interrupts
20.9% (480.0) <interrupt> : sky2@pci:0000:04:00.0
19.2% (439.2) <interrupt> : extra timer interrupt
18.9% (432.8) <interrupt> : sky2@pci:0000:06:00.0
10.4% (237.4) <kernel core> : hrtimer_start_range_ns (tick_sched_timer)
1.7% ( 38.8) <kernel core> : hrtimer_start (tick_sched_timer)
This is pretty consistent, btw regardless of what's going on. 4:00 is
the external (100M) interface.
Total network activity while this is going on is about 70KB/sec - mostly
internal.
I'm using msi interrupt (or think I am, anyway).
Also - I'm seeing what appears to be increased packet latency (not
surprising) and slightly decreased throughput.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-13 16:04 ` Michael Breuer
@ 2010-01-14 3:41 ` Stephen Hemminger
2010-01-14 10:14 ` Jarek Poplawski
2010-01-14 15:46 ` [PATCH] sky2: safer transmit ring cleaning (v4) Michael Breuer
0 siblings, 2 replies; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-14 3:41 UTC (permalink / raw)
To: Michael Breuer, David Miller; +Cc: jarkao2, mikem, flyboy, rjw, netdev
Subject: sky2: safer transmit cleanup
This code makes transmit path and transmit reset safer by:
* adding memory barrier before checking available ring slots
* reseting state of tx ring elements after free
* seperate cleanup function from ring done function
* removing mostly unused tx_next element
* ignoring transmit completion if device is offline
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
This patch is against the current net-next-2.6 tree.
This version handles the case of dual port shared transmit status
and other cases where it is possible for tx_done to be called when
device is being changed.
--- a/drivers/net/sky2.c 2010-01-13 08:32:51.360161158 -0800
+++ b/drivers/net/sky2.c 2010-01-13 08:35:37.685531490 -0800
@@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
/* Number of list elements available for next tx */
static inline int tx_avail(const struct sky2_port *sky2)
{
+ /* Makes sure update of tx_prod from start_xmit and
+ tx_cons from tx_done are seen. */
+ smp_mb();
return sky2->tx_pending - tx_inuse(sky2);
}
@@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
return count;
}
-static void sky2_tx_unmap(struct pci_dev *pdev,
- const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
{
if (re->flags & TX_MAP_SINGLE)
pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
pci_unmap_len(re, maplen),
PCI_DMA_TODEVICE);
+ re->flags = 0;
}
/*
@@ -1804,7 +1807,8 @@ mapping_error:
}
/*
- * Free ring elements from starting at tx_cons until "done"
+ * Transmit complete processing
+ * Free ring elements from starting at tx_cons until done index
*
* NB:
* 1. The hardware will tell us about partial completion of multi-part
@@ -1813,11 +1817,14 @@ mapping_error:
* looks at the tail of the queue of FIFO (tx_cons), not
* the head (tx_prod)
*/
-static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
+static void sky2_tx_done(struct net_device *dev, u16 done)
{
- struct net_device *dev = sky2->netdev;
+ struct sky2_port *sky2 = netdev_priv(dev);
unsigned idx;
+ if (!(netif_running(dev) & netif_device_present(dev)))
+ return;
+
BUG_ON(done >= sky2->tx_ring_size);
for (idx = sky2->tx_cons; idx != done;
@@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
sky2_tx_unmap(sky2->hw->pdev, re);
if (skb) {
+ re->skb = NULL;
+
if (unlikely(netif_msg_tx_done(sky2)))
printk(KERN_DEBUG "%s: tx done %u\n",
dev->name, idx);
@@ -1836,16 +1845,12 @@ static void sky2_tx_complete(struct sky2
dev->stats.tx_bytes += skb->len;
dev_kfree_skb_any(skb);
-
- sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
}
}
sky2->tx_cons = idx;
- smp_mb();
- /* Wake unless it's detached, and called e.g. from sky2_down() */
- if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev))
+ if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
netif_wake_queue(dev);
}
@@ -1871,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
}
+static void sky2_tx_clean(struct sky2_port *sky2)
+{
+ u16 idx;
+
+ for (idx = 0; idx < sky2->tx_ring_size; idx++) {
+ struct tx_ring_info *re = sky2->tx_ring + idx;
+
+ sky2_tx_unmap(sky2->hw->pdev, re);
+ if (re->skb) {
+ dev_kfree_skb_any(re->skb);
+ re->skb = NULL;
+ }
+ }
+}
+
/* Network shutdown */
static int sky2_down(struct net_device *dev)
{
@@ -1934,8 +1954,7 @@ static int sky2_down(struct net_device *
sky2_tx_reset(hw, port);
/* Free any pending frames stuck in HW queue */
- sky2_tx_complete(sky2, sky2->tx_prod);
-
+ sky2_tx_clean(sky2);
sky2_rx_clean(sky2);
sky2_free_buffers(sky2);
@@ -2412,15 +2431,6 @@ error:
goto resubmit;
}
-/* Transmit complete */
-static inline void sky2_tx_done(struct net_device *dev, u16 last)
-{
- struct sky2_port *sky2 = netdev_priv(dev);
-
- if (netif_running(dev))
- sky2_tx_complete(sky2, last);
-}
-
static inline void sky2_skb_rx(const struct sky2_port *sky2,
u32 status, struct sk_buff *skb)
{
@@ -3177,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h
static void sky2_detach(struct net_device *dev)
{
if (netif_running(dev)) {
- netif_tx_lock(dev);
+ netif_tx_lock_bh(dev);
netif_device_detach(dev); /* stop txq */
- netif_tx_unlock(dev);
+ netif_tx_unlock_bh(dev);
sky2_down(dev);
}
}
@@ -4202,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
/* Dump contents of tx ring */
sop = 1;
- for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
+ for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
idx = RING_NEXT(idx, sky2->tx_ring_size)) {
const struct sky2_tx_le *le = sky2->tx_le + idx;
u32 a = le32_to_cpu(le->addr);
--- a/drivers/net/sky2.h 2010-01-13 08:32:27.919849429 -0800
+++ b/drivers/net/sky2.h 2010-01-13 08:33:03.410162026 -0800
@@ -2187,7 +2187,6 @@ struct sky2_port {
u16 tx_ring_size;
u16 tx_cons; /* next le to check */
u16 tx_prod; /* next le to use */
- u16 tx_next; /* debug only */
u16 tx_pending;
u16 tx_last_mss;
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 3:41 ` [PATCH] sky2: safer transmit ring cleaning (v4) Stephen Hemminger
@ 2010-01-14 10:14 ` Jarek Poplawski
2010-01-14 11:16 ` Jarek Poplawski
2010-01-14 17:52 ` Stephen Hemminger
2010-01-14 15:46 ` [PATCH] sky2: safer transmit ring cleaning (v4) Michael Breuer
1 sibling, 2 replies; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-14 10:14 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Michael Breuer, David Miller, mikem, flyboy, rjw, netdev
On Wed, Jan 13, 2010 at 07:41:48PM -0800, Stephen Hemminger wrote:
> Subject: sky2: safer transmit cleanup
>
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
> * ignoring transmit completion if device is offline
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> ---
> This patch is against the current net-next-2.6 tree.
> This version handles the case of dual port shared transmit status
> and other cases where it is possible for tx_done to be called when
> device is being changed.
>
> --- a/drivers/net/sky2.c 2010-01-13 08:32:51.360161158 -0800
> +++ b/drivers/net/sky2.c 2010-01-13 08:35:37.685531490 -0800
> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + /* Makes sure update of tx_prod from start_xmit and
> + tx_cons from tx_done are seen. */
> + smp_mb();
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags & TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1804,7 +1807,8 @@ mapping_error:
> }
>
> /*
> - * Free ring elements from starting at tx_cons until "done"
> + * Transmit complete processing
> + * Free ring elements from starting at tx_cons until done index
> *
> * NB:
> * 1. The hardware will tell us about partial completion of multi-part
> @@ -1813,11 +1817,14 @@ mapping_error:
> * looks at the tail of the queue of FIFO (tx_cons), not
> * the head (tx_prod)
> */
> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
> +static void sky2_tx_done(struct net_device *dev, u16 done)
> {
> - struct net_device *dev = sky2->netdev;
> + struct sky2_port *sky2 = netdev_priv(dev);
> unsigned idx;
>
> + if (!(netif_running(dev) & netif_device_present(dev)))
This makes it safe, but it still resembles the "short term fix"
according do David's opinion.
This change seems to affect dev->stats too. Since they are not
updated in sky2_tx_clean(). Btw, I hope "&" is some optimization
because it's less readable than "&&".
Jarek P.
> + return;
> +
> BUG_ON(done >= sky2->tx_ring_size);
>
> for (idx = sky2->tx_cons; idx != done;
> @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
> sky2_tx_unmap(sky2->hw->pdev, re);
>
> if (skb) {
> + re->skb = NULL;
> +
> if (unlikely(netif_msg_tx_done(sky2)))
> printk(KERN_DEBUG "%s: tx done %u\n",
> dev->name, idx);
> @@ -1836,16 +1845,12 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_bytes += skb->len;
>
> dev_kfree_skb_any(skb);
> -
> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> }
> }
>
> sky2->tx_cons = idx;
> - smp_mb();
>
> - /* Wake unless it's detached, and called e.g. from sky2_down() */
> - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev))
> + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
> netif_wake_queue(dev);
> }
>
> @@ -1871,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
> }
>
> +static void sky2_tx_clean(struct sky2_port *sky2)
> +{
> + u16 idx;
> +
> + for (idx = 0; idx < sky2->tx_ring_size; idx++) {
> + struct tx_ring_info *re = sky2->tx_ring + idx;
> +
> + sky2_tx_unmap(sky2->hw->pdev, re);
> + if (re->skb) {
> + dev_kfree_skb_any(re->skb);
> + re->skb = NULL;
> + }
> + }
> +}
> +
> /* Network shutdown */
> static int sky2_down(struct net_device *dev)
> {
> @@ -1934,8 +1954,7 @@ static int sky2_down(struct net_device *
> sky2_tx_reset(hw, port);
>
> /* Free any pending frames stuck in HW queue */
> - sky2_tx_complete(sky2, sky2->tx_prod);
> -
> + sky2_tx_clean(sky2);
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> @@ -2412,15 +2431,6 @@ error:
> goto resubmit;
> }
>
> -/* Transmit complete */
> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
> -{
> - struct sky2_port *sky2 = netdev_priv(dev);
> -
> - if (netif_running(dev))
> - sky2_tx_complete(sky2, last);
> -}
> -
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> u32 status, struct sk_buff *skb)
> {
> @@ -3177,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h
> static void sky2_detach(struct net_device *dev)
> {
> if (netif_running(dev)) {
> - netif_tx_lock(dev);
> + netif_tx_lock_bh(dev);
> netif_device_detach(dev); /* stop txq */
> - netif_tx_unlock(dev);
> + netif_tx_unlock_bh(dev);
> sky2_down(dev);
> }
> }
> @@ -4202,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
>
> /* Dump contents of tx ring */
> sop = 1;
> - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
> + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size;
> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
> const struct sky2_tx_le *le = sky2->tx_le + idx;
> u32 a = le32_to_cpu(le->addr);
> --- a/drivers/net/sky2.h 2010-01-13 08:32:27.919849429 -0800
> +++ b/drivers/net/sky2.h 2010-01-13 08:33:03.410162026 -0800
> @@ -2187,7 +2187,6 @@ struct sky2_port {
> u16 tx_ring_size;
> u16 tx_cons; /* next le to check */
> u16 tx_prod; /* next le to use */
> - u16 tx_next; /* debug only */
>
> u16 tx_pending;
> u16 tx_last_mss;
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 10:14 ` Jarek Poplawski
@ 2010-01-14 11:16 ` Jarek Poplawski
2010-01-14 11:20 ` David Miller
2010-01-14 17:52 ` Stephen Hemminger
1 sibling, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-14 11:16 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Michael Breuer, David Miller, mikem, flyboy, rjw, netdev
On Thu, Jan 14, 2010 at 10:14:45AM +0000, Jarek Poplawski wrote:
> On Wed, Jan 13, 2010 at 07:41:48PM -0800, Stephen Hemminger wrote:
> > Subject: sky2: safer transmit cleanup
...
> > + if (!(netif_running(dev) & netif_device_present(dev)))
>
> This makes it safe, but it still resembles the "short term fix"
> according do David's opinion.
Hmm... Actually, it's not safe, but still safer again. It looks like
David was right (this time ;-). Since netif_device_present() isn't
protected by a lock here, this is still racy against napi, since the
flag can be set between the test and the wakeup. Btw, there is even no
barrier, and in your patch this dangerous distance is made much wider.
So, now I really ;-) agree with David: this needs a proper fix.
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 11:16 ` Jarek Poplawski
@ 2010-01-14 11:20 ` David Miller
2010-01-14 11:26 ` Jarek Poplawski
0 siblings, 1 reply; 60+ messages in thread
From: David Miller @ 2010-01-14 11:20 UTC (permalink / raw)
To: jarkao2; +Cc: shemminger, mbreuer, mikem, flyboy, rjw, netdev
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 14 Jan 2010 11:16:36 +0000
> So, now I really ;-) agree with David: this needs a proper fix.
Now Jarek, do you now see my dirty little secret?
When people disagree with me, I just silently sit around waiting for
them to eventually change their mind.
Isn't it brilliant? -)
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 11:20 ` David Miller
@ 2010-01-14 11:26 ` Jarek Poplawski
2010-01-14 13:19 ` Mike McCormack
0 siblings, 1 reply; 60+ messages in thread
From: Jarek Poplawski @ 2010-01-14 11:26 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, mbreuer, mikem, flyboy, rjw, netdev
On Thu, Jan 14, 2010 at 03:20:09AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 14 Jan 2010 11:16:36 +0000
>
> > So, now I really ;-) agree with David: this needs a proper fix.
>
> Now Jarek, do you now see my dirty little secret?
>
> When people disagree with me, I just silently sit around waiting for
> them to eventually change their mind.
>
> Isn't it brilliant? -)
If it were that easy... (it never works for me :-()
Probably, there is something more... ;-)
Jarek P.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 11:26 ` Jarek Poplawski
@ 2010-01-14 13:19 ` Mike McCormack
2010-01-14 15:43 ` Michael Breuer
` (2 more replies)
0 siblings, 3 replies; 60+ messages in thread
From: Mike McCormack @ 2010-01-14 13:19 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: David Miller, shemminger, mbreuer, flyboy, rjw, netdev
Jarek Poplawski wrote:
> On Thu, Jan 14, 2010 at 03:20:09AM -0800, David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Thu, 14 Jan 2010 11:16:36 +0000
>>
>>> So, now I really ;-) agree with David: this needs a proper fix.
>> Now Jarek, do you now see my dirty little secret?
>>
>> When people disagree with me, I just silently sit around waiting for
>> them to eventually change their mind.
>>
>> Isn't it brilliant? -)
>
> If it were that easy... (it never works for me :-()
>
> Probably, there is something more... ;-)
Here's what was sitting in my tree...
Subject: [PATCH] sky2: Don't detach device when restarting
Block the tx queue from transmitting when restarting
rather than trying to take the device offline.
Signed-off-by: Mike McCormack <mikem@ring3k.org>
---
drivers/net/sky2.c | 13 +++++++++----
1 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d76d907..061f6f2 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1580,6 +1580,8 @@ static int sky2_up(struct net_device *dev)
if (netif_msg_ifup(sky2))
printk(KERN_INFO PFX "%s: enabling interface\n", dev->name);
+ netif_start_queue(dev);
+
return 0;
err_out:
@@ -1596,6 +1598,8 @@ static inline int tx_inuse(const struct sky2_port *sky2)
/* Number of list elements available for next tx */
static inline int tx_avail(const struct sky2_port *sky2)
{
+ if (unlikely(!sky2->tx_ring))
+ return 0;
return sky2->tx_pending - tx_inuse(sky2);
}
@@ -1925,7 +1929,9 @@ static int sky2_down(struct net_device *dev)
sky2_read32(hw, B0_IMSK);
synchronize_irq(hw->pdev->irq);
- napi_synchronize(&hw->napi);
+ netif_tx_lock_bh(dev);
+ napi_disable(&hw->napi);
+ netif_stop_queue(dev);
spin_lock_bh(&sky2->phy_lock);
sky2_phy_power_down(hw, port);
@@ -1939,6 +1945,8 @@ static int sky2_down(struct net_device *dev)
sky2_rx_clean(sky2);
sky2_free_buffers(sky2);
+ napi_enable(&hw->napi);
+ netif_tx_unlock_bh(dev);
return 0;
}
@@ -3177,9 +3185,6 @@ static void sky2_reset(struct sky2_hw *hw)
static void sky2_detach(struct net_device *dev)
{
if (netif_running(dev)) {
- netif_tx_lock_bh(dev);
- netif_device_detach(dev); /* stop txq */
- netif_tx_unlock_bh(dev);
sky2_down(dev);
}
}
-- 1.5.6.5
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 13:19 ` Mike McCormack
@ 2010-01-14 15:43 ` Michael Breuer
2010-01-14 16:46 ` Michael Breuer
2010-01-14 17:51 ` Stephen Hemminger
2 siblings, 0 replies; 60+ messages in thread
From: Michael Breuer @ 2010-01-14 15:43 UTC (permalink / raw)
To: Mike McCormack
Cc: Jarek Poplawski, David Miller, shemminger, flyboy, rjw, netdev
On 1/14/2010 8:19 AM, Mike McCormack wrote:
> Here's what was sitting in my tree...
> ...
> err_out:
> @@ -1596,6 +1598,8 @@ static inline int tx_inuse(const struct sky2_port *sky2)
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + if (unlikely(!sky2->tx_ring))
> + return 0;
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
This hunk (patch) conflicts with the v4 patch Stephen sent out last
night. He added an smp_mb in front of the return. I'm going to give this
a go with the smb_mb before the unlikely test.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 3:41 ` [PATCH] sky2: safer transmit ring cleaning (v4) Stephen Hemminger
2010-01-14 10:14 ` Jarek Poplawski
@ 2010-01-14 15:46 ` Michael Breuer
1 sibling, 0 replies; 60+ messages in thread
From: Michael Breuer @ 2010-01-14 15:46 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev
On 1/13/2010 10:41 PM, Stephen Hemminger wrote:
> Subject: sky2: safer transmit cleanup
>
> This code makes transmit path and transmit reset safer by:
> * adding memory barrier before checking available ring slots
> * reseting state of tx ring elements after free
> * seperate cleanup function from ring done function
> * removing mostly unused tx_next element
> * ignoring transmit completion if device is offline
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>
> ---
> This patch is against the current net-next-2.6 tree.
> This version handles the case of dual port shared transmit status
> and other cases where it is possible for tx_done to be called when
> device is being changed.
>
> --- a/drivers/net/sky2.c 2010-01-13 08:32:51.360161158 -0800
> +++ b/drivers/net/sky2.c 2010-01-13 08:35:37.685531490 -0800
> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + /* Makes sure update of tx_prod from start_xmit and
> + tx_cons from tx_done are seen. */
> + smp_mb();
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags& TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1804,7 +1807,8 @@ mapping_error:
> }
>
> /*
> - * Free ring elements from starting at tx_cons until "done"
> + * Transmit complete processing
> + * Free ring elements from starting at tx_cons until done index
> *
> * NB:
> * 1. The hardware will tell us about partial completion of multi-part
> @@ -1813,11 +1817,14 @@ mapping_error:
> * looks at the tail of the queue of FIFO (tx_cons), not
> * the head (tx_prod)
> */
> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
> +static void sky2_tx_done(struct net_device *dev, u16 done)
> {
> - struct net_device *dev = sky2->netdev;
> + struct sky2_port *sky2 = netdev_priv(dev);
> unsigned idx;
>
> + if (!(netif_running(dev)& netif_device_present(dev)))
> + return;
> +
> BUG_ON(done>= sky2->tx_ring_size);
>
> for (idx = sky2->tx_cons; idx != done;
> @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2
> sky2_tx_unmap(sky2->hw->pdev, re);
>
> if (skb) {
> + re->skb = NULL;
> +
> if (unlikely(netif_msg_tx_done(sky2)))
> printk(KERN_DEBUG "%s: tx done %u\n",
> dev->name, idx);
> @@ -1836,16 +1845,12 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_bytes += skb->len;
>
> dev_kfree_skb_any(skb);
> -
> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> }
> }
>
> sky2->tx_cons = idx;
> - smp_mb();
>
> - /* Wake unless it's detached, and called e.g. from sky2_down() */
> - if (tx_avail(sky2)> MAX_SKB_TX_LE + 4&& netif_device_present(dev))
> + if (tx_avail(sky2)> MAX_SKB_TX_LE + 4)
> netif_wake_queue(dev);
> }
>
> @@ -1871,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw
> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
> }
>
> +static void sky2_tx_clean(struct sky2_port *sky2)
> +{
> + u16 idx;
> +
> + for (idx = 0; idx< sky2->tx_ring_size; idx++) {
> + struct tx_ring_info *re = sky2->tx_ring + idx;
> +
> + sky2_tx_unmap(sky2->hw->pdev, re);
> + if (re->skb) {
> + dev_kfree_skb_any(re->skb);
> + re->skb = NULL;
> + }
> + }
> +}
> +
> /* Network shutdown */
> static int sky2_down(struct net_device *dev)
> {
> @@ -1934,8 +1954,7 @@ static int sky2_down(struct net_device *
> sky2_tx_reset(hw, port);
>
> /* Free any pending frames stuck in HW queue */
> - sky2_tx_complete(sky2, sky2->tx_prod);
> -
> + sky2_tx_clean(sky2);
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> @@ -2412,15 +2431,6 @@ error:
> goto resubmit;
> }
>
> -/* Transmit complete */
> -static inline void sky2_tx_done(struct net_device *dev, u16 last)
> -{
> - struct sky2_port *sky2 = netdev_priv(dev);
> -
> - if (netif_running(dev))
> - sky2_tx_complete(sky2, last);
> -}
> -
> static inline void sky2_skb_rx(const struct sky2_port *sky2,
> u32 status, struct sk_buff *skb)
> {
> @@ -3177,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h
> static void sky2_detach(struct net_device *dev)
> {
> if (netif_running(dev)) {
> - netif_tx_lock(dev);
> + netif_tx_lock_bh(dev);
> netif_device_detach(dev); /* stop txq */
> - netif_tx_unlock(dev);
> + netif_tx_unlock_bh(dev);
> sky2_down(dev);
> }
> }
> @@ -4202,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi
>
> /* Dump contents of tx ring */
> sop = 1;
> - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size;
> idx = RING_NEXT(idx, sky2->tx_ring_size)) {
> const struct sky2_tx_le *le = sky2->tx_le + idx;
> u32 a = le32_to_cpu(le->addr);
> --- a/drivers/net/sky2.h 2010-01-13 08:32:27.919849429 -0800
> +++ b/drivers/net/sky2.h 2010-01-13 08:33:03.410162026 -0800
> @@ -2187,7 +2187,6 @@ struct sky2_port {
> u16 tx_ring_size;
> u16 tx_cons; /* next le to check */
> u16 tx_prod; /* next le to use */
> - u16 tx_next; /* debug only */
>
> u16 tx_pending;
> u16 tx_last_mss;
>
Tested this a bit (w/o Mike's patch from this morning). Overall, works
(vs. v3), but still something flaky going on WRT dhcp traffic:
Under load (and only under load) I'm seeing mutliple failed dhcp client
requests - 4-6 discover/offer sequences before I see request/ack. I
don't see errors, dropped packets, etc., at the time this is happening.
I'd blow it off to load, but I don't see this with the earlier
(pskb_may_pull) patch.
Going to rebuild with the inclusion of Mike's patch and see what happens.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 13:19 ` Mike McCormack
2010-01-14 15:43 ` Michael Breuer
@ 2010-01-14 16:46 ` Michael Breuer
2010-01-14 17:51 ` Stephen Hemminger
2 siblings, 0 replies; 60+ messages in thread
From: Michael Breuer @ 2010-01-14 16:46 UTC (permalink / raw)
To: Mike McCormack
Cc: Jarek Poplawski, David Miller, shemminger, flyboy, rjw, netdev
On 1/14/2010 8:19 AM, Mike McCormack wrote:
> Jarek Poplawski wrote:
>
>> On Thu, Jan 14, 2010 at 03:20:09AM -0800, David Miller wrote:
>>
>>> From: Jarek Poplawski<jarkao2@gmail.com>
>>> Date: Thu, 14 Jan 2010 11:16:36 +0000
>>>
>>>
>>>> So, now I really ;-) agree with David: this needs a proper fix.
>>>>
>>> Now Jarek, do you now see my dirty little secret?
>>>
>>> When people disagree with me, I just silently sit around waiting for
>>> them to eventually change their mind.
>>>
>>> Isn't it brilliant? -)
>>>
>> If it were that easy... (it never works for me :-()
>>
>> Probably, there is something more... ;-)
>>
> Here's what was sitting in my tree...
>
>
> Subject: [PATCH] sky2: Don't detach device when restarting
>
> Block the tx queue from transmitting when restarting
> rather than trying to take the device offline.
>
> Signed-off-by: Mike McCormack<mikem@ring3k.org>
> ---
> drivers/net/sky2.c | 13 +++++++++----
> 1 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index d76d907..061f6f2 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -1580,6 +1580,8 @@ static int sky2_up(struct net_device *dev)
> if (netif_msg_ifup(sky2))
> printk(KERN_INFO PFX "%s: enabling interface\n", dev->name);
>
> + netif_start_queue(dev);
> +
> return 0;
>
> err_out:
> @@ -1596,6 +1598,8 @@ static inline int tx_inuse(const struct sky2_port *sky2)
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + if (unlikely(!sky2->tx_ring))
> + return 0;
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
> @@ -1925,7 +1929,9 @@ static int sky2_down(struct net_device *dev)
> sky2_read32(hw, B0_IMSK);
>
> synchronize_irq(hw->pdev->irq);
> - napi_synchronize(&hw->napi);
> + netif_tx_lock_bh(dev);
> + napi_disable(&hw->napi);
> + netif_stop_queue(dev);
>
> spin_lock_bh(&sky2->phy_lock);
> sky2_phy_power_down(hw, port);
> @@ -1939,6 +1945,8 @@ static int sky2_down(struct net_device *dev)
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> + napi_enable(&hw->napi);
> + netif_tx_unlock_bh(dev);
>
> return 0;
> }
> @@ -3177,9 +3185,6 @@ static void sky2_reset(struct sky2_hw *hw)
> static void sky2_detach(struct net_device *dev)
> {
> if (netif_running(dev)) {
> - netif_tx_lock_bh(dev);
> - netif_device_detach(dev); /* stop txq */
> - netif_tx_unlock_bh(dev);
> sky2_down(dev);
> }
> }
> -- 1.5.6.5
>
Ok - no obvious difference with this patch + Stephen's.
I still see:
No reported errors; decent throughput; earlier issues with IRQ resolved.
But... still seeing DHCP DISCOVER/OFFER (no REQUEST/ACK) while under
load. I can try to sniff this... but given that it's under load, I'd
probably have to filter out non DHCP stuff and might miss whatever is
really going on. Again - main point here is that absent these patches I
don't see this issue. It's also possible that these two patches allow
the load to be heavier thus actually causing a different problem.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 13:19 ` Mike McCormack
2010-01-14 15:43 ` Michael Breuer
2010-01-14 16:46 ` Michael Breuer
@ 2010-01-14 17:51 ` Stephen Hemminger
2 siblings, 0 replies; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-14 17:51 UTC (permalink / raw)
To: Mike McCormack
Cc: Jarek Poplawski, David Miller, mbreuer, flyboy, rjw, netdev
On Thu, 14 Jan 2010 22:19:39 +0900
Mike McCormack <mikem@ring3k.org> wrote:
> /* Number of list elements available for next tx */
> static inline int tx_avail(const struct sky2_port *sky2)
> {
> + if (unlikely(!sky2->tx_ring))
> + return 0;
> return sky2->tx_pending - tx_inuse(sky2);
> }
>
Breaks in detach case
> @@ -1925,7 +1929,9 @@ static int sky2_down(struct net_device *dev)
> sky2_read32(hw, B0_IMSK);
>
> synchronize_irq(hw->pdev->irq);
> - napi_synchronize(&hw->napi);
> + netif_tx_lock_bh(dev);
> + napi_disable(&hw->napi);
> + netif_stop_queue(dev);
>
> spin_lock_bh(&sky2->phy_lock);
> sky2_phy_power_down(hw, port);
> @@ -1939,6 +1945,8 @@ static int sky2_down(struct net_device *dev)
> sky2_rx_clean(sky2);
>
> sky2_free_buffers(sky2);
> + napi_enable(&hw->napi);
> + netif_tx_unlock_bh(dev);
>
> return 0;
Breaks on dual ported boards
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 10:14 ` Jarek Poplawski
2010-01-14 11:16 ` Jarek Poplawski
@ 2010-01-14 17:52 ` Stephen Hemminger
2010-01-14 23:51 ` Michael Breuer
1 sibling, 1 reply; 60+ messages in thread
From: Stephen Hemminger @ 2010-01-14 17:52 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Michael Breuer, David Miller, mikem, flyboy, rjw, netdev
On Thu, 14 Jan 2010 10:14:45 +0000
Jarek Poplawski <jarkao2@gmail.com> wrote:
> This makes it safe, but it still resembles the "short term fix"
> according do David's opinion.
>
> This change seems to affect dev->stats too. Since they are not
> updated in sky2_tx_clean(). Btw, I hope "&" is some optimization
> because it's less readable than "&&".
Stats don't matter for packets flushed during device reset.
The & is because in the most common case device is up,
and we don't want the additional conditional branch.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4)
2010-01-14 17:52 ` Stephen Hemminger
@ 2010-01-14 23:51 ` Michael Breuer
2010-01-16 18:35 ` sky2 DHCPOFFER packet loss under load (Was Re: [PATCH] sky2: safer transmit ring cleaning (v4)) Michael Breuer
0 siblings, 1 reply; 60+ messages in thread
From: Michael Breuer @ 2010-01-14 23:51 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Jarek Poplawski, David Miller, mikem, flyboy, rjw, netdev
On 1/14/2010 12:52 PM, Stephen Hemminger wrote:
> On Thu, 14 Jan 2010 10:14:45 +0000
> Jarek Poplawski<jarkao2@gmail.com> wrote:
>
>
>> This makes it safe, but it still resembles the "short term fix"
>> according do David's opinion.
>>
>> This change seems to affect dev->stats too. Since they are not
>> updated in sky2_tx_clean(). Btw, I hope "&" is some optimization
>> because it's less readable than "&&".
>>
> Stats don't matter for packets flushed during device reset.
>
> The& is because in the most common case device is up,
> and we don't want the additional conditional branch.
>
I've been looking at what might explain the dhcp stuff - as well as the
dropped packets only when there's an extra hop. I came across one path
that seems suspect - although I'm really not familiar with the network
stack code... that said, I'm wondering about neigh_compat_output (and
eth_rebuild_header and arp_find). If I'm following things correctly (or
perhaps mostly correctly), the only time anything goes this route (pun
intentional) is when the packet was routed to this box. I'm guessing
that bridging makes this more likely. So my dhcp stuff would all be
going through here, as would the smb stuff that seemed flaky. The race
I'm seeing (maybe) is that when the arp table is being rebuilt, there's
a possibility that arp_find frees the skb. There's some other locking
and stuff going on that seems maybe races with sky2.c in places on both
the rx and tx path. I *think* it's right from looking at it, but test
results suggest otherwise. Aside from the potential race, I think
there's also a corner case where neigh_compat_output can return either
with or without freeing the skb depending on the return from
dev_hard_header... this may also be part of the race.
Maybe I've missed something... but as far as I can see, this is just
about the only difference in code path taken between stuff that is
working and stuff that is occasionally not.
^ permalink raw reply [flat|nested] 60+ messages in thread
* sky2 DHCPOFFER packet loss under load (Was Re: [PATCH] sky2: safer transmit ring cleaning (v4))
2010-01-14 23:51 ` Michael Breuer
@ 2010-01-16 18:35 ` Michael Breuer
0 siblings, 0 replies; 60+ messages in thread
From: Michael Breuer @ 2010-01-16 18:35 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Jarek Poplawski, David Miller, mikem, flyboy, rjw, netdev
On 1/14/2010 6:51 PM, Michael Breuer wrote:
> On 1/14/2010 12:52 PM, Stephen Hemminger wrote:
>> On Thu, 14 Jan 2010 10:14:45 +0000
>> Jarek Poplawski<jarkao2@gmail.com> wrote:
>>
>>> This makes it safe, but it still resembles the "short term fix"
>>> according do David's opinion.
>>>
>>> This change seems to affect dev->stats too. Since they are not
>>> updated in sky2_tx_clean(). Btw, I hope "&" is some optimization
>>> because it's less readable than "&&".
>> Stats don't matter for packets flushed during device reset.
>>
>> The& is because in the most common case device is up,
>> and we don't want the additional conditional branch.
> I've been looking at what might explain the dhcp stuff - as well as
> the dropped packets only when there's an extra hop. I came across one
> path that seems suspect - although I'm really not familiar with the
> network stack code... that said, I'm wondering about
> neigh_compat_output (and eth_rebuild_header and arp_find). If I'm
> following things correctly (or perhaps mostly correctly), the only
> time anything goes this route (pun intentional) is when the packet was
> routed to this box. I'm guessing that bridging makes this more likely.
> So my dhcp stuff would all be going through here, as would the smb
> stuff that seemed flaky. The race I'm seeing (maybe) is that when the
> arp table is being rebuilt, there's a possibility that arp_find frees
> the skb. There's some other locking and stuff going on that seems
> maybe races with sky2.c in places on both the rx and tx path. I
> *think* it's right from looking at it, but test results suggest
> otherwise. Aside from the potential race, I think there's also a
> corner case where neigh_compat_output can return either with or
> without freeing the skb depending on the return from
> dev_hard_header... this may also be part of the race.
>
> Maybe I've missed something... but as far as I can see, this is just
> about the only difference in code path taken between stuff that is
> working and stuff that is occasionally not.
Ok - brief update. I've confirmed that under load, outgoing DHCPOFFER
packets are being silently dropped. I don't know yet where.
Test scenario, what I do know, etc.:
Scenario:
Two systems; one Gb switch; one wifi router; one Blackberry client.
System A: Linux host; Asus P6T Deluxe V2/Sky2. eth1-> internet eth0->
internal (10.0.0.1/24).
Switch: HP Procurve unmanaged - one port connected to System A; another
to the wifi router; another to System B.
System B: Win7; Asus M2N Deluxe SLI/ Nforce 5 (10.0.0.11)
Router: Wrt54g-tm (dd-wrt) Connected to switch & various wifi clients
including a Blackberry. WEP enabled. (10.0.0.60)
Blackberry Curve 8320 (wifi enabled). (10.0.0.56 via dhcp lease)
Test that causes packet loss:
1. Turn off BB wifi.
2. Start copy of large files (4GB) from System B to an CIFS share on
System A.
3. Start nethogs on system A.
4. Start tcpdump on the wifi router (interface br0 - wired 10.0.0.60
connection)
5. Start wireshark on System A
6. tail system A /var/log/messages - watching for DHCP activity
7. When smb traffic load (incoming) exceeds 40,000KBPS (nethogs) -
enable wifi on the blackberry.
8. Stop test after multiple DHCPDISCOVER/OFFER observed without REQUEST/ACK.
Results:
1. It seems that the problem occurs intermittently below 40,000KBPS, and
consistently over that number as reported by nethogs. Lots of
fluctuation, so figure that the 40k is approximate.
2. wireshark (system A) shows DHCPOFFER traffic outgoing.
3. tcpdump (wifi - wired incoming interface) does NOT show DHCPOFFER
traffic when this problem occurs.
4. Both traces show arp activity during the DISCOVER/OFFER sequence.
5. There is no evidence of tx errors or packet drops, in any statistics
I can find.
Thoughts:
I still think there's a race happening between the arp neighbor update
and sky2. Might be higher up, but as I'm seeing the outgoing packets
when sniffing eth0, can't be too much higher up. This problem seems to
be exacerbated by the more recent patches, however I believe that this
is a result of the higher throughput achievable with these patches. With
the older set, I saw this problem less frequently, but found it much
harder to get over the 40K RX number.
I am also seeing (as previously reported - but haven't sniffed yet) SMB
ACK dropped packets, but only when traversing the wifi router. Not sure
if this is related, but hey, you never know.
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH net-2.6] cdc_ether: Partially revert "usbnet: Set link down initially ..."
[not found] ` <omZ8aBZReyD.A.rz.COlSLB@chimera>
@ 2010-01-28 23:18 ` Ben Hutchings
2010-01-29 5:37 ` David Miller
0 siblings, 1 reply; 60+ messages in thread
From: Ben Hutchings @ 2010-01-28 23:18 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, avi.rozen
Commit 37e8273cd30592d3a82bcb70cbb1bdc4eaeb6b71 ("usbnet: Set link down
initially for drivers that update link state") changed the initial link
state in cdc_ether and other drivers based on the understanding that the
devices they support generate link change interrupts. However, this is
optional in the CDC Ethernet protocol, and two users have reported in
<http://bugzilla.kernel.org/show_bug.cgi?id=14791> that the link state
for their devices remains down. Therefore, revert the change in
cdc_ether.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Avi Rozen <avi.rozen@gmail.com>
---
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -419,7 +419,7 @@
static const struct driver_info cdc_info = {
.description = "CDC Ethernet Device",
- .flags = FLAG_ETHER | FLAG_LINK_INTR,
+ .flags = FLAG_ETHER,
// .check_connect = cdc_check_connect,
.bind = cdc_bind,
.unbind = usbnet_cdc_unbind,
--
Ben Hutchings
The generation of random numbers is too important to be left to chance.
- Robert Coveyou
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH net-2.6] cdc_ether: Partially revert "usbnet: Set link down initially ..."
2010-01-28 23:18 ` [PATCH net-2.6] cdc_ether: Partially revert "usbnet: Set link down initially ..." Ben Hutchings
@ 2010-01-29 5:37 ` David Miller
0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2010-01-29 5:37 UTC (permalink / raw)
To: ben; +Cc: netdev, avi.rozen
From: Ben Hutchings <ben@decadent.org.uk>
Date: Thu, 28 Jan 2010 23:18:44 +0000
> Commit 37e8273cd30592d3a82bcb70cbb1bdc4eaeb6b71 ("usbnet: Set link down
> initially for drivers that update link state") changed the initial link
> state in cdc_ether and other drivers based on the understanding that the
> devices they support generate link change interrupts. However, this is
> optional in the CDC Ethernet protocol, and two users have reported in
> <http://bugzilla.kernel.org/show_bug.cgi?id=14791> that the link state
> for their devices remains down. Therefore, revert the change in
> cdc_ether.
>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> Tested-by: Avi Rozen <avi.rozen@gmail.com>
Applied, thanks a lot Ben.
^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2010-01-29 5:37 UTC | newest]
Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-10 22:27 2.6.33-rc3-git3: Reported regressions from 2.6.32 Rafael J. Wysocki
2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki
2010-01-11 0:36 ` Berck E. Nash
2010-01-11 13:26 ` Jarek Poplawski
2010-01-11 19:32 ` Rafael J. Wysocki
2010-01-11 20:31 ` Jarek Poplawski
2010-01-11 20:50 ` Rafael J. Wysocki
2010-01-11 21:02 ` Berck E. Nash
2010-01-11 21:47 ` Jarek Poplawski
2010-01-11 14:03 ` Mike McCormack
2010-01-11 16:45 ` Stephen Hemminger
2010-01-11 22:07 ` Jarek Poplawski
2010-01-12 0:14 ` David Miller
2010-01-12 7:50 ` Jarek Poplawski
2010-01-12 8:08 ` David Miller
2010-01-12 8:56 ` Jarek Poplawski
2010-01-12 9:42 ` David Miller
2010-01-12 10:31 ` Jarek Poplawski
2010-01-12 10:56 ` David Miller
2010-01-12 11:04 ` Jarek Poplawski
2010-01-12 15:39 ` Stephen Hemminger
2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger
2010-01-12 16:32 ` Michael Breuer
2010-01-12 17:02 ` Stephen Hemminger
2010-01-12 18:04 ` Jarek Poplawski
2010-01-12 18:13 ` Stephen Hemminger
2010-01-12 18:24 ` Jarek Poplawski
2010-01-12 18:49 ` [PATCH] sky2: safer transmit ring cleaning (v2) Stephen Hemminger
2010-01-12 19:16 ` Jarek Poplawski
2010-01-12 19:23 ` Stephen Hemminger
2010-01-12 19:50 ` Jarek Poplawski
2010-01-13 1:23 ` Stephen Hemminger
2010-01-12 19:34 ` Michael Breuer
2010-01-12 18:35 ` [PATCH] sky2: safer transmit ring cleaning Michael Breuer
2010-01-12 18:42 ` Michael Breuer
2010-01-12 20:31 ` Michael Breuer
2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger
2010-01-13 4:31 ` Michael Breuer
2010-01-13 7:35 ` Jarek Poplawski
2010-01-13 16:04 ` Michael Breuer
2010-01-14 3:41 ` [PATCH] sky2: safer transmit ring cleaning (v4) Stephen Hemminger
2010-01-14 10:14 ` Jarek Poplawski
2010-01-14 11:16 ` Jarek Poplawski
2010-01-14 11:20 ` David Miller
2010-01-14 11:26 ` Jarek Poplawski
2010-01-14 13:19 ` Mike McCormack
2010-01-14 15:43 ` Michael Breuer
2010-01-14 16:46 ` Michael Breuer
2010-01-14 17:51 ` Stephen Hemminger
2010-01-14 17:52 ` Stephen Hemminger
2010-01-14 23:51 ` Michael Breuer
2010-01-16 18:35 ` sky2 DHCPOFFER packet loss under load (Was Re: [PATCH] sky2: safer transmit ring cleaning (v4)) Michael Breuer
2010-01-14 15:46 ` [PATCH] sky2: safer transmit ring cleaning (v4) Michael Breuer
2010-01-11 22:31 ` [Bug #14925] sky2 panic under load Jarek Poplawski
2010-01-11 16:00 ` 2.6.33-rc3-git3: Reported regressions from 2.6.32 Luis R. Rodriguez
2010-01-11 21:47 ` Nick Bowler
2010-01-11 22:10 ` Rafael J. Wysocki
[not found] ` <omZ8aBZReyD.A.rz.COlSLB@chimera>
2010-01-28 23:18 ` [PATCH net-2.6] cdc_ether: Partially revert "usbnet: Set link down initially ..." Ben Hutchings
2010-01-29 5:37 ` David Miller
-- strict thread matches above, loose matches on Subject: below --
2009-12-29 15:05 2.6.33-rc2: Reported regressions from 2.6.32 Rafael J. Wysocki
2009-12-29 15:10 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).