* 2.6.33-rc3-git3: Reported regressions from 2.6.32 @ 2010-01-10 22:27 Rafael J. Wysocki 2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki ` (3 more replies) 0 siblings, 4 replies; 59+ messages in thread From: Rafael J. Wysocki @ 2010-01-10 22:27 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich, Kernel Testers List, Network Development, Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List, DRI This message contains a list of some regressions from 2.6.32, for which there are no fixes in the mainline I know of. If any of them have been fixed already, please let me know. If you know of any other unresolved regressions from 2.6.32, please let me know either and I'll add them to the list. Also, please let me know if any of the entries below are invalid. Each entry from the list will be sent additionally in an automatic reply to this message with CCs to the people involved in reporting and handling the issue. Listed regressions statistics: Date Total Pending Unresolved ---------------------------------------- 2010-01-10 55 33 21 2009-12-29 36 34 27 Unresolved regressions ---------------------- Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15043 Subject : Display goes off with i915.powersave=1 Submitter : Soeren Sonnenburg <sonne@debian.org> Date : 2010-01-10 20:09 (1 days old) References : http://marc.info/?l=linux-kernel&m=126315457519505&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15041 Subject : Pagemap endless read loop with LTP Submitter : Andi Kleen <andi@firstfloor.org> Date : 2010-01-10 2:09 (1 days old) References : http://marc.info/?l=linux-kernel&m=126308941423848&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15038 Subject : drm/ksm: fbdev blanking regression Submitter : Johan Hovold <jhovold@gmail.com> Date : 2010-01-06 17:00 (5 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=731b5a15a3b1474a41c2ca29b4c32b0f21bc852e References : http://marc.info/?l=linux-kernel&m=126279726418748&w=4 Handled-By : James Simmons <jsimmons@infradead.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15036 Subject : soft lockup in dmesg after suspend/resume Submitter : ykzhao <yakui.zhao@intel.com> Date : 2010-01-04 5:36 (7 days old) References : http://marc.info/?l=linux-kernel&m=126258356202722&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15035 Subject : BUG: unable to handle kernel paging request in rs600_gart_set_page() Submitter : Xiaotian Feng <dfeng@redhat.com> Date : 2010-01-05 19:10 (6 days old) References : http://lkml.org/lkml/2010/1/5/95 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15033 Subject : drm: gem_object_free without struct_mutex Submitter : Hugh Dickins <hugh.dickins@tiscali.co.uk> Date : 2009-12-30 19:45 (12 days old) References : http://marc.info/?l=linux-kernel&m=126220236201529&w=4 Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15032 Subject : Oops in uart_resume_port() on resume Submitter : Zdenek Kabelac <zdenek.kabelac@gmail.com> Date : 2010-01-04 15:47 (7 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba15ab0e8de0d4439a91342ad52d55ca9e313f3d References : http://marc.info/?l=linux-kernel&m=126262008815689&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14976 Subject : No sound on snd_hda_codec_via in 2.6.33-rc2 Submitter : Mark Rosenstand <rosenstand@gmail.com> Date : 2010-01-02 08:27 (9 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14954 Subject : warning from alloc_pages_nodemask on boot -- caused by commit 78f1699659963fff97975df44db6d5dbe7218e55 Submitter : Stephen Hemminger <shemminger@linux-foundation.org> Date : 2009-12-29 19:57 (13 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=78f1699659963fff97975df44db6d5dbe7218e55 Handled-By : Alex Chiang <achiang@hp.com> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14950 Subject : tbench regression with 2.6.33-rc1 Submitter : Lin Ming <ming.m.lin@intel.com> Date : 2009-12-25 11:11 (17 days old) References : http://marc.info/?l=linux-kernel&m=126174044213172&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14949 Subject : drm_vm.c:drm_mmap: possible circular locking dependency detected Submitter : Borislav Petkov <petkovbb@googlemail.com> Date : 2009-12-26 9:45 (16 days old) References : http://marc.info/?l=linux-kernel&m=126182073616279&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14948 Subject : EHCI resume sysfs duplicates Submitter : Borislav Petkov <petkovbb@googlemail.com> Date : 2009-12-26 9:36 (16 days old) References : http://marc.info/?l=linux-kernel&m=126182020615709&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14946 Subject : All kernels after 2.6.32-git10 show only 1 CPU Submitter : Sid Boyce <sboyce@blueyonder.co.uk> Date : 2009-12-23 16:55 (19 days old) References : http://marc.info/?l=linux-kernel&m=126158734326801&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14937 Subject : WARNING: at kernel/lockdep.c:2830 Submitter : Grant Wilson <grant.wilson@zen.co.uk> Date : 2009-12-27 13:35 (15 days old) References : http://marc.info/?l=linux-kernel&m=126192220404829&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14934 Subject : kernel crash during boot Submitter : werner <w.landgraf@ru.ru> Date : 2009-12-25 5:37 (17 days old) References : http://marc.info/?l=linux-kernel&m=126171951030608&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925 Subject : sky2 panic under load Submitter : Berck E. Nash <flyboy@gmail.com> Date : 2009-12-21 23:52 (21 days old) References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4 http://marc.info/?l=linux-kernel&m=126160893126548&w=4 Handled-By : Stephen Hemminger <shemminger@vyatta.com> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14924 Subject : Weird hard hangs when rendering 'some' web-sites in Firefox Submitter : David <david@unsolicited.net> Date : 2009-12-21 21:53 (21 days old) References : http://marc.info/?l=linux-kernel&m=126143375823340&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14860 Subject : No DP-DVI output when laptop is docked Submitter : Pär Andersson <paran@lysator.liu.se> Date : 2009-12-21 21:02 (21 days old) Handled-By : ykzhao <yakui.zhao@intel.com> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14859 Subject : System timer firing too much without cause Submitter : Shawn Starr <shawn.starr@rogers.com> Date : 2009-12-21 19:16 (21 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14792 Subject : Misdetection of the TV output Submitter : Santi <santi@agolina.net> Date : 2009-12-12 13:28 (30 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14791 Subject : Something has been broken in the network stack this week Submitter : Delete This Account <speedyboyinovator@hotmail.com> Date : 2009-12-12 13:06 (30 days old) Regressions with patches ------------------------ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15039 Subject : leds_alix2: can't allocate I/O for GPIO Submitter : Arnd Hannemann <hannemann@nets.rwth-aachen.de> Date : 2010-01-07 10:26 (4 days old) References : http://marc.info/?l=linux-kernel&m=126286001106257&w=4 Handled-By : Daniel Mack <daniel@caiaq.de> Patch : http://patchwork.kernel.org/patch/72006/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15037 Subject : BUG during shutdown - bisected to commit e2912009 Submitter : Marc Dionne <marc.c.dionne@gmail.com> Date : 2010-01-02 0:27 (9 days old) References : http://marc.info/?l=linux-kernel&m=126239207631034&w=4 Handled-By : Xiaotian Feng <dfeng@redhat.com> Patch : http://patchwork.kernel.org/patch/71537/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15034 Subject : volano ~30% regression Submitter : Lin Ming <ming.m.lin@intel.com> Date : 2010-01-04 8:15 (7 days old) References : http://marc.info/?l=linux-kernel&m=126259391411982&w=4 Handled-By : Mike Galbraith <efault@gmx.de> Patch : http://patchwork.kernel.org/patch/70623/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15031 Subject : bug in fs/btrfs/ordered-data.c:672 Submitter : Carlos R. Mafra <crmafra2@gmail.com> Date : 2010-01-02 11:32 (9 days old) References : http://lkml.org/lkml/2010/1/2/17 Handled-By : Yan, Zheng <yanzheng@21cn.com> Patch : http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg03686.html Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15026 Subject : i915: Resume regression on MSI Wind U100 w/o KMS Submitter : Rafael J. Wysocki <rjw@sisk.pl> Date : 2010-01-08 23:45 (3 days old) References : http://marc.info/?l=linux-kernel&m=126299433427673&w=4 Handled-By : Rafael J. Wysocki <rjw@sisk.pl> Patch : http://patchwork.kernel.org/patch/71887/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14974 Subject : tg3 does not resume from hibernation properly on BCM5787M Submitter : Chow Loong Jin <hyperair@ubuntu.com> Date : 2010-01-02 05:30 (9 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=87668d352aa8d135bd695a050f18bbfc7b50b506 Handled-By : Matt Carlson <mcarlson@broadcom.com> Patch : http://bugzilla.kernel.org/show_bug.cgi?id=14974#c12 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14957 Subject : Blank screen with KMS enabled Submitter : Philipp Kohlbecher <xt28@gmx.de> Date : 2009-12-30 00:08 (12 days old) Handled-By : Zhao Yakui <yakui.zhao@intel.com> Patch : http://bugzilla.kernel.org/attachment.cgi?id=24476 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14944 Subject : 2.6.32.2 SATA link detect failed, 2.6.32.1 works fine Submitter : fengxiangjun <fengxiangjun@neusoft.com> Date : 2009-12-21 11:12 (21 days old) References : http://marc.info/?l=linux-kernel&m=126139442328314&w=4 Handled-By : Tejun Heo <tj@kernel.org> Patch : http://patchwork.kernel.org/patch/69746/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14942 Subject : gkrellm no longer shows all the temperatures on thinkpad x60 Submitter : Pavel Machek <pavel@ucw.cz> Date : 2009-12-27 21:57 (15 days old) References : http://marc.info/?l=linux-kernel&m=126195107005565&w=4 Handled-By : Henrique de Moraes Holschuh <hmh@hmh.eng.br> Patch : http://patchwork.kernel.org/patch/69809/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14907 Subject : Commit "kbuild: fix bzImage build for x86" breaks build Submitter : Oliver Hartkopp <oliver@hartkopp.net> Date : 2009-12-19 19:21 (23 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4a2ff67c88211026afcbdbc190c13f705dae1b59 References : http://marc.info/?l=linux-kernel&m=126125050622195&w=4 http://marc.info/?l=linux-kernel&m=126114565103300&w=4 http://marc.info/?l=linux-kbuild&m=126202819526623&w=2 Handled-By : Jonathan Nieder <jrnieder@gmail.com> Patch : http://thread.gmane.org/gmane.linux.kbuild.devel/4313/focus=4319 http://patchwork.kernel.org/patch/71957/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14899 Subject : reiserfs: inconsistent lock state Submitter : Alexander Beregalov <a.beregalov@gmail.com> Date : 2009-12-11 22:06 (31 days old) References : http://marc.info/?l=linux-kernel&m=126056920702515&w=4 Handled-By : Frederic Weisbecker <fweisbec@gmail.com> Patch : http://patchwork.kernel.org/patch/67256/ http://patchwork.kernel.org/patch/67257/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14874 Subject : Ath5k regression with commit 8bf3d79bc401ca417ccf9fc076d3295d1a71dbf5 Submitter : Joshua Covington <joshuacov@googlemail.com> Date : 2009-12-25 00:49 (17 days old) Handled-By : Luis R. Rodriguez <mcgrof@gmail.com> Patch : http://bugzilla.kernel.org/attachment.cgi?id=24335 For details, please visit the bug entries and follow the links given in references. As you can see, there is a Bugzilla entry for each of the listed regressions. There also is a Bugzilla entry used for tracking the regressions from 2.6.32, unresolved as well as resolved, at: http://bugzilla.kernel.org/show_bug.cgi?id=14885 Please let me know if there are any Bugzilla entries that should be added to the list in there. Thanks, Rafael ^ permalink raw reply [flat|nested] 59+ messages in thread
* [Bug #14925] sky2 panic under load 2010-01-10 22:27 2.6.33-rc3-git3: Reported regressions from 2.6.32 Rafael J. Wysocki @ 2010-01-10 22:32 ` Rafael J. Wysocki 2010-01-11 0:36 ` Berck E. Nash 2010-01-11 16:00 ` 2.6.33-rc3-git3: Reported regressions from 2.6.32 Luis R. Rodriguez ` (2 subsequent siblings) 3 siblings, 1 reply; 59+ messages in thread From: Rafael J. Wysocki @ 2010-01-10 22:32 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Berck E. Nash, Berck Nash, Daniel Hazelton, Michael Breuer, "netdev, Stephen Hemminger This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925 Subject : sky2 panic under load Submitter : Berck E. Nash <flyboy@gmail.com> Date : 2009-12-21 23:52 (21 days old) References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4 http://marc.info/?l=linux-kernel&m=126160893126548&w=4 Handled-By : Stephen Hemminger <shemminger@vyatta.com> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki @ 2010-01-11 0:36 ` Berck E. Nash 2010-01-11 13:26 ` Jarek Poplawski 2010-01-11 14:03 ` Mike McCormack 0 siblings, 2 replies; 59+ messages in thread From: Berck E. Nash @ 2010-01-11 0:36 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: netdev, Jarek Poplawski Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of recent regressions. > > The following bug entry is on the current list of known regressions > from 2.6.32. Please verify if it still should be listed and let me know > (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925 > Subject : sky2 panic under load > Submitter : Berck E. Nash <flyboy@gmail.com> > Date : 2009-12-21 23:52 (21 days old) > References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4 > http://marc.info/?l=linux-kernel&m=126160893126548&w=4 > Handled-By : Stephen Hemminger <shemminger@vyatta.com> The patch attached to the bug report did not fix the problem, but I'm fairly certain that this one from Jarek P. did: During TX timeout procedure dev could be awoken too early, e.g. by sky2_complete_tx() called from sky2_down(). Then sky2_xmit_frame() can run while buffers are freed causing an oops. This patch fixes it by adding netif_device_present() test in sky2_tx_complete(). Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14925 With debugging by: Mike McCormack <mikem@ring3k.org> Reported-by: Berck E. Nash <flyboy@gmail.com> Tested-by: Berck E. Nash <flyboy@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> --- drivers/net/sky2.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index 1c01b96..7650f73 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -1844,7 +1844,8 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done) sky2->tx_cons = idx; smp_mb(); - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) + /* Wake unless it's detached, and called e.g. from sky2_down() */ + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev)) netif_wake_queue(dev); } ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 0:36 ` Berck E. Nash @ 2010-01-11 13:26 ` Jarek Poplawski 2010-01-11 19:32 ` Rafael J. Wysocki 2010-01-11 14:03 ` Mike McCormack 1 sibling, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-11 13:26 UTC (permalink / raw) To: Berck E. Nash; +Cc: Rafael J. Wysocki, netdev On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: > Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.32. Please verify if it still should be listed and let me know > > (either way). BTW, I don't know why Berck didn't experience such a panic before 2.6.32, but seems not a regression to me. There might be new/more sky2 TX timeouts which trigger this panic and would make a real regression. Cheers, Jarek P. > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925 > > Subject : sky2 panic under load > > Submitter : Berck E. Nash <flyboy@gmail.com> > > Date : 2009-12-21 23:52 (21 days old) > > References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4 > > http://marc.info/?l=linux-kernel&m=126160893126548&w=4 > > Handled-By : Stephen Hemminger <shemminger@vyatta.com> > > The patch attached to the bug report did not fix the problem, but I'm > fairly certain that this one from Jarek P. did: > > During TX timeout procedure dev could be awoken too early, e.g. by > sky2_complete_tx() called from sky2_down(). Then sky2_xmit_frame() > can run while buffers are freed causing an oops. This patch fixes it > by adding netif_device_present() test in sky2_tx_complete(). > > Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14925 > > With debugging by: Mike McCormack <mikem@ring3k.org> > > Reported-by: Berck E. Nash <flyboy@gmail.com> > Tested-by: Berck E. Nash <flyboy@gmail.com> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > --- > > drivers/net/sky2.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c > index 1c01b96..7650f73 100644 > --- a/drivers/net/sky2.c > +++ b/drivers/net/sky2.c > @@ -1844,7 +1844,8 @@ static void sky2_tx_complete(struct sky2_port > *sky2, u16 done) > sky2->tx_cons = idx; > smp_mb(); > > - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > + /* Wake unless it's detached, and called e.g. from sky2_down() */ > + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev)) > netif_wake_queue(dev); > } > ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 13:26 ` Jarek Poplawski @ 2010-01-11 19:32 ` Rafael J. Wysocki 2010-01-11 20:31 ` Jarek Poplawski 0 siblings, 1 reply; 59+ messages in thread From: Rafael J. Wysocki @ 2010-01-11 19:32 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Berck E. Nash, netdev On Monday 11 January 2010, Jarek Poplawski wrote: > On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: > > Rafael J. Wysocki wrote: > > > This message has been generated automatically as a part of a report > > > of recent regressions. > > > > > > The following bug entry is on the current list of known regressions > > > from 2.6.32. Please verify if it still should be listed and let me know > > > (either way). > > BTW, I don't know why Berck didn't experience such a panic before > 2.6.32, but seems not a regression to me. There might be new/more sky2 > TX timeouts which trigger this panic and would make a real regression. Even if the code has always been broken, but it's only become visible after 2.6.32, that still counts as a regression IMO, because now the users are affected who weren't before. Rafael ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 19:32 ` Rafael J. Wysocki @ 2010-01-11 20:31 ` Jarek Poplawski 2010-01-11 20:50 ` Rafael J. Wysocki 2010-01-11 21:02 ` Berck E. Nash 0 siblings, 2 replies; 59+ messages in thread From: Jarek Poplawski @ 2010-01-11 20:31 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Berck E. Nash, netdev On Mon, Jan 11, 2010 at 08:32:24PM +0100, Rafael J. Wysocki wrote: > On Monday 11 January 2010, Jarek Poplawski wrote: > > On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: > > > Rafael J. Wysocki wrote: > > > > This message has been generated automatically as a part of a report > > > > of recent regressions. > > > > > > > > The following bug entry is on the current list of known regressions > > > > from 2.6.32. Please verify if it still should be listed and let me know > > > > (either way). > > > > BTW, I don't know why Berck didn't experience such a panic before > > 2.6.32, but seems not a regression to me. There might be new/more sky2 > > TX timeouts which trigger this panic and would make a real regression. > > Even if the code has always been broken, but it's only become visible after > 2.6.32, that still counts as a regression IMO, because now the users are > affected who weren't before. Right, but: 1) someone with a similar but older problem might be mislead a fix is not for them; 2) someone with exactly this one problem (i.e. Berck ;-) might be mislead "no oops" is enough, while their linux might be still worse than before. (So I intended Berck to re-consider or even re-check this problem wrt. 2.6.31, and maybe even reporting another regression.) Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 20:31 ` Jarek Poplawski @ 2010-01-11 20:50 ` Rafael J. Wysocki 2010-01-11 21:02 ` Berck E. Nash 1 sibling, 0 replies; 59+ messages in thread From: Rafael J. Wysocki @ 2010-01-11 20:50 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Berck E. Nash, netdev On Monday 11 January 2010, Jarek Poplawski wrote: > On Mon, Jan 11, 2010 at 08:32:24PM +0100, Rafael J. Wysocki wrote: > > On Monday 11 January 2010, Jarek Poplawski wrote: > > > On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: > > > > Rafael J. Wysocki wrote: > > > > > This message has been generated automatically as a part of a report > > > > > of recent regressions. > > > > > > > > > > The following bug entry is on the current list of known regressions > > > > > from 2.6.32. Please verify if it still should be listed and let me know > > > > > (either way). > > > > > > BTW, I don't know why Berck didn't experience such a panic before > > > 2.6.32, but seems not a regression to me. There might be new/more sky2 > > > TX timeouts which trigger this panic and would make a real regression. > > > > Even if the code has always been broken, but it's only become visible after > > 2.6.32, that still counts as a regression IMO, because now the users are > > affected who weren't before. > > Right, but: > 1) someone with a similar but older problem might be mislead a fix is > not for them; Not if the fix changelog mentions the older kernels explicitly. > 2) someone with exactly this one problem (i.e. Berck ;-) might be > mislead "no oops" is enough, while their linux might be still worse > than before. (So I intended Berck to re-consider or even re-check > this problem wrt. 2.6.31, and maybe even reporting another > regression.) Yes, rechecking the problem with 2.6.31 would be useful. Rafael ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 20:31 ` Jarek Poplawski 2010-01-11 20:50 ` Rafael J. Wysocki @ 2010-01-11 21:02 ` Berck E. Nash 2010-01-11 21:47 ` Jarek Poplawski 1 sibling, 1 reply; 59+ messages in thread From: Berck E. Nash @ 2010-01-11 21:02 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Rafael J. Wysocki, netdev Jarek Poplawski wrote: > On Mon, Jan 11, 2010 at 08:32:24PM +0100, Rafael J. Wysocki wrote: >> On Monday 11 January 2010, Jarek Poplawski wrote: >>> On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: >>>> Rafael J. Wysocki wrote: >>>>> This message has been generated automatically as a part of a report >>>>> of recent regressions. >>>>> >>>>> The following bug entry is on the current list of known regressions >>>>> from 2.6.32. Please verify if it still should be listed and let me know >>>>> (either way). >>> BTW, I don't know why Berck didn't experience such a panic before >>> 2.6.32, but seems not a regression to me. There might be new/more sky2 >>> TX timeouts which trigger this panic and would make a real regression. >> Even if the code has always been broken, but it's only become visible after >> 2.6.32, that still counts as a regression IMO, because now the users are >> affected who weren't before. > > Right, but: > 1) someone with a similar but older problem might be mislead a fix is > not for them; > 2) someone with exactly this one problem (i.e. Berck ;-) might be > mislead "no oops" is enough, while their linux might be still worse > than before. (So I intended Berck to re-consider or even re-check > this problem wrt. 2.6.31, and maybe even reporting another > regression.) Well, the problem with this bug is how hard it is for me to reproduce. I'm willing to admit that just because I never got it before 2.6.32 isn't proof that it wasn't there in 2.6.31. But it's a regression somewhere along the line, since I've been using this hardware for over 3 years now. There were lots of bugs in the sky2 driver years ago, but for the last 2+ years or so, I haven't had any trouble at all until now. The bug only shows up for me with bittorrent traffic. I also use the same adapter to transfer backups over the network from several computers, and that doesn't trigger it... I used 2.6.31 for however long it was the current stable, and I never got a crash with it. After I got several crashes in 2.6.32, I reverted to 2.6.31 until Jarek sent this patch that seems to have fixed it. I've never gotten it to crash in 2.6.31, so I'm pretty sure it's a 2.6.32 regression, but I can't prove it. I would love to do more testing, but since I can't reproduce the bug at will, I'm not really sure what to offer? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 21:02 ` Berck E. Nash @ 2010-01-11 21:47 ` Jarek Poplawski 0 siblings, 0 replies; 59+ messages in thread From: Jarek Poplawski @ 2010-01-11 21:47 UTC (permalink / raw) To: Berck E. Nash; +Cc: Rafael J. Wysocki, netdev On Mon, Jan 11, 2010 at 02:02:40PM -0700, Berck E. Nash wrote: > Jarek Poplawski wrote: > > 2) someone with exactly this one problem (i.e. Berck ;-) might be > > mislead "no oops" is enough, while their linux might be still worse > > than before. (So I intended Berck to re-consider or even re-check > > this problem wrt. 2.6.31, and maybe even reporting another > > regression.) > > Well, the problem with this bug is how hard it is for me to reproduce. > I'm willing to admit that just because I never got it before 2.6.32 > isn't proof that it wasn't there in 2.6.31. But it's a regression > somewhere along the line, since I've been using this hardware for over 3 > years now. There were lots of bugs in the sky2 driver years ago, but > for the last 2+ years or so, I haven't had any trouble at all until now. > > The bug only shows up for me with bittorrent traffic. I also use the > same adapter to transfer backups over the network from several > computers, and that doesn't trigger it... > > I used 2.6.31 for however long it was the current stable, and I never > got a crash with it. After I got several crashes in 2.6.32, I reverted > to 2.6.31 until Jarek sent this patch that seems to have fixed it. I've > never gotten it to crash in 2.6.31, so I'm pretty sure it's a 2.6.32 > regression, but I can't prove it. > > I would love to do more testing, but since I can't reproduce the bug at > will, I'm not really sure what to offer? I mainly thought about looking at the logs during such bittorrent to compare if there is any visible change in "sky2 eth0: disabling interface"/"sky2 eth0: enabling interface" etc. frequency between 2.6.31 and 2.6.32. Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 0:36 ` Berck E. Nash 2010-01-11 13:26 ` Jarek Poplawski @ 2010-01-11 14:03 ` Mike McCormack 2010-01-11 16:45 ` Stephen Hemminger 1 sibling, 1 reply; 59+ messages in thread From: Mike McCormack @ 2010-01-11 14:03 UTC (permalink / raw) To: Berck E. Nash Cc: Rafael J. Wysocki, netdev, Jarek Poplawski, Stephen Hemminger Berck E. Nash wrote: > > During TX timeout procedure dev could be awoken too early, e.g. by > sky2_complete_tx() called from sky2_down(). Then sky2_xmit_frame() > can run while buffers are freed causing an oops. This patch fixes it > by adding netif_device_present() test in sky2_tx_complete(). > > Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14925 > > With debugging by: Mike McCormack <mikem@ring3k.org> > > Reported-by: Berck E. Nash <flyboy@gmail.com> > Tested-by: Berck E. Nash <flyboy@gmail.com> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > --- > > drivers/net/sky2.c | 3 ++- Perhaps only sky2_tx_done should wake the queue? Does the patch below fix the problem too? thanks, Mike Subject: [PATCH] sky2: Don't wake queue in sky2_tx_complete() We should only wake the tx queue on completion of transmits, not after cleaning the tx ring. Signed-off-by: Mike McCormack <mikem@ring3k.org> --- drivers/net/sky2.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index 93d9635..6d9111d 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done) sky2->tx_cons = idx; smp_mb(); - - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) - netif_wake_queue(dev); } static void sky2_tx_reset(struct sky2_hw *hw, unsigned port) @@ -2416,8 +2413,11 @@ static inline void sky2_tx_done(struct net_device *dev, u16 last) { struct sky2_port *sky2 = netdev_priv(dev); - if (netif_running(dev)) + if (netif_running(dev)) { sky2_tx_complete(sky2, last); + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) + netif_wake_queue(dev); + } } static inline void sky2_skb_rx(const struct sky2_port *sky2, -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 14:03 ` Mike McCormack @ 2010-01-11 16:45 ` Stephen Hemminger 2010-01-11 22:07 ` Jarek Poplawski 2010-01-11 22:31 ` [Bug #14925] sky2 panic under load Jarek Poplawski 0 siblings, 2 replies; 59+ messages in thread From: Stephen Hemminger @ 2010-01-11 16:45 UTC (permalink / raw) To: Mike McCormack; +Cc: Berck E. Nash, Rafael J. Wysocki, netdev, Jarek Poplawski On Mon, 11 Jan 2010 23:03:41 +0900 Mike McCormack <mikem@ring3k.org> wrote: > Perhaps only sky2_tx_done should wake the queue? > Does the patch below fix the problem too? > > thanks, > > Mike The idea is good, but what if transmit queue was full (stopped), and TX FIFO gets stuck. Then Transmit timeout happens and the reset logic clears the queue. What will start the queue? Something like this: ----------------------------------------------------------- --- a/drivers/net/sky2.c 2010-01-11 08:36:42.617426300 -0800 +++ b/drivers/net/sky2.c 2010-01-11 08:42:51.295237661 -0800 @@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2 sky2->tx_cons = idx; smp_mb(); - - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) - netif_wake_queue(dev); } static void sky2_tx_reset(struct sky2_hw *hw, unsigned port) @@ -2416,8 +2413,12 @@ static inline void sky2_tx_done(struct n { struct sky2_port *sky2 = netdev_priv(dev); - if (netif_running(dev)) + if (netif_running(dev) && netif_device_present(dev)) { sky2_tx_complete(sky2, last); + + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) + netif_wake_queue(dev); + } } static inline void sky2_skb_rx(const struct sky2_port *sky2, @@ -3197,6 +3198,7 @@ static int sky2_reattach(struct net_devi } else { netif_device_attach(dev); sky2_set_multicast(dev); + netif_start_queue(dev); } } -- ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 16:45 ` Stephen Hemminger @ 2010-01-11 22:07 ` Jarek Poplawski 2010-01-12 0:14 ` David Miller 2010-01-11 22:31 ` [Bug #14925] sky2 panic under load Jarek Poplawski 1 sibling, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-11 22:07 UTC (permalink / raw) To: Stephen Hemminger Cc: Mike McCormack, Berck E. Nash, Rafael J. Wysocki, netdev On Mon, Jan 11, 2010 at 08:45:04AM -0800, Stephen Hemminger wrote: > On Mon, 11 Jan 2010 23:03:41 +0900 > Mike McCormack <mikem@ring3k.org> wrote: > > > Perhaps only sky2_tx_done should wake the queue? > > Does the patch below fix the problem too? > > > > thanks, > > > > Mike > > The idea is good, I don't agree: you both try to make this check more specific. Actually, since this problem is quite tricky, I wondered about making it more genaral and visible for other maintainers by adding something like netif_wake_queue_present() with the check and some comment. Jarek P. > but what if transmit queue was full (stopped), > and TX FIFO gets stuck. Then Transmit timeout happens and > the reset logic clears the queue. What will start the queue? > > Something like this: > ----------------------------------------------------------- > > > > --- a/drivers/net/sky2.c 2010-01-11 08:36:42.617426300 -0800 > +++ b/drivers/net/sky2.c 2010-01-11 08:42:51.295237661 -0800 > @@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2 > > sky2->tx_cons = idx; > smp_mb(); > - > - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > - netif_wake_queue(dev); > } > > static void sky2_tx_reset(struct sky2_hw *hw, unsigned port) > @@ -2416,8 +2413,12 @@ static inline void sky2_tx_done(struct n > { > struct sky2_port *sky2 = netdev_priv(dev); > > - if (netif_running(dev)) > + if (netif_running(dev) && netif_device_present(dev)) { > sky2_tx_complete(sky2, last); > + > + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > + netif_wake_queue(dev); > + } > } > > static inline void sky2_skb_rx(const struct sky2_port *sky2, > @@ -3197,6 +3198,7 @@ static int sky2_reattach(struct net_devi > } else { > netif_device_attach(dev); > sky2_set_multicast(dev); > + netif_start_queue(dev); > } > } > > > -- ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 22:07 ` Jarek Poplawski @ 2010-01-12 0:14 ` David Miller 2010-01-12 7:50 ` Jarek Poplawski 0 siblings, 1 reply; 59+ messages in thread From: David Miller @ 2010-01-12 0:14 UTC (permalink / raw) To: jarkao2; +Cc: shemminger, mikem, flyboy, rjw, netdev From: Jarek Poplawski <jarkao2@gmail.com> Date: Mon, 11 Jan 2010 23:07:54 +0100 > I don't agree: you both try to make this check more specific. > Actually, since this problem is quite tricky, I wondered about > making it more genaral and visible for other maintainers by > adding something like netif_wake_queue_present() with the > check and some comment. This detracts from the real problem. The issue is that we have a code path bringing a device down, which uses helper routines which are meant to be executed when the device is up and functioning normally. That's the bug. No other driver does silly things like call helper routines which wake the TX queue when taking the chip down. Fix that, not the immediate symptoms. Write a routine that unconditionally clears the TX queue, frees the packets, etc. and has none of the wake logic. That's what most other driver do, and those that don't should be fixed similarly. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-12 0:14 ` David Miller @ 2010-01-12 7:50 ` Jarek Poplawski 2010-01-12 8:08 ` David Miller 0 siblings, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-12 7:50 UTC (permalink / raw) To: David Miller; +Cc: shemminger, mikem, flyboy, rjw, netdev On Mon, Jan 11, 2010 at 04:14:19PM -0800, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Mon, 11 Jan 2010 23:07:54 +0100 > > > I don't agree: you both try to make this check more specific. > > Actually, since this problem is quite tricky, I wondered about > > making it more genaral and visible for other maintainers by > > adding something like netif_wake_queue_present() with the > > check and some comment. > > This detracts from the real problem. > > The issue is that we have a code path bringing a device down, which > uses helper routines which are meant to be executed when the device is > up and functioning normally. > > That's the bug. > > No other driver does silly things like call helper routines which wake > the TX queue when taking the chip down. > > Fix that, not the immediate symptoms. Write a routine that > unconditionally clears the TX queue, frees the packets, etc. and > has none of the wake logic. > > That's what most other driver do, and those that don't should be > fixed similarly. As I wrote it's tricky and could be hard to track even if you "heard" about it, e.g. Mike moves netif_wake_queue() to another place in his last proposal, but it still can run after netif_device_detach() until napi_synchronize() in sky2_down(). I think, I can see similar problems e.g. in gianfar or netxen, where napi_disable() is done after netif_device_detach(), especially in suspend procedures (there might be less severe (than oops) effects yet). IMHO, it all looks simply error prone (sometime you have to know a driver well to track all possible paths to say it's really safe). In the meantime users are endangered with known oopses or at least suspend problems, while we know the reasons, can fix it (even temporarily) with one-liners, but wait for the right fixes. (Btw, such fixes might require a significant rewrite, risking inserting new bugs, so I doubt they are "right" for -stable.) Anyway, I'm OK with this (I hope what should be done will be done, and I don't have sky2 ;-) Cheers, Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-12 7:50 ` Jarek Poplawski @ 2010-01-12 8:08 ` David Miller 2010-01-12 8:56 ` Jarek Poplawski 0 siblings, 1 reply; 59+ messages in thread From: David Miller @ 2010-01-12 8:08 UTC (permalink / raw) To: jarkao2; +Cc: shemminger, mikem, flyboy, rjw, netdev From: Jarek Poplawski <jarkao2@gmail.com> Date: Tue, 12 Jan 2010 07:50:59 +0000 > I think, I can see similar problems e.g. in gianfar or netxen, where > napi_disable() is done after netif_device_detach(), especially in > suspend procedures (there might be less severe (than oops) effects > yet). IMHO, it all looks simply error prone (sometime you have to > know a driver well to track all possible paths to say it's really > safe). Then that's an even larger bug. Until you do napi_disable(), the device can be touched. Asynchronous paths outside of the driver's control, even with interrupts disabled, can call back into the driver and touch the chip. F.e. netpoll via netconsole output on another cpu So it therefore must be done before doing the actual work of bringing the device down or suspending it. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-12 8:08 ` David Miller @ 2010-01-12 8:56 ` Jarek Poplawski 2010-01-12 9:42 ` David Miller 0 siblings, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-12 8:56 UTC (permalink / raw) To: David Miller; +Cc: shemminger, mikem, flyboy, rjw, netdev, Michael Breuer On Tue, Jan 12, 2010 at 12:08:04AM -0800, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Tue, 12 Jan 2010 07:50:59 +0000 > > > I think, I can see similar problems e.g. in gianfar or netxen, where > > napi_disable() is done after netif_device_detach(), especially in > > suspend procedures (there might be less severe (than oops) effects > > yet). IMHO, it all looks simply error prone (sometime you have to > > know a driver well to track all possible paths to say it's really > > safe). > > Then that's an even larger bug. > > Until you do napi_disable(), the device can be touched. > > Asynchronous paths outside of the driver's control, even > with interrupts disabled, can call back into the driver > and touch the chip. > > F.e. netpoll via netconsole output on another cpu > > So it therefore must be done before doing the actual work of bringing > the device down or suspending it. Maybe I miss something, but once more: this patch mentioned by Berck Nash has been tested by at least two users, Berck himself, and probably even more intensively by Michael Breuer, during af_packet debugging. Both guys acknowledged it helped, so it can't be that bad. Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-12 8:56 ` Jarek Poplawski @ 2010-01-12 9:42 ` David Miller 2010-01-12 10:31 ` Jarek Poplawski 2010-01-12 10:56 ` David Miller 0 siblings, 2 replies; 59+ messages in thread From: David Miller @ 2010-01-12 9:42 UTC (permalink / raw) To: jarkao2; +Cc: shemminger, mikem, flyboy, rjw, netdev, mbreuer From: Jarek Poplawski <jarkao2@gmail.com> Date: Tue, 12 Jan 2010 08:56:34 +0000 > Maybe I miss something, but once more: this patch mentioned by Berck > Nash has been tested by at least two users, Berck himself, and > probably even more intensively by Michael Breuer, during af_packet > debugging. Both guys acknowledged it helped, so it can't be that bad. I agree, as a short term fix, this patch is fine. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-12 9:42 ` David Miller @ 2010-01-12 10:31 ` Jarek Poplawski 2010-01-12 10:56 ` David Miller 1 sibling, 0 replies; 59+ messages in thread From: Jarek Poplawski @ 2010-01-12 10:31 UTC (permalink / raw) To: David Miller; +Cc: shemminger, mikem, flyboy, rjw, netdev, mbreuer On Tue, Jan 12, 2010 at 01:42:18AM -0800, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Tue, 12 Jan 2010 08:56:34 +0000 > > > Maybe I miss something, but once more: this patch mentioned by Berck > > Nash has been tested by at least two users, Berck himself, and > > probably even more intensively by Michael Breuer, during af_packet > > debugging. Both guys acknowledged it helped, so it can't be that bad. > > I agree, as a short term fix, this patch is fine. And I agree real problems might be deeper. Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-12 9:42 ` David Miller 2010-01-12 10:31 ` Jarek Poplawski @ 2010-01-12 10:56 ` David Miller 2010-01-12 11:04 ` Jarek Poplawski ` (2 more replies) 1 sibling, 3 replies; 59+ messages in thread From: David Miller @ 2010-01-12 10:56 UTC (permalink / raw) To: jarkao2; +Cc: shemminger, mikem, flyboy, rjw, netdev, mbreuer From: David Miller <davem@davemloft.net> Date: Tue, 12 Jan 2010 01:42:18 -0800 (PST) > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Tue, 12 Jan 2010 08:56:34 +0000 > >> Maybe I miss something, but once more: this patch mentioned by Berck >> Nash has been tested by at least two users, Berck himself, and >> probably even more intensively by Michael Breuer, during af_packet >> debugging. Both guys acknowledged it helped, so it can't be that bad. > > I agree, as a short term fix, this patch is fine. In case it isn't painfully obvious, I'm applying Jarek's "v2" fix and will also queue that up for -stable. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-12 10:56 ` David Miller @ 2010-01-12 11:04 ` Jarek Poplawski 2010-01-12 15:39 ` Stephen Hemminger 2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger 2 siblings, 0 replies; 59+ messages in thread From: Jarek Poplawski @ 2010-01-12 11:04 UTC (permalink / raw) To: David Miller; +Cc: shemminger, mikem, flyboy, rjw, netdev, mbreuer On Tue, Jan 12, 2010 at 02:56:20AM -0800, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Tue, 12 Jan 2010 01:42:18 -0800 (PST) > > > From: Jarek Poplawski <jarkao2@gmail.com> > > Date: Tue, 12 Jan 2010 08:56:34 +0000 > > > >> Maybe I miss something, but once more: this patch mentioned by Berck > >> Nash has been tested by at least two users, Berck himself, and > >> probably even more intensively by Michael Breuer, during af_packet > >> debugging. Both guys acknowledged it helped, so it can't be that bad. > > > > I agree, as a short term fix, this patch is fine. > > In case it isn't painfully obvious, I'm applying Jarek's "v2" > fix and will also queue that up for -stable. > Thanks! Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-12 10:56 ` David Miller 2010-01-12 11:04 ` Jarek Poplawski @ 2010-01-12 15:39 ` Stephen Hemminger 2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger 2 siblings, 0 replies; 59+ messages in thread From: Stephen Hemminger @ 2010-01-12 15:39 UTC (permalink / raw) To: David Miller; +Cc: mikem, flyboy, rjw, netdev, mbreuer, jarkao2 Please hold off applying anything. I am testing a cleaner solution ^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH] sky2: safer transmit ring cleaning 2010-01-12 10:56 ` David Miller 2010-01-12 11:04 ` Jarek Poplawski 2010-01-12 15:39 ` Stephen Hemminger @ 2010-01-12 16:15 ` Stephen Hemminger 2010-01-12 16:32 ` Michael Breuer ` (2 more replies) 2 siblings, 3 replies; 59+ messages in thread From: Stephen Hemminger @ 2010-01-12 16:15 UTC (permalink / raw) To: David Miller; +Cc: jarkao2, mikem, flyboy, rjw, netdev, mbreuer This code makes transmit path and transmit reset safer by: * adding memory barrier before checking available ring slots * reseting state of tx ring elements after free * seperate cleanup function from ring done function * removing mostly unused tx_next element Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> --- Please apply this instead of the various bits and pieces flying around labeled as sky2 panic under load --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 +++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800 @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct /* Number of list elements available for next tx */ static inline int tx_avail(const struct sky2_port *sky2) { + /* Makes sure update of tx_prod from start_xmit and + tx_cons from tx_done are seen. */ + smp_mb(); return sky2->tx_pending - tx_inuse(sky2); } @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s return count; } -static void sky2_tx_unmap(struct pci_dev *pdev, - const struct tx_ring_info *re) +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) { if (re->flags & TX_MAP_SINGLE) pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), pci_unmap_len(re, maplen), PCI_DMA_TODEVICE); + re->flags = 0; } /* @@ -1804,7 +1807,8 @@ mapping_error: } /* - * Free ring elements from starting at tx_cons until "done" + * Transmit complete processing + * Free ring elements from starting at tx_cons until done index * * NB: * 1. The hardware will tell us about partial completion of multi-part @@ -1813,9 +1817,9 @@ mapping_error: * looks at the tail of the queue of FIFO (tx_cons), not * the head (tx_prod) */ -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) +static void sky2_tx_done(struct net_device *dev, u16 done) { - struct net_device *dev = sky2->netdev; + struct sky2_port *sky2 = netdev_priv(dev); unsigned idx; BUG_ON(done >= sky2->tx_ring_size); @@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2 sky2_tx_unmap(sky2->hw->pdev, re); if (skb) { + re->skb = NULL; + if (unlikely(netif_msg_tx_done(sky2))) printk(KERN_DEBUG "%s: tx done %u\n", dev->name, idx); @@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2 dev->stats.tx_bytes += skb->len; dev_kfree_skb_any(skb); - - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); } } sky2->tx_cons = idx; - smp_mb(); if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) netif_wake_queue(dev); @@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); } +static void sky2_tx_clean(struct sky2_port *sky2) +{ + u16 idx; + + for (idx = 0; idx < sky2->tx_ring_size; idx++) { + struct tx_ring_info *re = sky2->tx_ring + idx; + + sky2_tx_unmap(sky2->hw->pdev, re); + if (re->skb) { + dev_kfree_skb_any(re->skb); + re->skb = NULL; + } + } +} + /* Network shutdown */ static int sky2_down(struct net_device *dev) { @@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device * sky2_tx_reset(hw, port); /* Free any pending frames stuck in HW queue */ - sky2_tx_complete(sky2, sky2->tx_prod); - + sky2_tx_clean(sky2); sky2_rx_clean(sky2); sky2_free_buffers(sky2); @@ -2411,15 +2428,6 @@ error: goto resubmit; } -/* Transmit complete */ -static inline void sky2_tx_done(struct net_device *dev, u16 last) -{ - struct sky2_port *sky2 = netdev_priv(dev); - - if (netif_running(dev)) - sky2_tx_complete(sky2, last); -} - static inline void sky2_skb_rx(const struct sky2_port *sky2, u32 status, struct sk_buff *skb) { @@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi /* Dump contents of tx ring */ sop = 1; - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size; + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size; idx = RING_NEXT(idx, sky2->tx_ring_size)) { const struct sky2_tx_le *le = sky2->tx_le + idx; u32 a = le32_to_cpu(le->addr); --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 @@ -2187,7 +2187,6 @@ struct sky2_port { u16 tx_ring_size; u16 tx_cons; /* next le to check */ u16 tx_prod; /* next le to use */ - u16 tx_next; /* debug only */ u16 tx_pending; u16 tx_last_mss; ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning 2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger @ 2010-01-12 16:32 ` Michael Breuer 2010-01-12 17:02 ` Stephen Hemminger 2010-01-12 18:04 ` Jarek Poplawski 2010-01-12 18:35 ` [PATCH] sky2: safer transmit ring cleaning Michael Breuer 2 siblings, 1 reply; 59+ messages in thread From: Michael Breuer @ 2010-01-12 16:32 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev On 01/12/2010 11:15 AM, Stephen Hemminger wrote: > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element > > > Stephen, Just want to confirm that this supersedes both of the patches I'm currently running with: 1. skb_may_pull 2. Nash patch -- Mike ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning 2010-01-12 16:32 ` Michael Breuer @ 2010-01-12 17:02 ` Stephen Hemminger 0 siblings, 0 replies; 59+ messages in thread From: Stephen Hemminger @ 2010-01-12 17:02 UTC (permalink / raw) To: Michael Breuer; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev On Tue, 12 Jan 2010 11:32:45 -0500 Michael Breuer <mbreuer@majjas.com> wrote: > On 01/12/2010 11:15 AM, Stephen Hemminger wrote: > > This code makes transmit path and transmit reset safer by: > > * adding memory barrier before checking available ring slots > > * reseting state of tx ring elements after free > > * seperate cleanup function from ring done function > > * removing mostly unused tx_next element > > > > > > > Stephen, > > Just want to confirm that this supersedes both of the patches I'm > currently running with: > > 1. skb_may_pull > 2. Nash patch You want: 1. AF_PACKET patch that makes sure skb is not modified after send 2. This patch. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning 2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger 2010-01-12 16:32 ` Michael Breuer @ 2010-01-12 18:04 ` Jarek Poplawski 2010-01-12 18:13 ` Stephen Hemminger 2010-01-12 18:35 ` [PATCH] sky2: safer transmit ring cleaning Michael Breuer 2 siblings, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-12 18:04 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer On Tue, Jan 12, 2010 at 08:15:13AM -0800, Stephen Hemminger wrote: > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element Does this patch prevent re-enabling tx after netif_device_detach(), e.g. when sky2_detach() and sky2_tx_done() run at the same time on different cpus? Jarek P. > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > --- > Please apply this instead of the various bits and pieces flying > around labeled as sky2 panic under load > > > --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 > +++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800 > @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + /* Makes sure update of tx_prod from start_xmit and > + tx_cons from tx_done are seen. */ > + smp_mb(); > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s > return count; > } > > -static void sky2_tx_unmap(struct pci_dev *pdev, > - const struct tx_ring_info *re) > +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) > { > if (re->flags & TX_MAP_SINGLE) > pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), > @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev > pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), > pci_unmap_len(re, maplen), > PCI_DMA_TODEVICE); > + re->flags = 0; > } > > /* > @@ -1804,7 +1807,8 @@ mapping_error: > } > > /* > - * Free ring elements from starting at tx_cons until "done" > + * Transmit complete processing > + * Free ring elements from starting at tx_cons until done index > * > * NB: > * 1. The hardware will tell us about partial completion of multi-part > @@ -1813,9 +1817,9 @@ mapping_error: > * looks at the tail of the queue of FIFO (tx_cons), not > * the head (tx_prod) > */ > -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) > +static void sky2_tx_done(struct net_device *dev, u16 done) > { > - struct net_device *dev = sky2->netdev; > + struct sky2_port *sky2 = netdev_priv(dev); > unsigned idx; > > BUG_ON(done >= sky2->tx_ring_size); > @@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2 > sky2_tx_unmap(sky2->hw->pdev, re); > > if (skb) { > + re->skb = NULL; > + > if (unlikely(netif_msg_tx_done(sky2))) > printk(KERN_DEBUG "%s: tx done %u\n", > dev->name, idx); > @@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2 > dev->stats.tx_bytes += skb->len; > > dev_kfree_skb_any(skb); > - > - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); > } > } > > sky2->tx_cons = idx; > - smp_mb(); > > if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > netif_wake_queue(dev); > @@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw > sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); > } > > +static void sky2_tx_clean(struct sky2_port *sky2) > +{ > + u16 idx; > + > + for (idx = 0; idx < sky2->tx_ring_size; idx++) { > + struct tx_ring_info *re = sky2->tx_ring + idx; > + > + sky2_tx_unmap(sky2->hw->pdev, re); > + if (re->skb) { > + dev_kfree_skb_any(re->skb); > + re->skb = NULL; > + } > + } > +} > + > /* Network shutdown */ > static int sky2_down(struct net_device *dev) > { > @@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device * > sky2_tx_reset(hw, port); > > /* Free any pending frames stuck in HW queue */ > - sky2_tx_complete(sky2, sky2->tx_prod); > - > + sky2_tx_clean(sky2); > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > @@ -2411,15 +2428,6 @@ error: > goto resubmit; > } > > -/* Transmit complete */ > -static inline void sky2_tx_done(struct net_device *dev, u16 last) > -{ > - struct sky2_port *sky2 = netdev_priv(dev); > - > - if (netif_running(dev)) > - sky2_tx_complete(sky2, last); > -} > - > static inline void sky2_skb_rx(const struct sky2_port *sky2, > u32 status, struct sk_buff *skb) > { > @@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi > > /* Dump contents of tx ring */ > sop = 1; > - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size; > + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size; > idx = RING_NEXT(idx, sky2->tx_ring_size)) { > const struct sky2_tx_le *le = sky2->tx_le + idx; > u32 a = le32_to_cpu(le->addr); > --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 > +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 > @@ -2187,7 +2187,6 @@ struct sky2_port { > u16 tx_ring_size; > u16 tx_cons; /* next le to check */ > u16 tx_prod; /* next le to use */ > - u16 tx_next; /* debug only */ > > u16 tx_pending; > u16 tx_last_mss; ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning 2010-01-12 18:04 ` Jarek Poplawski @ 2010-01-12 18:13 ` Stephen Hemminger 2010-01-12 18:24 ` Jarek Poplawski 0 siblings, 1 reply; 59+ messages in thread From: Stephen Hemminger @ 2010-01-12 18:13 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer On Tue, 12 Jan 2010 19:04:30 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote: > On Tue, Jan 12, 2010 at 08:15:13AM -0800, Stephen Hemminger wrote: > > This code makes transmit path and transmit reset safer by: > > * adding memory barrier before checking available ring slots > > * reseting state of tx ring elements after free > > * seperate cleanup function from ring done function > > * removing mostly unused tx_next element > > Does this patch prevent re-enabling tx after netif_device_detach(), > e.g. when sky2_detach() and sky2_tx_done() run at the same time on > different cpus? > Yes. The napi is disabled during the detach so transmit completion can not be done during that period. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning 2010-01-12 18:13 ` Stephen Hemminger @ 2010-01-12 18:24 ` Jarek Poplawski 2010-01-12 18:49 ` [PATCH] sky2: safer transmit ring cleaning (v2) Stephen Hemminger 0 siblings, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-12 18:24 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer On Tue, Jan 12, 2010 at 10:13:06AM -0800, Stephen Hemminger wrote: > On Tue, 12 Jan 2010 19:04:30 +0100 > Jarek Poplawski <jarkao2@gmail.com> wrote: > > > On Tue, Jan 12, 2010 at 08:15:13AM -0800, Stephen Hemminger wrote: > > > This code makes transmit path and transmit reset safer by: > > > * adding memory barrier before checking available ring slots > > > * reseting state of tx ring elements after free > > > * seperate cleanup function from ring done function > > > * removing mostly unused tx_next element > > > > Does this patch prevent re-enabling tx after netif_device_detach(), > > e.g. when sky2_detach() and sky2_tx_done() run at the same time on > > different cpus? > > > > Yes. > The napi is disabled during the detach so transmit completion can > not be done during that period. Could you point me where exactly the napi is disabled, probably I missed this? Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH] sky2: safer transmit ring cleaning (v2) 2010-01-12 18:24 ` Jarek Poplawski @ 2010-01-12 18:49 ` Stephen Hemminger 2010-01-12 19:16 ` Jarek Poplawski 2010-01-12 19:34 ` Michael Breuer 0 siblings, 2 replies; 59+ messages in thread From: Stephen Hemminger @ 2010-01-12 18:49 UTC (permalink / raw) To: Jarek Poplawski, David Miller; +Cc: mikem, flyboy, rjw, netdev, mbreuer This code makes transmit path and transmit reset safer by: * adding memory barrier before checking available ring slots * reseting state of tx ring elements after free * seperate cleanup function from ring done function * removing mostly unused tx_next element Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> --- What is supposed to happen: * restart sky2_restart calls napi_disable while cleaning * dev_close we can't call napi_disable() because of two ports sharing same NAPI, so napi_synchronize() is used to make sure that any NAPI running on other CPU has completed. * if status is reported by chip for down device, then tx_done should ignore But, last patch was missing last step --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 +++ b/drivers/net/sky2.c 2010-01-12 10:44:56.068391575 -0800 @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct /* Number of list elements available for next tx */ static inline int tx_avail(const struct sky2_port *sky2) { + /* Makes sure update of tx_prod from start_xmit and + tx_cons from tx_done are seen. */ + smp_mb(); return sky2->tx_pending - tx_inuse(sky2); } @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s return count; } -static void sky2_tx_unmap(struct pci_dev *pdev, - const struct tx_ring_info *re) +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) { if (re->flags & TX_MAP_SINGLE) pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), pci_unmap_len(re, maplen), PCI_DMA_TODEVICE); + re->flags = 0; } /* @@ -1804,7 +1807,8 @@ mapping_error: } /* - * Free ring elements from starting at tx_cons until "done" + * Transmit complete processing + * Free ring elements from starting at tx_cons until done index * * NB: * 1. The hardware will tell us about partial completion of multi-part @@ -1813,11 +1817,14 @@ mapping_error: * looks at the tail of the queue of FIFO (tx_cons), not * the head (tx_prod) */ -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) +static void sky2_tx_done(struct net_device *dev, u16 done) { - struct net_device *dev = sky2->netdev; + struct sky2_port *sky2 = netdev_priv(dev); unsigned idx; + if (unlikely(!netif_running(dev))) + return; + BUG_ON(done >= sky2->tx_ring_size); for (idx = sky2->tx_cons; idx != done; @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 sky2_tx_unmap(sky2->hw->pdev, re); if (skb) { + re->skb = NULL; + if (unlikely(netif_msg_tx_done(sky2))) printk(KERN_DEBUG "%s: tx done %u\n", dev->name, idx); @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2 dev->stats.tx_bytes += skb->len; dev_kfree_skb_any(skb); - - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); } } sky2->tx_cons = idx; - smp_mb(); if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) netif_wake_queue(dev); @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); } +static void sky2_tx_clean(struct sky2_port *sky2) +{ + u16 idx; + + for (idx = 0; idx < sky2->tx_ring_size; idx++) { + struct tx_ring_info *re = sky2->tx_ring + idx; + + sky2_tx_unmap(sky2->hw->pdev, re); + if (re->skb) { + dev_kfree_skb_any(re->skb); + re->skb = NULL; + } + } +} + /* Network shutdown */ static int sky2_down(struct net_device *dev) { @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device * sky2_tx_reset(hw, port); /* Free any pending frames stuck in HW queue */ - sky2_tx_complete(sky2, sky2->tx_prod); - + sky2_tx_clean(sky2); sky2_rx_clean(sky2); sky2_free_buffers(sky2); @@ -2411,15 +2431,6 @@ error: goto resubmit; } -/* Transmit complete */ -static inline void sky2_tx_done(struct net_device *dev, u16 last) -{ - struct sky2_port *sky2 = netdev_priv(dev); - - if (netif_running(dev)) - sky2_tx_complete(sky2, last); -} - static inline void sky2_skb_rx(const struct sky2_port *sky2, u32 status, struct sk_buff *skb) { @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi /* Dump contents of tx ring */ sop = 1; - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size; + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size; idx = RING_NEXT(idx, sky2->tx_ring_size)) { const struct sky2_tx_le *le = sky2->tx_le + idx; u32 a = le32_to_cpu(le->addr); --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 @@ -2187,7 +2187,6 @@ struct sky2_port { u16 tx_ring_size; u16 tx_cons; /* next le to check */ u16 tx_prod; /* next le to use */ - u16 tx_next; /* debug only */ u16 tx_pending; u16 tx_last_mss; ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2) 2010-01-12 18:49 ` [PATCH] sky2: safer transmit ring cleaning (v2) Stephen Hemminger @ 2010-01-12 19:16 ` Jarek Poplawski 2010-01-12 19:23 ` Stephen Hemminger 2010-01-12 19:34 ` Michael Breuer 1 sibling, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-12 19:16 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer On Tue, Jan 12, 2010 at 10:49:45AM -0800, Stephen Hemminger wrote: > > > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > --- > > What is supposed to happen: > * restart sky2_restart calls napi_disable while cleaning Yes, but it's after the detach; similarly to sky2_suspend(). (I'm not sure how safe vs such re-enabling is sky2_set_ringparam().) > * dev_close we can't call napi_disable() because of two ports > sharing same NAPI, so napi_synchronize() is used to make sure that > any NAPI running on other CPU has completed. So it seems still endangered. > * if status is reported by chip for down device, then tx_done should ignore > > But, last patch was missing last step > > --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 > +++ b/drivers/net/sky2.c 2010-01-12 10:44:56.068391575 -0800 > @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + /* Makes sure update of tx_prod from start_xmit and > + tx_cons from tx_done are seen. */ > + smp_mb(); > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s > return count; > } > > -static void sky2_tx_unmap(struct pci_dev *pdev, > - const struct tx_ring_info *re) > +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) > { > if (re->flags & TX_MAP_SINGLE) > pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), > @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev > pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), > pci_unmap_len(re, maplen), > PCI_DMA_TODEVICE); > + re->flags = 0; > } > > /* > @@ -1804,7 +1807,8 @@ mapping_error: > } > > /* > - * Free ring elements from starting at tx_cons until "done" > + * Transmit complete processing > + * Free ring elements from starting at tx_cons until done index > * > * NB: > * 1. The hardware will tell us about partial completion of multi-part > @@ -1813,11 +1817,14 @@ mapping_error: > * looks at the tail of the queue of FIFO (tx_cons), not > * the head (tx_prod) > */ > -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) > +static void sky2_tx_done(struct net_device *dev, u16 done) > { > - struct net_device *dev = sky2->netdev; > + struct sky2_port *sky2 = netdev_priv(dev); > unsigned idx; > > + if (unlikely(!netif_running(dev))) > + return; > + > BUG_ON(done >= sky2->tx_ring_size); > > for (idx = sky2->tx_cons; idx != done; > @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 > sky2_tx_unmap(sky2->hw->pdev, re); > > if (skb) { > + re->skb = NULL; > + > if (unlikely(netif_msg_tx_done(sky2))) > printk(KERN_DEBUG "%s: tx done %u\n", > dev->name, idx); > @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2 > dev->stats.tx_bytes += skb->len; > > dev_kfree_skb_any(skb); > - > - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); > } > } > > sky2->tx_cons = idx; > - smp_mb(); > > if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > netif_wake_queue(dev); > @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw > sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); > } > > +static void sky2_tx_clean(struct sky2_port *sky2) > +{ > + u16 idx; > + > + for (idx = 0; idx < sky2->tx_ring_size; idx++) { > + struct tx_ring_info *re = sky2->tx_ring + idx; > + > + sky2_tx_unmap(sky2->hw->pdev, re); > + if (re->skb) { > + dev_kfree_skb_any(re->skb); > + re->skb = NULL; > + } > + } > +} > + > /* Network shutdown */ > static int sky2_down(struct net_device *dev) > { > @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device * > sky2_tx_reset(hw, port); > > /* Free any pending frames stuck in HW queue */ > - sky2_tx_complete(sky2, sky2->tx_prod); > - > + sky2_tx_clean(sky2); > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > @@ -2411,15 +2431,6 @@ error: > goto resubmit; > } > > -/* Transmit complete */ > -static inline void sky2_tx_done(struct net_device *dev, u16 last) > -{ > - struct sky2_port *sky2 = netdev_priv(dev); > - > - if (netif_running(dev)) > - sky2_tx_complete(sky2, last); > -} > - > static inline void sky2_skb_rx(const struct sky2_port *sky2, > u32 status, struct sk_buff *skb) > { > @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi > > /* Dump contents of tx ring */ > sop = 1; > - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size; > + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size; > idx = RING_NEXT(idx, sky2->tx_ring_size)) { > const struct sky2_tx_le *le = sky2->tx_le + idx; > u32 a = le32_to_cpu(le->addr); > --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 > +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 > @@ -2187,7 +2187,6 @@ struct sky2_port { > u16 tx_ring_size; > u16 tx_cons; /* next le to check */ > u16 tx_prod; /* next le to use */ > - u16 tx_next; /* debug only */ > > u16 tx_pending; > u16 tx_last_mss; ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2) 2010-01-12 19:16 ` Jarek Poplawski @ 2010-01-12 19:23 ` Stephen Hemminger 2010-01-12 19:50 ` Jarek Poplawski 0 siblings, 1 reply; 59+ messages in thread From: Stephen Hemminger @ 2010-01-12 19:23 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer On Tue, 12 Jan 2010 20:16:11 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote: > > > > What is supposed to happen: > > * restart sky2_restart calls napi_disable while cleaning > > Yes, but it's after the detach; similarly to sky2_suspend(). > (I'm not sure how safe vs such re-enabling is sky2_set_ringparam(). set_ringparam happens under rtnl_lock() so reset and ringparams can't conflict. > > > * dev_close we can't call napi_disable() because of two ports > > sharing same NAPI, so napi_synchronize() is used to make sure that > > any NAPI running on other CPU has completed. > > So it seems still endangered. It was but not in revised v2 patch. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2) 2010-01-12 19:23 ` Stephen Hemminger @ 2010-01-12 19:50 ` Jarek Poplawski 2010-01-13 1:23 ` Stephen Hemminger 0 siblings, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-12 19:50 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer On Tue, Jan 12, 2010 at 11:23:14AM -0800, Stephen Hemminger wrote: > On Tue, 12 Jan 2010 20:16:11 +0100 > Jarek Poplawski <jarkao2@gmail.com> wrote: > > > > > > > What is supposed to happen: > > > * restart sky2_restart calls napi_disable while cleaning > > > > Yes, but it's after the detach; similarly to sky2_suspend(). > > (I'm not sure how safe vs such re-enabling is sky2_set_ringparam(). > > set_ringparam happens under rtnl_lock() so reset and ringparams can't > conflict. I didn't mean reset. I meant tx (dev_queue_xmit()) during ringparams. > > > > > > * dev_close we can't call napi_disable() because of two ports > > > sharing same NAPI, so napi_synchronize() is used to make sure that > > > any NAPI running on other CPU has completed. > > > > So it seems still endangered. > > It was but not in revised v2 patch. Sorry, I meant sky2_down() (except in dev_close()). Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2) 2010-01-12 19:50 ` Jarek Poplawski @ 2010-01-13 1:23 ` Stephen Hemminger 0 siblings, 0 replies; 59+ messages in thread From: Stephen Hemminger @ 2010-01-13 1:23 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David Miller, mikem, flyboy, rjw, netdev, mbreuer On Tue, 12 Jan 2010 20:50:04 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote: > On Tue, Jan 12, 2010 at 11:23:14AM -0800, Stephen Hemminger wrote: > > On Tue, 12 Jan 2010 20:16:11 +0100 > > Jarek Poplawski <jarkao2@gmail.com> wrote: > > > > > > > > > > What is supposed to happen: > > > > * restart sky2_restart calls napi_disable while cleaning > > > > > > Yes, but it's after the detach; similarly to sky2_suspend(). > > > (I'm not sure how safe vs such re-enabling is sky2_set_ringparam(). > > > > set_ringparam happens under rtnl_lock() so reset and ringparams can't > > conflict. > > I didn't mean reset. I meant tx (dev_queue_xmit()) during ringparams. sky2_detach does tx_lock netif_detach() -- stops the queue tx_unlock So if another CPU was about to send, it will have to wait behind the tx_global_lock, and then it will see the queue as frozen. BUT, you prod me to look harder and there is a race with softirq (bottom half) here that needs to be fixed. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v2) 2010-01-12 18:49 ` [PATCH] sky2: safer transmit ring cleaning (v2) Stephen Hemminger 2010-01-12 19:16 ` Jarek Poplawski @ 2010-01-12 19:34 ` Michael Breuer 1 sibling, 0 replies; 59+ messages in thread From: Michael Breuer @ 2010-01-12 19:34 UTC (permalink / raw) To: Stephen Hemminger Cc: Jarek Poplawski, David Miller, mikem, flyboy, rjw, netdev On 1/12/2010 1:49 PM, Stephen Hemminger wrote: > > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element > > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com> > > --- > > What is supposed to happen: > * restart sky2_restart calls napi_disable while cleaning > * dev_close we can't call napi_disable() because of two ports > sharing same NAPI, so napi_synchronize() is used to make sure that > any NAPI running on other CPU has completed. > * if status is reported by chip for down device, then tx_done should ignore > > But, last patch was missing last step > > --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 > +++ b/drivers/net/sky2.c 2010-01-12 10:44:56.068391575 -0800 > @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + /* Makes sure update of tx_prod from start_xmit and > + tx_cons from tx_done are seen. */ > + smp_mb(); > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s > return count; > } > > -static void sky2_tx_unmap(struct pci_dev *pdev, > - const struct tx_ring_info *re) > +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) > { > if (re->flags& TX_MAP_SINGLE) > pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), > @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev > pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), > pci_unmap_len(re, maplen), > PCI_DMA_TODEVICE); > + re->flags = 0; > } > > /* > @@ -1804,7 +1807,8 @@ mapping_error: > } > > /* > - * Free ring elements from starting at tx_cons until "done" > + * Transmit complete processing > + * Free ring elements from starting at tx_cons until done index > * > * NB: > * 1. The hardware will tell us about partial completion of multi-part > @@ -1813,11 +1817,14 @@ mapping_error: > * looks at the tail of the queue of FIFO (tx_cons), not > * the head (tx_prod) > */ > -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) > +static void sky2_tx_done(struct net_device *dev, u16 done) > { > - struct net_device *dev = sky2->netdev; > + struct sky2_port *sky2 = netdev_priv(dev); > unsigned idx; > > + if (unlikely(!netif_running(dev))) > + return; > + > BUG_ON(done>= sky2->tx_ring_size); > > for (idx = sky2->tx_cons; idx != done; > @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 > sky2_tx_unmap(sky2->hw->pdev, re); > > if (skb) { > + re->skb = NULL; > + > if (unlikely(netif_msg_tx_done(sky2))) > printk(KERN_DEBUG "%s: tx done %u\n", > dev->name, idx); > @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2 > dev->stats.tx_bytes += skb->len; > > dev_kfree_skb_any(skb); > - > - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); > } > } > > sky2->tx_cons = idx; > - smp_mb(); > > if (tx_avail(sky2)> MAX_SKB_TX_LE + 4) > netif_wake_queue(dev); > @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw > sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); > } > > +static void sky2_tx_clean(struct sky2_port *sky2) > +{ > + u16 idx; > + > + for (idx = 0; idx< sky2->tx_ring_size; idx++) { > + struct tx_ring_info *re = sky2->tx_ring + idx; > + > + sky2_tx_unmap(sky2->hw->pdev, re); > + if (re->skb) { > + dev_kfree_skb_any(re->skb); > + re->skb = NULL; > + } > + } > +} > + > /* Network shutdown */ > static int sky2_down(struct net_device *dev) > { > @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device * > sky2_tx_reset(hw, port); > > /* Free any pending frames stuck in HW queue */ > - sky2_tx_complete(sky2, sky2->tx_prod); > - > + sky2_tx_clean(sky2); > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > @@ -2411,15 +2431,6 @@ error: > goto resubmit; > } > > -/* Transmit complete */ > -static inline void sky2_tx_done(struct net_device *dev, u16 last) > -{ > - struct sky2_port *sky2 = netdev_priv(dev); > - > - if (netif_running(dev)) > - sky2_tx_complete(sky2, last); > -} > - > static inline void sky2_skb_rx(const struct sky2_port *sky2, > u32 status, struct sk_buff *skb) > { > @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi > > /* Dump contents of tx ring */ > sop = 1; > - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > idx = RING_NEXT(idx, sky2->tx_ring_size)) { > const struct sky2_tx_le *le = sky2->tx_le + idx; > u32 a = le32_to_cpu(le->addr); > --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 > +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 > @@ -2187,7 +2187,6 @@ struct sky2_port { > u16 tx_ring_size; > u16 tx_cons; /* next le to check */ > u16 tx_prod; /* next le to use */ > - u16 tx_next; /* debug only */ > > u16 tx_pending; > u16 tx_last_mss; > Testing observation: This makes no sense to me, but the DHCP multiple request/inform issue I noted with V1 of this patch is gone with V2 (this is a repeatable test). I don't see how this patch could make the difference as the device should never be down during this test. What am I missing? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning 2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger 2010-01-12 16:32 ` Michael Breuer 2010-01-12 18:04 ` Jarek Poplawski @ 2010-01-12 18:35 ` Michael Breuer 2010-01-12 18:42 ` Michael Breuer 2 siblings, 1 reply; 59+ messages in thread From: Michael Breuer @ 2010-01-12 18:35 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev On 1/12/2010 11:15 AM, Stephen Hemminger wrote: > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element > > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com> > > --- > Please apply this instead of the various bits and pieces flying > around labeled as sky2 panic under load > > > --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 > +++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800 > @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + /* Makes sure update of tx_prod from start_xmit and > + tx_cons from tx_done are seen. */ > + smp_mb(); > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s > return count; > } > > -static void sky2_tx_unmap(struct pci_dev *pdev, > - const struct tx_ring_info *re) > +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) > { > if (re->flags& TX_MAP_SINGLE) > pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), > @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev > pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), > pci_unmap_len(re, maplen), > PCI_DMA_TODEVICE); > + re->flags = 0; > } > > /* > @@ -1804,7 +1807,8 @@ mapping_error: > } > > /* > - * Free ring elements from starting at tx_cons until "done" > + * Transmit complete processing > + * Free ring elements from starting at tx_cons until done index > * > * NB: > * 1. The hardware will tell us about partial completion of multi-part > @@ -1813,9 +1817,9 @@ mapping_error: > * looks at the tail of the queue of FIFO (tx_cons), not > * the head (tx_prod) > */ > -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) > +static void sky2_tx_done(struct net_device *dev, u16 done) > { > - struct net_device *dev = sky2->netdev; > + struct sky2_port *sky2 = netdev_priv(dev); > unsigned idx; > > BUG_ON(done>= sky2->tx_ring_size); > @@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2 > sky2_tx_unmap(sky2->hw->pdev, re); > > if (skb) { > + re->skb = NULL; > + > if (unlikely(netif_msg_tx_done(sky2))) > printk(KERN_DEBUG "%s: tx done %u\n", > dev->name, idx); > @@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2 > dev->stats.tx_bytes += skb->len; > > dev_kfree_skb_any(skb); > - > - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); > } > } > > sky2->tx_cons = idx; > - smp_mb(); > > if (tx_avail(sky2)> MAX_SKB_TX_LE + 4) > netif_wake_queue(dev); > @@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw > sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); > } > > +static void sky2_tx_clean(struct sky2_port *sky2) > +{ > + u16 idx; > + > + for (idx = 0; idx< sky2->tx_ring_size; idx++) { > + struct tx_ring_info *re = sky2->tx_ring + idx; > + > + sky2_tx_unmap(sky2->hw->pdev, re); > + if (re->skb) { > + dev_kfree_skb_any(re->skb); > + re->skb = NULL; > + } > + } > +} > + > /* Network shutdown */ > static int sky2_down(struct net_device *dev) > { > @@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device * > sky2_tx_reset(hw, port); > > /* Free any pending frames stuck in HW queue */ > - sky2_tx_complete(sky2, sky2->tx_prod); > - > + sky2_tx_clean(sky2); > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > @@ -2411,15 +2428,6 @@ error: > goto resubmit; > } > > -/* Transmit complete */ > -static inline void sky2_tx_done(struct net_device *dev, u16 last) > -{ > - struct sky2_port *sky2 = netdev_priv(dev); > - > - if (netif_running(dev)) > - sky2_tx_complete(sky2, last); > -} > - > static inline void sky2_skb_rx(const struct sky2_port *sky2, > u32 status, struct sk_buff *skb) > { > @@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi > > /* Dump contents of tx ring */ > sop = 1; > - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > idx = RING_NEXT(idx, sky2->tx_ring_size)) { > const struct sky2_tx_le *le = sky2->tx_le + idx; > u32 a = le32_to_cpu(le->addr); > --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 > +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 > @@ -2187,7 +2187,6 @@ struct sky2_port { > u16 tx_ring_size; > u16 tx_cons; /* next le to check */ > u16 tx_prod; /* next le to use */ > - u16 tx_next; /* debug only */ > > u16 tx_pending; > u16 tx_last_mss; > Test observations: 1. DHCP request/response sequence having some issues... can't confirm that it's a result of this patch, but I don't see this with the prior patch. Prior to this patch, if I connect a new device (Blackberry in this case) I see DHCPDISCOVER;DHCPOFFER;DHCPREQUEST;DHCPACK (just the four messages). With this patch I'm seeing repeated attempts - i.e., DISCOVER/OFFER 6 times before the REQUEST/ACK. This is repeatable and happening whether or not under load. As my original problem started with DHCP packets, this seems interesting. I don't see any errors logged (dmesg, messages, etc.). 2. Probably not related to this patch, but perhaps to the driver - I've finally tracked the source of my dropped RX packets. It's happening when a sending data to a CIFS client on a MacOS system. The RX drop rate seems proportional to the TX rate for SMB to that client - at a tx rate of about 200Kb/s I see about 20 dropped RX packets/sec - at 400 I see about 40. I'm thinking therefore it's ACKs being dropped on RX. Why? no idea (yet). Nothing reported by ethtool or netstat -s remotely correlates to the number of dropped RX packets. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning 2010-01-12 18:35 ` [PATCH] sky2: safer transmit ring cleaning Michael Breuer @ 2010-01-12 18:42 ` Michael Breuer 2010-01-12 20:31 ` Michael Breuer 0 siblings, 1 reply; 59+ messages in thread From: Michael Breuer @ 2010-01-12 18:42 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev On 1/12/2010 1:35 PM, Michael Breuer wrote: > On 1/12/2010 11:15 AM, Stephen Hemminger wrote: >> This code makes transmit path and transmit reset safer by: >> * adding memory barrier before checking available ring slots >> * reseting state of tx ring elements after free >> * seperate cleanup function from ring done function >> * removing mostly unused tx_next element >> >> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com> >> >> --- >> Please apply this instead of the various bits and pieces flying >> around labeled as sky2 panic under load >> >> >> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 >> +++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800 >> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct >> /* Number of list elements available for next tx */ >> static inline int tx_avail(const struct sky2_port *sky2) >> { >> + /* Makes sure update of tx_prod from start_xmit and >> + tx_cons from tx_done are seen. */ >> + smp_mb(); >> return sky2->tx_pending - tx_inuse(sky2); >> } >> >> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s >> return count; >> } >> >> -static void sky2_tx_unmap(struct pci_dev *pdev, >> - const struct tx_ring_info *re) >> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info >> *re) >> { >> if (re->flags& TX_MAP_SINGLE) >> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), >> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev >> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), >> pci_unmap_len(re, maplen), >> PCI_DMA_TODEVICE); >> + re->flags = 0; >> } >> >> /* >> @@ -1804,7 +1807,8 @@ mapping_error: >> } >> >> /* >> - * Free ring elements from starting at tx_cons until "done" >> + * Transmit complete processing >> + * Free ring elements from starting at tx_cons until done index >> * >> * NB: >> * 1. The hardware will tell us about partial completion of >> multi-part >> @@ -1813,9 +1817,9 @@ mapping_error: >> * looks at the tail of the queue of FIFO (tx_cons), not >> * the head (tx_prod) >> */ >> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) >> +static void sky2_tx_done(struct net_device *dev, u16 done) >> { >> - struct net_device *dev = sky2->netdev; >> + struct sky2_port *sky2 = netdev_priv(dev); >> unsigned idx; >> >> BUG_ON(done>= sky2->tx_ring_size); >> @@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2 >> sky2_tx_unmap(sky2->hw->pdev, re); >> >> if (skb) { >> + re->skb = NULL; >> + >> if (unlikely(netif_msg_tx_done(sky2))) >> printk(KERN_DEBUG "%s: tx done %u\n", >> dev->name, idx); >> @@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2 >> dev->stats.tx_bytes += skb->len; >> >> dev_kfree_skb_any(skb); >> - >> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); >> } >> } >> >> sky2->tx_cons = idx; >> - smp_mb(); >> >> if (tx_avail(sky2)> MAX_SKB_TX_LE + 4) >> netif_wake_queue(dev); >> @@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw >> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); >> } >> >> +static void sky2_tx_clean(struct sky2_port *sky2) >> +{ >> + u16 idx; >> + >> + for (idx = 0; idx< sky2->tx_ring_size; idx++) { >> + struct tx_ring_info *re = sky2->tx_ring + idx; >> + >> + sky2_tx_unmap(sky2->hw->pdev, re); >> + if (re->skb) { >> + dev_kfree_skb_any(re->skb); >> + re->skb = NULL; >> + } >> + } >> +} >> + >> /* Network shutdown */ >> static int sky2_down(struct net_device *dev) >> { >> @@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device * >> sky2_tx_reset(hw, port); >> >> /* Free any pending frames stuck in HW queue */ >> - sky2_tx_complete(sky2, sky2->tx_prod); >> - >> + sky2_tx_clean(sky2); >> sky2_rx_clean(sky2); >> >> sky2_free_buffers(sky2); >> @@ -2411,15 +2428,6 @@ error: >> goto resubmit; >> } >> >> -/* Transmit complete */ >> -static inline void sky2_tx_done(struct net_device *dev, u16 last) >> -{ >> - struct sky2_port *sky2 = netdev_priv(dev); >> - >> - if (netif_running(dev)) >> - sky2_tx_complete(sky2, last); >> -} >> - >> static inline void sky2_skb_rx(const struct sky2_port *sky2, >> u32 status, struct sk_buff *skb) >> { >> @@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi >> >> /* Dump contents of tx ring */ >> sop = 1; >> - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< >> sky2->tx_ring_size; >> + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< >> sky2->tx_ring_size; >> idx = RING_NEXT(idx, sky2->tx_ring_size)) { >> const struct sky2_tx_le *le = sky2->tx_le + idx; >> u32 a = le32_to_cpu(le->addr); >> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 >> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 >> @@ -2187,7 +2187,6 @@ struct sky2_port { >> u16 tx_ring_size; >> u16 tx_cons; /* next le to check */ >> u16 tx_prod; /* next le to use */ >> - u16 tx_next; /* debug only */ >> >> u16 tx_pending; >> u16 tx_last_mss; > Test observations: > > 1. DHCP request/response sequence having some issues... can't confirm > that it's a result of this patch, but I don't see this with the prior > patch. Prior to this patch, if I connect a new device (Blackberry in > this case) I see DHCPDISCOVER;DHCPOFFER;DHCPREQUEST;DHCPACK (just the > four messages). With this patch I'm seeing repeated attempts - i.e., > DISCOVER/OFFER 6 times before the REQUEST/ACK. This is repeatable and > happening whether or not under load. As my original problem started > with DHCP packets, this seems interesting. I don't see any errors > logged (dmesg, messages, etc.). > 2. Probably not related to this patch, but perhaps to the driver - > I've finally tracked the source of my dropped RX packets. It's > happening when a sending data to a CIFS client on a MacOS system. The > RX drop rate seems proportional to the TX rate for SMB to that client > - at a tx rate of about 200Kb/s I see about 20 dropped RX packets/sec > - at 400 I see about 40. I'm thinking therefore it's ACKs being > dropped on RX. Why? no idea (yet). Nothing reported by ethtool or > netstat -s remotely correlates to the number of dropped RX packets. > > Let me add: the CIFS client from which the packets are dropped is connected via a dd-wrt router (on wifi) connected to the sky2 1G port. A Windows client connected directly to the 1G port does not exhibit the same symptoms (. I'll try later the Mac directly connected & a Wintel box over wifi if possible. The DD-WRT router (linksys wrt54g-tm) is bridged (wifi clients on same subnet as wired & serviced by DHCPD running on the linux box). This is also the source of the aforementioned perhaps-flaky DHCP connections. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning 2010-01-12 18:42 ` Michael Breuer @ 2010-01-12 20:31 ` Michael Breuer 2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger 0 siblings, 1 reply; 59+ messages in thread From: Michael Breuer @ 2010-01-12 20:31 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev On 1/12/2010 1:42 PM, Michael Breuer wrote: > On 1/12/2010 1:35 PM, Michael Breuer wrote: >> On 1/12/2010 11:15 AM, Stephen Hemminger wrote: >>> This code makes transmit path and transmit reset safer by: >>> * adding memory barrier before checking available ring slots >>> * reseting state of tx ring elements after free >>> * seperate cleanup function from ring done function >>> * removing mostly unused tx_next element >>> >>> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com> >>> >>> --- >>> Please apply this instead of the various bits and pieces flying >>> around labeled as sky2 panic under load >>> >>> >>> --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 >>> +++ b/drivers/net/sky2.c 2010-01-11 17:36:22.027429875 -0800 >>> @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct >>> /* Number of list elements available for next tx */ >>> static inline int tx_avail(const struct sky2_port *sky2) >>> { >>> + /* Makes sure update of tx_prod from start_xmit and >>> + tx_cons from tx_done are seen. */ >>> + smp_mb(); >>> return sky2->tx_pending - tx_inuse(sky2); >>> } >>> >>> @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s >>> return count; >>> } >>> >>> -static void sky2_tx_unmap(struct pci_dev *pdev, >>> - const struct tx_ring_info *re) >>> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info >>> *re) >>> { >>> if (re->flags& TX_MAP_SINGLE) >>> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), >>> @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev >>> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), >>> pci_unmap_len(re, maplen), >>> PCI_DMA_TODEVICE); >>> + re->flags = 0; >>> } >>> >>> /* >>> @@ -1804,7 +1807,8 @@ mapping_error: >>> } >>> >>> /* >>> - * Free ring elements from starting at tx_cons until "done" >>> + * Transmit complete processing >>> + * Free ring elements from starting at tx_cons until done index >>> * >>> * NB: >>> * 1. The hardware will tell us about partial completion of >>> multi-part >>> @@ -1813,9 +1817,9 @@ mapping_error: >>> * looks at the tail of the queue of FIFO (tx_cons), not >>> * the head (tx_prod) >>> */ >>> -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) >>> +static void sky2_tx_done(struct net_device *dev, u16 done) >>> { >>> - struct net_device *dev = sky2->netdev; >>> + struct sky2_port *sky2 = netdev_priv(dev); >>> unsigned idx; >>> >>> BUG_ON(done>= sky2->tx_ring_size); >>> @@ -1828,6 +1832,8 @@ static void sky2_tx_complete(struct sky2 >>> sky2_tx_unmap(sky2->hw->pdev, re); >>> >>> if (skb) { >>> + re->skb = NULL; >>> + >>> if (unlikely(netif_msg_tx_done(sky2))) >>> printk(KERN_DEBUG "%s: tx done %u\n", >>> dev->name, idx); >>> @@ -1836,13 +1842,10 @@ static void sky2_tx_complete(struct sky2 >>> dev->stats.tx_bytes += skb->len; >>> >>> dev_kfree_skb_any(skb); >>> - >>> - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); >>> } >>> } >>> >>> sky2->tx_cons = idx; >>> - smp_mb(); >>> >>> if (tx_avail(sky2)> MAX_SKB_TX_LE + 4) >>> netif_wake_queue(dev); >>> @@ -1870,6 +1873,21 @@ static void sky2_tx_reset(struct sky2_hw >>> sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); >>> } >>> >>> +static void sky2_tx_clean(struct sky2_port *sky2) >>> +{ >>> + u16 idx; >>> + >>> + for (idx = 0; idx< sky2->tx_ring_size; idx++) { >>> + struct tx_ring_info *re = sky2->tx_ring + idx; >>> + >>> + sky2_tx_unmap(sky2->hw->pdev, re); >>> + if (re->skb) { >>> + dev_kfree_skb_any(re->skb); >>> + re->skb = NULL; >>> + } >>> + } >>> +} >>> + >>> /* Network shutdown */ >>> static int sky2_down(struct net_device *dev) >>> { >>> @@ -1933,8 +1951,7 @@ static int sky2_down(struct net_device * >>> sky2_tx_reset(hw, port); >>> >>> /* Free any pending frames stuck in HW queue */ >>> - sky2_tx_complete(sky2, sky2->tx_prod); >>> - >>> + sky2_tx_clean(sky2); >>> sky2_rx_clean(sky2); >>> >>> sky2_free_buffers(sky2); >>> @@ -2411,15 +2428,6 @@ error: >>> goto resubmit; >>> } >>> >>> -/* Transmit complete */ >>> -static inline void sky2_tx_done(struct net_device *dev, u16 last) >>> -{ >>> - struct sky2_port *sky2 = netdev_priv(dev); >>> - >>> - if (netif_running(dev)) >>> - sky2_tx_complete(sky2, last); >>> -} >>> - >>> static inline void sky2_skb_rx(const struct sky2_port *sky2, >>> u32 status, struct sk_buff *skb) >>> { >>> @@ -4201,7 +4209,7 @@ static int sky2_debug_show(struct seq_fi >>> >>> /* Dump contents of tx ring */ >>> sop = 1; >>> - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< >>> sky2->tx_ring_size; >>> + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< >>> sky2->tx_ring_size; >>> idx = RING_NEXT(idx, sky2->tx_ring_size)) { >>> const struct sky2_tx_le *le = sky2->tx_le + idx; >>> u32 a = le32_to_cpu(le->addr); >>> --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 >>> +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 >>> @@ -2187,7 +2187,6 @@ struct sky2_port { >>> u16 tx_ring_size; >>> u16 tx_cons; /* next le to check */ >>> u16 tx_prod; /* next le to use */ >>> - u16 tx_next; /* debug only */ >>> >>> u16 tx_pending; >>> u16 tx_last_mss; >> Test observations: >> >> 1. DHCP request/response sequence having some issues... can't confirm >> that it's a result of this patch, but I don't see this with the prior >> patch. Prior to this patch, if I connect a new device (Blackberry in >> this case) I see DHCPDISCOVER;DHCPOFFER;DHCPREQUEST;DHCPACK (just the >> four messages). With this patch I'm seeing repeated attempts - i.e., >> DISCOVER/OFFER 6 times before the REQUEST/ACK. This is repeatable >> and happening whether or not under load. As my original problem >> started with DHCP packets, this seems interesting. I don't see any >> errors logged (dmesg, messages, etc.). >> 2. Probably not related to this patch, but perhaps to the driver - >> I've finally tracked the source of my dropped RX packets. It's >> happening when a sending data to a CIFS client on a MacOS system. The >> RX drop rate seems proportional to the TX rate for SMB to that client >> - at a tx rate of about 200Kb/s I see about 20 dropped RX packets/sec >> - at 400 I see about 40. I'm thinking therefore it's ACKs being >> dropped on RX. Why? no idea (yet). Nothing reported by ethtool or >> netstat -s remotely correlates to the number of dropped RX packets. >> >> > Let me add: the CIFS client from which the packets are dropped is > connected via a dd-wrt router (on wifi) connected to the sky2 1G port. > A Windows client connected directly to the 1G port does not exhibit > the same symptoms (. I'll try later the Mac directly connected & a > Wintel box over wifi if possible. The DD-WRT router (linksys > wrt54g-tm) is bridged (wifi clients on same subnet as wired & serviced > by DHCPD running on the linux box). This is also the source of the > aforementioned perhaps-flaky DHCP connections. Ok - a little more on the dropped packets - only happens when connected through the wifi router - but happens using a wired connection as well (via the router's ethernet port). From sniffer traces: I see large frames outbound (even though MTU=1500). I see the fragmented result arriving on the remote side. I see ACK's for each of the fragmented frames outbound from the Mac, but not received on the Linux side. I also don't see any retransmissions or duplicate ACKS on either sniffer trace. I'm wondering whether there is fragmentation offload to the Yukon2, and whether the individual fragment acks are what's showing up dropped. If so, I don't understand why it would only happen with the wifi router vs. switch in the middle. Maybe the router is doing something to the packets which is causing the Yukon to forward the acks up differently? The router MTU is also 1500 on all ports, and does not show dropped packets or errors corresponding to what I'm seeing on the sky2 adapter. ^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH] sky2: safer transmit ring cleaning (v3) 2010-01-12 20:31 ` Michael Breuer @ 2010-01-13 4:10 ` Stephen Hemminger 2010-01-13 4:31 ` Michael Breuer ` (2 more replies) 0 siblings, 3 replies; 59+ messages in thread From: Stephen Hemminger @ 2010-01-13 4:10 UTC (permalink / raw) To: Michael Breuer, David Miller; +Cc: jarkao2, mikem, flyboy, rjw, netdev Subject: sky2: safer transmit cleanup This code makes transmit path and transmit reset safer by: * adding memory barrier before checking available ring slots * reseting state of tx ring elements after free * seperate cleanup function from ring done function * removing mostly unused tx_next element Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> --- This version adds missing _bh to sky2_detach --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 +++ b/drivers/net/sky2.c 2010-01-12 17:21:58.415268802 -0800 @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct /* Number of list elements available for next tx */ static inline int tx_avail(const struct sky2_port *sky2) { + /* Makes sure update of tx_prod from start_xmit and + tx_cons from tx_done are seen. */ + smp_mb(); return sky2->tx_pending - tx_inuse(sky2); } @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s return count; } -static void sky2_tx_unmap(struct pci_dev *pdev, - const struct tx_ring_info *re) +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) { if (re->flags & TX_MAP_SINGLE) pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), pci_unmap_len(re, maplen), PCI_DMA_TODEVICE); + re->flags = 0; } /* @@ -1804,7 +1807,8 @@ mapping_error: } /* - * Free ring elements from starting at tx_cons until "done" + * Transmit complete processing + * Free ring elements from starting at tx_cons until done index * * NB: * 1. The hardware will tell us about partial completion of multi-part @@ -1813,11 +1817,14 @@ mapping_error: * looks at the tail of the queue of FIFO (tx_cons), not * the head (tx_prod) */ -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) +static void sky2_tx_done(struct net_device *dev, u16 done) { - struct net_device *dev = sky2->netdev; + struct sky2_port *sky2 = netdev_priv(dev); unsigned idx; + if (unlikely(!netif_running(dev))) + return; + BUG_ON(done >= sky2->tx_ring_size); for (idx = sky2->tx_cons; idx != done; @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 sky2_tx_unmap(sky2->hw->pdev, re); if (skb) { + re->skb = NULL; + if (unlikely(netif_msg_tx_done(sky2))) printk(KERN_DEBUG "%s: tx done %u\n", dev->name, idx); @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2 dev->stats.tx_bytes += skb->len; dev_kfree_skb_any(skb); - - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); } } sky2->tx_cons = idx; - smp_mb(); if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) netif_wake_queue(dev); @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); } +static void sky2_tx_clean(struct sky2_port *sky2) +{ + u16 idx; + + for (idx = 0; idx < sky2->tx_ring_size; idx++) { + struct tx_ring_info *re = sky2->tx_ring + idx; + + sky2_tx_unmap(sky2->hw->pdev, re); + if (re->skb) { + dev_kfree_skb_any(re->skb); + re->skb = NULL; + } + } +} + /* Network shutdown */ static int sky2_down(struct net_device *dev) { @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device * sky2_tx_reset(hw, port); /* Free any pending frames stuck in HW queue */ - sky2_tx_complete(sky2, sky2->tx_prod); - + sky2_tx_clean(sky2); sky2_rx_clean(sky2); sky2_free_buffers(sky2); @@ -2411,15 +2431,6 @@ error: goto resubmit; } -/* Transmit complete */ -static inline void sky2_tx_done(struct net_device *dev, u16 last) -{ - struct sky2_port *sky2 = netdev_priv(dev); - - if (netif_running(dev)) - sky2_tx_complete(sky2, last); -} - static inline void sky2_skb_rx(const struct sky2_port *sky2, u32 status, struct sk_buff *skb) { @@ -3176,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h static void sky2_detach(struct net_device *dev) { if (netif_running(dev)) { - netif_tx_lock(dev); + netif_tx_lock_bh(dev); netif_device_detach(dev); /* stop txq */ - netif_tx_unlock(dev); + netif_tx_unlock_bh(dev); sky2_down(dev); } } @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi /* Dump contents of tx ring */ sop = 1; - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size; + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size; idx = RING_NEXT(idx, sky2->tx_ring_size)) { const struct sky2_tx_le *le = sky2->tx_le + idx; u32 a = le32_to_cpu(le->addr); --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 @@ -2187,7 +2187,6 @@ struct sky2_port { u16 tx_ring_size; u16 tx_cons; /* next le to check */ u16 tx_prod; /* next le to use */ - u16 tx_next; /* debug only */ u16 tx_pending; u16 tx_last_mss; ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v3) 2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger @ 2010-01-13 4:31 ` Michael Breuer 2010-01-13 7:35 ` Jarek Poplawski 2010-01-13 16:04 ` Michael Breuer 2 siblings, 0 replies; 59+ messages in thread From: Michael Breuer @ 2010-01-13 4:31 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev On 01/12/2010 11:10 PM, Stephen Hemminger wrote: > Subject: sky2: safer transmit cleanup > > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element > > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com> > > --- > This version adds missing _bh to sky2_detach > > > --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 > +++ b/drivers/net/sky2.c 2010-01-12 17:21:58.415268802 -0800 > @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + /* Makes sure update of tx_prod from start_xmit and > + tx_cons from tx_done are seen. */ > + smp_mb(); > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s > return count; > } > > -static void sky2_tx_unmap(struct pci_dev *pdev, > - const struct tx_ring_info *re) > +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) > { > if (re->flags& TX_MAP_SINGLE) > pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), > @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev > pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), > pci_unmap_len(re, maplen), > PCI_DMA_TODEVICE); > + re->flags = 0; > } > > /* > @@ -1804,7 +1807,8 @@ mapping_error: > } > > /* > - * Free ring elements from starting at tx_cons until "done" > + * Transmit complete processing > + * Free ring elements from starting at tx_cons until done index > * > * NB: > * 1. The hardware will tell us about partial completion of multi-part > @@ -1813,11 +1817,14 @@ mapping_error: > * looks at the tail of the queue of FIFO (tx_cons), not > * the head (tx_prod) > */ > -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) > +static void sky2_tx_done(struct net_device *dev, u16 done) > { > - struct net_device *dev = sky2->netdev; > + struct sky2_port *sky2 = netdev_priv(dev); > unsigned idx; > > + if (unlikely(!netif_running(dev))) > + return; > + > BUG_ON(done>= sky2->tx_ring_size); > > for (idx = sky2->tx_cons; idx != done; > @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 > sky2_tx_unmap(sky2->hw->pdev, re); > > if (skb) { > + re->skb = NULL; > + > if (unlikely(netif_msg_tx_done(sky2))) > printk(KERN_DEBUG "%s: tx done %u\n", > dev->name, idx); > @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2 > dev->stats.tx_bytes += skb->len; > > dev_kfree_skb_any(skb); > - > - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); > } > } > > sky2->tx_cons = idx; > - smp_mb(); > > if (tx_avail(sky2)> MAX_SKB_TX_LE + 4) > netif_wake_queue(dev); > @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw > sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); > } > > +static void sky2_tx_clean(struct sky2_port *sky2) > +{ > + u16 idx; > + > + for (idx = 0; idx< sky2->tx_ring_size; idx++) { > + struct tx_ring_info *re = sky2->tx_ring + idx; > + > + sky2_tx_unmap(sky2->hw->pdev, re); > + if (re->skb) { > + dev_kfree_skb_any(re->skb); > + re->skb = NULL; > + } > + } > +} > + > /* Network shutdown */ > static int sky2_down(struct net_device *dev) > { > @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device * > sky2_tx_reset(hw, port); > > /* Free any pending frames stuck in HW queue */ > - sky2_tx_complete(sky2, sky2->tx_prod); > - > + sky2_tx_clean(sky2); > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > @@ -2411,15 +2431,6 @@ error: > goto resubmit; > } > > -/* Transmit complete */ > -static inline void sky2_tx_done(struct net_device *dev, u16 last) > -{ > - struct sky2_port *sky2 = netdev_priv(dev); > - > - if (netif_running(dev)) > - sky2_tx_complete(sky2, last); > -} > - > static inline void sky2_skb_rx(const struct sky2_port *sky2, > u32 status, struct sk_buff *skb) > { > @@ -3176,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h > static void sky2_detach(struct net_device *dev) > { > if (netif_running(dev)) { > - netif_tx_lock(dev); > + netif_tx_lock_bh(dev); > netif_device_detach(dev); /* stop txq */ > - netif_tx_unlock(dev); > + netif_tx_unlock_bh(dev); > sky2_down(dev); > } > } > @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi > > /* Dump contents of tx ring */ > sop = 1; > - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > idx = RING_NEXT(idx, sky2->tx_ring_size)) { > const struct sky2_tx_le *le = sky2->tx_le + idx; > u32 a = le32_to_cpu(le->addr); > --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 > +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 > @@ -2187,7 +2187,6 @@ struct sky2_port { > u16 tx_ring_size; > u16 tx_cons; /* next le to check */ > u16 tx_prod; /* next le to use */ > - u16 tx_next; /* debug only */ > > u16 tx_pending; > u16 tx_last_mss; > Will test later - just an FYI - hunk 11 depends on an earlier patch which added the netif_tx_lock(dev). I don't think that was committed. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v3) 2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger 2010-01-13 4:31 ` Michael Breuer @ 2010-01-13 7:35 ` Jarek Poplawski 2010-01-13 16:04 ` Michael Breuer 2 siblings, 0 replies; 59+ messages in thread From: Jarek Poplawski @ 2010-01-13 7:35 UTC (permalink / raw) To: Stephen Hemminger Cc: Michael Breuer, David Miller, mikem, flyboy, rjw, netdev On Tue, Jan 12, 2010 at 08:10:12PM -0800, Stephen Hemminger wrote: > Subject: sky2: safer transmit cleanup > > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element I agree this patch makes it safer, but still not really safe, at least compared with the "sky2: Fix oops in sky2_xmit_frame() after TX timeout" patch. Btw, I'm not sure adding netif_running() test to protect dev_close() is really needed, because it's after dev_deactivate(), so re-enabled tx doesn't probably matter. > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > --- > This version adds missing _bh to sky2_detach As I wrote before, I doubt adding tx lock alone at this place is the proper fix for anything (the rx (napi) race is the real problem), but without this _bh it's a bug. Jarek P. > > > --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 > +++ b/drivers/net/sky2.c 2010-01-12 17:21:58.415268802 -0800 > @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + /* Makes sure update of tx_prod from start_xmit and > + tx_cons from tx_done are seen. */ > + smp_mb(); > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s > return count; > } > > -static void sky2_tx_unmap(struct pci_dev *pdev, > - const struct tx_ring_info *re) > +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) > { > if (re->flags & TX_MAP_SINGLE) > pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), > @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev > pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), > pci_unmap_len(re, maplen), > PCI_DMA_TODEVICE); > + re->flags = 0; > } > > /* > @@ -1804,7 +1807,8 @@ mapping_error: > } > > /* > - * Free ring elements from starting at tx_cons until "done" > + * Transmit complete processing > + * Free ring elements from starting at tx_cons until done index > * > * NB: > * 1. The hardware will tell us about partial completion of multi-part > @@ -1813,11 +1817,14 @@ mapping_error: > * looks at the tail of the queue of FIFO (tx_cons), not > * the head (tx_prod) > */ > -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) > +static void sky2_tx_done(struct net_device *dev, u16 done) > { > - struct net_device *dev = sky2->netdev; > + struct sky2_port *sky2 = netdev_priv(dev); > unsigned idx; > > + if (unlikely(!netif_running(dev))) > + return; > + > BUG_ON(done >= sky2->tx_ring_size); > > for (idx = sky2->tx_cons; idx != done; > @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 > sky2_tx_unmap(sky2->hw->pdev, re); > > if (skb) { > + re->skb = NULL; > + > if (unlikely(netif_msg_tx_done(sky2))) > printk(KERN_DEBUG "%s: tx done %u\n", > dev->name, idx); > @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2 > dev->stats.tx_bytes += skb->len; > > dev_kfree_skb_any(skb); > - > - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); > } > } > > sky2->tx_cons = idx; > - smp_mb(); > > if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > netif_wake_queue(dev); > @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw > sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); > } > > +static void sky2_tx_clean(struct sky2_port *sky2) > +{ > + u16 idx; > + > + for (idx = 0; idx < sky2->tx_ring_size; idx++) { > + struct tx_ring_info *re = sky2->tx_ring + idx; > + > + sky2_tx_unmap(sky2->hw->pdev, re); > + if (re->skb) { > + dev_kfree_skb_any(re->skb); > + re->skb = NULL; > + } > + } > +} > + > /* Network shutdown */ > static int sky2_down(struct net_device *dev) > { > @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device * > sky2_tx_reset(hw, port); > > /* Free any pending frames stuck in HW queue */ > - sky2_tx_complete(sky2, sky2->tx_prod); > - > + sky2_tx_clean(sky2); > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > @@ -2411,15 +2431,6 @@ error: > goto resubmit; > } > > -/* Transmit complete */ > -static inline void sky2_tx_done(struct net_device *dev, u16 last) > -{ > - struct sky2_port *sky2 = netdev_priv(dev); > - > - if (netif_running(dev)) > - sky2_tx_complete(sky2, last); > -} > - > static inline void sky2_skb_rx(const struct sky2_port *sky2, > u32 status, struct sk_buff *skb) > { > @@ -3176,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h > static void sky2_detach(struct net_device *dev) > { > if (netif_running(dev)) { > - netif_tx_lock(dev); > + netif_tx_lock_bh(dev); > netif_device_detach(dev); /* stop txq */ > - netif_tx_unlock(dev); > + netif_tx_unlock_bh(dev); > sky2_down(dev); > } > } > @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi > > /* Dump contents of tx ring */ > sop = 1; > - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size; > + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size; > idx = RING_NEXT(idx, sky2->tx_ring_size)) { > const struct sky2_tx_le *le = sky2->tx_le + idx; > u32 a = le32_to_cpu(le->addr); > --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 > +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 > @@ -2187,7 +2187,6 @@ struct sky2_port { > u16 tx_ring_size; > u16 tx_cons; /* next le to check */ > u16 tx_prod; /* next le to use */ > - u16 tx_next; /* debug only */ > > u16 tx_pending; > u16 tx_last_mss; ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v3) 2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger 2010-01-13 4:31 ` Michael Breuer 2010-01-13 7:35 ` Jarek Poplawski @ 2010-01-13 16:04 ` Michael Breuer 2010-01-14 3:41 ` [PATCH] sky2: safer transmit ring cleaning (v4) Stephen Hemminger 2 siblings, 1 reply; 59+ messages in thread From: Michael Breuer @ 2010-01-13 16:04 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev On 1/12/2010 11:10 PM, Stephen Hemminger wrote: > Subject: sky2: safer transmit cleanup > > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element > > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com> > > --- > This version adds missing _bh to sky2_detach > > > --- a/drivers/net/sky2.c 2010-01-11 10:49:50.907113126 -0800 > +++ b/drivers/net/sky2.c 2010-01-12 17:21:58.415268802 -0800 > @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + /* Makes sure update of tx_prod from start_xmit and > + tx_cons from tx_done are seen. */ > + smp_mb(); > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s > return count; > } > > -static void sky2_tx_unmap(struct pci_dev *pdev, > - const struct tx_ring_info *re) > +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) > { > if (re->flags& TX_MAP_SINGLE) > pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), > @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev > pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), > pci_unmap_len(re, maplen), > PCI_DMA_TODEVICE); > + re->flags = 0; > } > > /* > @@ -1804,7 +1807,8 @@ mapping_error: > } > > /* > - * Free ring elements from starting at tx_cons until "done" > + * Transmit complete processing > + * Free ring elements from starting at tx_cons until done index > * > * NB: > * 1. The hardware will tell us about partial completion of multi-part > @@ -1813,11 +1817,14 @@ mapping_error: > * looks at the tail of the queue of FIFO (tx_cons), not > * the head (tx_prod) > */ > -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) > +static void sky2_tx_done(struct net_device *dev, u16 done) > { > - struct net_device *dev = sky2->netdev; > + struct sky2_port *sky2 = netdev_priv(dev); > unsigned idx; > > + if (unlikely(!netif_running(dev))) > + return; > + > BUG_ON(done>= sky2->tx_ring_size); > > for (idx = sky2->tx_cons; idx != done; > @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 > sky2_tx_unmap(sky2->hw->pdev, re); > > if (skb) { > + re->skb = NULL; > + > if (unlikely(netif_msg_tx_done(sky2))) > printk(KERN_DEBUG "%s: tx done %u\n", > dev->name, idx); > @@ -1836,13 +1845,10 @@ static void sky2_tx_complete(struct sky2 > dev->stats.tx_bytes += skb->len; > > dev_kfree_skb_any(skb); > - > - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); > } > } > > sky2->tx_cons = idx; > - smp_mb(); > > if (tx_avail(sky2)> MAX_SKB_TX_LE + 4) > netif_wake_queue(dev); > @@ -1870,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw > sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); > } > > +static void sky2_tx_clean(struct sky2_port *sky2) > +{ > + u16 idx; > + > + for (idx = 0; idx< sky2->tx_ring_size; idx++) { > + struct tx_ring_info *re = sky2->tx_ring + idx; > + > + sky2_tx_unmap(sky2->hw->pdev, re); > + if (re->skb) { > + dev_kfree_skb_any(re->skb); > + re->skb = NULL; > + } > + } > +} > + > /* Network shutdown */ > static int sky2_down(struct net_device *dev) > { > @@ -1933,8 +1954,7 @@ static int sky2_down(struct net_device * > sky2_tx_reset(hw, port); > > /* Free any pending frames stuck in HW queue */ > - sky2_tx_complete(sky2, sky2->tx_prod); > - > + sky2_tx_clean(sky2); > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > @@ -2411,15 +2431,6 @@ error: > goto resubmit; > } > > -/* Transmit complete */ > -static inline void sky2_tx_done(struct net_device *dev, u16 last) > -{ > - struct sky2_port *sky2 = netdev_priv(dev); > - > - if (netif_running(dev)) > - sky2_tx_complete(sky2, last); > -} > - > static inline void sky2_skb_rx(const struct sky2_port *sky2, > u32 status, struct sk_buff *skb) > { > @@ -3176,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h > static void sky2_detach(struct net_device *dev) > { > if (netif_running(dev)) { > - netif_tx_lock(dev); > + netif_tx_lock_bh(dev); > netif_device_detach(dev); /* stop txq */ > - netif_tx_unlock(dev); > + netif_tx_unlock_bh(dev); > sky2_down(dev); > } > } > @@ -4201,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi > > /* Dump contents of tx ring */ > sop = 1; > - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > idx = RING_NEXT(idx, sky2->tx_ring_size)) { > const struct sky2_tx_le *le = sky2->tx_le + idx; > u32 a = le32_to_cpu(le->addr); > --- a/drivers/net/sky2.h 2010-01-11 17:29:22.817088617 -0800 > +++ b/drivers/net/sky2.h 2010-01-11 17:29:28.197120484 -0800 > @@ -2187,7 +2187,6 @@ struct sky2_port { > u16 tx_ring_size; > u16 tx_cons; /* next le to check */ > u16 tx_prod; /* next le to use */ > - u16 tx_next; /* debug only */ > > u16 tx_pending; > u16 tx_last_mss; > Not sure why, but with this version my system is running hot (literally - higher MB temp & fan speed). This is happening at low throughput. CPU utilization is low - no apparent change from prior versions. The only indication of something amiss is seen using powertop. With older versions, I never noticed (except under load) sky2 interrupt as the main source of system activity. With this version, I see: Top causes for wakeups: 22.9% (525.4) <kernel IPI> : Rescheduling interrupts 20.9% (480.0) <interrupt> : sky2@pci:0000:04:00.0 19.2% (439.2) <interrupt> : extra timer interrupt 18.9% (432.8) <interrupt> : sky2@pci:0000:06:00.0 10.4% (237.4) <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 1.7% ( 38.8) <kernel core> : hrtimer_start (tick_sched_timer) This is pretty consistent, btw regardless of what's going on. 4:00 is the external (100M) interface. Total network activity while this is going on is about 70KB/sec - mostly internal. I'm using msi interrupt (or think I am, anyway). Also - I'm seeing what appears to be increased packet latency (not surprising) and slightly decreased throughput. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-13 16:04 ` Michael Breuer @ 2010-01-14 3:41 ` Stephen Hemminger 2010-01-14 10:14 ` Jarek Poplawski 2010-01-14 15:46 ` [PATCH] sky2: safer transmit ring cleaning (v4) Michael Breuer 0 siblings, 2 replies; 59+ messages in thread From: Stephen Hemminger @ 2010-01-14 3:41 UTC (permalink / raw) To: Michael Breuer, David Miller; +Cc: jarkao2, mikem, flyboy, rjw, netdev Subject: sky2: safer transmit cleanup This code makes transmit path and transmit reset safer by: * adding memory barrier before checking available ring slots * reseting state of tx ring elements after free * seperate cleanup function from ring done function * removing mostly unused tx_next element * ignoring transmit completion if device is offline Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> --- This patch is against the current net-next-2.6 tree. This version handles the case of dual port shared transmit status and other cases where it is possible for tx_done to be called when device is being changed. --- a/drivers/net/sky2.c 2010-01-13 08:32:51.360161158 -0800 +++ b/drivers/net/sky2.c 2010-01-13 08:35:37.685531490 -0800 @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct /* Number of list elements available for next tx */ static inline int tx_avail(const struct sky2_port *sky2) { + /* Makes sure update of tx_prod from start_xmit and + tx_cons from tx_done are seen. */ + smp_mb(); return sky2->tx_pending - tx_inuse(sky2); } @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s return count; } -static void sky2_tx_unmap(struct pci_dev *pdev, - const struct tx_ring_info *re) +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) { if (re->flags & TX_MAP_SINGLE) pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), pci_unmap_len(re, maplen), PCI_DMA_TODEVICE); + re->flags = 0; } /* @@ -1804,7 +1807,8 @@ mapping_error: } /* - * Free ring elements from starting at tx_cons until "done" + * Transmit complete processing + * Free ring elements from starting at tx_cons until done index * * NB: * 1. The hardware will tell us about partial completion of multi-part @@ -1813,11 +1817,14 @@ mapping_error: * looks at the tail of the queue of FIFO (tx_cons), not * the head (tx_prod) */ -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) +static void sky2_tx_done(struct net_device *dev, u16 done) { - struct net_device *dev = sky2->netdev; + struct sky2_port *sky2 = netdev_priv(dev); unsigned idx; + if (!(netif_running(dev) & netif_device_present(dev))) + return; + BUG_ON(done >= sky2->tx_ring_size); for (idx = sky2->tx_cons; idx != done; @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 sky2_tx_unmap(sky2->hw->pdev, re); if (skb) { + re->skb = NULL; + if (unlikely(netif_msg_tx_done(sky2))) printk(KERN_DEBUG "%s: tx done %u\n", dev->name, idx); @@ -1836,16 +1845,12 @@ static void sky2_tx_complete(struct sky2 dev->stats.tx_bytes += skb->len; dev_kfree_skb_any(skb); - - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); } } sky2->tx_cons = idx; - smp_mb(); - /* Wake unless it's detached, and called e.g. from sky2_down() */ - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev)) + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) netif_wake_queue(dev); } @@ -1871,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); } +static void sky2_tx_clean(struct sky2_port *sky2) +{ + u16 idx; + + for (idx = 0; idx < sky2->tx_ring_size; idx++) { + struct tx_ring_info *re = sky2->tx_ring + idx; + + sky2_tx_unmap(sky2->hw->pdev, re); + if (re->skb) { + dev_kfree_skb_any(re->skb); + re->skb = NULL; + } + } +} + /* Network shutdown */ static int sky2_down(struct net_device *dev) { @@ -1934,8 +1954,7 @@ static int sky2_down(struct net_device * sky2_tx_reset(hw, port); /* Free any pending frames stuck in HW queue */ - sky2_tx_complete(sky2, sky2->tx_prod); - + sky2_tx_clean(sky2); sky2_rx_clean(sky2); sky2_free_buffers(sky2); @@ -2412,15 +2431,6 @@ error: goto resubmit; } -/* Transmit complete */ -static inline void sky2_tx_done(struct net_device *dev, u16 last) -{ - struct sky2_port *sky2 = netdev_priv(dev); - - if (netif_running(dev)) - sky2_tx_complete(sky2, last); -} - static inline void sky2_skb_rx(const struct sky2_port *sky2, u32 status, struct sk_buff *skb) { @@ -3177,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h static void sky2_detach(struct net_device *dev) { if (netif_running(dev)) { - netif_tx_lock(dev); + netif_tx_lock_bh(dev); netif_device_detach(dev); /* stop txq */ - netif_tx_unlock(dev); + netif_tx_unlock_bh(dev); sky2_down(dev); } } @@ -4202,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi /* Dump contents of tx ring */ sop = 1; - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size; + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size; idx = RING_NEXT(idx, sky2->tx_ring_size)) { const struct sky2_tx_le *le = sky2->tx_le + idx; u32 a = le32_to_cpu(le->addr); --- a/drivers/net/sky2.h 2010-01-13 08:32:27.919849429 -0800 +++ b/drivers/net/sky2.h 2010-01-13 08:33:03.410162026 -0800 @@ -2187,7 +2187,6 @@ struct sky2_port { u16 tx_ring_size; u16 tx_cons; /* next le to check */ u16 tx_prod; /* next le to use */ - u16 tx_next; /* debug only */ u16 tx_pending; u16 tx_last_mss; ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 3:41 ` [PATCH] sky2: safer transmit ring cleaning (v4) Stephen Hemminger @ 2010-01-14 10:14 ` Jarek Poplawski 2010-01-14 11:16 ` Jarek Poplawski 2010-01-14 17:52 ` Stephen Hemminger 2010-01-14 15:46 ` [PATCH] sky2: safer transmit ring cleaning (v4) Michael Breuer 1 sibling, 2 replies; 59+ messages in thread From: Jarek Poplawski @ 2010-01-14 10:14 UTC (permalink / raw) To: Stephen Hemminger Cc: Michael Breuer, David Miller, mikem, flyboy, rjw, netdev On Wed, Jan 13, 2010 at 07:41:48PM -0800, Stephen Hemminger wrote: > Subject: sky2: safer transmit cleanup > > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element > * ignoring transmit completion if device is offline > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > --- > This patch is against the current net-next-2.6 tree. > This version handles the case of dual port shared transmit status > and other cases where it is possible for tx_done to be called when > device is being changed. > > --- a/drivers/net/sky2.c 2010-01-13 08:32:51.360161158 -0800 > +++ b/drivers/net/sky2.c 2010-01-13 08:35:37.685531490 -0800 > @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + /* Makes sure update of tx_prod from start_xmit and > + tx_cons from tx_done are seen. */ > + smp_mb(); > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s > return count; > } > > -static void sky2_tx_unmap(struct pci_dev *pdev, > - const struct tx_ring_info *re) > +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) > { > if (re->flags & TX_MAP_SINGLE) > pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), > @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev > pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), > pci_unmap_len(re, maplen), > PCI_DMA_TODEVICE); > + re->flags = 0; > } > > /* > @@ -1804,7 +1807,8 @@ mapping_error: > } > > /* > - * Free ring elements from starting at tx_cons until "done" > + * Transmit complete processing > + * Free ring elements from starting at tx_cons until done index > * > * NB: > * 1. The hardware will tell us about partial completion of multi-part > @@ -1813,11 +1817,14 @@ mapping_error: > * looks at the tail of the queue of FIFO (tx_cons), not > * the head (tx_prod) > */ > -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) > +static void sky2_tx_done(struct net_device *dev, u16 done) > { > - struct net_device *dev = sky2->netdev; > + struct sky2_port *sky2 = netdev_priv(dev); > unsigned idx; > > + if (!(netif_running(dev) & netif_device_present(dev))) This makes it safe, but it still resembles the "short term fix" according do David's opinion. This change seems to affect dev->stats too. Since they are not updated in sky2_tx_clean(). Btw, I hope "&" is some optimization because it's less readable than "&&". Jarek P. > + return; > + > BUG_ON(done >= sky2->tx_ring_size); > > for (idx = sky2->tx_cons; idx != done; > @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 > sky2_tx_unmap(sky2->hw->pdev, re); > > if (skb) { > + re->skb = NULL; > + > if (unlikely(netif_msg_tx_done(sky2))) > printk(KERN_DEBUG "%s: tx done %u\n", > dev->name, idx); > @@ -1836,16 +1845,12 @@ static void sky2_tx_complete(struct sky2 > dev->stats.tx_bytes += skb->len; > > dev_kfree_skb_any(skb); > - > - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); > } > } > > sky2->tx_cons = idx; > - smp_mb(); > > - /* Wake unless it's detached, and called e.g. from sky2_down() */ > - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev)) > + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > netif_wake_queue(dev); > } > > @@ -1871,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw > sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); > } > > +static void sky2_tx_clean(struct sky2_port *sky2) > +{ > + u16 idx; > + > + for (idx = 0; idx < sky2->tx_ring_size; idx++) { > + struct tx_ring_info *re = sky2->tx_ring + idx; > + > + sky2_tx_unmap(sky2->hw->pdev, re); > + if (re->skb) { > + dev_kfree_skb_any(re->skb); > + re->skb = NULL; > + } > + } > +} > + > /* Network shutdown */ > static int sky2_down(struct net_device *dev) > { > @@ -1934,8 +1954,7 @@ static int sky2_down(struct net_device * > sky2_tx_reset(hw, port); > > /* Free any pending frames stuck in HW queue */ > - sky2_tx_complete(sky2, sky2->tx_prod); > - > + sky2_tx_clean(sky2); > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > @@ -2412,15 +2431,6 @@ error: > goto resubmit; > } > > -/* Transmit complete */ > -static inline void sky2_tx_done(struct net_device *dev, u16 last) > -{ > - struct sky2_port *sky2 = netdev_priv(dev); > - > - if (netif_running(dev)) > - sky2_tx_complete(sky2, last); > -} > - > static inline void sky2_skb_rx(const struct sky2_port *sky2, > u32 status, struct sk_buff *skb) > { > @@ -3177,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h > static void sky2_detach(struct net_device *dev) > { > if (netif_running(dev)) { > - netif_tx_lock(dev); > + netif_tx_lock_bh(dev); > netif_device_detach(dev); /* stop txq */ > - netif_tx_unlock(dev); > + netif_tx_unlock_bh(dev); > sky2_down(dev); > } > } > @@ -4202,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi > > /* Dump contents of tx ring */ > sop = 1; > - for (idx = sky2->tx_next; idx != sky2->tx_prod && idx < sky2->tx_ring_size; > + for (idx = sky2->tx_cons; idx != sky2->tx_prod && idx < sky2->tx_ring_size; > idx = RING_NEXT(idx, sky2->tx_ring_size)) { > const struct sky2_tx_le *le = sky2->tx_le + idx; > u32 a = le32_to_cpu(le->addr); > --- a/drivers/net/sky2.h 2010-01-13 08:32:27.919849429 -0800 > +++ b/drivers/net/sky2.h 2010-01-13 08:33:03.410162026 -0800 > @@ -2187,7 +2187,6 @@ struct sky2_port { > u16 tx_ring_size; > u16 tx_cons; /* next le to check */ > u16 tx_prod; /* next le to use */ > - u16 tx_next; /* debug only */ > > u16 tx_pending; > u16 tx_last_mss; ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 10:14 ` Jarek Poplawski @ 2010-01-14 11:16 ` Jarek Poplawski 2010-01-14 11:20 ` David Miller 2010-01-14 17:52 ` Stephen Hemminger 1 sibling, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-14 11:16 UTC (permalink / raw) To: Stephen Hemminger Cc: Michael Breuer, David Miller, mikem, flyboy, rjw, netdev On Thu, Jan 14, 2010 at 10:14:45AM +0000, Jarek Poplawski wrote: > On Wed, Jan 13, 2010 at 07:41:48PM -0800, Stephen Hemminger wrote: > > Subject: sky2: safer transmit cleanup ... > > + if (!(netif_running(dev) & netif_device_present(dev))) > > This makes it safe, but it still resembles the "short term fix" > according do David's opinion. Hmm... Actually, it's not safe, but still safer again. It looks like David was right (this time ;-). Since netif_device_present() isn't protected by a lock here, this is still racy against napi, since the flag can be set between the test and the wakeup. Btw, there is even no barrier, and in your patch this dangerous distance is made much wider. So, now I really ;-) agree with David: this needs a proper fix. Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 11:16 ` Jarek Poplawski @ 2010-01-14 11:20 ` David Miller 2010-01-14 11:26 ` Jarek Poplawski 0 siblings, 1 reply; 59+ messages in thread From: David Miller @ 2010-01-14 11:20 UTC (permalink / raw) To: jarkao2; +Cc: shemminger, mbreuer, mikem, flyboy, rjw, netdev From: Jarek Poplawski <jarkao2@gmail.com> Date: Thu, 14 Jan 2010 11:16:36 +0000 > So, now I really ;-) agree with David: this needs a proper fix. Now Jarek, do you now see my dirty little secret? When people disagree with me, I just silently sit around waiting for them to eventually change their mind. Isn't it brilliant? -) ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 11:20 ` David Miller @ 2010-01-14 11:26 ` Jarek Poplawski 2010-01-14 13:19 ` Mike McCormack 0 siblings, 1 reply; 59+ messages in thread From: Jarek Poplawski @ 2010-01-14 11:26 UTC (permalink / raw) To: David Miller; +Cc: shemminger, mbreuer, mikem, flyboy, rjw, netdev On Thu, Jan 14, 2010 at 03:20:09AM -0800, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Thu, 14 Jan 2010 11:16:36 +0000 > > > So, now I really ;-) agree with David: this needs a proper fix. > > Now Jarek, do you now see my dirty little secret? > > When people disagree with me, I just silently sit around waiting for > them to eventually change their mind. > > Isn't it brilliant? -) If it were that easy... (it never works for me :-() Probably, there is something more... ;-) Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 11:26 ` Jarek Poplawski @ 2010-01-14 13:19 ` Mike McCormack 2010-01-14 15:43 ` Michael Breuer ` (2 more replies) 0 siblings, 3 replies; 59+ messages in thread From: Mike McCormack @ 2010-01-14 13:19 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David Miller, shemminger, mbreuer, flyboy, rjw, netdev Jarek Poplawski wrote: > On Thu, Jan 14, 2010 at 03:20:09AM -0800, David Miller wrote: >> From: Jarek Poplawski <jarkao2@gmail.com> >> Date: Thu, 14 Jan 2010 11:16:36 +0000 >> >>> So, now I really ;-) agree with David: this needs a proper fix. >> Now Jarek, do you now see my dirty little secret? >> >> When people disagree with me, I just silently sit around waiting for >> them to eventually change their mind. >> >> Isn't it brilliant? -) > > If it were that easy... (it never works for me :-() > > Probably, there is something more... ;-) Here's what was sitting in my tree... Subject: [PATCH] sky2: Don't detach device when restarting Block the tx queue from transmitting when restarting rather than trying to take the device offline. Signed-off-by: Mike McCormack <mikem@ring3k.org> --- drivers/net/sky2.c | 13 +++++++++---- 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index d76d907..061f6f2 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -1580,6 +1580,8 @@ static int sky2_up(struct net_device *dev) if (netif_msg_ifup(sky2)) printk(KERN_INFO PFX "%s: enabling interface\n", dev->name); + netif_start_queue(dev); + return 0; err_out: @@ -1596,6 +1598,8 @@ static inline int tx_inuse(const struct sky2_port *sky2) /* Number of list elements available for next tx */ static inline int tx_avail(const struct sky2_port *sky2) { + if (unlikely(!sky2->tx_ring)) + return 0; return sky2->tx_pending - tx_inuse(sky2); } @@ -1925,7 +1929,9 @@ static int sky2_down(struct net_device *dev) sky2_read32(hw, B0_IMSK); synchronize_irq(hw->pdev->irq); - napi_synchronize(&hw->napi); + netif_tx_lock_bh(dev); + napi_disable(&hw->napi); + netif_stop_queue(dev); spin_lock_bh(&sky2->phy_lock); sky2_phy_power_down(hw, port); @@ -1939,6 +1945,8 @@ static int sky2_down(struct net_device *dev) sky2_rx_clean(sky2); sky2_free_buffers(sky2); + napi_enable(&hw->napi); + netif_tx_unlock_bh(dev); return 0; } @@ -3177,9 +3185,6 @@ static void sky2_reset(struct sky2_hw *hw) static void sky2_detach(struct net_device *dev) { if (netif_running(dev)) { - netif_tx_lock_bh(dev); - netif_device_detach(dev); /* stop txq */ - netif_tx_unlock_bh(dev); sky2_down(dev); } } -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 13:19 ` Mike McCormack @ 2010-01-14 15:43 ` Michael Breuer 2010-01-14 16:46 ` Michael Breuer 2010-01-14 17:51 ` Stephen Hemminger 2 siblings, 0 replies; 59+ messages in thread From: Michael Breuer @ 2010-01-14 15:43 UTC (permalink / raw) To: Mike McCormack Cc: Jarek Poplawski, David Miller, shemminger, flyboy, rjw, netdev On 1/14/2010 8:19 AM, Mike McCormack wrote: > Here's what was sitting in my tree... > ... > err_out: > @@ -1596,6 +1598,8 @@ static inline int tx_inuse(const struct sky2_port *sky2) > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + if (unlikely(!sky2->tx_ring)) > + return 0; > return sky2->tx_pending - tx_inuse(sky2); > } > This hunk (patch) conflicts with the v4 patch Stephen sent out last night. He added an smp_mb in front of the return. I'm going to give this a go with the smb_mb before the unlikely test. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 13:19 ` Mike McCormack 2010-01-14 15:43 ` Michael Breuer @ 2010-01-14 16:46 ` Michael Breuer 2010-01-14 17:51 ` Stephen Hemminger 2 siblings, 0 replies; 59+ messages in thread From: Michael Breuer @ 2010-01-14 16:46 UTC (permalink / raw) To: Mike McCormack Cc: Jarek Poplawski, David Miller, shemminger, flyboy, rjw, netdev On 1/14/2010 8:19 AM, Mike McCormack wrote: > Jarek Poplawski wrote: > >> On Thu, Jan 14, 2010 at 03:20:09AM -0800, David Miller wrote: >> >>> From: Jarek Poplawski<jarkao2@gmail.com> >>> Date: Thu, 14 Jan 2010 11:16:36 +0000 >>> >>> >>>> So, now I really ;-) agree with David: this needs a proper fix. >>>> >>> Now Jarek, do you now see my dirty little secret? >>> >>> When people disagree with me, I just silently sit around waiting for >>> them to eventually change their mind. >>> >>> Isn't it brilliant? -) >>> >> If it were that easy... (it never works for me :-() >> >> Probably, there is something more... ;-) >> > Here's what was sitting in my tree... > > > Subject: [PATCH] sky2: Don't detach device when restarting > > Block the tx queue from transmitting when restarting > rather than trying to take the device offline. > > Signed-off-by: Mike McCormack<mikem@ring3k.org> > --- > drivers/net/sky2.c | 13 +++++++++---- > 1 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c > index d76d907..061f6f2 100644 > --- a/drivers/net/sky2.c > +++ b/drivers/net/sky2.c > @@ -1580,6 +1580,8 @@ static int sky2_up(struct net_device *dev) > if (netif_msg_ifup(sky2)) > printk(KERN_INFO PFX "%s: enabling interface\n", dev->name); > > + netif_start_queue(dev); > + > return 0; > > err_out: > @@ -1596,6 +1598,8 @@ static inline int tx_inuse(const struct sky2_port *sky2) > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + if (unlikely(!sky2->tx_ring)) > + return 0; > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1925,7 +1929,9 @@ static int sky2_down(struct net_device *dev) > sky2_read32(hw, B0_IMSK); > > synchronize_irq(hw->pdev->irq); > - napi_synchronize(&hw->napi); > + netif_tx_lock_bh(dev); > + napi_disable(&hw->napi); > + netif_stop_queue(dev); > > spin_lock_bh(&sky2->phy_lock); > sky2_phy_power_down(hw, port); > @@ -1939,6 +1945,8 @@ static int sky2_down(struct net_device *dev) > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > + napi_enable(&hw->napi); > + netif_tx_unlock_bh(dev); > > return 0; > } > @@ -3177,9 +3185,6 @@ static void sky2_reset(struct sky2_hw *hw) > static void sky2_detach(struct net_device *dev) > { > if (netif_running(dev)) { > - netif_tx_lock_bh(dev); > - netif_device_detach(dev); /* stop txq */ > - netif_tx_unlock_bh(dev); > sky2_down(dev); > } > } > -- 1.5.6.5 > Ok - no obvious difference with this patch + Stephen's. I still see: No reported errors; decent throughput; earlier issues with IRQ resolved. But... still seeing DHCP DISCOVER/OFFER (no REQUEST/ACK) while under load. I can try to sniff this... but given that it's under load, I'd probably have to filter out non DHCP stuff and might miss whatever is really going on. Again - main point here is that absent these patches I don't see this issue. It's also possible that these two patches allow the load to be heavier thus actually causing a different problem. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 13:19 ` Mike McCormack 2010-01-14 15:43 ` Michael Breuer 2010-01-14 16:46 ` Michael Breuer @ 2010-01-14 17:51 ` Stephen Hemminger 2 siblings, 0 replies; 59+ messages in thread From: Stephen Hemminger @ 2010-01-14 17:51 UTC (permalink / raw) To: Mike McCormack Cc: Jarek Poplawski, David Miller, mbreuer, flyboy, rjw, netdev On Thu, 14 Jan 2010 22:19:39 +0900 Mike McCormack <mikem@ring3k.org> wrote: > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + if (unlikely(!sky2->tx_ring)) > + return 0; > return sky2->tx_pending - tx_inuse(sky2); > } > Breaks in detach case > @@ -1925,7 +1929,9 @@ static int sky2_down(struct net_device *dev) > sky2_read32(hw, B0_IMSK); > > synchronize_irq(hw->pdev->irq); > - napi_synchronize(&hw->napi); > + netif_tx_lock_bh(dev); > + napi_disable(&hw->napi); > + netif_stop_queue(dev); > > spin_lock_bh(&sky2->phy_lock); > sky2_phy_power_down(hw, port); > @@ -1939,6 +1945,8 @@ static int sky2_down(struct net_device *dev) > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > + napi_enable(&hw->napi); > + netif_tx_unlock_bh(dev); > > return 0; Breaks on dual ported boards ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 10:14 ` Jarek Poplawski 2010-01-14 11:16 ` Jarek Poplawski @ 2010-01-14 17:52 ` Stephen Hemminger 2010-01-14 23:51 ` Michael Breuer 1 sibling, 1 reply; 59+ messages in thread From: Stephen Hemminger @ 2010-01-14 17:52 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Michael Breuer, David Miller, mikem, flyboy, rjw, netdev On Thu, 14 Jan 2010 10:14:45 +0000 Jarek Poplawski <jarkao2@gmail.com> wrote: > This makes it safe, but it still resembles the "short term fix" > according do David's opinion. > > This change seems to affect dev->stats too. Since they are not > updated in sky2_tx_clean(). Btw, I hope "&" is some optimization > because it's less readable than "&&". Stats don't matter for packets flushed during device reset. The & is because in the most common case device is up, and we don't want the additional conditional branch. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 17:52 ` Stephen Hemminger @ 2010-01-14 23:51 ` Michael Breuer 2010-01-16 18:35 ` sky2 DHCPOFFER packet loss under load (Was Re: [PATCH] sky2: safer transmit ring cleaning (v4)) Michael Breuer 0 siblings, 1 reply; 59+ messages in thread From: Michael Breuer @ 2010-01-14 23:51 UTC (permalink / raw) To: Stephen Hemminger Cc: Jarek Poplawski, David Miller, mikem, flyboy, rjw, netdev On 1/14/2010 12:52 PM, Stephen Hemminger wrote: > On Thu, 14 Jan 2010 10:14:45 +0000 > Jarek Poplawski<jarkao2@gmail.com> wrote: > > >> This makes it safe, but it still resembles the "short term fix" >> according do David's opinion. >> >> This change seems to affect dev->stats too. Since they are not >> updated in sky2_tx_clean(). Btw, I hope "&" is some optimization >> because it's less readable than "&&". >> > Stats don't matter for packets flushed during device reset. > > The& is because in the most common case device is up, > and we don't want the additional conditional branch. > I've been looking at what might explain the dhcp stuff - as well as the dropped packets only when there's an extra hop. I came across one path that seems suspect - although I'm really not familiar with the network stack code... that said, I'm wondering about neigh_compat_output (and eth_rebuild_header and arp_find). If I'm following things correctly (or perhaps mostly correctly), the only time anything goes this route (pun intentional) is when the packet was routed to this box. I'm guessing that bridging makes this more likely. So my dhcp stuff would all be going through here, as would the smb stuff that seemed flaky. The race I'm seeing (maybe) is that when the arp table is being rebuilt, there's a possibility that arp_find frees the skb. There's some other locking and stuff going on that seems maybe races with sky2.c in places on both the rx and tx path. I *think* it's right from looking at it, but test results suggest otherwise. Aside from the potential race, I think there's also a corner case where neigh_compat_output can return either with or without freeing the skb depending on the return from dev_hard_header... this may also be part of the race. Maybe I've missed something... but as far as I can see, this is just about the only difference in code path taken between stuff that is working and stuff that is occasionally not. ^ permalink raw reply [flat|nested] 59+ messages in thread
* sky2 DHCPOFFER packet loss under load (Was Re: [PATCH] sky2: safer transmit ring cleaning (v4)) 2010-01-14 23:51 ` Michael Breuer @ 2010-01-16 18:35 ` Michael Breuer 0 siblings, 0 replies; 59+ messages in thread From: Michael Breuer @ 2010-01-16 18:35 UTC (permalink / raw) To: Stephen Hemminger Cc: Jarek Poplawski, David Miller, mikem, flyboy, rjw, netdev On 1/14/2010 6:51 PM, Michael Breuer wrote: > On 1/14/2010 12:52 PM, Stephen Hemminger wrote: >> On Thu, 14 Jan 2010 10:14:45 +0000 >> Jarek Poplawski<jarkao2@gmail.com> wrote: >> >>> This makes it safe, but it still resembles the "short term fix" >>> according do David's opinion. >>> >>> This change seems to affect dev->stats too. Since they are not >>> updated in sky2_tx_clean(). Btw, I hope "&" is some optimization >>> because it's less readable than "&&". >> Stats don't matter for packets flushed during device reset. >> >> The& is because in the most common case device is up, >> and we don't want the additional conditional branch. > I've been looking at what might explain the dhcp stuff - as well as > the dropped packets only when there's an extra hop. I came across one > path that seems suspect - although I'm really not familiar with the > network stack code... that said, I'm wondering about > neigh_compat_output (and eth_rebuild_header and arp_find). If I'm > following things correctly (or perhaps mostly correctly), the only > time anything goes this route (pun intentional) is when the packet was > routed to this box. I'm guessing that bridging makes this more likely. > So my dhcp stuff would all be going through here, as would the smb > stuff that seemed flaky. The race I'm seeing (maybe) is that when the > arp table is being rebuilt, there's a possibility that arp_find frees > the skb. There's some other locking and stuff going on that seems > maybe races with sky2.c in places on both the rx and tx path. I > *think* it's right from looking at it, but test results suggest > otherwise. Aside from the potential race, I think there's also a > corner case where neigh_compat_output can return either with or > without freeing the skb depending on the return from > dev_hard_header... this may also be part of the race. > > Maybe I've missed something... but as far as I can see, this is just > about the only difference in code path taken between stuff that is > working and stuff that is occasionally not. Ok - brief update. I've confirmed that under load, outgoing DHCPOFFER packets are being silently dropped. I don't know yet where. Test scenario, what I do know, etc.: Scenario: Two systems; one Gb switch; one wifi router; one Blackberry client. System A: Linux host; Asus P6T Deluxe V2/Sky2. eth1-> internet eth0-> internal (10.0.0.1/24). Switch: HP Procurve unmanaged - one port connected to System A; another to the wifi router; another to System B. System B: Win7; Asus M2N Deluxe SLI/ Nforce 5 (10.0.0.11) Router: Wrt54g-tm (dd-wrt) Connected to switch & various wifi clients including a Blackberry. WEP enabled. (10.0.0.60) Blackberry Curve 8320 (wifi enabled). (10.0.0.56 via dhcp lease) Test that causes packet loss: 1. Turn off BB wifi. 2. Start copy of large files (4GB) from System B to an CIFS share on System A. 3. Start nethogs on system A. 4. Start tcpdump on the wifi router (interface br0 - wired 10.0.0.60 connection) 5. Start wireshark on System A 6. tail system A /var/log/messages - watching for DHCP activity 7. When smb traffic load (incoming) exceeds 40,000KBPS (nethogs) - enable wifi on the blackberry. 8. Stop test after multiple DHCPDISCOVER/OFFER observed without REQUEST/ACK. Results: 1. It seems that the problem occurs intermittently below 40,000KBPS, and consistently over that number as reported by nethogs. Lots of fluctuation, so figure that the 40k is approximate. 2. wireshark (system A) shows DHCPOFFER traffic outgoing. 3. tcpdump (wifi - wired incoming interface) does NOT show DHCPOFFER traffic when this problem occurs. 4. Both traces show arp activity during the DISCOVER/OFFER sequence. 5. There is no evidence of tx errors or packet drops, in any statistics I can find. Thoughts: I still think there's a race happening between the arp neighbor update and sky2. Might be higher up, but as I'm seeing the outgoing packets when sniffing eth0, can't be too much higher up. This problem seems to be exacerbated by the more recent patches, however I believe that this is a result of the higher throughput achievable with these patches. With the older set, I saw this problem less frequently, but found it much harder to get over the 40K RX number. I am also seeing (as previously reported - but haven't sniffed yet) SMB ACK dropped packets, but only when traversing the wifi router. Not sure if this is related, but hey, you never know. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH] sky2: safer transmit ring cleaning (v4) 2010-01-14 3:41 ` [PATCH] sky2: safer transmit ring cleaning (v4) Stephen Hemminger 2010-01-14 10:14 ` Jarek Poplawski @ 2010-01-14 15:46 ` Michael Breuer 1 sibling, 0 replies; 59+ messages in thread From: Michael Breuer @ 2010-01-14 15:46 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, jarkao2, mikem, flyboy, rjw, netdev On 1/13/2010 10:41 PM, Stephen Hemminger wrote: > Subject: sky2: safer transmit cleanup > > This code makes transmit path and transmit reset safer by: > * adding memory barrier before checking available ring slots > * reseting state of tx ring elements after free > * seperate cleanup function from ring done function > * removing mostly unused tx_next element > * ignoring transmit completion if device is offline > > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com> > > --- > This patch is against the current net-next-2.6 tree. > This version handles the case of dual port shared transmit status > and other cases where it is possible for tx_done to be called when > device is being changed. > > --- a/drivers/net/sky2.c 2010-01-13 08:32:51.360161158 -0800 > +++ b/drivers/net/sky2.c 2010-01-13 08:35:37.685531490 -0800 > @@ -1596,6 +1596,9 @@ static inline int tx_inuse(const struct > /* Number of list elements available for next tx */ > static inline int tx_avail(const struct sky2_port *sky2) > { > + /* Makes sure update of tx_prod from start_xmit and > + tx_cons from tx_done are seen. */ > + smp_mb(); > return sky2->tx_pending - tx_inuse(sky2); > } > > @@ -1618,8 +1621,7 @@ static unsigned tx_le_req(const struct s > return count; > } > > -static void sky2_tx_unmap(struct pci_dev *pdev, > - const struct tx_ring_info *re) > +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re) > { > if (re->flags& TX_MAP_SINGLE) > pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr), > @@ -1629,6 +1631,7 @@ static void sky2_tx_unmap(struct pci_dev > pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr), > pci_unmap_len(re, maplen), > PCI_DMA_TODEVICE); > + re->flags = 0; > } > > /* > @@ -1804,7 +1807,8 @@ mapping_error: > } > > /* > - * Free ring elements from starting at tx_cons until "done" > + * Transmit complete processing > + * Free ring elements from starting at tx_cons until done index > * > * NB: > * 1. The hardware will tell us about partial completion of multi-part > @@ -1813,11 +1817,14 @@ mapping_error: > * looks at the tail of the queue of FIFO (tx_cons), not > * the head (tx_prod) > */ > -static void sky2_tx_complete(struct sky2_port *sky2, u16 done) > +static void sky2_tx_done(struct net_device *dev, u16 done) > { > - struct net_device *dev = sky2->netdev; > + struct sky2_port *sky2 = netdev_priv(dev); > unsigned idx; > > + if (!(netif_running(dev)& netif_device_present(dev))) > + return; > + > BUG_ON(done>= sky2->tx_ring_size); > > for (idx = sky2->tx_cons; idx != done; > @@ -1828,6 +1835,8 @@ static void sky2_tx_complete(struct sky2 > sky2_tx_unmap(sky2->hw->pdev, re); > > if (skb) { > + re->skb = NULL; > + > if (unlikely(netif_msg_tx_done(sky2))) > printk(KERN_DEBUG "%s: tx done %u\n", > dev->name, idx); > @@ -1836,16 +1845,12 @@ static void sky2_tx_complete(struct sky2 > dev->stats.tx_bytes += skb->len; > > dev_kfree_skb_any(skb); > - > - sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size); > } > } > > sky2->tx_cons = idx; > - smp_mb(); > > - /* Wake unless it's detached, and called e.g. from sky2_down() */ > - if (tx_avail(sky2)> MAX_SKB_TX_LE + 4&& netif_device_present(dev)) > + if (tx_avail(sky2)> MAX_SKB_TX_LE + 4) > netif_wake_queue(dev); > } > > @@ -1871,6 +1876,21 @@ static void sky2_tx_reset(struct sky2_hw > sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET); > } > > +static void sky2_tx_clean(struct sky2_port *sky2) > +{ > + u16 idx; > + > + for (idx = 0; idx< sky2->tx_ring_size; idx++) { > + struct tx_ring_info *re = sky2->tx_ring + idx; > + > + sky2_tx_unmap(sky2->hw->pdev, re); > + if (re->skb) { > + dev_kfree_skb_any(re->skb); > + re->skb = NULL; > + } > + } > +} > + > /* Network shutdown */ > static int sky2_down(struct net_device *dev) > { > @@ -1934,8 +1954,7 @@ static int sky2_down(struct net_device * > sky2_tx_reset(hw, port); > > /* Free any pending frames stuck in HW queue */ > - sky2_tx_complete(sky2, sky2->tx_prod); > - > + sky2_tx_clean(sky2); > sky2_rx_clean(sky2); > > sky2_free_buffers(sky2); > @@ -2412,15 +2431,6 @@ error: > goto resubmit; > } > > -/* Transmit complete */ > -static inline void sky2_tx_done(struct net_device *dev, u16 last) > -{ > - struct sky2_port *sky2 = netdev_priv(dev); > - > - if (netif_running(dev)) > - sky2_tx_complete(sky2, last); > -} > - > static inline void sky2_skb_rx(const struct sky2_port *sky2, > u32 status, struct sk_buff *skb) > { > @@ -3177,9 +3187,9 @@ static void sky2_reset(struct sky2_hw *h > static void sky2_detach(struct net_device *dev) > { > if (netif_running(dev)) { > - netif_tx_lock(dev); > + netif_tx_lock_bh(dev); > netif_device_detach(dev); /* stop txq */ > - netif_tx_unlock(dev); > + netif_tx_unlock_bh(dev); > sky2_down(dev); > } > } > @@ -4202,7 +4212,7 @@ static int sky2_debug_show(struct seq_fi > > /* Dump contents of tx ring */ > sop = 1; > - for (idx = sky2->tx_next; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > + for (idx = sky2->tx_cons; idx != sky2->tx_prod&& idx< sky2->tx_ring_size; > idx = RING_NEXT(idx, sky2->tx_ring_size)) { > const struct sky2_tx_le *le = sky2->tx_le + idx; > u32 a = le32_to_cpu(le->addr); > --- a/drivers/net/sky2.h 2010-01-13 08:32:27.919849429 -0800 > +++ b/drivers/net/sky2.h 2010-01-13 08:33:03.410162026 -0800 > @@ -2187,7 +2187,6 @@ struct sky2_port { > u16 tx_ring_size; > u16 tx_cons; /* next le to check */ > u16 tx_prod; /* next le to use */ > - u16 tx_next; /* debug only */ > > u16 tx_pending; > u16 tx_last_mss; > Tested this a bit (w/o Mike's patch from this morning). Overall, works (vs. v3), but still something flaky going on WRT dhcp traffic: Under load (and only under load) I'm seeing mutliple failed dhcp client requests - 4-6 discover/offer sequences before I see request/ack. I don't see errors, dropped packets, etc., at the time this is happening. I'd blow it off to load, but I don't see this with the earlier (pskb_may_pull) patch. Going to rebuild with the inclusion of Mike's patch and see what happens. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [Bug #14925] sky2 panic under load 2010-01-11 16:45 ` Stephen Hemminger 2010-01-11 22:07 ` Jarek Poplawski @ 2010-01-11 22:31 ` Jarek Poplawski 1 sibling, 0 replies; 59+ messages in thread From: Jarek Poplawski @ 2010-01-11 22:31 UTC (permalink / raw) To: Stephen Hemminger Cc: Mike McCormack, Berck E. Nash, Rafael J. Wysocki, netdev On Mon, Jan 11, 2010 at 08:45:04AM -0800, Stephen Hemminger wrote: > On Mon, 11 Jan 2010 23:03:41 +0900 > Mike McCormack <mikem@ring3k.org> wrote: > > > Perhaps only sky2_tx_done should wake the queue? > > Does the patch below fix the problem too? > > > > thanks, > > > > Mike > > The idea is good, but what if transmit queue was full (stopped), > and TX FIFO gets stuck. Then Transmit timeout happens and > the reset logic clears the queue. What will start the queue? > > Something like this: > ----------------------------------------------------------- > > > > --- a/drivers/net/sky2.c 2010-01-11 08:36:42.617426300 -0800 > +++ b/drivers/net/sky2.c 2010-01-11 08:42:51.295237661 -0800 > @@ -1843,9 +1843,6 @@ static void sky2_tx_complete(struct sky2 > > sky2->tx_cons = idx; > smp_mb(); > - > - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > - netif_wake_queue(dev); > } > > static void sky2_tx_reset(struct sky2_hw *hw, unsigned port) > @@ -2416,8 +2413,12 @@ static inline void sky2_tx_done(struct n > { > struct sky2_port *sky2 = netdev_priv(dev); > > - if (netif_running(dev)) > + if (netif_running(dev) && netif_device_present(dev)) { > sky2_tx_complete(sky2, last); > + > + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > + netif_wake_queue(dev); > + } > } > > static inline void sky2_skb_rx(const struct sky2_port *sky2, > @@ -3197,6 +3198,7 @@ static int sky2_reattach(struct net_devi > } else { > netif_device_attach(dev); > sky2_set_multicast(dev); > + netif_start_queue(dev); Why netif_device_attach() is not enough? Jarek P. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: 2.6.33-rc3-git3: Reported regressions from 2.6.32 2010-01-10 22:27 2.6.33-rc3-git3: Reported regressions from 2.6.32 Rafael J. Wysocki 2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki @ 2010-01-11 16:00 ` Luis R. Rodriguez 2010-01-11 21:47 ` Nick Bowler [not found] ` <omZ8aBZReyD.A.rz.COlSLB@chimera> 3 siblings, 0 replies; 59+ messages in thread From: Luis R. Rodriguez @ 2010-01-11 16:00 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich, Kernel Testers List, Network Development, Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List, DRI On Sun, Jan 10, 2010 at 2:27 PM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > This message contains a list of some regressions from 2.6.32, for which there > are no fixes in the mainline I know of. If any of them have been fixed already, > please let me know. > > If you know of any other unresolved regressions from 2.6.32, please let me know > either and I'll add them to the list. Also, please let me know if any of the > entries below are invalid. > > Each entry from the list will be sent additionally in an automatic reply to > this message with CCs to the people involved in reporting and handling the > issue. > Of these only the one below was wireless, which is actually fixed now. Not bad, rc3 and no wireless regressions yet. > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14874 > Subject : Ath5k regression with commit 8bf3d79bc401ca417ccf9fc076d3295d1a71dbf5 > Submitter : Joshua Covington <joshuacov@googlemail.com> > Date : 2009-12-25 00:49 (17 days old) > Handled-By : Luis R. Rodriguez <mcgrof-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > Patch : http://bugzilla.kernel.org/attachment.cgi?id=24335 Luis ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: 2.6.33-rc3-git3: Reported regressions from 2.6.32 2010-01-10 22:27 2.6.33-rc3-git3: Reported regressions from 2.6.32 Rafael J. Wysocki 2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki 2010-01-11 16:00 ` 2.6.33-rc3-git3: Reported regressions from 2.6.32 Luis R. Rodriguez @ 2010-01-11 21:47 ` Nick Bowler 2010-01-11 22:10 ` Rafael J. Wysocki [not found] ` <omZ8aBZReyD.A.rz.COlSLB@chimera> 3 siblings, 1 reply; 59+ messages in thread From: Nick Bowler @ 2010-01-11 21:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich, Kernel Testers List, Network Development, Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List, DRI On 23:27 Sun 10 Jan , Rafael J. Wysocki wrote: > This message contains a list of some regressions from 2.6.32, for which there > are no fixes in the mainline I know of. If any of them have been fixed already, > please let me know. > > If you know of any other unresolved regressions from 2.6.32, please let me know > either and I'll add them to the list. Also, please let me know if any of the > entries below are invalid. Here are two that I don't see in the list: * LVDS downclocking breaks on G45/thinkpad T500 LKML: http://lkml.org/lkml/2009/12/26/40 FDO: http://bugs.freedesktop.org/show_bug.cgi?id=25809 * Receive failure in et131x (Alan Cox sent me a patch for this but it's not in mainline yet) LKML: http://lkml.org/lkml/2009/12/29/210 -- Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/) ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: 2.6.33-rc3-git3: Reported regressions from 2.6.32 2010-01-11 21:47 ` Nick Bowler @ 2010-01-11 22:10 ` Rafael J. Wysocki 0 siblings, 0 replies; 59+ messages in thread From: Rafael J. Wysocki @ 2010-01-11 22:10 UTC (permalink / raw) To: Nick Bowler Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich, Kernel Testers List, Network Development, Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List, DRI On Monday 11 January 2010, Nick Bowler wrote: > On 23:27 Sun 10 Jan , Rafael J. Wysocki wrote: > > This message contains a list of some regressions from 2.6.32, for which there > > are no fixes in the mainline I know of. If any of them have been fixed already, > > please let me know. > > > > If you know of any other unresolved regressions from 2.6.32, please let me know > > either and I'll add them to the list. Also, please let me know if any of the > > entries below are invalid. > > Here are two that I don't see in the list: > > * LVDS downclocking breaks on G45/thinkpad T500 > > LKML: http://lkml.org/lkml/2009/12/26/40 > FDO: http://bugs.freedesktop.org/show_bug.cgi?id=25809 Added this one. > * Receive failure in et131x (Alan Cox sent me a patch for this but > it's not in mainline yet) > > LKML: http://lkml.org/lkml/2009/12/29/210 This one is a staging driver, so I'm not going to list it. Thanks, Rafael ^ permalink raw reply [flat|nested] 59+ messages in thread
[parent not found: <omZ8aBZReyD.A.rz.COlSLB@chimera>]
* [PATCH net-2.6] cdc_ether: Partially revert "usbnet: Set link down initially ..." [not found] ` <omZ8aBZReyD.A.rz.COlSLB@chimera> @ 2010-01-28 23:18 ` Ben Hutchings 2010-01-29 5:37 ` David Miller 0 siblings, 1 reply; 59+ messages in thread From: Ben Hutchings @ 2010-01-28 23:18 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, avi.rozen Commit 37e8273cd30592d3a82bcb70cbb1bdc4eaeb6b71 ("usbnet: Set link down initially for drivers that update link state") changed the initial link state in cdc_ether and other drivers based on the understanding that the devices they support generate link change interrupts. However, this is optional in the CDC Ethernet protocol, and two users have reported in <http://bugzilla.kernel.org/show_bug.cgi?id=14791> that the link state for their devices remains down. Therefore, revert the change in cdc_ether. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Tested-by: Avi Rozen <avi.rozen@gmail.com> --- --- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -419,7 +419,7 @@ static const struct driver_info cdc_info = { .description = "CDC Ethernet Device", - .flags = FLAG_ETHER | FLAG_LINK_INTR, + .flags = FLAG_ETHER, // .check_connect = cdc_check_connect, .bind = cdc_bind, .unbind = usbnet_cdc_unbind, -- Ben Hutchings The generation of random numbers is too important to be left to chance. - Robert Coveyou ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH net-2.6] cdc_ether: Partially revert "usbnet: Set link down initially ..." 2010-01-28 23:18 ` [PATCH net-2.6] cdc_ether: Partially revert "usbnet: Set link down initially ..." Ben Hutchings @ 2010-01-29 5:37 ` David Miller 0 siblings, 0 replies; 59+ messages in thread From: David Miller @ 2010-01-29 5:37 UTC (permalink / raw) To: ben; +Cc: netdev, avi.rozen From: Ben Hutchings <ben@decadent.org.uk> Date: Thu, 28 Jan 2010 23:18:44 +0000 > Commit 37e8273cd30592d3a82bcb70cbb1bdc4eaeb6b71 ("usbnet: Set link down > initially for drivers that update link state") changed the initial link > state in cdc_ether and other drivers based on the understanding that the > devices they support generate link change interrupts. However, this is > optional in the CDC Ethernet protocol, and two users have reported in > <http://bugzilla.kernel.org/show_bug.cgi?id=14791> that the link state > for their devices remains down. Therefore, revert the change in > cdc_ether. > > Signed-off-by: Ben Hutchings <ben@decadent.org.uk> > Tested-by: Avi Rozen <avi.rozen@gmail.com> Applied, thanks a lot Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
end of thread, other threads:[~2010-01-29 5:37 UTC | newest] Thread overview: 59+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-01-10 22:27 2.6.33-rc3-git3: Reported regressions from 2.6.32 Rafael J. Wysocki 2010-01-10 22:32 ` [Bug #14925] sky2 panic under load Rafael J. Wysocki 2010-01-11 0:36 ` Berck E. Nash 2010-01-11 13:26 ` Jarek Poplawski 2010-01-11 19:32 ` Rafael J. Wysocki 2010-01-11 20:31 ` Jarek Poplawski 2010-01-11 20:50 ` Rafael J. Wysocki 2010-01-11 21:02 ` Berck E. Nash 2010-01-11 21:47 ` Jarek Poplawski 2010-01-11 14:03 ` Mike McCormack 2010-01-11 16:45 ` Stephen Hemminger 2010-01-11 22:07 ` Jarek Poplawski 2010-01-12 0:14 ` David Miller 2010-01-12 7:50 ` Jarek Poplawski 2010-01-12 8:08 ` David Miller 2010-01-12 8:56 ` Jarek Poplawski 2010-01-12 9:42 ` David Miller 2010-01-12 10:31 ` Jarek Poplawski 2010-01-12 10:56 ` David Miller 2010-01-12 11:04 ` Jarek Poplawski 2010-01-12 15:39 ` Stephen Hemminger 2010-01-12 16:15 ` [PATCH] sky2: safer transmit ring cleaning Stephen Hemminger 2010-01-12 16:32 ` Michael Breuer 2010-01-12 17:02 ` Stephen Hemminger 2010-01-12 18:04 ` Jarek Poplawski 2010-01-12 18:13 ` Stephen Hemminger 2010-01-12 18:24 ` Jarek Poplawski 2010-01-12 18:49 ` [PATCH] sky2: safer transmit ring cleaning (v2) Stephen Hemminger 2010-01-12 19:16 ` Jarek Poplawski 2010-01-12 19:23 ` Stephen Hemminger 2010-01-12 19:50 ` Jarek Poplawski 2010-01-13 1:23 ` Stephen Hemminger 2010-01-12 19:34 ` Michael Breuer 2010-01-12 18:35 ` [PATCH] sky2: safer transmit ring cleaning Michael Breuer 2010-01-12 18:42 ` Michael Breuer 2010-01-12 20:31 ` Michael Breuer 2010-01-13 4:10 ` [PATCH] sky2: safer transmit ring cleaning (v3) Stephen Hemminger 2010-01-13 4:31 ` Michael Breuer 2010-01-13 7:35 ` Jarek Poplawski 2010-01-13 16:04 ` Michael Breuer 2010-01-14 3:41 ` [PATCH] sky2: safer transmit ring cleaning (v4) Stephen Hemminger 2010-01-14 10:14 ` Jarek Poplawski 2010-01-14 11:16 ` Jarek Poplawski 2010-01-14 11:20 ` David Miller 2010-01-14 11:26 ` Jarek Poplawski 2010-01-14 13:19 ` Mike McCormack 2010-01-14 15:43 ` Michael Breuer 2010-01-14 16:46 ` Michael Breuer 2010-01-14 17:51 ` Stephen Hemminger 2010-01-14 17:52 ` Stephen Hemminger 2010-01-14 23:51 ` Michael Breuer 2010-01-16 18:35 ` sky2 DHCPOFFER packet loss under load (Was Re: [PATCH] sky2: safer transmit ring cleaning (v4)) Michael Breuer 2010-01-14 15:46 ` [PATCH] sky2: safer transmit ring cleaning (v4) Michael Breuer 2010-01-11 22:31 ` [Bug #14925] sky2 panic under load Jarek Poplawski 2010-01-11 16:00 ` 2.6.33-rc3-git3: Reported regressions from 2.6.32 Luis R. Rodriguez 2010-01-11 21:47 ` Nick Bowler 2010-01-11 22:10 ` Rafael J. Wysocki [not found] ` <omZ8aBZReyD.A.rz.COlSLB@chimera> 2010-01-28 23:18 ` [PATCH net-2.6] cdc_ether: Partially revert "usbnet: Set link down initially ..." Ben Hutchings 2010-01-29 5:37 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).