public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [2/4] 2.6.23-rc3: known regressions
       [not found] <46C098FD.1030601@googlemail.com>
@ 2007-08-13 17:59 ` Michal Piotrowski
  2007-08-13 23:29   ` Luca Tettamanti
  2007-08-13 17:59 ` [3/4] " Michal Piotrowski
  2007-08-13 17:59 ` [4/4] " Michal Piotrowski
  2 siblings, 1 reply; 29+ messages in thread
From: Michal Piotrowski @ 2007-08-13 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, LKML, Len Brown, linux-acpi, Plamen Petrov,
	Michael Sedkowski, Kamalesh Babulal, Norbert Preining, dth

Hi all,

Here is a list of some known regressions in 2.6.23-rc3.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

Name                    Regressions fixed since 21-Jun-2007
Adrian Bunk                            9
Andi Kleen                             5
Linus Torvalds                         5
Andrew Morton                          4
Al Viro                                3
Cornelia Huck                          3
Jens Axboe                             3
Tejun Heo                              3



ACPI

Subject         : regression from 2.6.23-rc2 to -rc3 BAT missing
References      : http://lkml.org/lkml/2007/8/13/661
Last known good : ?
Submitter       : Norbert Preining <preining@logic.at>
Caused-By       : ?
Handled-By      : ?
Status          : unknown

Subject         : FATAL: drivers/acpi/video: sizeof(struct acpi_device_id)=20 is not a modulo of the size of section __mod_acpi_device_table=48
References      : http://lkml.org/lkml/2007/8/13/584
Last known good : ?
Submitter       : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Caused-By       : ?
Handled-By      : ?
Status          : unknown

Subject         : MCFG bug on hp nx6310
References      : http://lkml.org/lkml/2007/8/8/252
Last known good : ?
Submitter       : Michael Sedkowski <sedmich@gmail.com>
Caused-By       : ?
Handled-By      : ?
Status          : unknown

Subject         : 2.6.23-rc1-git10 hangs on boot - needs "acpi=off"
References      : http://lkml.org/lkml/2007/8/2/24
Last known good : ?
Submitter       : Plamen Petrov <plamen.petrov@tk.ru.acad.bg>
Caused-By       : ?
Handled-By      : Len Brown <len.brown@intel.com>
Status          : problem is being debugged



CPUFREQ

Subject         : ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not
References      : http://lkml.org/lkml/2007/7/27/298
                  http://lkml.org/lkml/2007/7/29/371
Last known good : ?
Submitter       : dth <dth@dth.net>
Caused-By       : Len Brown <lenb@kernel.org>
                  commit f79e3185dd0f8650022518d7624c876d8929061b
Handled-By      : Len Brown <lenb@kernel.org>
Status          : problem is being debugged



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3/4] 2.6.23-rc3: known regressions
       [not found] <46C098FD.1030601@googlemail.com>
  2007-08-13 17:59 ` [2/4] 2.6.23-rc3: known regressions Michal Piotrowski
@ 2007-08-13 17:59 ` Michal Piotrowski
  2007-08-14 22:09   ` Francois Romieu
  2007-08-13 17:59 ` [4/4] " Michal Piotrowski
  2 siblings, 1 reply; 29+ messages in thread
From: Michal Piotrowski @ 2007-08-13 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, LKML, Netdev, Stephen Hemminger, Thomas Meyer,
	Uwe Bugla, Daniel K., Shish, Karl Meyer

Hi all,

Here is a list of some known regressions in 2.6.23-rc3.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

Name                    Regressions fixed since 21-Jun-2007
Adrian Bunk                            9
Andi Kleen                             5
Linus Torvalds                         5
Andrew Morton                          4
Al Viro                                3
Cornelia Huck                          3
Jens Axboe                             3
Tejun Heo                              3



Networking

Subject         : NETDEV WATCHDOG: eth0: transmit timed out
References      : http://lkml.org/lkml/2007/8/13/737
Last known good : ?
Submitter       : Karl Meyer <adhocrocker@gmail.com>
Caused-By       : ?
Handled-By      : ?
Status          : unknown

Subject         : Weird network problems with 2.6.23-rc2
References      : http://lkml.org/lkml/2007/8/11/40
Last known good : ?
Submitter       : Shish <shish@shishnet.org>
Caused-By       : ?
Handled-By      : ?
Status          : unknown

Subject         : BUG: when using 'brctl stp'
References      : http://lkml.org/lkml/2007/8/10/441
Last known good : 2.6.23-rc1
Submitter       : Daniel K. <daniel@cluded.net>
Caused-By       : ?
Handled-By      : ?
Status          : unknown

Subject         : IP v4 routing is broken
References      : http://www.stardust.webpages.pl/files/tbf/bugs/bug_report01.txt
Last known good : 2.6.22-git2
Submitter       : Uwe Bugla <uwe.bugla@gmx.de>
Caused-By       : ?
Handled-By      : ?
Status          : unknown

Subject         : New wake ups from sky2
References      : http://lkml.org/lkml/2007/7/20/386
Last known good : ?
Submitter       : Thomas Meyer <thomas@m3y3r.de>
Caused-By       : Stephen Hemminger <shemminger@osdl.org>
                  commit eb35cf60e462491249166182e3e755d3d5d91a28
Handled-By      : Stephen Hemminger <shemminger@osdl.org>
Status          : unknown



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [4/4] 2.6.23-rc3: known regressions
       [not found] <46C098FD.1030601@googlemail.com>
  2007-08-13 17:59 ` [2/4] 2.6.23-rc3: known regressions Michal Piotrowski
  2007-08-13 17:59 ` [3/4] " Michal Piotrowski
@ 2007-08-13 17:59 ` Michal Piotrowski
  2007-08-21  1:41   ` [linux-usb-devel] " David Brownell
  2 siblings, 1 reply; 29+ messages in thread
From: Michal Piotrowski @ 2007-08-13 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, LKML, linux-usb-devel, Greg KH, Alan Stern,
	Oliver Neukum, Tino Keitel, Stuart_Hayes@Dell.com, Daniel Exner,
	kvm-devel, Avi Kivity, Paolo Ornati, linux-alpha,
	Richard Henderson, Ivan Kokshaysky, Oliver Falk, linux-fsdevel,
	nfs, Trond Myklebust, Andrew Clayton

Hi all,

Here is a list of some known regressions in 2.6.23-rc3.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

Name                    Regressions fixed since 21-Jun-2007
Adrian Bunk                            9
Andi Kleen                             5
Linus Torvalds                         5
Andrew Morton                          4
Al Viro                                3
Cornelia Huck                          3
Jens Axboe                             3
Tejun Heo                              3



FS

Subject         : [NFSD OOPS] 2.6.23-rc1-git10
References      : http://lkml.org/lkml/2007/8/2/462
Last known good : ?
Submitter       : Andrew Clayton <andrew@digital-domain.net>
Caused-By       : ?
Handled-By      : ?
Status          : unknown



USB

Subject         : EHCI Regression in 2.6.23-rc2
References      : http://lkml.org/lkml/2007/8/10/81
Last known good : ?
Submitter       : Daniel Exner <dex@dragonslave.de>
Caused-By       : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com>
                  commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
Handled-By      : ?
Status          : unknown

Subject         : 2.6.23-rc1: USB hard disk broken
References      : http://lkml.org/lkml/2007/7/25/62
Last known good : ?
Submitter       : Tino Keitel <tino.keitel@gmx.de>
Caused-By       : ?
Handled-By      : Oliver Neukum <oliver@neukum.org>
Status          : unknown



Virtualization

Subject         : WARNING: at arch/x86_64/kernel/smp.c:379 smp_call_function_single()
References      : http://lkml.org/lkml/2007/8/10/117
Last known good : ?
Submitter       : Paolo Ornati <ornati@fastwebnet.it>
Caused-By       : Avi Kivity <avi@qumranet.com>
                  commit cec9ad279b66793bee0b5009b7ca311060061efd
Handled-By      : Avi Kivity <avi@qumranet.com>
Status          : problem is being debugged



Alpha

Subject         : -Werror compilation problem - make[1]: *** [arch/alpha/kernel/sys_titan.o] Error 1
References      : http://lkml.org/lkml/2007/8/6/137
Last known good : ?
Submitter       : Oliver Falk <oliver@linux-kernel.at>
Caused-By       : ?
Handled-By      : ?
Status          : unknown



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [2/4] 2.6.23-rc3: known regressions
  2007-08-13 17:59 ` [2/4] 2.6.23-rc3: known regressions Michal Piotrowski
@ 2007-08-13 23:29   ` Luca Tettamanti
  2007-08-14  6:37     ` Michal Piotrowski
  0 siblings, 1 reply; 29+ messages in thread
From: Luca Tettamanti @ 2007-08-13 23:29 UTC (permalink / raw)
  To: Michal Piotrowski; +Cc: LKML, Len Brown, linux-acpi, Kamalesh Babulal

Michal Piotrowski ha scritto: 
> Hi all,
> 
> Here is a list of some known regressions in 2.6.23-rc3.
> 
> Feel free to add new regressions/remove fixed etc.
> http://kernelnewbies.org/known_regressions
[...]
> 
> Subject         : FATAL: drivers/acpi/video: sizeof(struct acpi_device_id)=20 is not a modulo of the size of section __mod_acpi_device_table=48
> References      : http://lkml.org/lkml/2007/8/13/584
> Last known good : ?
> Submitter       : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> Caused-By       : ?
> Handled-By      : ?
> Status          : unknown

Hi Michal,
this is the same as:

Modpost

Subject         : modpost bug breaks ia64 cross compilation
References      : http://lkml.org/lkml/2007/7/27/30
                  http://lkml.org/lkml/2007/7/27/418
Last known good : ?
Submitter       : Jan Dittmer <jdi@l4x.org>
Caused-By       : Thomas Renninger <trenn@suse.de>
                  commit 29b71a1ca74491fab9fed09e9d835d840d042690
Handled-By      : ?
Patch           : http://lkml.org/lkml/2007/8/2/211
Status          : patch was suggested

It affects every cross-compilaton from 32 bit host (e.g. x86) to a 64 bit
target (e.g. x86_64) and (AFAICS, I haven't tried) vice versa.
Very annoying :|

Luca
-- 
"New processes are created by other processes, just like new
 humans. New humans are created by other humans, of course,
 not by processes." -- Unix System Administration Handbook

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [2/4] 2.6.23-rc3: known regressions
  2007-08-13 23:29   ` Luca Tettamanti
@ 2007-08-14  6:37     ` Michal Piotrowski
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Piotrowski @ 2007-08-14  6:37 UTC (permalink / raw)
  To: Luca Tettamanti; +Cc: LKML, Len Brown, linux-acpi, Kamalesh Babulal

On 14/08/07, Luca Tettamanti <kronos.it@gmail.com> wrote:
> Michal Piotrowski ha scritto:
> > Hi all,
> >
> > Here is a list of some known regressions in 2.6.23-rc3.
> >
> > Feel free to add new regressions/remove fixed etc.
> > http://kernelnewbies.org/known_regressions
> [...]
> >
> > Subject         : FATAL: drivers/acpi/video: sizeof(struct acpi_device_id)=20 is not a modulo of the size of section __mod_acpi_device_table=48
> > References      : http://lkml.org/lkml/2007/8/13/584
> > Last known good : ?
> > Submitter       : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> > Caused-By       : ?
> > Handled-By      : ?
> > Status          : unknown
>
> Hi Michal,
> this is the same as:
>
> Modpost
>
> Subject         : modpost bug breaks ia64 cross compilation
> References      : http://lkml.org/lkml/2007/7/27/30
>                   http://lkml.org/lkml/2007/7/27/418
> Last known good : ?
> Submitter       : Jan Dittmer <jdi@l4x.org>
> Caused-By       : Thomas Renninger <trenn@suse.de>
>                   commit 29b71a1ca74491fab9fed09e9d835d840d042690
> Handled-By      : ?
> Patch           : http://lkml.org/lkml/2007/8/2/211
> Status          : patch was suggested

Indeed, thanks.

>
> It affects every cross-compilaton from 32 bit host (e.g. x86) to a 64 bit
> target (e.g. x86_64) and (AFAICS, I haven't tried) vice versa.
> Very annoying :|
>
> Luca

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3/4] 2.6.23-rc3: known regressions
  2007-08-13 17:59 ` [3/4] " Michal Piotrowski
@ 2007-08-14 22:09   ` Francois Romieu
  0 siblings, 0 replies; 29+ messages in thread
From: Francois Romieu @ 2007-08-14 22:09 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Linus Torvalds, Andrew Morton, LKML, Netdev, Stephen Hemminger,
	Daniel K., Shish, Karl Meyer

Michal Piotrowski <michal.k.k.piotrowski@gmail.com> :
[...]
> Networking
> 
> Subject         : NETDEV WATCHDOG: eth0: transmit timed out
> References      : http://lkml.org/lkml/2007/8/13/737
> Last known good : ?
> Submitter       : Karl Meyer <adhocrocker@gmail.com>
> Caused-By       : ?
> Handled-By      : ?

  Handled-By      : romieu@fr.zoreil.com

> Status          : unknown

> Subject         : Weird network problems with 2.6.23-rc2
> References      : http://lkml.org/lkml/2007/8/11/40
> Last known good : ?
> Submitter       : Shish <shish@shishnet.org>
> Caused-By       : ?
> Handled-By      : ?
> Status          : unknown

The PR does not give any driver nor hardware detail. :o/

> Subject         : BUG: when using 'brctl stp'
> References      : http://lkml.org/lkml/2007/8/10/441
> Last known good : 2.6.23-rc1
> Submitter       : Daniel K. <daniel@cluded.net>
> Caused-By       : ?
> Handled-By      : ?

  Handled-By      : shemminger@linux-foundation.org

> Status          : unknown

  Status          : fix applied by David Miller

-- 
Ueimor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-13 17:59 ` [4/4] " Michal Piotrowski
@ 2007-08-21  1:41   ` David Brownell
  2007-08-21  2:02     ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: David Brownell @ 2007-08-21  1:41 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: linux-usb-devel, Linus Torvalds, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner

On Monday 13 August 2007, Michal Piotrowski wrote:
> Subject         : EHCI Regression in 2.6.23-rc2
> References      : http://lkml.org/lkml/2007/8/10/81
> Last known good : ?
> Submitter       : Daniel Exner <dex@dragonslave.de>
> Caused-By       : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com>
>                   commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
> Handled-By      : ?
> Status          : unknown

Fixed I believe by Stuart's patch:

http://marc.info/?l=linux-usb-devel&m=118765934722610&w=2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  1:41   ` [linux-usb-devel] " David Brownell
@ 2007-08-21  2:02     ` Linus Torvalds
  2007-08-21  4:02       ` David Brownell
  2007-08-21  4:27       ` [linux-usb-devel] " David Brownell
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-21  2:02 UTC (permalink / raw)
  To: David Brownell
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner, Stuart Hayes

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=us-ascii, Size: 2149 bytes --]



On Mon, 20 Aug 2007, David Brownell wrote:

> On Monday 13 August 2007, Michal Piotrowski wrote:
> > Subject         : EHCI Regression in 2.6.23-rc2
> > References      : http://lkml.org/lkml/2007/8/10/81
> > Last known good : ?
> > Submitter       : Daniel Exner <dex@dragonslave.de>
> > Caused-By       : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com>
> >                   commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
> > Handled-By      : ?
> > Status          : unknown
> 
> Fixed I believe by Stuart's patch:
> 
> http://marc.info/?l=linux-usb-devel&m=118765934722610&w=2

Quite frankly, I'd personally prefer to just revert commit 
196705c9bbc03540429b0f7cf9ee35c2f928a534 entirely instead.

The whole dependency on cpufreq seems totally bogus. Would it not be a lot 
more natural to handle the *result* of the problem (ie the MMF errors by 
broken EHCI controllers?) rather than add totally insane workarounds for 
this case to try to hide them in the first place?

There can be *other* delays in reading memory that have nothing to do with 
cpu frequency shifting, and everything to do with exteme situations on the 
bus. If the stupid EHCI controller has some tight latency issues, that's a 
generic problem.

That commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 just exemplifies what 
is wrong with USB, but it does so by adding incredibly ugly code. I'd 
rather not add even *more* ugly code - especially not for a case where we 
then seem to blame the wrong party (ie a VIA controller that didn't need 
the ugly code in the first place).

Serverworks/Broadcom makes totally crap chips (not just in USB) and then 
doesn't even document their buggy crap hardware. But that is NOT a reason 
for then making the kernel have buggy crap software in it.

So really - is there any reason why we just don't say "Broadcom chips 
suck, and get MMF errors under normal circumstances because they are 
crap". And from *that*, the obvious solution would seem to not be to 
penalize everybody else, but to just say that "We will try to recover from 
MMF errors gracefully by retrying the transaction". Hmm?

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  2:02     ` Linus Torvalds
@ 2007-08-21  4:02       ` David Brownell
  2007-08-21  4:15         ` Linus Torvalds
  2007-08-21  4:27       ` [linux-usb-devel] " David Brownell
  1 sibling, 1 reply; 29+ messages in thread
From: David Brownell @ 2007-08-21  4:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner

On Monday 20 August 2007, Linus Torvalds wrote:
> 
> On Mon, 20 Aug 2007, David Brownell wrote:
> 
> > On Monday 13 August 2007, Michal Piotrowski wrote:
> > > Subject         : EHCI Regression in 2.6.23-rc2
> > > References      : http://lkml.org/lkml/2007/8/10/81
> > > Last known good : ?
> > > Submitter       : Daniel Exner <dex@dragonslave.de>
> > > Caused-By       : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com>
> > >                   commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
> > > Handled-By      : ?
> > > Status          : unknown
> > 
> > Fixed I believe by Stuart's patch:
> > 
> > http://marc.info/?l=linux-usb-devel&m=118765934722610&w=2
> 
> Quite frankly, I'd personally prefer to just revert commit 
> 196705c9bbc03540429b0f7cf9ee35c2f928a534 entirely instead.
> 
> The whole dependency on cpufreq seems totally bogus. Would it not be a lot 
> more natural to handle the *result* of the problem (ie the MMF errors by 
> broken EHCI controllers?) rather than add totally insane workarounds for 
> this case to try to hide them in the first place?

MMF basically means the "Transaction Translating" (TT) hub had data
for the host, but the host didn't collect it in time ... so that some
data was lost.

Unfortunately, that's the type of fault that's especially hard to
recover from.  Plus, very few of the upper layer drivers have even
a minor clue about fault recovery strategies.  And I don't trust
the current hcd/usbcore code that tries to clean up after MMF.

On the plus side, MMF errors have been vanishingly rare until this
cpufreq interaction came up ... which of course implies the downside
that those "handle the result" code paths are all but untested.


> There can be *other* delays in reading memory that have nothing to do with 
> cpu frequency shifting, and everything to do with exteme situations on the 
> bus. If the stupid EHCI controller has some tight latency issues, that's a 
> generic problem.

There could be such problems, yes.  But in practice, I don't know
that we've ever seen them.  (There's a first time for everthing,
yes.  I *just* fetched a webpage where an image got overwritten
about 

> That commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 just exemplifies what 
> is wrong with USB, but it does so by adding incredibly ugly code. I'd 
> rather not add even *more* ugly code - especially not for a case where we 
> then seem to blame the wrong party (ie a VIA controller that didn't need 
> the ugly code in the first place).
> 
> Serverworks/Broadcom makes totally crap chips (not just in USB) and then 
> doesn't even document their buggy crap hardware. But that is NOT a reason 
> for then making the kernel have buggy crap software in it.
> 
> So really - is there any reason why we just don't say "Broadcom chips 
> suck, and get MMF errors under normal circumstances because they are 
> crap". And from *that*, the obvious solution would seem to not be to 
> penalize everybody else, but to just say that "We will try to recover from 
> MMF errors gracefully by retrying the transaction". Hmm?
> 
> 			Linus
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  4:02       ` David Brownell
@ 2007-08-21  4:15         ` Linus Torvalds
  2007-08-21  4:48           ` David Brownell
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-08-21  4:15 UTC (permalink / raw)
  To: David Brownell
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner



On Mon, 20 Aug 2007, David Brownell wrote:
> 
> MMF basically means the "Transaction Translating" (TT) hub had data
> for the host, but the host didn't collect it in time ... so that some
> data was lost.
> 
> Unfortunately, that's the type of fault that's especially hard to
> recover from.

Fair enough. However, it still seems particularly idiotic to
 - penalize everybody
 - mix up two totally unrelated areas (cpufreq and USB) for a bug that is 
   extremely rare and could be handled differently.

For example, if it really ends up being practically impossible to recover 
from split transaction errors, I would still suggest reverting that horrid 
commit, and then just black-listing the known-broken EHCI controllers and 
simply not *do* any split transactions on them. That way there's no 
complexity.

As far as I know, split transactions aren't required anyway, they are just 
a performance optimization.

Basically, we not only know that the commit has caused problems, it's 
fundamentally ugly, fragile, and not very maintainable, and the whole 
reason for doing it is pretty dubious.

Why not just admit that certain hardware is broken (and the vendor isn't 
worth even bothering to be polite with, since they try to screw us every 
chance they get) and cannot reliably do split transactions? Problem 
solved, no real downside, and nobody will even *notice*.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  2:02     ` Linus Torvalds
  2007-08-21  4:02       ` David Brownell
@ 2007-08-21  4:27       ` David Brownell
  1 sibling, 0 replies; 29+ messages in thread
From: David Brownell @ 2007-08-21  4:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner

[ GRR sorry for the premature "SEND" ... mouspads-r-evil ]

On Monday 20 August 2007, Linus Torvalds wrote:
> 
> On Mon, 20 Aug 2007, David Brownell wrote:
> 
> > On Monday 13 August 2007, Michal Piotrowski wrote:
> > > Subject         : EHCI Regression in 2.6.23-rc2
> > > References      : http://lkml.org/lkml/2007/8/10/81
> > > Last known good : ?
> > > Submitter       : Daniel Exner <dex@dragonslave.de>
> > > Caused-By       : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com>
> > >                   commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
> > > Handled-By      : ?
> > > Status          : unknown
> > 
> > Fixed I believe by Stuart's patch:
> > 
> > http://marc.info/?l=linux-usb-devel&m=118765934722610&w=2
> 
> Quite frankly, I'd personally prefer to just revert commit 
> 196705c9bbc03540429b0f7cf9ee35c2f928a534 entirely instead.
> 
> The whole dependency on cpufreq seems totally bogus. Would it not be a lot 
> more natural to handle the *result* of the problem (ie the MMF errors by 
> broken EHCI controllers?) rather than add totally insane workarounds for 
> this case to try to hide them in the first place?

MMF basically means the "Transaction Translating" (TT) Hub had data
for the host, but the host didn't collect it in time ... so that some
data was lost.  In this context, it's only for periodic transfers,
meaning interrupt (for HID keyboards, mice, etc) or isochronous (mostly
audio or video, but sometimes ATM).

Unfortunately, that's the type of fault that's especially hard to
recover from.  Plus, very few of the upper layer drivers have even
a minor clue about fault recovery strategies during I/O ... it's not
supposed to happen for interrupt transfers, one hopes the drivers
will at least die gracefully.   With ISO, faults are expected (since
it's "best effort" delivery, time-priority).

On the plus side, MMF errors have been vanishingly rare until this
cpufreq interaction came up ... which of course implies the downside
that those "handle the result" code paths are all but untested.


> There can be *other* delays in reading memory that have nothing to do with 
> cpu frequency shifting, and everything to do with exteme situations on the 
> bus. If the stupid EHCI controller has some tight latency issues, that's a 
> generic problem.

There could be such problems, yes.  But in practice, I don't know
that we've ever seen them.

(There's a first time for everthing, yes.  I *just* fetched a webpage
where an image got overwritten about half way through fetching it.
Top half was today's, bottom half was tomorrow's, update 12 midnight EST.
Strangest looking JPG ever!  ;)


> That commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 just exemplifies what 
> is wrong with USB, but it does so by adding incredibly ugly code. I'd 
> rather not add even *more* ugly code - especially not for a case where we 
> then seem to blame the wrong party (ie a VIA controller that didn't need 
> the ugly code in the first place).
> 
> Serverworks/Broadcom makes totally crap chips (not just in USB) and then 
> doesn't even document their buggy crap hardware.

That's pretty much how I feel about VIA's USB stuff:
buggy crap that I actively steer people away from.

And that's why it doesn't seem odd to me to add even
more workarounds for VIA-only bugs.


> But that is NOT a reason  
> for then making the kernel have buggy crap software in it.

I don't think we always have the option not to cope with
broken hardware.  We may have some options about *how* we
cope with it though ...


> So really - is there any reason why we just don't say "Broadcom chips 
> suck, and get MMF errors under normal circumstances because they are 
> crap". And from *that*, the obvious solution would seem to not be to 
> penalize everybody else, but to just say that "We will try to recover from 
> MMF errors gracefully by retrying the transaction". Hmm?

Well, see above about why retrying wouldn't work well.  Data lost, and
not recoverable ... although if the events are only USB keyboards/mice,
then the user might be able to recover.  (Stuart?)

Alternatively, if Broadcom then just veto cpufreq changes whenever
there are USB interrupt transfers active.  (We *can* veto changes
in notifiers, yes?)

- Dave



> 
> 			Linus
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  4:15         ` Linus Torvalds
@ 2007-08-21  4:48           ` David Brownell
  2007-08-21  5:31             ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: David Brownell @ 2007-08-21  4:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner

On Monday 20 August 2007, Linus Torvalds wrote:
> 
> On Mon, 20 Aug 2007, David Brownell wrote:
> > 
> > MMF basically means the "Transaction Translating" (TT) hub had data
> > for the host, but the host didn't collect it in time ... so that some
> > data was lost.
> > 
> > Unfortunately, that's the type of fault that's especially hard to
> > recover from.
> 
> Fair enough. However, it still seems particularly idiotic to
>  - penalize everybody
>  - mix up two totally unrelated areas (cpufreq and USB) for a bug that is 
>    extremely rare and could be handled differently.

Yes on #1, but on #2 ... frequency transitions are a common place
for systems to want to hiccup.  Maybe less so on PCs, but it's
hard to say that re-clocking an I/O or memory bus shouldn't affect
the peripherals using it for "realtime" (deadlines) I/O !!


My more complete response suggested maybe just vetoing cpufreq
transitions if the Broadcom chipset (or maybe it's just specific
boards?) finds itself in the awkward configuration ... penalizing
only the people we know could have trouble.


> For example, if it really ends up being practically impossible to recover 
> from split transaction errors, I would still suggest reverting that horrid 
> commit, and then just black-listing the known-broken EHCI controllers and 
> simply not *do* any split transactions on them. That way there's no 
> complexity.
> 
> As far as I know, split transactions aren't required anyway, they are just 
> a performance optimization.

Nope.  Linus, this is at least the second or third time you've
been wrong -- sorry.  But I wish you were right, since they're
such a PITA to cope with.  ;)

Split transactions are how the full and low speed devices bridge
to high speed busses.  Think of the TT hub as a speed converter,
buffering data and then retransmitting it at the other (slower or
faster) speed.  Some systems don't even have a full/low speed host
adapter ... they just have a high speed root hub and rely on some
external TT hubs (maybe on a mainboard) to handle the rest.


> Basically, we not only know that the commit has caused problems, it's 
> fundamentally ugly, fragile, and not very maintainable, and the whole 
> reason for doing it is pretty dubious.
> 
> Why not just admit that certain hardware is broken (and the vendor isn't 
> worth even bothering to be polite with, since they try to screw us every 
> chance they get) and cannot reliably do split transactions? Problem 
> solved, no real downside, and nobody will even *notice*.

Well, I suggested an alternate fix that I hope Stuart will look at.
I think it achieves your goals (only impacting Broadcom systems).

- Dave




> 
> 			Linus
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  4:48           ` David Brownell
@ 2007-08-21  5:31             ` Linus Torvalds
  2007-08-21  5:51               ` Arjan van de Ven
  2007-08-21  6:03               ` Linus Torvalds
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-21  5:31 UTC (permalink / raw)
  To: David Brownell
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner



On Mon, 20 Aug 2007, David Brownell wrote:

> On Monday 20 August 2007, Linus Torvalds wrote:
> > 
> > Fair enough. However, it still seems particularly idiotic to
> >  - penalize everybody
> >  - mix up two totally unrelated areas (cpufreq and USB) for a bug that is 
> >    extremely rare and could be handled differently.
> 
> Yes on #1, but on #2 ... frequency transitions are a common place
> for systems to want to hiccup.

I disagree - it's extremely rare. We've probably had more *software* 
problems with cpufreq than we've had real hardware problems (ie all the 
locking for cpufreq has been pretty painful).

It's also something that probably depends a lot on the particular CPU. 
Some CPU's have very short PLL relocking, and continue running while the 
voltage is changed. Others seem to stop for much longer times. This is not 
somethign that the USB layer should stick its fingers in - because quite 
frankly, it really doesn't have a clue.

> Maybe less so on PCs, but it's hard to say that re-clocking an I/O or 
> memory bus shouldn't affect the peripherals using it for "realtime" 
> (deadlines) I/O !!

Normally the memory bus isn't reclocked (it's *possible* to do, but it's 
complex and can be quite fragile). I think the issue was just that the CPU 
itself was reclocked, and had long latencies for probing the cache. Not 
unlike a sleep state: there can be long DMA latencies just from the CPU 
being in S1!

So adding this special case for CPU frequency is not at all unlike adding 
a special case saying that the CPU cannot go into "halt" state, because 
the DMA latency is too high.

Have we done that? Yes. We actually had a "no-hlt" kernel command line 
flag that literally disabled halting the CPU, because it apparently caused 
problems for some floppy disk setups (and yes, the main reasonable 
explanation was some bad DMA interaction, we never figured it out).

So it might be much better if we instead re-introduced that kind of "DMA 
latency requirement", and letting different subsystems react to that as 
they may.

It really can affect more than just cpufreq - I would not be in the 
*least* surprised if C3 latencies and other things can cause these things 
too! But even within cpufreq, it's quite likely to hit certain situations 
more than others.

(Of course, if C3 latencies are high on a MB that has known DMA latency 
issues, you'd hope that the BIOS has simply disabled C3 entirely in the 
ACPI tables)

> transitions if the Broadcom chipset (or maybe it's just specific
> boards?) finds itself in the awkward configuration ... penalizing
> only the people we know could have trouble.

Yes, that would be more acceptable, I think. 

It is also quite likely that this is not a generic cpufreq issue, but one 
that happens with just a certain class of CPU's - ie some particular CPU 
that is just slower than others at re-clocking.

Just disabling things blindly on cpufreq events, when what it actually 
wants to do is say "I need low DMA latency", and then the cpu-freq layer 
(which can know about these things) may decide internally that it knows 
that a particular setup is not able to have low-latency DMA durign 
frequency relocking.

On other - saner - CPU's, the frequency relock may take a fraction of the 
time, and the CPU is running perfectly the whole time - and it would _not_ 
be affected.

> > As far as I know, split transactions aren't required anyway, they are just 
> > a performance optimization.
> 
> Nope.  Linus, this is at least the second or third time you've
> been wrong -- sorry.  But I wish you were right, since they're
> such a PITA to cope with.  ;)
> 
> Split transactions are how the full and low speed devices bridge
> to high speed busses.  Think of the TT hub as a speed converter,
> buffering data and then retransmitting it at the other (slower or
> faster) speed.  Some systems don't even have a full/low speed host
> adapter ... they just have a high speed root hub and rely on some
> external TT hubs (maybe on a mainboard) to handle the rest.

Ok. But in the meantime, I really think we should just revert the code 
that causes a known regression.

Because, quite frankly, you may not like VIA, but in the bigger picture, 
VIA has been a hell of a lot better than Broadcom. 

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  5:31             ` Linus Torvalds
@ 2007-08-21  5:51               ` Arjan van de Ven
  2007-08-21  6:04                 ` Arjan van de Ven
  2007-08-21  6:25                 ` Linus Torvalds
  2007-08-21  6:03               ` Linus Torvalds
  1 sibling, 2 replies; 29+ messages in thread
From: Arjan van de Ven @ 2007-08-21  5:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner


> Have we done that? Yes. We actually had a "no-hlt" kernel command line 
> flag that literally disabled halting the CPU, because it apparently caused 
> problems for some floppy disk setups (and yes, the main reasonable 
> explanation was some bad DMA interaction, we never figured it out).
> 
> So it might be much better if we instead re-introduced that kind of "DMA 
> latency requirement", and letting different subsystems react to that as 
> they may.


wait.... we HAVE that infrastructure .. see kernel/latency.c ...


> It really can affect more than just cpufreq - I would not be in the 
> *least* surprised if C3 latencies and other things can cause these things 
> too! But even within cpufreq, it's quite likely to hit certain situations 
> more than others.

and kernel/latency.c was designed EXACTLY for that reason. All the USB
layer has to do is to announce it's latency requirement like this:

/* 
 * Some broadcom chips are buggy and can't take more than 5 usec as DMA
 * latency; inform the rest of kernel of this.
 */
if (weird_broadcom_chip())
	set_acceptable_latency("ehci", 5);


and the C-state code will honor it. CPUFREQ doesn't honor it yet but
that's easy to add.. (this assumes the ACPI BIOS informs us correctly
about the cpu behavior, but that's the best we can do obviously unless
you want a table inside the kernel keyed off vendor/model/stepping)




-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  5:31             ` Linus Torvalds
  2007-08-21  5:51               ` Arjan van de Ven
@ 2007-08-21  6:03               ` Linus Torvalds
  2007-08-21  6:34                 ` David Brownell
  1 sibling, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-08-21  6:03 UTC (permalink / raw)
  To: David Brownell
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner



On Mon, 20 Aug 2007, Linus Torvalds wrote:
> 
> Ok. But in the meantime, I really think we should just revert the code 
> that causes a known regression.

Side note: one reason I'm interested in this is that my mac mini (now used 
by the kids) has had a very flaky USB mouse lately. Is it related? I have 
no idea, and probably not, but as a result I'm very interested in any USB 
regressions. There's *something* rotten with that mouse, and while it 
could be the mouse itself going bad, I think it started happening only 
after updating that machine to 2.6.23-rc1.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  5:51               ` Arjan van de Ven
@ 2007-08-21  6:04                 ` Arjan van de Ven
  2007-08-21  6:26                   ` Linus Torvalds
  2007-08-21  6:25                 ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Arjan van de Ven @ 2007-08-21  6:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner


On Mon, 2007-08-20 at 22:51 -0700, Arjan van de Ven wrote:
> and the C-state code will honor it. CPUFREQ doesn't honor it yet but
> that's easy to add..

untested patch to add this to cpufreq; this is probably a good idea in
general even if using the latency framework doesn't end up being used
for fixing this regression...


--- linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c.org	2007-08-20 22:58:32.000000000 -0700
+++ linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c	2007-08-20 23:02:21.000000000 -0700
@@ -1604,6 +1604,12 @@ static int __cpufreq_set_policy(struct c
 	if (ret)
 		goto error_out;
 
+
+	if (system_latency_constraint() < policy->cpuinfo.transition_latency) {
+		ret = -EINVAL;
+		goto error_out;
+	}
+
 	/* notification of the new policy */
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 			CPUFREQ_NOTIFY, policy);



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  6:25                 ` Linus Torvalds
@ 2007-08-21  6:24                   ` Arjan van de Ven
  0 siblings, 0 replies; 29+ messages in thread
From: Arjan van de Ven @ 2007-08-21  6:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner


On Mon, 2007-08-20 at 23:25 -0700, Linus Torvalds wrote:

> Do we actually have the latency information for these things? Especially 
> since I assume a number of people use the specialized direct-hw-access 
> cpufreq drivers..
> 
> I realize that we *have* "transition_latency" at the cpufreq layer, and it 
> is supposed to be in ns, but I wonder how likely it is to bear any 
> relationship to reality, considering that I don't think it's really used 
> for anything.. (yeah, it affects the heuristics, but I don't think it has 
> any _hard_ meaning, so I'd worry that it's not necessarily something that 
> people have tried to make accurate).

trusting the bios to be accurate for all machines is generally a ...
well... it's like trusting politicians in election week.

But it's sort of the best we got; at the same time, what are the odds
that the time is more than an order of magnitude off? if the latency of
the cpu is so large that the requirement ehci puts in is orders of
magnitude more strict, a bit inaccurate data from the bios doesn't
matter all that much. And worst case we make a table with quirks somehow
(probably on cpu vendor/model I suppose)

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  5:51               ` Arjan van de Ven
  2007-08-21  6:04                 ` Arjan van de Ven
@ 2007-08-21  6:25                 ` Linus Torvalds
  2007-08-21  6:24                   ` Arjan van de Ven
  1 sibling, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-08-21  6:25 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner



On Mon, 20 Aug 2007, Arjan van de Ven wrote:
> > 
> > So it might be much better if we instead re-introduced that kind of "DMA 
> > latency requirement", and letting different subsystems react to that as 
> > they may.
> 
> wait.... we HAVE that infrastructure .. see kernel/latency.c ...

Heh. Just shows how wellknown that interface is - it seems like it's only 
used by the ipw2100 driver and "pcm_native".

But yes, that looks like the right thing.

> and the C-state code will honor it. CPUFREQ doesn't honor it yet but
> that's easy to add.. (this assumes the ACPI BIOS informs us correctly
> about the cpu behavior, but that's the best we can do obviously unless
> you want a table inside the kernel keyed off vendor/model/stepping)

Do we actually have the latency information for these things? Especially 
since I assume a number of people use the specialized direct-hw-access 
cpufreq drivers..

I realize that we *have* "transition_latency" at the cpufreq layer, and it 
is supposed to be in ns, but I wonder how likely it is to bear any 
relationship to reality, considering that I don't think it's really used 
for anything.. (yeah, it affects the heuristics, but I don't think it has 
any _hard_ meaning, so I'd worry that it's not necessarily something that 
people have tried to make accurate).

But I dunno.

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  6:04                 ` Arjan van de Ven
@ 2007-08-21  6:26                   ` Linus Torvalds
  2007-08-21  6:28                     ` Arjan van de Ven
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-08-21  6:26 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner



On Mon, 20 Aug 2007, Arjan van de Ven wrote:
> 
> untested patch to add this to cpufreq; this is probably a good idea in
> general even if using the latency framework doesn't end up being used
> for fixing this regression...
> 
> 
> --- linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c.org	2007-08-20 22:58:32.000000000 -0700
> +++ linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c	2007-08-20 23:02:21.000000000 -0700
> @@ -1604,6 +1604,12 @@ static int __cpufreq_set_policy(struct c
>  	if (ret)
>  		goto error_out;
>  
> +
> +	if (system_latency_constraint() < policy->cpuinfo.transition_latency) {

That looks broken. "system_latency_constraint()" is in us, but 
transition_latency is in ns, afaik.

But adding a "/ 1000" to turn the ns into us, and it migth even work.

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  6:26                   ` Linus Torvalds
@ 2007-08-21  6:28                     ` Arjan van de Ven
  2007-08-21  6:45                       ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Arjan van de Ven @ 2007-08-21  6:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner


On Mon, 2007-08-20 at 23:26 -0700, Linus Torvalds wrote:
> 
> On Mon, 20 Aug 2007, Arjan van de Ven wrote:
> > 
> > untested patch to add this to cpufreq; this is probably a good idea in
> > general even if using the latency framework doesn't end up being used
> > for fixing this regression...
> > 
> > 
> > --- linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c.org	2007-08-20 22:58:32.000000000 -0700
> > +++ linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c	2007-08-20 23:02:21.000000000 -0700
> > @@ -1604,6 +1604,12 @@ static int __cpufreq_set_policy(struct c
> >  	if (ret)
> >  		goto error_out;
> >  
> > +
> > +	if (system_latency_constraint() < policy->cpuinfo.transition_latency) {
> 
> That looks broken. "system_latency_constraint()" is in us, but 
> transition_latency is in ns, afaik.
> 
> But adding a "/ 1000" to turn the ns into us, and it migth even work.


eh woops yes indeed.
Shows me for not testing; I'll do that tomorrow when I'm more awake
-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  6:03               ` Linus Torvalds
@ 2007-08-21  6:34                 ` David Brownell
  2007-08-21  6:52                   ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: David Brownell @ 2007-08-21  6:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner

On Monday 20 August 2007, Linus Torvalds wrote:
> 
> On Mon, 20 Aug 2007, Linus Torvalds wrote:
> > 
> > Ok. But in the meantime, I really think we should just revert the code 
> > that causes a known regression.
> 
> Side note: one reason I'm interested in this is that my mac mini (now used 
> by the kids) has had a very flaky USB mouse lately. Is it related? I have 
> no idea, and probably not, but as a result I'm very interested in any USB 
> regressions. There's *something* rotten with that mouse, and while it 
> could be the mouse itself going bad, I think it started happening only 
> after updating that machine to 2.6.23-rc1.

Try disabling USB_SUSPEND ... the rather aggressive powersave
mechanism (autosuspend defaulting to always ON) has made lots
of trouble.  I think that default will change...

- Dave

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  6:28                     ` Arjan van de Ven
@ 2007-08-21  6:45                       ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-21  6:45 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner



On Mon, 20 Aug 2007, Arjan van de Ven wrote:

> 
> On Mon, 2007-08-20 at 23:26 -0700, Linus Torvalds wrote:
> > 
> > On Mon, 20 Aug 2007, Arjan van de Ven wrote:
> > > 
> > > untested patch to add this to cpufreq; this is probably a good idea in
> > > general even if using the latency framework doesn't end up being used
> > > for fixing this regression...
> > > 
> > > 
> > > --- linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c.org	2007-08-20 22:58:32.000000000 -0700
> > > +++ linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c	2007-08-20 23:02:21.000000000 -0700
> > > @@ -1604,6 +1604,12 @@ static int __cpufreq_set_policy(struct c
> > >  	if (ret)
> > >  		goto error_out;
> > >  
> > > +
> > > +	if (system_latency_constraint() < policy->cpuinfo.transition_latency) {
> > 
> > That looks broken. "system_latency_constraint()" is in us, but 
> > transition_latency is in ns, afaik.
> > 
> > But adding a "/ 1000" to turn the ns into us, and it migth even work.
> 
> 
> eh woops yes indeed.
> Shows me for not testing; I'll do that tomorrow when I'm more awake

Side note: I think we migth want to also have some way of telling the user 
*why* we're not doing frequency changes. Maybe as simple as a rate-limited 
printk() or something.

Otherwise, we'll easily be in a situation where some poor sod ends up 
running constantly at lowest frequency, and no way of even seeing why. 
Which sounds like a debugging nightmare.

If the kernel spits out the occasional warning about the latency 
violation, at least we get notified about there being potential problems.

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  6:34                 ` David Brownell
@ 2007-08-21  6:52                   ` Linus Torvalds
  2007-08-21  7:24                     ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-08-21  6:52 UTC (permalink / raw)
  To: David Brownell
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner



On Mon, 20 Aug 2007, David Brownell wrote:
> 
> Try disabling USB_SUSPEND ... the rather aggressive powersave
> mechanism (autosuspend defaulting to always ON) has made lots
> of trouble.  I think that default will change...

It's already disabled - so that's not it.

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  6:52                   ` Linus Torvalds
@ 2007-08-21  7:24                     ` Linus Torvalds
  2007-08-22  3:34                       ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-08-21  7:24 UTC (permalink / raw)
  To: David Brownell
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner



On Mon, 20 Aug 2007, Linus Torvalds wrote:
> 
> On Mon, 20 Aug 2007, David Brownell wrote:
> > 
> > Try disabling USB_SUSPEND ... the rather aggressive powersave
> > mechanism (autosuspend defaulting to always ON) has made lots
> > of trouble.  I think that default will change...
> 
> It's already disabled - so that's not it.

Side note: after reverting 196705c9bb I can't get the mouse to skip any 
more on that mac mini. But since the bad behaviour wasn't 100% reliable to 
begin with, that's not really a guarantee of anything. Two out of three 
kids are off on camp this week, so that machine probably won't be getting 
a lot of testing ;/

That's with an all-intel chipset, no VIA or Broadcom anywhere.

I do find it interesting that VIA apparently ignores the INACTIVE bit: it 
implies that Windows probably doesn't use it, which in turn implies that 
nobody has ever tested it in any real life situation. So I would not be 
shocked to hear that others have problems too.

But as mentioned, the mouse behaviour was flaky (but very noticeable and 
irritating when it did happen), so it's hard to be sure my quick testing 
really was convincing. Which is why it's even harder to bisect ;(.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-21  7:24                     ` Linus Torvalds
@ 2007-08-22  3:34                       ` Linus Torvalds
  2007-08-22 14:42                         ` Stuart_Hayes
  2007-08-22 23:35                         ` Junio C Hamano
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-22  3:34 UTC (permalink / raw)
  To: David Brownell
  Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML,
	Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner



On Tue, 21 Aug 2007, Linus Torvalds wrote:
> 
> Side note: after reverting 196705c9bb I can't get the mouse to skip any 
> more on that mac mini. But since the bad behaviour wasn't 100% reliable to 
> begin with, that's not really a guarantee of anything. Two out of three 
> kids are off on camp this week, so that machine probably won't be getting 
> a lot of testing ;/

Well, my one remaining child said today that "I got so much time on 
webkinz today - yesterday the mouse locked up after five minutes".

Apparently it hadn't had the mouse lock up at all today.

So I really do believe that that 196705c9bb commit caused problems on 
intel-only USB machines too ("ondemand" cpufreq governor, switching 
between 1.0-1.66 Ghz using acpi-cpufreq: totally bog-standard in all 
respects, in other words).

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-22  3:34                       ` Linus Torvalds
@ 2007-08-22 14:42                         ` Stuart_Hayes
  2007-08-22 18:41                           ` Linus Torvalds
  2007-08-22 23:35                         ` Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: Stuart_Hayes @ 2007-08-22 14:42 UTC (permalink / raw)
  To: torvalds, david-b
  Cc: michal.k.k.piotrowski, linux-usb-devel, gregkh, linux-kernel,
	akpm, dex

Linus Torvalds wrote:
> On Tue, 21 Aug 2007, Linus Torvalds wrote:
>> 
>> Side note: after reverting 196705c9bb I can't get the mouse to skip
>> any more on that mac mini. But since the bad behaviour wasn't 100%
>> reliable to begin with, that's not really a guarantee of anything.
>> Two out of three kids are off on camp this week, so that machine
>> probably won't be getting a lot of testing ;/
> 
> Well, my one remaining child said today that "I got so much time on
> webkinz today - yesterday the mouse locked up after five minutes". 
> 
> Apparently it hadn't had the mouse lock up at all today.
> 
> So I really do believe that that 196705c9bb commit caused problems on
> intel-only USB machines too ("ondemand" cpufreq governor, switching
> between 1.0-1.66 Ghz using acpi-cpufreq: totally bog-standard in all
> respects, in other words).   
> 
> 		Linus

If you were running 2.6.26-rc3, that's quite possibly because you didn't
have the follow-up patch that fixed my original patch... it wasn't in
2.6.26-rc3
(http://www.mail-archive.com/linux-usb-devel@lists.sourceforge.net/msg56
523.html).  It fixed a bug with my patch that wasn't necessary with
Broadcom, but was with nVidia (and Intel, I believe).

That could definitely cause mouse lock-ups.  Sorry, that should have
occurred to me yesterday when you mentioned the problem your kids were
seeing, but it didn't for some reason.

Stuart

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-22 14:42                         ` Stuart_Hayes
@ 2007-08-22 18:41                           ` Linus Torvalds
  2007-08-22 20:41                             ` Stuart_Hayes
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-08-22 18:41 UTC (permalink / raw)
  To: Stuart_Hayes
  Cc: david-b, michal.k.k.piotrowski, linux-usb-devel, gregkh,
	linux-kernel, akpm, dex



On Wed, 22 Aug 2007, Stuart_Hayes@Dell.com wrote:
> 
> If you were running 2.6.26-rc3, that's quite possibly because you didn't
> have the follow-up patch that fixed my original patch... it wasn't in
> 2.6.26-rc3

Well, I was running "current git", and it's never been there. So not just 
a -rc3 issue.

> That could definitely cause mouse lock-ups.  Sorry, that should have
> occurred to me yesterday when you mentioned the problem your kids were
> seeing, but it didn't for some reason.

Btw, could it have caused the USB stack to be *really* confused? Some of 
those mouse lockups ended up also locking the machine hard (ie no ping, no 
nothing), and I'm a bit worried that there was something else going on 
too..

That said, if you can actually re-create the MMF problems, could you 
please try the patch that Arjan suggested? Ie add a

	/*
	 * Some broadcom chips are buggy and can't take more than 5 usec as DMA
	 * latency; inform the rest of kernel of this.
	 */
	if (weird_broadcom_chip())
		set_acceptable_latency("ehci", 5);

to the USB driver, and then add something like

	static inline int cpufreq_acceptable_latency(struct cpufreq_policy *policy)
	{
		unsigned long latency;

		/* Policy latency in usec */
		latency = policy->cpuinfo.transition_latency / 1000;

		if (latency > system_latency_constraint())
			return -EINVAL;

		return 0;
	}

adn then add calls to this from both the "__cpufreq_set_policy()" function 
and the "__cpufreq_driver_target()" one too..

That should disable cpufreq with that broken chip, which is perhaps a big 
draconian, but it's certainly better than having the USB layer know about 
cpufreq internals directly.

In the longer run, I think we can move the "system_latency_constraint()" 
checking from the policy registration into each CPU frequency driver, so 
that it could be more dynamically decide about "can we do it right _now_" 
rather than globally saying "we can't do it with this hardware".

				Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions
  2007-08-22 18:41                           ` Linus Torvalds
@ 2007-08-22 20:41                             ` Stuart_Hayes
  0 siblings, 0 replies; 29+ messages in thread
From: Stuart_Hayes @ 2007-08-22 20:41 UTC (permalink / raw)
  To: torvalds
  Cc: david-b, michal.k.k.piotrowski, linux-usb-devel, gregkh,
	linux-kernel, akpm, dex


>> That could definitely cause mouse lock-ups.  Sorry, that should have
>> occurred to me yesterday when you mentioned the problem your kids
>> were seeing, but it didn't for some reason.
> 
> Btw, could it have caused the USB stack to be *really* confused? Some
> of those mouse lockups ended up also locking the machine hard (ie no
> ping, no nothing), and I'm a bit worried that there was something
> else going on too..   
> 

Unfortunately, yes, that sounds exactly like what happened with the
nVidia controller.  The problem was with my patch, but was fixed with
the later patch.  I doubt there was anything else going on.

> That said, if you can actually re-create the MMF problems, could you
> please try the patch that Arjan suggested? Ie add a 
> 
> 	/*
> 	 * Some broadcom chips are buggy and can't take more than 5 usec
as
> DMA 
> 	 * latency; inform the rest of kernel of this.
> 	 */
> 	if (weird_broadcom_chip())
> 		set_acceptable_latency("ehci", 5);
> 
> to the USB driver, and then add something like
> 
> 	static inline int cpufreq_acceptable_latency(struct
cpufreq_policy
> 	*policy) {
> 		unsigned long latency;
> 
> 		/* Policy latency in usec */
> 		latency = policy->cpuinfo.transition_latency / 1000;
> 
> 		if (latency > system_latency_constraint())
> 			return -EINVAL;
> 
> 		return 0;
> 	}
> 
> adn then add calls to this from both the "__cpufreq_set_policy()"
> function and the "__cpufreq_driver_target()" one too.. 
> 
> That should disable cpufreq with that broken chip, which is perhaps a
> big draconian, but it's certainly better than having the USB layer
> know about cpufreq internals directly.  
> 
> In the longer run, I think we can move the
> "system_latency_constraint()" 
> checking from the policy registration into each CPU frequency driver,
> so that it could be more dynamically decide about "can we do it right
> _now_"  
> rather than globally saying "we can't do it with this hardware".
> 
> 				Linus

I will work on that, thank you for the help.

Stuart

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [4/4] 2.6.23-rc3: known regressions
  2007-08-22  3:34                       ` Linus Torvalds
  2007-08-22 14:42                         ` Stuart_Hayes
@ 2007-08-22 23:35                         ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2007-08-22 23:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Well, my one remaining child said today that "I got so much time on 
> webkinz today - yesterday the mouse locked up after five minutes".
>
> Apparently it hadn't had the mouse lock up at all today.
>
> So I really do believe that that 196705c9bb commit caused problems on 
> intel-only USB machines too ("ondemand" cpufreq governor, switching 
> between 1.0-1.66 Ghz using acpi-cpufreq: totally bog-standard in all 
> respects, in other words).
>
> 		Linus

Sorry for being way offtopic, but the above message reminded me
of commit 869659a6 from git.git repository, also this message:

    http://www.gelato.unsw.edu.au/archives/git/0607/24208.html

By the way, Linus, please let me know if you get this message
via vger but not via the direct path to you.  I seem to have
been getting bounces for mails to you and andrew from my ISP in
the past few weeks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-08-22 23:35 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <46C098FD.1030601@googlemail.com>
2007-08-13 17:59 ` [2/4] 2.6.23-rc3: known regressions Michal Piotrowski
2007-08-13 23:29   ` Luca Tettamanti
2007-08-14  6:37     ` Michal Piotrowski
2007-08-13 17:59 ` [3/4] " Michal Piotrowski
2007-08-14 22:09   ` Francois Romieu
2007-08-13 17:59 ` [4/4] " Michal Piotrowski
2007-08-21  1:41   ` [linux-usb-devel] " David Brownell
2007-08-21  2:02     ` Linus Torvalds
2007-08-21  4:02       ` David Brownell
2007-08-21  4:15         ` Linus Torvalds
2007-08-21  4:48           ` David Brownell
2007-08-21  5:31             ` Linus Torvalds
2007-08-21  5:51               ` Arjan van de Ven
2007-08-21  6:04                 ` Arjan van de Ven
2007-08-21  6:26                   ` Linus Torvalds
2007-08-21  6:28                     ` Arjan van de Ven
2007-08-21  6:45                       ` Linus Torvalds
2007-08-21  6:25                 ` Linus Torvalds
2007-08-21  6:24                   ` Arjan van de Ven
2007-08-21  6:03               ` Linus Torvalds
2007-08-21  6:34                 ` David Brownell
2007-08-21  6:52                   ` Linus Torvalds
2007-08-21  7:24                     ` Linus Torvalds
2007-08-22  3:34                       ` Linus Torvalds
2007-08-22 14:42                         ` Stuart_Hayes
2007-08-22 18:41                           ` Linus Torvalds
2007-08-22 20:41                             ` Stuart_Hayes
2007-08-22 23:35                         ` Junio C Hamano
2007-08-21  4:27       ` [linux-usb-devel] " David Brownell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox