* Re: [2/4] 2.6.23-rc3: known regressions [not found] <46C098FD.1030601@googlemail.com> @ 2007-08-13 17:59 ` Michal Piotrowski 2007-08-13 23:29 ` Luca Tettamanti 2007-08-13 17:59 ` [3/4] " Michal Piotrowski 2007-08-13 17:59 ` [4/4] " Michal Piotrowski 2 siblings, 1 reply; 29+ messages in thread From: Michal Piotrowski @ 2007-08-13 17:59 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, LKML, Len Brown, linux-acpi, Plamen Petrov, Michael Sedkowski, Kamalesh Babulal, Norbert Preining, dth Hi all, Here is a list of some known regressions in 2.6.23-rc3. Feel free to add new regressions/remove fixed etc. http://kernelnewbies.org/known_regressions List of Aces Name Regressions fixed since 21-Jun-2007 Adrian Bunk 9 Andi Kleen 5 Linus Torvalds 5 Andrew Morton 4 Al Viro 3 Cornelia Huck 3 Jens Axboe 3 Tejun Heo 3 ACPI Subject : regression from 2.6.23-rc2 to -rc3 BAT missing References : http://lkml.org/lkml/2007/8/13/661 Last known good : ? Submitter : Norbert Preining <preining@logic.at> Caused-By : ? Handled-By : ? Status : unknown Subject : FATAL: drivers/acpi/video: sizeof(struct acpi_device_id)=20 is not a modulo of the size of section __mod_acpi_device_table=48 References : http://lkml.org/lkml/2007/8/13/584 Last known good : ? Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Caused-By : ? Handled-By : ? Status : unknown Subject : MCFG bug on hp nx6310 References : http://lkml.org/lkml/2007/8/8/252 Last known good : ? Submitter : Michael Sedkowski <sedmich@gmail.com> Caused-By : ? Handled-By : ? Status : unknown Subject : 2.6.23-rc1-git10 hangs on boot - needs "acpi=off" References : http://lkml.org/lkml/2007/8/2/24 Last known good : ? Submitter : Plamen Petrov <plamen.petrov@tk.ru.acad.bg> Caused-By : ? Handled-By : Len Brown <len.brown@intel.com> Status : problem is being debugged CPUFREQ Subject : ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not References : http://lkml.org/lkml/2007/7/27/298 http://lkml.org/lkml/2007/7/29/371 Last known good : ? Submitter : dth <dth@dth.net> Caused-By : Len Brown <lenb@kernel.org> commit f79e3185dd0f8650022518d7624c876d8929061b Handled-By : Len Brown <lenb@kernel.org> Status : problem is being debugged Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2/4] 2.6.23-rc3: known regressions 2007-08-13 17:59 ` [2/4] 2.6.23-rc3: known regressions Michal Piotrowski @ 2007-08-13 23:29 ` Luca Tettamanti 2007-08-14 6:37 ` Michal Piotrowski 0 siblings, 1 reply; 29+ messages in thread From: Luca Tettamanti @ 2007-08-13 23:29 UTC (permalink / raw) To: Michal Piotrowski; +Cc: LKML, Len Brown, linux-acpi, Kamalesh Babulal Michal Piotrowski ha scritto: > Hi all, > > Here is a list of some known regressions in 2.6.23-rc3. > > Feel free to add new regressions/remove fixed etc. > http://kernelnewbies.org/known_regressions [...] > > Subject : FATAL: drivers/acpi/video: sizeof(struct acpi_device_id)=20 is not a modulo of the size of section __mod_acpi_device_table=48 > References : http://lkml.org/lkml/2007/8/13/584 > Last known good : ? > Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> > Caused-By : ? > Handled-By : ? > Status : unknown Hi Michal, this is the same as: Modpost Subject : modpost bug breaks ia64 cross compilation References : http://lkml.org/lkml/2007/7/27/30 http://lkml.org/lkml/2007/7/27/418 Last known good : ? Submitter : Jan Dittmer <jdi@l4x.org> Caused-By : Thomas Renninger <trenn@suse.de> commit 29b71a1ca74491fab9fed09e9d835d840d042690 Handled-By : ? Patch : http://lkml.org/lkml/2007/8/2/211 Status : patch was suggested It affects every cross-compilaton from 32 bit host (e.g. x86) to a 64 bit target (e.g. x86_64) and (AFAICS, I haven't tried) vice versa. Very annoying :| Luca -- "New processes are created by other processes, just like new humans. New humans are created by other humans, of course, not by processes." -- Unix System Administration Handbook ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [2/4] 2.6.23-rc3: known regressions 2007-08-13 23:29 ` Luca Tettamanti @ 2007-08-14 6:37 ` Michal Piotrowski 0 siblings, 0 replies; 29+ messages in thread From: Michal Piotrowski @ 2007-08-14 6:37 UTC (permalink / raw) To: Luca Tettamanti; +Cc: LKML, Len Brown, linux-acpi, Kamalesh Babulal On 14/08/07, Luca Tettamanti <kronos.it@gmail.com> wrote: > Michal Piotrowski ha scritto: > > Hi all, > > > > Here is a list of some known regressions in 2.6.23-rc3. > > > > Feel free to add new regressions/remove fixed etc. > > http://kernelnewbies.org/known_regressions > [...] > > > > Subject : FATAL: drivers/acpi/video: sizeof(struct acpi_device_id)=20 is not a modulo of the size of section __mod_acpi_device_table=48 > > References : http://lkml.org/lkml/2007/8/13/584 > > Last known good : ? > > Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> > > Caused-By : ? > > Handled-By : ? > > Status : unknown > > Hi Michal, > this is the same as: > > Modpost > > Subject : modpost bug breaks ia64 cross compilation > References : http://lkml.org/lkml/2007/7/27/30 > http://lkml.org/lkml/2007/7/27/418 > Last known good : ? > Submitter : Jan Dittmer <jdi@l4x.org> > Caused-By : Thomas Renninger <trenn@suse.de> > commit 29b71a1ca74491fab9fed09e9d835d840d042690 > Handled-By : ? > Patch : http://lkml.org/lkml/2007/8/2/211 > Status : patch was suggested Indeed, thanks. > > It affects every cross-compilaton from 32 bit host (e.g. x86) to a 64 bit > target (e.g. x86_64) and (AFAICS, I haven't tried) vice versa. > Very annoying :| > > Luca Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [3/4] 2.6.23-rc3: known regressions [not found] <46C098FD.1030601@googlemail.com> 2007-08-13 17:59 ` [2/4] 2.6.23-rc3: known regressions Michal Piotrowski @ 2007-08-13 17:59 ` Michal Piotrowski 2007-08-14 22:09 ` Francois Romieu 2007-08-13 17:59 ` [4/4] " Michal Piotrowski 2 siblings, 1 reply; 29+ messages in thread From: Michal Piotrowski @ 2007-08-13 17:59 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, LKML, Netdev, Stephen Hemminger, Thomas Meyer, Uwe Bugla, Daniel K., Shish, Karl Meyer Hi all, Here is a list of some known regressions in 2.6.23-rc3. Feel free to add new regressions/remove fixed etc. http://kernelnewbies.org/known_regressions List of Aces Name Regressions fixed since 21-Jun-2007 Adrian Bunk 9 Andi Kleen 5 Linus Torvalds 5 Andrew Morton 4 Al Viro 3 Cornelia Huck 3 Jens Axboe 3 Tejun Heo 3 Networking Subject : NETDEV WATCHDOG: eth0: transmit timed out References : http://lkml.org/lkml/2007/8/13/737 Last known good : ? Submitter : Karl Meyer <adhocrocker@gmail.com> Caused-By : ? Handled-By : ? Status : unknown Subject : Weird network problems with 2.6.23-rc2 References : http://lkml.org/lkml/2007/8/11/40 Last known good : ? Submitter : Shish <shish@shishnet.org> Caused-By : ? Handled-By : ? Status : unknown Subject : BUG: when using 'brctl stp' References : http://lkml.org/lkml/2007/8/10/441 Last known good : 2.6.23-rc1 Submitter : Daniel K. <daniel@cluded.net> Caused-By : ? Handled-By : ? Status : unknown Subject : IP v4 routing is broken References : http://www.stardust.webpages.pl/files/tbf/bugs/bug_report01.txt Last known good : 2.6.22-git2 Submitter : Uwe Bugla <uwe.bugla@gmx.de> Caused-By : ? Handled-By : ? Status : unknown Subject : New wake ups from sky2 References : http://lkml.org/lkml/2007/7/20/386 Last known good : ? Submitter : Thomas Meyer <thomas@m3y3r.de> Caused-By : Stephen Hemminger <shemminger@osdl.org> commit eb35cf60e462491249166182e3e755d3d5d91a28 Handled-By : Stephen Hemminger <shemminger@osdl.org> Status : unknown Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [3/4] 2.6.23-rc3: known regressions 2007-08-13 17:59 ` [3/4] " Michal Piotrowski @ 2007-08-14 22:09 ` Francois Romieu 0 siblings, 0 replies; 29+ messages in thread From: Francois Romieu @ 2007-08-14 22:09 UTC (permalink / raw) To: Michal Piotrowski Cc: Linus Torvalds, Andrew Morton, LKML, Netdev, Stephen Hemminger, Daniel K., Shish, Karl Meyer Michal Piotrowski <michal.k.k.piotrowski@gmail.com> : [...] > Networking > > Subject : NETDEV WATCHDOG: eth0: transmit timed out > References : http://lkml.org/lkml/2007/8/13/737 > Last known good : ? > Submitter : Karl Meyer <adhocrocker@gmail.com> > Caused-By : ? > Handled-By : ? Handled-By : romieu@fr.zoreil.com > Status : unknown > Subject : Weird network problems with 2.6.23-rc2 > References : http://lkml.org/lkml/2007/8/11/40 > Last known good : ? > Submitter : Shish <shish@shishnet.org> > Caused-By : ? > Handled-By : ? > Status : unknown The PR does not give any driver nor hardware detail. :o/ > Subject : BUG: when using 'brctl stp' > References : http://lkml.org/lkml/2007/8/10/441 > Last known good : 2.6.23-rc1 > Submitter : Daniel K. <daniel@cluded.net> > Caused-By : ? > Handled-By : ? Handled-By : shemminger@linux-foundation.org > Status : unknown Status : fix applied by David Miller -- Ueimor ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [4/4] 2.6.23-rc3: known regressions [not found] <46C098FD.1030601@googlemail.com> 2007-08-13 17:59 ` [2/4] 2.6.23-rc3: known regressions Michal Piotrowski 2007-08-13 17:59 ` [3/4] " Michal Piotrowski @ 2007-08-13 17:59 ` Michal Piotrowski 2007-08-21 1:41 ` [linux-usb-devel] " David Brownell 2 siblings, 1 reply; 29+ messages in thread From: Michal Piotrowski @ 2007-08-13 17:59 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, LKML, linux-usb-devel, Greg KH, Alan Stern, Oliver Neukum, Tino Keitel, Stuart_Hayes@Dell.com, Daniel Exner, kvm-devel, Avi Kivity, Paolo Ornati, linux-alpha, Richard Henderson, Ivan Kokshaysky, Oliver Falk, linux-fsdevel, nfs, Trond Myklebust, Andrew Clayton Hi all, Here is a list of some known regressions in 2.6.23-rc3. Feel free to add new regressions/remove fixed etc. http://kernelnewbies.org/known_regressions List of Aces Name Regressions fixed since 21-Jun-2007 Adrian Bunk 9 Andi Kleen 5 Linus Torvalds 5 Andrew Morton 4 Al Viro 3 Cornelia Huck 3 Jens Axboe 3 Tejun Heo 3 FS Subject : [NFSD OOPS] 2.6.23-rc1-git10 References : http://lkml.org/lkml/2007/8/2/462 Last known good : ? Submitter : Andrew Clayton <andrew@digital-domain.net> Caused-By : ? Handled-By : ? Status : unknown USB Subject : EHCI Regression in 2.6.23-rc2 References : http://lkml.org/lkml/2007/8/10/81 Last known good : ? Submitter : Daniel Exner <dex@dragonslave.de> Caused-By : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com> commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 Handled-By : ? Status : unknown Subject : 2.6.23-rc1: USB hard disk broken References : http://lkml.org/lkml/2007/7/25/62 Last known good : ? Submitter : Tino Keitel <tino.keitel@gmx.de> Caused-By : ? Handled-By : Oliver Neukum <oliver@neukum.org> Status : unknown Virtualization Subject : WARNING: at arch/x86_64/kernel/smp.c:379 smp_call_function_single() References : http://lkml.org/lkml/2007/8/10/117 Last known good : ? Submitter : Paolo Ornati <ornati@fastwebnet.it> Caused-By : Avi Kivity <avi@qumranet.com> commit cec9ad279b66793bee0b5009b7ca311060061efd Handled-By : Avi Kivity <avi@qumranet.com> Status : problem is being debugged Alpha Subject : -Werror compilation problem - make[1]: *** [arch/alpha/kernel/sys_titan.o] Error 1 References : http://lkml.org/lkml/2007/8/6/137 Last known good : ? Submitter : Oliver Falk <oliver@linux-kernel.at> Caused-By : ? Handled-By : ? Status : unknown Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-13 17:59 ` [4/4] " Michal Piotrowski @ 2007-08-21 1:41 ` David Brownell 2007-08-21 2:02 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: David Brownell @ 2007-08-21 1:41 UTC (permalink / raw) To: Michal Piotrowski Cc: linux-usb-devel, Linus Torvalds, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Monday 13 August 2007, Michal Piotrowski wrote: > Subject : EHCI Regression in 2.6.23-rc2 > References : http://lkml.org/lkml/2007/8/10/81 > Last known good : ? > Submitter : Daniel Exner <dex@dragonslave.de> > Caused-By : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com> > commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 > Handled-By : ? > Status : unknown Fixed I believe by Stuart's patch: http://marc.info/?l=linux-usb-devel&m=118765934722610&w=2 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 1:41 ` [linux-usb-devel] " David Brownell @ 2007-08-21 2:02 ` Linus Torvalds 2007-08-21 4:02 ` David Brownell 2007-08-21 4:27 ` [linux-usb-devel] " David Brownell 0 siblings, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2007-08-21 2:02 UTC (permalink / raw) To: David Brownell Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner, Stuart Hayes [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: TEXT/PLAIN; charset=us-ascii, Size: 2149 bytes --] On Mon, 20 Aug 2007, David Brownell wrote: > On Monday 13 August 2007, Michal Piotrowski wrote: > > Subject : EHCI Regression in 2.6.23-rc2 > > References : http://lkml.org/lkml/2007/8/10/81 > > Last known good : ? > > Submitter : Daniel Exner <dex@dragonslave.de> > > Caused-By : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com> > > commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 > > Handled-By : ? > > Status : unknown > > Fixed I believe by Stuart's patch: > > http://marc.info/?l=linux-usb-devel&m=118765934722610&w=2 Quite frankly, I'd personally prefer to just revert commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 entirely instead. The whole dependency on cpufreq seems totally bogus. Would it not be a lot more natural to handle the *result* of the problem (ie the MMF errors by broken EHCI controllers?) rather than add totally insane workarounds for this case to try to hide them in the first place? There can be *other* delays in reading memory that have nothing to do with cpu frequency shifting, and everything to do with exteme situations on the bus. If the stupid EHCI controller has some tight latency issues, that's a generic problem. That commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 just exemplifies what is wrong with USB, but it does so by adding incredibly ugly code. I'd rather not add even *more* ugly code - especially not for a case where we then seem to blame the wrong party (ie a VIA controller that didn't need the ugly code in the first place). Serverworks/Broadcom makes totally crap chips (not just in USB) and then doesn't even document their buggy crap hardware. But that is NOT a reason for then making the kernel have buggy crap software in it. So really - is there any reason why we just don't say "Broadcom chips suck, and get MMF errors under normal circumstances because they are crap". And from *that*, the obvious solution would seem to not be to penalize everybody else, but to just say that "We will try to recover from MMF errors gracefully by retrying the transaction". Hmm? Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 2:02 ` Linus Torvalds @ 2007-08-21 4:02 ` David Brownell 2007-08-21 4:15 ` Linus Torvalds 2007-08-21 4:27 ` [linux-usb-devel] " David Brownell 1 sibling, 1 reply; 29+ messages in thread From: David Brownell @ 2007-08-21 4:02 UTC (permalink / raw) To: Linus Torvalds Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Monday 20 August 2007, Linus Torvalds wrote: > > On Mon, 20 Aug 2007, David Brownell wrote: > > > On Monday 13 August 2007, Michal Piotrowski wrote: > > > Subject     : EHCI Regression in 2.6.23-rc2 > > > References    : http://lkml.org/lkml/2007/8/10/81 > > > Last known good : ? > > > Submitter    : Daniel Exner <dex@dragonslave.de> > > > Caused-By    : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com> > > >          commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 > > > Handled-By    : ? > > > Status      : unknown > > > > Fixed I believe by Stuart's patch: > > > > http://marc.info/?l=linux-usb-devel&m=118765934722610&w=2 > > Quite frankly, I'd personally prefer to just revert commit > 196705c9bbc03540429b0f7cf9ee35c2f928a534 entirely instead. > > The whole dependency on cpufreq seems totally bogus. Would it not be a lot > more natural to handle the *result* of the problem (ie the MMF errors by > broken EHCI controllers?) rather than add totally insane workarounds for > this case to try to hide them in the first place? MMF basically means the "Transaction Translating" (TT) hub had data for the host, but the host didn't collect it in time ... so that some data was lost. Unfortunately, that's the type of fault that's especially hard to recover from. Plus, very few of the upper layer drivers have even a minor clue about fault recovery strategies. And I don't trust the current hcd/usbcore code that tries to clean up after MMF. On the plus side, MMF errors have been vanishingly rare until this cpufreq interaction came up ... which of course implies the downside that those "handle the result" code paths are all but untested. > There can be *other* delays in reading memory that have nothing to do with > cpu frequency shifting, and everything to do with exteme situations on the > bus. If the stupid EHCI controller has some tight latency issues, that's a > generic problem. There could be such problems, yes. But in practice, I don't know that we've ever seen them. (There's a first time for everthing, yes. I *just* fetched a webpage where an image got overwritten about > That commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 just exemplifies what > is wrong with USB, but it does so by adding incredibly ugly code. I'd > rather not add even *more* ugly code - especially not for a case where we > then seem to blame the wrong party (ie a VIA controller that didn't need > the ugly code in the first place). > > Serverworks/Broadcom makes totally crap chips (not just in USB) and then > doesn't even document their buggy crap hardware. But that is NOT a reason > for then making the kernel have buggy crap software in it. > > So really - is there any reason why we just don't say "Broadcom chips > suck, and get MMF errors under normal circumstances because they are > crap". And from *that*, the obvious solution would seem to not be to > penalize everybody else, but to just say that "We will try to recover from > MMF errors gracefully by retrying the transaction". Hmm? > > Linus > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 4:02 ` David Brownell @ 2007-08-21 4:15 ` Linus Torvalds 2007-08-21 4:48 ` David Brownell 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2007-08-21 4:15 UTC (permalink / raw) To: David Brownell Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 20 Aug 2007, David Brownell wrote: > > MMF basically means the "Transaction Translating" (TT) hub had data > for the host, but the host didn't collect it in time ... so that some > data was lost. > > Unfortunately, that's the type of fault that's especially hard to > recover from. Fair enough. However, it still seems particularly idiotic to - penalize everybody - mix up two totally unrelated areas (cpufreq and USB) for a bug that is extremely rare and could be handled differently. For example, if it really ends up being practically impossible to recover from split transaction errors, I would still suggest reverting that horrid commit, and then just black-listing the known-broken EHCI controllers and simply not *do* any split transactions on them. That way there's no complexity. As far as I know, split transactions aren't required anyway, they are just a performance optimization. Basically, we not only know that the commit has caused problems, it's fundamentally ugly, fragile, and not very maintainable, and the whole reason for doing it is pretty dubious. Why not just admit that certain hardware is broken (and the vendor isn't worth even bothering to be polite with, since they try to screw us every chance they get) and cannot reliably do split transactions? Problem solved, no real downside, and nobody will even *notice*. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 4:15 ` Linus Torvalds @ 2007-08-21 4:48 ` David Brownell 2007-08-21 5:31 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: David Brownell @ 2007-08-21 4:48 UTC (permalink / raw) To: Linus Torvalds Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Monday 20 August 2007, Linus Torvalds wrote: > > On Mon, 20 Aug 2007, David Brownell wrote: > > > > MMF basically means the "Transaction Translating" (TT) hub had data > > for the host, but the host didn't collect it in time ... so that some > > data was lost. > > > > Unfortunately, that's the type of fault that's especially hard to > > recover from. > > Fair enough. However, it still seems particularly idiotic to > - penalize everybody > - mix up two totally unrelated areas (cpufreq and USB) for a bug that is > extremely rare and could be handled differently. Yes on #1, but on #2 ... frequency transitions are a common place for systems to want to hiccup. Maybe less so on PCs, but it's hard to say that re-clocking an I/O or memory bus shouldn't affect the peripherals using it for "realtime" (deadlines) I/O !! My more complete response suggested maybe just vetoing cpufreq transitions if the Broadcom chipset (or maybe it's just specific boards?) finds itself in the awkward configuration ... penalizing only the people we know could have trouble. > For example, if it really ends up being practically impossible to recover > from split transaction errors, I would still suggest reverting that horrid > commit, and then just black-listing the known-broken EHCI controllers and > simply not *do* any split transactions on them. That way there's no > complexity. > > As far as I know, split transactions aren't required anyway, they are just > a performance optimization. Nope. Linus, this is at least the second or third time you've been wrong -- sorry. But I wish you were right, since they're such a PITA to cope with. ;) Split transactions are how the full and low speed devices bridge to high speed busses. Think of the TT hub as a speed converter, buffering data and then retransmitting it at the other (slower or faster) speed. Some systems don't even have a full/low speed host adapter ... they just have a high speed root hub and rely on some external TT hubs (maybe on a mainboard) to handle the rest. > Basically, we not only know that the commit has caused problems, it's > fundamentally ugly, fragile, and not very maintainable, and the whole > reason for doing it is pretty dubious. > > Why not just admit that certain hardware is broken (and the vendor isn't > worth even bothering to be polite with, since they try to screw us every > chance they get) and cannot reliably do split transactions? Problem > solved, no real downside, and nobody will even *notice*. Well, I suggested an alternate fix that I hope Stuart will look at. I think it achieves your goals (only impacting Broadcom systems). - Dave > > Linus > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 4:48 ` David Brownell @ 2007-08-21 5:31 ` Linus Torvalds 2007-08-21 5:51 ` Arjan van de Ven 2007-08-21 6:03 ` Linus Torvalds 0 siblings, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2007-08-21 5:31 UTC (permalink / raw) To: David Brownell Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 20 Aug 2007, David Brownell wrote: > On Monday 20 August 2007, Linus Torvalds wrote: > > > > Fair enough. However, it still seems particularly idiotic to > > - penalize everybody > > - mix up two totally unrelated areas (cpufreq and USB) for a bug that is > > extremely rare and could be handled differently. > > Yes on #1, but on #2 ... frequency transitions are a common place > for systems to want to hiccup. I disagree - it's extremely rare. We've probably had more *software* problems with cpufreq than we've had real hardware problems (ie all the locking for cpufreq has been pretty painful). It's also something that probably depends a lot on the particular CPU. Some CPU's have very short PLL relocking, and continue running while the voltage is changed. Others seem to stop for much longer times. This is not somethign that the USB layer should stick its fingers in - because quite frankly, it really doesn't have a clue. > Maybe less so on PCs, but it's hard to say that re-clocking an I/O or > memory bus shouldn't affect the peripherals using it for "realtime" > (deadlines) I/O !! Normally the memory bus isn't reclocked (it's *possible* to do, but it's complex and can be quite fragile). I think the issue was just that the CPU itself was reclocked, and had long latencies for probing the cache. Not unlike a sleep state: there can be long DMA latencies just from the CPU being in S1! So adding this special case for CPU frequency is not at all unlike adding a special case saying that the CPU cannot go into "halt" state, because the DMA latency is too high. Have we done that? Yes. We actually had a "no-hlt" kernel command line flag that literally disabled halting the CPU, because it apparently caused problems for some floppy disk setups (and yes, the main reasonable explanation was some bad DMA interaction, we never figured it out). So it might be much better if we instead re-introduced that kind of "DMA latency requirement", and letting different subsystems react to that as they may. It really can affect more than just cpufreq - I would not be in the *least* surprised if C3 latencies and other things can cause these things too! But even within cpufreq, it's quite likely to hit certain situations more than others. (Of course, if C3 latencies are high on a MB that has known DMA latency issues, you'd hope that the BIOS has simply disabled C3 entirely in the ACPI tables) > transitions if the Broadcom chipset (or maybe it's just specific > boards?) finds itself in the awkward configuration ... penalizing > only the people we know could have trouble. Yes, that would be more acceptable, I think. It is also quite likely that this is not a generic cpufreq issue, but one that happens with just a certain class of CPU's - ie some particular CPU that is just slower than others at re-clocking. Just disabling things blindly on cpufreq events, when what it actually wants to do is say "I need low DMA latency", and then the cpu-freq layer (which can know about these things) may decide internally that it knows that a particular setup is not able to have low-latency DMA durign frequency relocking. On other - saner - CPU's, the frequency relock may take a fraction of the time, and the CPU is running perfectly the whole time - and it would _not_ be affected. > > As far as I know, split transactions aren't required anyway, they are just > > a performance optimization. > > Nope. Linus, this is at least the second or third time you've > been wrong -- sorry. But I wish you were right, since they're > such a PITA to cope with. ;) > > Split transactions are how the full and low speed devices bridge > to high speed busses. Think of the TT hub as a speed converter, > buffering data and then retransmitting it at the other (slower or > faster) speed. Some systems don't even have a full/low speed host > adapter ... they just have a high speed root hub and rely on some > external TT hubs (maybe on a mainboard) to handle the rest. Ok. But in the meantime, I really think we should just revert the code that causes a known regression. Because, quite frankly, you may not like VIA, but in the bigger picture, VIA has been a hell of a lot better than Broadcom. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 5:31 ` Linus Torvalds @ 2007-08-21 5:51 ` Arjan van de Ven 2007-08-21 6:04 ` Arjan van de Ven 2007-08-21 6:25 ` Linus Torvalds 2007-08-21 6:03 ` Linus Torvalds 1 sibling, 2 replies; 29+ messages in thread From: Arjan van de Ven @ 2007-08-21 5:51 UTC (permalink / raw) To: Linus Torvalds Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner > Have we done that? Yes. We actually had a "no-hlt" kernel command line > flag that literally disabled halting the CPU, because it apparently caused > problems for some floppy disk setups (and yes, the main reasonable > explanation was some bad DMA interaction, we never figured it out). > > So it might be much better if we instead re-introduced that kind of "DMA > latency requirement", and letting different subsystems react to that as > they may. wait.... we HAVE that infrastructure .. see kernel/latency.c ... > It really can affect more than just cpufreq - I would not be in the > *least* surprised if C3 latencies and other things can cause these things > too! But even within cpufreq, it's quite likely to hit certain situations > more than others. and kernel/latency.c was designed EXACTLY for that reason. All the USB layer has to do is to announce it's latency requirement like this: /* * Some broadcom chips are buggy and can't take more than 5 usec as DMA * latency; inform the rest of kernel of this. */ if (weird_broadcom_chip()) set_acceptable_latency("ehci", 5); and the C-state code will honor it. CPUFREQ doesn't honor it yet but that's easy to add.. (this assumes the ACPI BIOS informs us correctly about the cpu behavior, but that's the best we can do obviously unless you want a table inside the kernel keyed off vendor/model/stepping) -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 5:51 ` Arjan van de Ven @ 2007-08-21 6:04 ` Arjan van de Ven 2007-08-21 6:26 ` Linus Torvalds 2007-08-21 6:25 ` Linus Torvalds 1 sibling, 1 reply; 29+ messages in thread From: Arjan van de Ven @ 2007-08-21 6:04 UTC (permalink / raw) To: Linus Torvalds Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 2007-08-20 at 22:51 -0700, Arjan van de Ven wrote: > and the C-state code will honor it. CPUFREQ doesn't honor it yet but > that's easy to add.. untested patch to add this to cpufreq; this is probably a good idea in general even if using the latency framework doesn't end up being used for fixing this regression... --- linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c.org 2007-08-20 22:58:32.000000000 -0700 +++ linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c 2007-08-20 23:02:21.000000000 -0700 @@ -1604,6 +1604,12 @@ static int __cpufreq_set_policy(struct c if (ret) goto error_out; + + if (system_latency_constraint() < policy->cpuinfo.transition_latency) { + ret = -EINVAL; + goto error_out; + } + /* notification of the new policy */ blocking_notifier_call_chain(&cpufreq_policy_notifier_list, CPUFREQ_NOTIFY, policy); ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 6:04 ` Arjan van de Ven @ 2007-08-21 6:26 ` Linus Torvalds 2007-08-21 6:28 ` Arjan van de Ven 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2007-08-21 6:26 UTC (permalink / raw) To: Arjan van de Ven Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 20 Aug 2007, Arjan van de Ven wrote: > > untested patch to add this to cpufreq; this is probably a good idea in > general even if using the latency framework doesn't end up being used > for fixing this regression... > > > --- linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c.org 2007-08-20 22:58:32.000000000 -0700 > +++ linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c 2007-08-20 23:02:21.000000000 -0700 > @@ -1604,6 +1604,12 @@ static int __cpufreq_set_policy(struct c > if (ret) > goto error_out; > > + > + if (system_latency_constraint() < policy->cpuinfo.transition_latency) { That looks broken. "system_latency_constraint()" is in us, but transition_latency is in ns, afaik. But adding a "/ 1000" to turn the ns into us, and it migth even work. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 6:26 ` Linus Torvalds @ 2007-08-21 6:28 ` Arjan van de Ven 2007-08-21 6:45 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Arjan van de Ven @ 2007-08-21 6:28 UTC (permalink / raw) To: Linus Torvalds Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 2007-08-20 at 23:26 -0700, Linus Torvalds wrote: > > On Mon, 20 Aug 2007, Arjan van de Ven wrote: > > > > untested patch to add this to cpufreq; this is probably a good idea in > > general even if using the latency framework doesn't end up being used > > for fixing this regression... > > > > > > --- linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c.org 2007-08-20 22:58:32.000000000 -0700 > > +++ linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c 2007-08-20 23:02:21.000000000 -0700 > > @@ -1604,6 +1604,12 @@ static int __cpufreq_set_policy(struct c > > if (ret) > > goto error_out; > > > > + > > + if (system_latency_constraint() < policy->cpuinfo.transition_latency) { > > That looks broken. "system_latency_constraint()" is in us, but > transition_latency is in ns, afaik. > > But adding a "/ 1000" to turn the ns into us, and it migth even work. eh woops yes indeed. Shows me for not testing; I'll do that tomorrow when I'm more awake -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 6:28 ` Arjan van de Ven @ 2007-08-21 6:45 ` Linus Torvalds 0 siblings, 0 replies; 29+ messages in thread From: Linus Torvalds @ 2007-08-21 6:45 UTC (permalink / raw) To: Arjan van de Ven Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 20 Aug 2007, Arjan van de Ven wrote: > > On Mon, 2007-08-20 at 23:26 -0700, Linus Torvalds wrote: > > > > On Mon, 20 Aug 2007, Arjan van de Ven wrote: > > > > > > untested patch to add this to cpufreq; this is probably a good idea in > > > general even if using the latency framework doesn't end up being used > > > for fixing this regression... > > > > > > > > > --- linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c.org 2007-08-20 22:58:32.000000000 -0700 > > > +++ linux-2.6.23-rc2/drivers/cpufreq/cpufreq.c 2007-08-20 23:02:21.000000000 -0700 > > > @@ -1604,6 +1604,12 @@ static int __cpufreq_set_policy(struct c > > > if (ret) > > > goto error_out; > > > > > > + > > > + if (system_latency_constraint() < policy->cpuinfo.transition_latency) { > > > > That looks broken. "system_latency_constraint()" is in us, but > > transition_latency is in ns, afaik. > > > > But adding a "/ 1000" to turn the ns into us, and it migth even work. > > > eh woops yes indeed. > Shows me for not testing; I'll do that tomorrow when I'm more awake Side note: I think we migth want to also have some way of telling the user *why* we're not doing frequency changes. Maybe as simple as a rate-limited printk() or something. Otherwise, we'll easily be in a situation where some poor sod ends up running constantly at lowest frequency, and no way of even seeing why. Which sounds like a debugging nightmare. If the kernel spits out the occasional warning about the latency violation, at least we get notified about there being potential problems. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 5:51 ` Arjan van de Ven 2007-08-21 6:04 ` Arjan van de Ven @ 2007-08-21 6:25 ` Linus Torvalds 2007-08-21 6:24 ` Arjan van de Ven 1 sibling, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2007-08-21 6:25 UTC (permalink / raw) To: Arjan van de Ven Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 20 Aug 2007, Arjan van de Ven wrote: > > > > So it might be much better if we instead re-introduced that kind of "DMA > > latency requirement", and letting different subsystems react to that as > > they may. > > wait.... we HAVE that infrastructure .. see kernel/latency.c ... Heh. Just shows how wellknown that interface is - it seems like it's only used by the ipw2100 driver and "pcm_native". But yes, that looks like the right thing. > and the C-state code will honor it. CPUFREQ doesn't honor it yet but > that's easy to add.. (this assumes the ACPI BIOS informs us correctly > about the cpu behavior, but that's the best we can do obviously unless > you want a table inside the kernel keyed off vendor/model/stepping) Do we actually have the latency information for these things? Especially since I assume a number of people use the specialized direct-hw-access cpufreq drivers.. I realize that we *have* "transition_latency" at the cpufreq layer, and it is supposed to be in ns, but I wonder how likely it is to bear any relationship to reality, considering that I don't think it's really used for anything.. (yeah, it affects the heuristics, but I don't think it has any _hard_ meaning, so I'd worry that it's not necessarily something that people have tried to make accurate). But I dunno. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 6:25 ` Linus Torvalds @ 2007-08-21 6:24 ` Arjan van de Ven 0 siblings, 0 replies; 29+ messages in thread From: Arjan van de Ven @ 2007-08-21 6:24 UTC (permalink / raw) To: Linus Torvalds Cc: David Brownell, Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 2007-08-20 at 23:25 -0700, Linus Torvalds wrote: > Do we actually have the latency information for these things? Especially > since I assume a number of people use the specialized direct-hw-access > cpufreq drivers.. > > I realize that we *have* "transition_latency" at the cpufreq layer, and it > is supposed to be in ns, but I wonder how likely it is to bear any > relationship to reality, considering that I don't think it's really used > for anything.. (yeah, it affects the heuristics, but I don't think it has > any _hard_ meaning, so I'd worry that it's not necessarily something that > people have tried to make accurate). trusting the bios to be accurate for all machines is generally a ... well... it's like trusting politicians in election week. But it's sort of the best we got; at the same time, what are the odds that the time is more than an order of magnitude off? if the latency of the cpu is so large that the requirement ehci puts in is orders of magnitude more strict, a bit inaccurate data from the bios doesn't matter all that much. And worst case we make a table with quirks somehow (probably on cpu vendor/model I suppose) -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 5:31 ` Linus Torvalds 2007-08-21 5:51 ` Arjan van de Ven @ 2007-08-21 6:03 ` Linus Torvalds 2007-08-21 6:34 ` David Brownell 1 sibling, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2007-08-21 6:03 UTC (permalink / raw) To: David Brownell Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 20 Aug 2007, Linus Torvalds wrote: > > Ok. But in the meantime, I really think we should just revert the code > that causes a known regression. Side note: one reason I'm interested in this is that my mac mini (now used by the kids) has had a very flaky USB mouse lately. Is it related? I have no idea, and probably not, but as a result I'm very interested in any USB regressions. There's *something* rotten with that mouse, and while it could be the mouse itself going bad, I think it started happening only after updating that machine to 2.6.23-rc1. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 6:03 ` Linus Torvalds @ 2007-08-21 6:34 ` David Brownell 2007-08-21 6:52 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: David Brownell @ 2007-08-21 6:34 UTC (permalink / raw) To: Linus Torvalds Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Monday 20 August 2007, Linus Torvalds wrote: > > On Mon, 20 Aug 2007, Linus Torvalds wrote: > > > > Ok. But in the meantime, I really think we should just revert the code > > that causes a known regression. > > Side note: one reason I'm interested in this is that my mac mini (now used > by the kids) has had a very flaky USB mouse lately. Is it related? I have > no idea, and probably not, but as a result I'm very interested in any USB > regressions. There's *something* rotten with that mouse, and while it > could be the mouse itself going bad, I think it started happening only > after updating that machine to 2.6.23-rc1. Try disabling USB_SUSPEND ... the rather aggressive powersave mechanism (autosuspend defaulting to always ON) has made lots of trouble. I think that default will change... - Dave ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 6:34 ` David Brownell @ 2007-08-21 6:52 ` Linus Torvalds 2007-08-21 7:24 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2007-08-21 6:52 UTC (permalink / raw) To: David Brownell Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 20 Aug 2007, David Brownell wrote: > > Try disabling USB_SUSPEND ... the rather aggressive powersave > mechanism (autosuspend defaulting to always ON) has made lots > of trouble. I think that default will change... It's already disabled - so that's not it. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 6:52 ` Linus Torvalds @ 2007-08-21 7:24 ` Linus Torvalds 2007-08-22 3:34 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2007-08-21 7:24 UTC (permalink / raw) To: David Brownell Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Mon, 20 Aug 2007, Linus Torvalds wrote: > > On Mon, 20 Aug 2007, David Brownell wrote: > > > > Try disabling USB_SUSPEND ... the rather aggressive powersave > > mechanism (autosuspend defaulting to always ON) has made lots > > of trouble. I think that default will change... > > It's already disabled - so that's not it. Side note: after reverting 196705c9bb I can't get the mouse to skip any more on that mac mini. But since the bad behaviour wasn't 100% reliable to begin with, that's not really a guarantee of anything. Two out of three kids are off on camp this week, so that machine probably won't be getting a lot of testing ;/ That's with an all-intel chipset, no VIA or Broadcom anywhere. I do find it interesting that VIA apparently ignores the INACTIVE bit: it implies that Windows probably doesn't use it, which in turn implies that nobody has ever tested it in any real life situation. So I would not be shocked to hear that others have problems too. But as mentioned, the mouse behaviour was flaky (but very noticeable and irritating when it did happen), so it's hard to be sure my quick testing really was convincing. Which is why it's even harder to bisect ;(. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 7:24 ` Linus Torvalds @ 2007-08-22 3:34 ` Linus Torvalds 2007-08-22 14:42 ` Stuart_Hayes 2007-08-22 23:35 ` Junio C Hamano 0 siblings, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2007-08-22 3:34 UTC (permalink / raw) To: David Brownell Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner On Tue, 21 Aug 2007, Linus Torvalds wrote: > > Side note: after reverting 196705c9bb I can't get the mouse to skip any > more on that mac mini. But since the bad behaviour wasn't 100% reliable to > begin with, that's not really a guarantee of anything. Two out of three > kids are off on camp this week, so that machine probably won't be getting > a lot of testing ;/ Well, my one remaining child said today that "I got so much time on webkinz today - yesterday the mouse locked up after five minutes". Apparently it hadn't had the mouse lock up at all today. So I really do believe that that 196705c9bb commit caused problems on intel-only USB machines too ("ondemand" cpufreq governor, switching between 1.0-1.66 Ghz using acpi-cpufreq: totally bog-standard in all respects, in other words). Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-22 3:34 ` Linus Torvalds @ 2007-08-22 14:42 ` Stuart_Hayes 2007-08-22 18:41 ` Linus Torvalds 2007-08-22 23:35 ` Junio C Hamano 1 sibling, 1 reply; 29+ messages in thread From: Stuart_Hayes @ 2007-08-22 14:42 UTC (permalink / raw) To: torvalds, david-b Cc: michal.k.k.piotrowski, linux-usb-devel, gregkh, linux-kernel, akpm, dex Linus Torvalds wrote: > On Tue, 21 Aug 2007, Linus Torvalds wrote: >> >> Side note: after reverting 196705c9bb I can't get the mouse to skip >> any more on that mac mini. But since the bad behaviour wasn't 100% >> reliable to begin with, that's not really a guarantee of anything. >> Two out of three kids are off on camp this week, so that machine >> probably won't be getting a lot of testing ;/ > > Well, my one remaining child said today that "I got so much time on > webkinz today - yesterday the mouse locked up after five minutes". > > Apparently it hadn't had the mouse lock up at all today. > > So I really do believe that that 196705c9bb commit caused problems on > intel-only USB machines too ("ondemand" cpufreq governor, switching > between 1.0-1.66 Ghz using acpi-cpufreq: totally bog-standard in all > respects, in other words). > > Linus If you were running 2.6.26-rc3, that's quite possibly because you didn't have the follow-up patch that fixed my original patch... it wasn't in 2.6.26-rc3 (http://www.mail-archive.com/linux-usb-devel@lists.sourceforge.net/msg56 523.html). It fixed a bug with my patch that wasn't necessary with Broadcom, but was with nVidia (and Intel, I believe). That could definitely cause mouse lock-ups. Sorry, that should have occurred to me yesterday when you mentioned the problem your kids were seeing, but it didn't for some reason. Stuart ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-22 14:42 ` Stuart_Hayes @ 2007-08-22 18:41 ` Linus Torvalds 2007-08-22 20:41 ` Stuart_Hayes 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2007-08-22 18:41 UTC (permalink / raw) To: Stuart_Hayes Cc: david-b, michal.k.k.piotrowski, linux-usb-devel, gregkh, linux-kernel, akpm, dex On Wed, 22 Aug 2007, Stuart_Hayes@Dell.com wrote: > > If you were running 2.6.26-rc3, that's quite possibly because you didn't > have the follow-up patch that fixed my original patch... it wasn't in > 2.6.26-rc3 Well, I was running "current git", and it's never been there. So not just a -rc3 issue. > That could definitely cause mouse lock-ups. Sorry, that should have > occurred to me yesterday when you mentioned the problem your kids were > seeing, but it didn't for some reason. Btw, could it have caused the USB stack to be *really* confused? Some of those mouse lockups ended up also locking the machine hard (ie no ping, no nothing), and I'm a bit worried that there was something else going on too.. That said, if you can actually re-create the MMF problems, could you please try the patch that Arjan suggested? Ie add a /* * Some broadcom chips are buggy and can't take more than 5 usec as DMA * latency; inform the rest of kernel of this. */ if (weird_broadcom_chip()) set_acceptable_latency("ehci", 5); to the USB driver, and then add something like static inline int cpufreq_acceptable_latency(struct cpufreq_policy *policy) { unsigned long latency; /* Policy latency in usec */ latency = policy->cpuinfo.transition_latency / 1000; if (latency > system_latency_constraint()) return -EINVAL; return 0; } adn then add calls to this from both the "__cpufreq_set_policy()" function and the "__cpufreq_driver_target()" one too.. That should disable cpufreq with that broken chip, which is perhaps a big draconian, but it's certainly better than having the USB layer know about cpufreq internals directly. In the longer run, I think we can move the "system_latency_constraint()" checking from the policy registration into each CPU frequency driver, so that it could be more dynamically decide about "can we do it right _now_" rather than globally saying "we can't do it with this hardware". Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-22 18:41 ` Linus Torvalds @ 2007-08-22 20:41 ` Stuart_Hayes 0 siblings, 0 replies; 29+ messages in thread From: Stuart_Hayes @ 2007-08-22 20:41 UTC (permalink / raw) To: torvalds Cc: david-b, michal.k.k.piotrowski, linux-usb-devel, gregkh, linux-kernel, akpm, dex >> That could definitely cause mouse lock-ups. Sorry, that should have >> occurred to me yesterday when you mentioned the problem your kids >> were seeing, but it didn't for some reason. > > Btw, could it have caused the USB stack to be *really* confused? Some > of those mouse lockups ended up also locking the machine hard (ie no > ping, no nothing), and I'm a bit worried that there was something > else going on too.. > Unfortunately, yes, that sounds exactly like what happened with the nVidia controller. The problem was with my patch, but was fixed with the later patch. I doubt there was anything else going on. > That said, if you can actually re-create the MMF problems, could you > please try the patch that Arjan suggested? Ie add a > > /* > * Some broadcom chips are buggy and can't take more than 5 usec as > DMA > * latency; inform the rest of kernel of this. > */ > if (weird_broadcom_chip()) > set_acceptable_latency("ehci", 5); > > to the USB driver, and then add something like > > static inline int cpufreq_acceptable_latency(struct cpufreq_policy > *policy) { > unsigned long latency; > > /* Policy latency in usec */ > latency = policy->cpuinfo.transition_latency / 1000; > > if (latency > system_latency_constraint()) > return -EINVAL; > > return 0; > } > > adn then add calls to this from both the "__cpufreq_set_policy()" > function and the "__cpufreq_driver_target()" one too.. > > That should disable cpufreq with that broken chip, which is perhaps a > big draconian, but it's certainly better than having the USB layer > know about cpufreq internals directly. > > In the longer run, I think we can move the > "system_latency_constraint()" > checking from the policy registration into each CPU frequency driver, > so that it could be more dynamically decide about "can we do it right > _now_" > rather than globally saying "we can't do it with this hardware". > > Linus I will work on that, thank you for the help. Stuart ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [4/4] 2.6.23-rc3: known regressions 2007-08-22 3:34 ` Linus Torvalds 2007-08-22 14:42 ` Stuart_Hayes @ 2007-08-22 23:35 ` Junio C Hamano 1 sibling, 0 replies; 29+ messages in thread From: Junio C Hamano @ 2007-08-22 23:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Linus Torvalds <torvalds@linux-foundation.org> writes: > Well, my one remaining child said today that "I got so much time on > webkinz today - yesterday the mouse locked up after five minutes". > > Apparently it hadn't had the mouse lock up at all today. > > So I really do believe that that 196705c9bb commit caused problems on > intel-only USB machines too ("ondemand" cpufreq governor, switching > between 1.0-1.66 Ghz using acpi-cpufreq: totally bog-standard in all > respects, in other words). > > Linus Sorry for being way offtopic, but the above message reminded me of commit 869659a6 from git.git repository, also this message: http://www.gelato.unsw.edu.au/archives/git/0607/24208.html By the way, Linus, please let me know if you get this message via vger but not via the direct path to you. I seem to have been getting bounces for mails to you and andrew from my ISP in the past few weeks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions 2007-08-21 2:02 ` Linus Torvalds 2007-08-21 4:02 ` David Brownell @ 2007-08-21 4:27 ` David Brownell 1 sibling, 0 replies; 29+ messages in thread From: David Brownell @ 2007-08-21 4:27 UTC (permalink / raw) To: Linus Torvalds Cc: Michal Piotrowski, linux-usb-devel, Greg KH, LKML, Stuart_Hayes@Dell.com, Andrew Morton, Daniel Exner [ GRR sorry for the premature "SEND" ... mouspads-r-evil ] On Monday 20 August 2007, Linus Torvalds wrote: > > On Mon, 20 Aug 2007, David Brownell wrote: > > > On Monday 13 August 2007, Michal Piotrowski wrote: > > > Subject     : EHCI Regression in 2.6.23-rc2 > > > References    : http://lkml.org/lkml/2007/8/10/81 > > > Last known good : ? > > > Submitter    : Daniel Exner <dex@dragonslave.de> > > > Caused-By    : Stuart_Hayes@Dell.com <Stuart_Hayes@Dell.com> > > >          commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 > > > Handled-By    : ? > > > Status      : unknown > > > > Fixed I believe by Stuart's patch: > > > > http://marc.info/?l=linux-usb-devel&m=118765934722610&w=2 > > Quite frankly, I'd personally prefer to just revert commit > 196705c9bbc03540429b0f7cf9ee35c2f928a534 entirely instead. > > The whole dependency on cpufreq seems totally bogus. Would it not be a lot > more natural to handle the *result* of the problem (ie the MMF errors by > broken EHCI controllers?) rather than add totally insane workarounds for > this case to try to hide them in the first place? MMF basically means the "Transaction Translating" (TT) Hub had data for the host, but the host didn't collect it in time ... so that some data was lost. In this context, it's only for periodic transfers, meaning interrupt (for HID keyboards, mice, etc) or isochronous (mostly audio or video, but sometimes ATM). Unfortunately, that's the type of fault that's especially hard to recover from. Plus, very few of the upper layer drivers have even a minor clue about fault recovery strategies during I/O ... it's not supposed to happen for interrupt transfers, one hopes the drivers will at least die gracefully. With ISO, faults are expected (since it's "best effort" delivery, time-priority). On the plus side, MMF errors have been vanishingly rare until this cpufreq interaction came up ... which of course implies the downside that those "handle the result" code paths are all but untested. > There can be *other* delays in reading memory that have nothing to do with > cpu frequency shifting, and everything to do with exteme situations on the > bus. If the stupid EHCI controller has some tight latency issues, that's a > generic problem. There could be such problems, yes. But in practice, I don't know that we've ever seen them. (There's a first time for everthing, yes. I *just* fetched a webpage where an image got overwritten about half way through fetching it. Top half was today's, bottom half was tomorrow's, update 12 midnight EST. Strangest looking JPG ever! ;) > That commit 196705c9bbc03540429b0f7cf9ee35c2f928a534 just exemplifies what > is wrong with USB, but it does so by adding incredibly ugly code. I'd > rather not add even *more* ugly code - especially not for a case where we > then seem to blame the wrong party (ie a VIA controller that didn't need > the ugly code in the first place). > > Serverworks/Broadcom makes totally crap chips (not just in USB) and then > doesn't even document their buggy crap hardware. That's pretty much how I feel about VIA's USB stuff: buggy crap that I actively steer people away from. And that's why it doesn't seem odd to me to add even more workarounds for VIA-only bugs. > But that is NOT a reason > for then making the kernel have buggy crap software in it. I don't think we always have the option not to cope with broken hardware. We may have some options about *how* we cope with it though ... > So really - is there any reason why we just don't say "Broadcom chips > suck, and get MMF errors under normal circumstances because they are > crap". And from *that*, the obvious solution would seem to not be to > penalize everybody else, but to just say that "We will try to recover from > MMF errors gracefully by retrying the transaction". Hmm? Well, see above about why retrying wouldn't work well. Data lost, and not recoverable ... although if the events are only USB keyboards/mice, then the user might be able to recover. (Stuart?) Alternatively, if Broadcom then just veto cpufreq changes whenever there are USB interrupt transfers active. (We *can* veto changes in notifiers, yes?) - Dave > > Linus > ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2007-08-22 23:35 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <46C098FD.1030601@googlemail.com>
2007-08-13 17:59 ` [2/4] 2.6.23-rc3: known regressions Michal Piotrowski
2007-08-13 23:29 ` Luca Tettamanti
2007-08-14 6:37 ` Michal Piotrowski
2007-08-13 17:59 ` [3/4] " Michal Piotrowski
2007-08-14 22:09 ` Francois Romieu
2007-08-13 17:59 ` [4/4] " Michal Piotrowski
2007-08-21 1:41 ` [linux-usb-devel] " David Brownell
2007-08-21 2:02 ` Linus Torvalds
2007-08-21 4:02 ` David Brownell
2007-08-21 4:15 ` Linus Torvalds
2007-08-21 4:48 ` David Brownell
2007-08-21 5:31 ` Linus Torvalds
2007-08-21 5:51 ` Arjan van de Ven
2007-08-21 6:04 ` Arjan van de Ven
2007-08-21 6:26 ` Linus Torvalds
2007-08-21 6:28 ` Arjan van de Ven
2007-08-21 6:45 ` Linus Torvalds
2007-08-21 6:25 ` Linus Torvalds
2007-08-21 6:24 ` Arjan van de Ven
2007-08-21 6:03 ` Linus Torvalds
2007-08-21 6:34 ` David Brownell
2007-08-21 6:52 ` Linus Torvalds
2007-08-21 7:24 ` Linus Torvalds
2007-08-22 3:34 ` Linus Torvalds
2007-08-22 14:42 ` Stuart_Hayes
2007-08-22 18:41 ` Linus Torvalds
2007-08-22 20:41 ` Stuart_Hayes
2007-08-22 23:35 ` Junio C Hamano
2007-08-21 4:27 ` [linux-usb-devel] " David Brownell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox