* DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) [not found] ` <4B2A530D.3080606@knaff.lu> @ 2009-12-17 17:00 ` Alain Knaff 2009-12-17 17:27 ` Linus Torvalds 0 siblings, 1 reply; 74+ messages in thread From: Alain Knaff @ 2009-12-17 17:00 UTC (permalink / raw) To: markh; +Cc: fdutils, torvalds, linux-kernel On 17/12/09 16:49, Alain Knaff wrote: > On 17/12/09 16:43, Mark Hounschell wrote: >> On 12/17/2009 10:35 AM, Alain Knaff wrote: >> >>>> Should I do more work in between? >>> >>> No, but make sure to look at track 0... Other tracks will still have the >>> error, as there was nothing forcing a memory flush between track 0 and 1... >> >> Ok track 0 > [...] >> 0: 0 >> 1: 0 >> 2: 0 >> 3: 4f <-- >> 4: 0 >> 5: 1 >> 6: 2 >> no disk change > > Yeah, that's what I meant... So the memory flusher program didn't manage to > clear up the inconsistency... > > So either my theory is wrong, or the memory flusher program was not > efficient enough.... hmmm, maybe doing some surfing in between the formats, > or doing another kernel compilation might be a better test. > > Alain Ok, so I had a look at the differences between 2.6.27.41 and 2.6.28, and there have indeed been changes to the iommu and DMA handling code. So I suspect that the problem may be lying here Cc'ed Linus and kernel list on this. For Linux and the list, here's the summary of what we are observing: - A DMA transfer of a memory block transfers the wrong value for the first byte of the block. All other bytes of the block are transferred correctly. The value of the first byte turns out to be the value that this byte held during the *previous* transfer. Just as if there was some kind of cache, and the transfer started before that cache was refreshed with the new values from main memory. Example: 1. initial contents: 33 44 55 66 2. one DMA transfer is performed 3. program changes buffer to: 77 88 99 aa 4. new DMA transfer is performed => instead it transmits 33 88 99 aa (i.e. first byte is from previous contents) This used to work in 2.6.27.41, but broke in 2.6.28 . It doesn't happen on all hardware though. It does indeed seem to be related to a DMA-side cache (rather than the processor's cache not being flushed to main memory), as doing lots of memory intensive work (kernel compilation) between 2 and 3 doesn't fix the problem. In the diff between 2.6.27.41 and 2.6.28, I noticed a lot of changes in arch/x86/kernel/amd_iommu.c and related files, could any of these have triggered this behavior? Any ideas, anybody? Alain ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) 2009-12-17 17:00 ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff @ 2009-12-17 17:27 ` Linus Torvalds 2009-12-17 18:21 ` DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa 2009-12-17 20:46 ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff 0 siblings, 2 replies; 74+ messages in thread From: Linus Torvalds @ 2009-12-17 17:27 UTC (permalink / raw) To: Alain Knaff; +Cc: markh, fdutils, linux-kernel On Thu, 17 Dec 2009, Alain Knaff wrote: > > 1. initial contents: 33 44 55 66 > 2. one DMA transfer is performed > 3. program changes buffer to: 77 88 99 aa > 4. new DMA transfer is performed => instead it transmits 33 88 99 aa > (i.e. first byte is from previous contents) > > This used to work in 2.6.27.41, but broke in 2.6.28 . It doesn't happen on > all hardware though. Do you have a list of hardware it works on? Especially chipsets. On x86, where all caches are supposed to be totally coherent (except for I$ under very special circumstances), the above should never be able to happen. At least not unless there is really buggy hardware involved. > It does indeed seem to be related to a DMA-side cache (rather than the > processor's cache not being flushed to main memory), as doing lots of > memory intensive work (kernel compilation) between 2 and 3 doesn't fix the > problem. I'm not entirely surprised. Actual CPU bugs are pretty rare in the x86 world. But chipset bugs? Another thing entirely. There are buffers and caches there, and those are sometimes software-visible. The most obvious case of that is just the IOMMU's themselves, but from your description I don't think you actually change the DMA _mappings_ do you? Just the actual buffer (that was then mapped earlier)? So I don't think it's the IOMMU code itself necessarily, although an IOMMU may well be involved (eg I could easily see a few cachelines worth of actual DMA data caching going on in the whole IOMMU too) And to some degree the floppy driver might be _more_ likely to see some kinds of bugs, because it uses that crazy legacy DMA engine. So it's not going to go through the regular PCI DMA hardware paths, it's going to go through its own special paths that nobody else uses any more (and thus has probably not had as much testing). > In the diff between 2.6.27.41 and 2.6.28, I noticed a lot of changes in > arch/x86/kernel/amd_iommu.c and related files, could any of these have > triggered this behavior? Could it have triggered? Sure. Chipset caches are often flushed by certain trivial operations (often the caches are small, and operations like "any PIO access" will make sure they are flushed). Different IOMMU flush patterns could easily account for it. But I think we'd like to see a list of hardware where this can be triggered, and quite frankly, a 'git bisect' would be absolutely wonderful especially if the list of hardware is not showing any really obvious patterns (and I assume they aren't all _that_ obvious, or you'd have mentioned them). Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: DMA cache consistency bug introduced in 2.6.28 2009-12-17 17:27 ` Linus Torvalds @ 2009-12-17 18:21 ` Krzysztof Halasa 2009-12-17 20:46 ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff 1 sibling, 0 replies; 74+ messages in thread From: Krzysztof Halasa @ 2009-12-17 18:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alain Knaff, markh, fdutils, linux-kernel Linus Torvalds <torvalds@linux-foundation.org> writes: > On x86, where all caches are supposed to be totally coherent (except for > I$ under very special circumstances), BTW SWIOTLB is a non-coherent "cache" in some sense, though I'd be surprised if it's related. Anyway mentioning $CPU and $RAM at the very least would be a good idea in such cases. -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) 2009-12-17 17:27 ` Linus Torvalds 2009-12-17 18:21 ` DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa @ 2009-12-17 20:46 ` Alain Knaff 2009-12-17 21:14 ` Linus Torvalds 1 sibling, 1 reply; 74+ messages in thread From: Alain Knaff @ 2009-12-17 20:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: markh, fdutils, linux-kernel Linus Torvalds wrote: > > On Thu, 17 Dec 2009, Alain Knaff wrote: >> 1. initial contents: 33 44 55 66 >> 2. one DMA transfer is performed >> 3. program changes buffer to: 77 88 99 aa >> 4. new DMA transfer is performed => instead it transmits 33 88 99 aa >> (i.e. first byte is from previous contents) >> >> This used to work in 2.6.27.41, but broke in 2.6.28 . It doesn't happen on >> all hardware though. > > Do you have a list of hardware it works on? Especially chipsets. For the moment, I have a very small sample of hardware: 1. One machine which works (my own): Athlon XP 1800+ processor 2. One which doesn't work (Mark's) I might get access to a wider sample of boxen in a week or so, in order to do some stats. What's the easiest way to find out the chipset? Here's already the output of lspci from my machine (works): 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge 00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50) 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74) 01:00.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4 MX 440] (rev a3) [...] > I'm not entirely surprised. Actual CPU bugs are pretty rare in the x86 > world. But chipset bugs? Another thing entirely. There are buffers and > caches there, and those are sometimes software-visible. The most obvious > case of that is just the IOMMU's themselves, but from your description I > don't think you actually change the DMA _mappings_ do you? Just the > actual buffer (that was then mapped earlier)? No, I don't change any DMA mappings. And the buffer is still the same physical buffer, at the same physical address. (It happens during formatting the floppy drive: here the first byte happens to be the trackid of the first physical sector of the track, and it always ends up being the track of the *previously* formatted track). > So I don't think it's the IOMMU code itself necessarily, although an IOMMU > may well be involved (eg I could easily see a few cachelines worth of > actual DMA data caching going on in the whole IOMMU too) > > And to some degree the floppy driver might be _more_ likely to see some > kinds of bugs, because it uses that crazy legacy DMA engine. So it's not Indeed, most other drivers use "bus master" DMA, that doesn't use the legacy DMA controller at all, but use DMA controllers hosted on the device itself... > going to go through the regular PCI DMA hardware paths, it's going to go > through its own special paths that nobody else uses any more (and thus has > probably not had as much testing). > >> In the diff between 2.6.27.41 and 2.6.28, I noticed a lot of changes in >> arch/x86/kernel/amd_iommu.c and related files, could any of these have >> triggered this behavior? > > Could it have triggered? Sure. Chipset caches are often flushed by certain > trivial operations (often the caches are small, and operations like "any > PIO access" will make sure they are flushed). Different IOMMU flush > patterns could easily account for it. > > But I think we'd like to see a list of hardware where this can be > triggered, We'll get a list of 2 machines relatively quickly (unless other people would like to chime in: the test is easy, just fdformat a floppy disk), and more in a week or so. > and quite frankly, a 'git bisect' would be absolutely wonderful How exactly would I use this (command line sample)? > especially if the list of hardware is not showing any really obvious > patterns (and I assume they aren't all _that_ obvious, or you'd have > mentioned them). > > Linus Thanks, Alain ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) 2009-12-17 20:46 ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff @ 2009-12-17 21:14 ` Linus Torvalds 2009-12-17 22:11 ` Alain Knaff 0 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2009-12-17 21:14 UTC (permalink / raw) To: Alain Knaff; +Cc: markh, fdutils, linux-kernel On Thu, 17 Dec 2009, Alain Knaff wrote: > > For the moment, I have a very small sample of hardware: > 1. One machine which works (my own): Athlon XP 1800+ processor > 2. One which doesn't work (Mark's) Ok. I don't think I even have any machines with floppy drives any more (one external USB drive somewhere gathering dust just in case I ever encounter a floppy again). > I might get access to a wider sample of boxen in a week or so, in order > to do some stats. Ok, I was more thinking "we have a bugzilla with ten different people reporting this". If it's just a single machine, that's not going to be relevant. > What's the easiest way to find out the chipset? > > Here's already the output of lspci from my machine (works): > > 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge > 00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge > 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge Yeah, lspci (and generally only the northbridge and southbridge matters, the "ISA bridge" might technically be relevant, but since it's universally on the same die as the southbridge, I left it in there just for kicks). > (It happens during formatting the floppy drive: here the first byte > happens to be the trackid of the first physical sector of the track, and > it always ends up being the track of the *previously* formatted track). I guess it could simply be a floppy controller bug too, triggered by some random timing difference or innocuous-looking change. > > But I think we'd like to see a list of hardware where this can be > > triggered, > > We'll get a list of 2 machines relatively quickly (unless other people > would like to chime in: the test is easy, just fdformat a floppy disk), > and more in a week or so. Only the "it doesn't work on xyz" is likely interesting. The machines it works on are probably uninteresting statistically. > > and quite frankly, a 'git bisect' would be absolutely wonderful > > How exactly would I use this (command line sample)? You'd need a git tree that contains both the working and non-working versions, and then literally just do git bisect start git bisect good <known good version number here> git bisect bad <known bad version here> and it will give you a commit to try. Compile, test, see if it's good or bad, and do git bisect [good|bad] depending on the result. Rinse and repeat (depending on how tight the initial good/bad commits were, it will need 10-15 kernel tests). So in this case, since apparently 2.6.27.41 is good, and 2.6.28 is not, it would be something like this: # clone hpa's tree that has all the stable releases in one place git clone git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-allstable.git cd linux-2.6-allstable git bisect start git bisect bad v2.6.28 git bisect good v2.6.27.41 and off you go. NOTE! Bisection depends very much on the bug being 100% reproducible. If you ever mark a good kernel bad (because you messed up) or a bad kernel good (because the bug wasn't 100% reproducible, so you _thought_ it was good even though the bug was present and just happened to hide), the end result of the bisect will be totally unreliable and seriously screwed up. So after a successful bisect, it is usually a good idea to try to go back to the original known-bad kernel, and then revert the commit that was indicated as the bad one (assuming the revert works - it could be that the bad one ends up being fundamental to other commits after it), and test that yes, that really fixes the bug. It gets more complicated if the bisect hits kernels that you can't test because they have _unrelated_ issues on that machine (compile failures or just other bugs that hide the actual floppy behavior), but generally bisection is pretty simple. "man git-bisect" does have some extra pointers. So git bisect may be somewhat time-consuming and mindless, but for reliably triggering bugs where nobody really knows what caused the bug it is a _really_ convenient thing to do. The only thing you need is a reliably triggering test-case, and some time. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) 2009-12-17 21:14 ` Linus Torvalds @ 2009-12-17 22:11 ` Alain Knaff 2009-12-17 22:43 ` Linus Torvalds 0 siblings, 1 reply; 74+ messages in thread From: Alain Knaff @ 2009-12-17 22:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: markh, fdutils, linux-kernel Linus Torvalds wrote: > > On Thu, 17 Dec 2009, Alain Knaff wrote: >> For the moment, I have a very small sample of hardware: >> 1. One machine which works (my own): Athlon XP 1800+ processor >> 2. One which doesn't work (Mark's) > > Ok. I don't think I even have any machines with floppy drives any more > (one external USB drive somewhere gathering dust just in case I ever > encounter a floppy again). Well, on my new box, I have no floppy drive either. The one I mentioned is an old machine that I kept around just in case I needed to debug floppy-related problems. >> I might get access to a wider sample of boxen in a week or so, in order >> to do some stats. > > Ok, I was more thinking "we have a bugzilla with ten different people > reporting this". If it's just a single machine, that's not going to be > relevant. We do have a bugzilla http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=548434 , but unfortunately it has only 2 people so far having seen the bug, one of which (ael) turned out to be a false alert (dusty drive). > >> What's the easiest way to find out the chipset? >> >> Here's already the output of lspci from my machine (works): >> >> 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge >> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge >> 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge > > Yeah, lspci (and generally only the northbridge and southbridge matters, > the "ISA bridge" might technically be relevant, but since it's universally > on the same die as the southbridge, I left it in there just for kicks). Good. Here's some info about some machines of Mark which do have the problem (there's more than one, fortunately): 1st one showing the problem (claimed to be AMD 790x chipset): 00:00.0 Host bridge: ATI Technologies Inc RD790 Northbridge only dual slot PCI-e_GFX and HT3 K8 part 00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external gfx0 port A) 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 2nd one showing the problem (also claimed to be AMD 790x chipset): 00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge 00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (int gfx) 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller He also has several machines that do work: 1st one that does work: 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) ... and a couple more where he didn't get around to test. [...] > Only the "it doesn't work on xyz" is likely interesting. The machines it > works on are probably uninteresting statistically. I understand... (working machine above just mentioned for completeness' sake). [...] > You'd need a git tree that contains both the working and non-working > versions, and then literally just do > > git bisect start > git bisect good <known good version number here> > git bisect bad <known bad version here> > > and it will give you a commit to try. Compile, test, see if it's good or > bad, and do > > git bisect [good|bad] > > depending on the result. Rinse and repeat (depending on how tight the > initial good/bad commits were, it will need 10-15 kernel tests). ... and how do I check out the most recent good / oldest bad kernel for compilation? > So in this case, since apparently 2.6.27.41 is good, and 2.6.28 is not, it > would be something like this: > > # clone hpa's tree that has all the stable releases in one place > git clone git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-allstable.git > > cd linux-2.6-allstable > git bisect start > git bisect bad v2.6.28 > git bisect good v2.6.27.41 > > and off you go. ok... > NOTE! Bisection depends very much on the bug being 100% reproducible. If > you ever mark a good kernel bad (because you messed up) or a bad kernel > good (because the bug wasn't 100% reproducible, so you _thought_ it was > good even though the bug was present and just happened to hide), the end > result of the bisect will be totally unreliable and seriously screwed up. > > So after a successful bisect, it is usually a good idea to try to go back > to the original known-bad kernel, and then revert the commit that was > indicated as the bad one (assuming the revert works - it could be that the > bad one ends up being fundamental to other commits after it), and test > that yes, that really fixes the bug. What command lines would I use for that revert? > It gets more complicated if the bisect hits kernels that you can't test > because they have _unrelated_ issues on that machine (compile failures or > just other bugs that hide the actual floppy behavior), but generally > bisection is pretty simple. "man git-bisect" does have some extra > pointers. > > So git bisect may be somewhat time-consuming and mindless, but for > reliably triggering bugs where nobody really knows what caused the bug it > is a _really_ convenient thing to do. The only thing you need is a > reliably triggering test-case, and some time. > > Linus Alain ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) 2009-12-17 22:11 ` Alain Knaff @ 2009-12-17 22:43 ` Linus Torvalds 2009-12-17 23:24 ` Alain Knaff 0 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2009-12-17 22:43 UTC (permalink / raw) To: Alain Knaff; +Cc: markh, fdutils, linux-kernel On Thu, 17 Dec 2009, Alain Knaff wrote: > [...] > > You'd need a git tree that contains both the working and non-working > > versions, and then literally just do > > > > git bisect start > > git bisect good <known good version number here> > > git bisect bad <known bad version here> > > > > and it will give you a commit to try. Compile, test, see if it's good or > > bad, and do > > > > git bisect [good|bad] > > > > depending on the result. Rinse and repeat (depending on how tight the > > initial good/bad commits were, it will need 10-15 kernel tests). > > ... and how do I check out the most recent good / oldest bad kernel for > compilation? 'git bisect' does all that for you. You don't need to check out the kernels you mark good or bad - git will just calculate the commit graphs, and pick a commit that is in the "middle" between them, and check out that commit. > > So after a successful bisect, it is usually a good idea to try to go back > > to the original known-bad kernel, and then revert the commit that was > > indicated as the bad one (assuming the revert works - it could be that the > > bad one ends up being fundamental to other commits after it), and test > > that yes, that really fixes the bug. > > What command lines would I use for that revert? git revert <sha1-that-git-bisect-reported> but even if that revert isn't successful, just the bisection result will be very interesting (assuming it all looks sane, of course - as mentioned, sometimes bisect results get screwed up because the bug isn't entirely reproducible due to timing etc). Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) 2009-12-17 22:43 ` Linus Torvalds @ 2009-12-17 23:24 ` Alain Knaff 2009-12-18 8:59 ` Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Alain Knaff @ 2009-12-17 23:24 UTC (permalink / raw) To: Linus Torvalds, markh; +Cc: fdutils, linux-kernel Linus Torvalds wrote: > > On Thu, 17 Dec 2009, Alain Knaff wrote: >> [...] >>> You'd need a git tree that contains both the working and non-working >>> versions, and then literally just do >>> >>> git bisect start >>> git bisect good <known good version number here> >>> git bisect bad <known bad version here> >>> >>> and it will give you a commit to try. Compile, test, see if it's good or >>> bad, and do >>> >>> git bisect [good|bad] >>> >>> depending on the result. Rinse and repeat (depending on how tight the >>> initial good/bad commits were, it will need 10-15 kernel tests). >> ... and how do I check out the most recent good / oldest bad kernel for >> compilation? > > 'git bisect' does all that for you. You don't need to check out the > kernels you mark good or bad - git will just calculate the commit graphs, > and pick a commit that is in the "middle" between them, and check out that > commit. > >>> So after a successful bisect, it is usually a good idea to try to go back >>> to the original known-bad kernel, and then revert the commit that was >>> indicated as the bad one (assuming the revert works - it could be that the >>> bad one ends up being fundamental to other commits after it), and test >>> that yes, that really fixes the bug. >> What command lines would I use for that revert? > > git revert <sha1-that-git-bisect-reported> > > but even if that revert isn't successful, just the bisection result will > be very interesting (assuming it all looks sane, of course - as mentioned, > sometimes bisect results get screwed up because the bug isn't entirely > reproducible due to timing etc). > > Linus thanks for these explanations, that makes it clearer indeed. Now, I only need to find a machine locally to test this on. Or Mark: are you confident in doing this yourself? Thanks, Alain ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) 2009-12-17 23:24 ` Alain Knaff @ 2009-12-18 8:59 ` Mark Hounschell 2009-12-18 10:55 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: " Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2009-12-18 8:59 UTC (permalink / raw) To: Alain Knaff; +Cc: Linus Torvalds, markh, fdutils, linux-kernel On 12/17/2009 06:24 PM, Alain Knaff wrote: > > Now, I only need to find a machine locally to test this on. Or Mark: are > you confident in doing this yourself? > I'll give it a shot. Sounds easy enough. If I have problems, I'll yell. Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-18 8:59 ` Mark Hounschell @ 2009-12-18 10:55 ` Mark Hounschell 2009-12-18 15:01 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa 2009-12-18 15:22 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Linus Torvalds 0 siblings, 2 replies; 74+ messages in thread From: Mark Hounschell @ 2009-12-18 10:55 UTC (permalink / raw) To: Alain Knaff; +Cc: Mark Hounschell, Linus Torvalds, linux-kernel, fdutils On 12/18/2009 03:59 AM, Mark Hounschell wrote: > On 12/17/2009 06:24 PM, Alain Knaff wrote: > >> >> Now, I only need to find a machine locally to test this on. Or Mark: are >> you confident in doing this yourself? >> > > I'll give it a shot. Sounds easy enough. If I have problems, I'll yell. > Ok, I ran into a build issue on the third on. #harley:/usr/src # git clone git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-allstable.git Initialized empty Git repository in /usr/src/linux-2.6-allstable/.git/ remote: Counting objects: 1486248, done. remote: Compressing objects: 100% (248092/248092), done. Receiving objects: 100% (1486248/1486248), 323.35 MiB | 6753 KiB/s, done. remote: Total 1486248 (delta 1236282), reused 1476516 (delta 1227133) Resolving deltas: 100% (1236282/1236282), done. Checking out files: 100% (31502/31502), done. harley:/usr/src # cd linux-2.6-allstable harley:/usr/src/linux-2.6-allstable # git bisect start harley:/usr/src/linux-2.6-allstable # git bisect bad v2.6.28 harley:/usr/src/linux-2.6-allstable # git bisect good v2.6.27.41 Bisecting: a merge base must be tested [3fa8749e584b55f1180411ab1b51117190bac1e5] Linux 2.6.27 Build and test kernel: This one worked so: harley:/usr/src/linux-2.6-allstable # git bisect good Bisecting: 4879 revisions left to test after this (roughly 12 steps) [c813b4e16ead3c3df98ac84419d4df2adf33fe01] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 Build and test kernel: This one worked so: harley:/usr/src/linux-2.6-allstable # git bisect good Bisecting: 2443 revisions left to test after this (roughly 11 steps) [db563fc2e80534f98c7f9121a6f7dfe41f177a79] Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 This one doesn't build: CC [M] fs/ext3/super.o fs/ext3/super.c: In function ‘ext3_quota_on’: fs/ext3/super.c:2839: error: ‘nd’ undeclared (first use in this function) fs/ext3/super.c:2839: error: (Each undeclared identifier is reported only once fs/ext3/super.c:2839: error: for each function it appears in.) make[2]: *** [fs/ext3/super.o] Error 1 make[1]: *** [fs/ext3] Error 2 make: *** [fs] Error 2 I haven't yet determined that I can but, if I were to make a modification to the tree now to fix this would that screw up the bisect process? Regards Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-18 10:55 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: " Mark Hounschell @ 2009-12-18 15:01 ` Krzysztof Halasa 2009-12-18 15:22 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Linus Torvalds 1 sibling, 0 replies; 74+ messages in thread From: Krzysztof Halasa @ 2009-12-18 15:01 UTC (permalink / raw) To: dmarkh; +Cc: Alain Knaff, Mark Hounschell, Linus Torvalds, linux-kernel, fdutils Mark Hounschell <dmarkh@cfl.rr.com> writes: > harley:/usr/src/linux-2.6-allstable # git bisect good > Bisecting: 2443 revisions left to test after this (roughly 11 steps) > [db563fc2e80534f98c7f9121a6f7dfe41f177a79] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 > > This one doesn't build: > > CC [M] fs/ext3/super.o > fs/ext3/super.c: In function ‘ext3_quota_on’: > fs/ext3/super.c:2839: error: ‘nd’ undeclared (first use in this function) > fs/ext3/super.c:2839: error: (Each undeclared identifier is reported only once > fs/ext3/super.c:2839: error: for each function it appears in.) > make[2]: *** [fs/ext3/super.o] Error 1 > make[1]: *** [fs/ext3] Error 2 > make: *** [fs] Error 2 > > I haven't yet determined that I can but, if I were to make a modification to the > tree now to fix this would that screw up the bisect process? It won't, in such cases. But you can also git reset --hard another_commit_id (while doing git bisect) if it fixes this problem (e.g. some next commit). And you can skip uninteresting parts of the tree when starting git bisect (though if the cause is in skipped parts, the results will be meaningless). -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-18 10:55 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: " Mark Hounschell 2009-12-18 15:01 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa @ 2009-12-18 15:22 ` Linus Torvalds 2009-12-18 15:28 ` Mark Hounschell 1 sibling, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2009-12-18 15:22 UTC (permalink / raw) To: Mark Hounschell; +Cc: Alain Knaff, Mark Hounschell, linux-kernel, fdutils On Fri, 18 Dec 2009, Mark Hounschell wrote: > > This one doesn't build: > > CC [M] fs/ext3/super.o > fs/ext3/super.c: In function ‘ext3_quota_on’: > fs/ext3/super.c:2839: error: ‘nd’ undeclared (first use in this function) > fs/ext3/super.c:2839: error: (Each undeclared identifier is reported only once > fs/ext3/super.c:2839: error: for each function it appears in.) > make[2]: *** [fs/ext3/super.o] Error 1 > make[1]: *** [fs/ext3] Error 2 > make: *** [fs] Error 2 > > I haven't yet determined that I can but, if I were to make a modification to the > tree now to fix this would that screw up the bisect process? You can safely fix unrelated problems without screwing up the bisection. And in this case you can be pretty sure that this is unrelated, so it's all ok. The fix for that silly problem is - path_put(&nd.path); + path_put(&path); (it's due to a silent merge failure - it merged cleanly, but semantics had changed in a branch and impacted code that was newly introduced in another branch). Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-18 15:22 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Linus Torvalds @ 2009-12-18 15:28 ` Mark Hounschell 2009-12-18 15:45 ` Linus Torvalds 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2009-12-18 15:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils On 12/18/2009 10:22 AM, Linus Torvalds wrote: > > > On Fri, 18 Dec 2009, Mark Hounschell wrote: >> >> This one doesn't build: >> >> CC [M] fs/ext3/super.o >> fs/ext3/super.c: In function ‘ext3_quota_on’: >> fs/ext3/super.c:2839: error: ‘nd’ undeclared (first use in this function) >> fs/ext3/super.c:2839: error: (Each undeclared identifier is reported only once >> fs/ext3/super.c:2839: error: for each function it appears in.) >> make[2]: *** [fs/ext3/super.o] Error 1 >> make[1]: *** [fs/ext3] Error 2 >> make: *** [fs] Error 2 >> >> I haven't yet determined that I can but, if I were to make a modification to the >> tree now to fix this would that screw up the bisect process? > > You can safely fix unrelated problems without screwing up the bisection. > And in this case you can be pretty sure that this is unrelated, so it's > all ok. > > The fix for that silly problem is > > - path_put(&nd.path); > + path_put(&path); > > (it's due to a silent merge failure - it merged cleanly, but semantics had > changed in a branch and impacted code that was newly introduced in another > branch). Yep, thanks. I'm past that now. But haven't done a bisect [good|bad] on the results of that one yet. Did you see Alain's email response to my bisect progress report to him? I'm still at a loss as to how to proceed? Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-18 15:28 ` Mark Hounschell @ 2009-12-18 15:45 ` Linus Torvalds 2009-12-18 20:04 ` Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2009-12-18 15:45 UTC (permalink / raw) To: Mark Hounschell; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils On Fri, 18 Dec 2009, Mark Hounschell wrote: > > Yep, thanks. I'm past that now. But haven't done a bisect [good|bad] on the > results of that one yet. Did you see Alain's email response to my bisect > progress report to him? > > I'm still at a loss as to how to proceed? Ahh, the HPET issue. That one is actually very interesting information, because we've had problems with HPET before. But what I would suggest is to try to continue to bisect with HPET enabled (to see the problem), and the commit that you couldn't even boot with HPET enabled you should not count as good or bad because you just don't know. You can do "git bisect skip" to make git know that some particular commit is not a commit you can test, and you can also move away from a whole problematic region to another area by doing git bisect visualize to bring up a graphical gitk view of what all you have left to bisect, pick a good point (still _reasonably_ close to the middle) there, and do git reset --hard <the-point-you-want-to-test> and try that kernel instead of the one git bisect suggested. But this floppy DMA inconsistency being somehow HPET-related is interestign in itself. One thing that HPET does si to obviously change how we read the time - and what that can cause (totally indirectly) is that now we don't touch the southbridge with IO accesses nearly as much, because instead of going to the old 8253 PIT will touch the same legacy chip support that implements the floppy controller itself. So it's entirely possible that the reason a non-HPET setup doesn't show this is that the accesses to the i8253 PIT part will "synchronize" the old floppy controller too, and hide some issue. But still, I assume you had HPET enabled in 2.6.27, so it would be interesting to see exactly when the problem starts. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-18 15:45 ` Linus Torvalds @ 2009-12-18 20:04 ` Mark Hounschell 2009-12-18 20:15 ` Linus Torvalds 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2009-12-18 20:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils On 12/18/2009 10:45 AM, Linus Torvalds wrote: > > > On Fri, 18 Dec 2009, Mark Hounschell wrote: >> >> Yep, thanks. I'm past that now. But haven't done a bisect [good|bad] on the >> results of that one yet. Did you see Alain's email response to my bisect >> progress report to him? >> >> I'm still at a loss as to how to proceed? > > Ahh, the HPET issue. > > That one is actually very interesting information, because we've had > problems with HPET before. But what I would suggest is to try to continue > to bisect with HPET enabled (to see the problem), and the commit that you > couldn't even boot with HPET enabled you should not count as good or bad > because you just don't know. > > You can do "git bisect skip" to make git know that some particular commit > is not a commit you can test, and you can also move away from a whole > problematic region to another area by doing > > git bisect visualize > > to bring up a graphical gitk view of what all you have left to bisect, > pick a good point (still _reasonably_ close to the middle) there, and do > > git reset --hard <the-point-you-want-to-test> > > and try that kernel instead of the one git bisect suggested. > > But this floppy DMA inconsistency being somehow HPET-related is > interestign in itself. One thing that HPET does si to obviously change how > we read the time - and what that can cause (totally indirectly) is that > now we don't touch the southbridge with IO accesses nearly as much, > because instead of going to the old 8253 PIT will touch the same legacy > chip support that implements the floppy controller itself. > > So it's entirely possible that the reason a non-HPET setup doesn't show > this is that the accesses to the i8253 PIT part will "synchronize" the old > floppy controller too, and hide some issue. > > But still, I assume you had HPET enabled in 2.6.27, so it would be > interesting to see exactly when the problem starts. > > Linus > It looks like I may have to back up and first find the points that, let me, and stop me, booting with the HPET enabled. Before I change direction, can the git-bisect start sequence use the SHA1 id for the starting 'goods' and 'bads'? I don't see reference to that in the doc. Thanks Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-18 20:04 ` Mark Hounschell @ 2009-12-18 20:15 ` Linus Torvalds 2009-12-22 15:11 ` Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2009-12-18 20:15 UTC (permalink / raw) To: Mark Hounschell; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils On Fri, 18 Dec 2009, Mark Hounschell wrote: > > It looks like I may have to back up and first find the points that, let me, > and stop me, booting with the HPET enabled. Before I change direction, can > the git-bisect start sequence use the SHA1 id for the starting 'goods' and > 'bads'? I don't see reference to that in the doc. You can always use a SHA1 id instead of a tag. So when you did git bisect good v2.6.17.4 you could always have replaced that "v2.6.17.4" with the SHA1 of the commit. In git, the SHA1 ID's are the "real" names - the tags and branch names are purely for human-readable decoration. Git always turns them into SHA1 id's internally. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-18 20:15 ` Linus Torvalds @ 2009-12-22 15:11 ` Mark Hounschell 2009-12-22 17:38 ` Linus Torvalds 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2009-12-22 15:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils On 12/18/2009 03:15 PM, Linus Torvalds wrote: > > > On Fri, 18 Dec 2009, Mark Hounschell wrote: >> >> It looks like I may have to back up and first find the points that, let me, >> and stop me, booting with the HPET enabled. Before I change direction, can >> the git-bisect start sequence use the SHA1 id for the starting 'goods' and >> 'bads'? I don't see reference to that in the doc. > > You can always use a SHA1 id instead of a tag. So when you did > > git bisect good v2.6.17.4 > > you could always have replaced that "v2.6.17.4" with the SHA1 of the > commit. > > In git, the SHA1 ID's are the "real" names - the tags and branch names are > purely for human-readable decoration. Git always turns them into SHA1 id's > internally. > > Linus > Ok, I may have something that might help. # git bisect bad 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> Date: Fri Sep 5 18:02:18 2008 -0700 x86: HPET_MSI Initialise per-cpu HPET timers Initialize a per CPU HPET MSI timer when possible. We retain the HPET timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We setup the remaining HPET timers as per CPU MSI based timers. This per CPU timer will eliminate the need for timer broadcasting with IRQ 0 when there is non-functional LAPIC timer across CPU deep C-states. If there are more CPUs than number of available timers, CPUs that do not find any timer to use will continue using LAPIC and IRQ 0 broadcast. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> :040000 040000 b0a11fa0abdc591427e78236a1f25f26b824140e f2e9b13cf9e2eb7e0fc101660b1e1d499033d78f M arch And of coarse this was the first commit that I could not boot if I had hpet enabled. To get this one to boot (single user mode only) I had to add the the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev) { if (request_irq(dev->irq, hpet_interrupt_handler, - IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev)) + IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev)) return -1; disable_irq(dev->irq); AND add the quiet cmdline option. Also, of all the machines it does work on with hpets enabled, I don't see the HPET2 in /proc/interupts as below. cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 82 0 3 0 IO-APIC-edge timer 1: 0 0 1712 6 IO-APIC-edge i8042 3: 0 0 6 0 IO-APIC-edge 4: 0 0 6 0 IO-APIC-edge 6: 0 0 4 0 IO-APIC-edge floppy 8: 0 0 60 0 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 37798 179 IO-APIC-edge i8042 14: 0 0 16462 71 IO-APIC-edge pata_atiixp 15: 0 0 5713 17 IO-APIC-edge pata_atiixp 16: 0 0 904 2 IO-APIC-fasteoi aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel, ni-pci-gpib 17: 0 0 2 0 IO-APIC-fasteoi ehci_hcd:usb1, parport0, ni-pci-gpib 18: 0 0 49940 90 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, nvidia 19: 0 0 703 2 IO-APIC-fasteoi aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 22: 0 0 1303 15 IO-APIC-fasteoi ahci 24: 261763 0 0 0 HPET_MSI-edge hpet2 29: 0 0 220 5 PCI-MSI-edge sky2@pci:0000:04:00.0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 138 271356 264446 261050 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts PND: 0 0 0 0 Performance pending work RES: 4511 9275 8470 8086 Rescheduling interrupts CAL: 3624 8666 523 4543 Function call interrupts TLB: 981 1111 1065 1058 TLB shootdowns ERR: 0 MIS: 0 Regards Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-22 15:11 ` Mark Hounschell @ 2009-12-22 17:38 ` Linus Torvalds 2009-12-22 17:57 ` Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2009-12-22 17:38 UTC (permalink / raw) To: Mark Hounschell Cc: Mark Hounschell, Alain Knaff, Linux Kernel Mailing List, fdutils, Venkatesh Pallipadi, Shaohua Li, Ingo Molnar [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for details, but Mark is basically chasing down a situation where the floppy driver seems to have trouble formatting floppies, and it happened between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a memory block transfers the wrong value for the first byte of the block. Which should be impossible, but whatever. Some part of the system has a cached buffer that isn't flushed. What gets _you_ guys involved is that Mark cannot reproduce the bug if HPET is disabled in the BIOS or by using 'nohpet'. He found that out by pure luck while bisecting, because some time during his bisect, his machine wouldn't even boot with HPET. So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But 2.6.28 (and current -git) does not. Any ideas? ] On Tue, 22 Dec 2009, Mark Hounschell wrote: > > Ok, I may have something that might help. > > # git bisect bad > 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit > commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 > Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> > Date: Fri Sep 5 18:02:18 2008 -0700 > > x86: HPET_MSI Initialise per-cpu HPET timers > > Initialize a per CPU HPET MSI timer when possible. We retain the HPET > timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We > setup the remaining HPET timers as per CPU MSI based timers. This per CPU > timer will eliminate the need for timer broadcasting with IRQ 0 when there > is non-functional LAPIC timer across CPU deep C-states. > > If there are more CPUs than number of available timers, CPUs that do not > find any timer to use will continue using LAPIC and IRQ 0 broadcast. > > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > Signed-off-by: Shaohua Li <shaohua.li@intel.com> > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > And of coarse this was the first commit that I could not boot if I had hpet > enabled. To get this one to boot (single user mode only) I had to add the > the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c > > commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a > > @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev) > { > > if (request_irq(dev->irq, hpet_interrupt_handler, > - IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev)) > + IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev)) > return -1; > > disable_irq(dev->irq); > > AND add the quiet cmdline option. Ok, so we know why HPET didn't boot for you, and that was fixed later (by that 5ceb1a04). But is this also when the floppy started mis-behaving? IOW, _if_ you boot with that fix from commit 5ceb1a04 (and the quiet option - I wonder what that is about: do you have any ideas?), is the per-CPU HPET timer commit also the commit that causes floppy problems, or is this purely a "bisect when HPET became a boot-up problem"? Linus --- > Also, of all the machines it does work on with hpets enabled, I don't see > the HPET2 in /proc/interupts as below. > > > cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 > 0: 82 0 3 0 IO-APIC-edge timer > 1: 0 0 1712 6 IO-APIC-edge i8042 > 3: 0 0 6 0 IO-APIC-edge > 4: 0 0 6 0 IO-APIC-edge > 6: 0 0 4 0 IO-APIC-edge floppy > 8: 0 0 60 0 IO-APIC-edge rtc0 > 9: 0 0 0 0 IO-APIC-fasteoi acpi > 12: 0 0 37798 179 IO-APIC-edge i8042 > 14: 0 0 16462 71 IO-APIC-edge pata_atiixp > 15: 0 0 5713 17 IO-APIC-edge pata_atiixp > 16: 0 0 904 2 IO-APIC-fasteoi aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel, ni-pci-gpib > 17: 0 0 2 0 IO-APIC-fasteoi ehci_hcd:usb1, parport0, ni-pci-gpib > 18: 0 0 49940 90 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, nvidia > 19: 0 0 703 2 IO-APIC-fasteoi aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 > 22: 0 0 1303 15 IO-APIC-fasteoi ahci > > 24: 261763 0 0 0 HPET_MSI-edge hpet2 > > 29: 0 0 220 5 PCI-MSI-edge sky2@pci:0000:04:00.0 > NMI: 0 0 0 0 Non-maskable interrupts > LOC: 138 271356 264446 261050 Local timer interrupts > SPU: 0 0 0 0 Spurious interrupts > PMI: 0 0 0 0 Performance monitoring interrupts > PND: 0 0 0 0 Performance pending work > RES: 4511 9275 8470 8086 Rescheduling interrupts > CAL: 3624 8666 523 4543 Function call interrupts > TLB: 981 1111 1065 1058 TLB shootdowns > ERR: 0 > MIS: 0 ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-22 17:38 ` Linus Torvalds @ 2009-12-22 17:57 ` Mark Hounschell 2009-12-22 23:37 ` Pallipadi, Venkatesh 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2009-12-22 17:57 UTC (permalink / raw) To: Linus Torvalds Cc: Mark Hounschell, Alain Knaff, Linux Kernel Mailing List, fdutils, Venkatesh Pallipadi, Shaohua Li, Ingo Molnar On 12/22/2009 12:38 PM, Linus Torvalds wrote: > > [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for > details, but Mark is basically chasing down a situation where the floppy > driver seems to have trouble formatting floppies, and it happened > between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a > memory block transfers the wrong value for the first byte of the block. > > Which should be impossible, but whatever. Some part of the system has a > cached buffer that isn't flushed. > > What gets _you_ guys involved is that Mark cannot reproduce the bug if > HPET is disabled in the BIOS or by using 'nohpet'. He found that out by > pure luck while bisecting, because some time during his bisect, his > machine wouldn't even boot with HPET. > > So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But > 2.6.28 (and current -git) does not. Any ideas? ] > > On Tue, 22 Dec 2009, Mark Hounschell wrote: >> >> Ok, I may have something that might help. >> >> # git bisect bad >> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit >> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 >> Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> >> Date: Fri Sep 5 18:02:18 2008 -0700 >> >> x86: HPET_MSI Initialise per-cpu HPET timers >> >> Initialize a per CPU HPET MSI timer when possible. We retain the HPET >> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We >> setup the remaining HPET timers as per CPU MSI based timers. This per CPU >> timer will eliminate the need for timer broadcasting with IRQ 0 when there >> is non-functional LAPIC timer across CPU deep C-states. >> >> If there are more CPUs than number of available timers, CPUs that do not >> find any timer to use will continue using LAPIC and IRQ 0 broadcast. >> >> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> >> Signed-off-by: Shaohua Li <shaohua.li@intel.com> >> Signed-off-by: Ingo Molnar <mingo@elte.hu> >> >> And of coarse this was the first commit that I could not boot if I had hpet >> enabled. To get this one to boot (single user mode only) I had to add the >> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c >> >> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a >> >> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev) >> { >> >> if (request_irq(dev->irq, hpet_interrupt_handler, >> - IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev)) >> + IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev)) >> return -1; >> >> disable_irq(dev->irq); >> >> AND add the quiet cmdline option. > > Ok, so we know why HPET didn't boot for you, and that was fixed later (by > that 5ceb1a04). But is this also when the floppy started mis-behaving? > Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops working and also when I could no longer boot with hpet enabled. Commit 5ceb1a04 is where I found I could boot again with the hpet enabled. It was a simple patch so backed it into where I was in order to be able to boot with hpet on. I did 2 different bisects. First to find out when I could boot again with hpet on, then the next to find which caused the floppy problem. Using the patch from the first bisect (5ceb1a04) while doing the second bisect. > IOW, _if_ you boot with that fix from commit 5ceb1a04 (and the quiet > option - I wonder what that is about: do you have any ideas?), is the > per-CPU HPET timer commit also the commit that causes floppy problems, or > is this purely a "bisect when HPET became a boot-up problem"? > The quiet option was only needed because with that 5ceb1a04 commit applied to the kernels I was interested in, kernel messages of some kind went on for hours and I could not get a login prompt. They went by so fast and I didn't have a serial console available to see them. They must not have too important or critical because the machine acted as normal as any machine in single user mode. But once I got to a single user login prompt it was for sure the same floppy problem. > > --- >> Also, of all the machines it does work on with hpets enabled, I don't see >> the HPET2 in /proc/interupts as below. >> >> >> cat /proc/interrupts >> CPU0 CPU1 CPU2 CPU3 >> 0: 82 0 3 0 IO-APIC-edge timer >> 1: 0 0 1712 6 IO-APIC-edge i8042 >> 3: 0 0 6 0 IO-APIC-edge >> 4: 0 0 6 0 IO-APIC-edge >> 6: 0 0 4 0 IO-APIC-edge floppy >> 8: 0 0 60 0 IO-APIC-edge rtc0 >> 9: 0 0 0 0 IO-APIC-fasteoi acpi >> 12: 0 0 37798 179 IO-APIC-edge i8042 >> 14: 0 0 16462 71 IO-APIC-edge pata_atiixp >> 15: 0 0 5713 17 IO-APIC-edge pata_atiixp >> 16: 0 0 904 2 IO-APIC-fasteoi aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel, ni-pci-gpib >> 17: 0 0 2 0 IO-APIC-fasteoi ehci_hcd:usb1, parport0, ni-pci-gpib >> 18: 0 0 49940 90 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, nvidia >> 19: 0 0 703 2 IO-APIC-fasteoi aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 >> 22: 0 0 1303 15 IO-APIC-fasteoi ahci >> >> 24: 261763 0 0 0 HPET_MSI-edge hpet2 >> >> 29: 0 0 220 5 PCI-MSI-edge sky2@pci:0000:04:00.0 >> NMI: 0 0 0 0 Non-maskable interrupts >> LOC: 138 271356 264446 261050 Local timer interrupts >> SPU: 0 0 0 0 Spurious interrupts >> PMI: 0 0 0 0 Performance monitoring interrupts >> PND: 0 0 0 0 Performance pending work >> RES: 4511 9275 8470 8086 Rescheduling interrupts >> CAL: 3624 8666 523 4543 Function call interrupts >> TLB: 981 1111 1065 1058 TLB shootdowns >> ERR: 0 >> MIS: 0 > Regards Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-22 17:57 ` Mark Hounschell @ 2009-12-22 23:37 ` Pallipadi, Venkatesh 2009-12-23 0:22 ` Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Pallipadi, Venkatesh @ 2009-12-22 23:37 UTC (permalink / raw) To: markh@compro.net Cc: Linus Torvalds, Mark Hounschell, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote: > On 12/22/2009 12:38 PM, Linus Torvalds wrote: > > > > [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for > > details, but Mark is basically chasing down a situation where the floppy > > driver seems to have trouble formatting floppies, and it happened > > between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a > > memory block transfers the wrong value for the first byte of the block. > > > > Which should be impossible, but whatever. Some part of the system has a > > cached buffer that isn't flushed. > > > > What gets _you_ guys involved is that Mark cannot reproduce the bug if > > HPET is disabled in the BIOS or by using 'nohpet'. He found that out by > > pure luck while bisecting, because some time during his bisect, his > > machine wouldn't even boot with HPET. > > > > So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But > > 2.6.28 (and current -git) does not. Any ideas? ] > > > > On Tue, 22 Dec 2009, Mark Hounschell wrote: > >> > >> Ok, I may have something that might help. > >> > >> # git bisect bad > >> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit > >> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 > >> Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> > >> Date: Fri Sep 5 18:02:18 2008 -0700 > >> > >> x86: HPET_MSI Initialise per-cpu HPET timers > >> > >> Initialize a per CPU HPET MSI timer when possible. We retain the HPET > >> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We > >> setup the remaining HPET timers as per CPU MSI based timers. This per CPU > >> timer will eliminate the need for timer broadcasting with IRQ 0 when there > >> is non-functional LAPIC timer across CPU deep C-states. > >> > >> If there are more CPUs than number of available timers, CPUs that do not > >> find any timer to use will continue using LAPIC and IRQ 0 broadcast. > >> > >> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > >> Signed-off-by: Shaohua Li <shaohua.li@intel.com> > >> Signed-off-by: Ingo Molnar <mingo@elte.hu> > >> > >> And of coarse this was the first commit that I could not boot if I had hpet > >> enabled. To get this one to boot (single user mode only) I had to add the > >> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c > >> > >> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a > >> > >> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev) > >> { > >> > >> if (request_irq(dev->irq, hpet_interrupt_handler, > >> - IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev)) > >> + IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev)) > >> return -1; > >> > >> disable_irq(dev->irq); > >> > >> AND add the quiet cmdline option. > > > > Ok, so we know why HPET didn't boot for you, and that was fixed later (by > > that 5ceb1a04). But is this also when the floppy started mis-behaving? > > > > Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops > working > and also when I could no longer boot with hpet enabled. I am missing something here. Commit 26afe5f2 is where system does not boot with HPET or is it where the floppy stops working when you boot with HPET enabled. Can you try "idle=halt" with both .27 and .28 with /proc/interrupts output in each case. With that option, we should be using local APIC timer and PIT, HPET or HPET with MSI should not really matter. Does it still fail with .28 with that option? Thanks, Venki ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-22 23:37 ` Pallipadi, Venkatesh @ 2009-12-23 0:22 ` Mark Hounschell 2009-12-23 13:02 ` Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2009-12-23 0:22 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: markh@compro.net, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar, Alain Knaff On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote: > On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote: >> On 12/22/2009 12:38 PM, Linus Torvalds wrote: >>> >>> [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for >>> details, but Mark is basically chasing down a situation where the floppy >>> driver seems to have trouble formatting floppies, and it happened >>> between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a >>> memory block transfers the wrong value for the first byte of the block. >>> >>> Which should be impossible, but whatever. Some part of the system has a >>> cached buffer that isn't flushed. >>> >>> What gets _you_ guys involved is that Mark cannot reproduce the bug if >>> HPET is disabled in the BIOS or by using 'nohpet'. He found that out by >>> pure luck while bisecting, because some time during his bisect, his >>> machine wouldn't even boot with HPET. >>> >>> So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But >>> 2.6.28 (and current -git) does not. Any ideas? ] >>> >>> On Tue, 22 Dec 2009, Mark Hounschell wrote: >>>> >>>> Ok, I may have something that might help. >>>> >>>> # git bisect bad >>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit >>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 >>>> Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> >>>> Date: Fri Sep 5 18:02:18 2008 -0700 >>>> >>>> x86: HPET_MSI Initialise per-cpu HPET timers >>>> >>>> Initialize a per CPU HPET MSI timer when possible. We retain the HPET >>>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We >>>> setup the remaining HPET timers as per CPU MSI based timers. This per CPU >>>> timer will eliminate the need for timer broadcasting with IRQ 0 when there >>>> is non-functional LAPIC timer across CPU deep C-states. >>>> >>>> If there are more CPUs than number of available timers, CPUs that do not >>>> find any timer to use will continue using LAPIC and IRQ 0 broadcast. >>>> >>>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> >>>> Signed-off-by: Shaohua Li <shaohua.li@intel.com> >>>> Signed-off-by: Ingo Molnar <mingo@elte.hu> >>>> >>>> And of coarse this was the first commit that I could not boot if I had hpet >>>> enabled. To get this one to boot (single user mode only) I had to add the >>>> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c >>>> >>>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a >>>> >>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev) >>>> { >>>> >>>> if (request_irq(dev->irq, hpet_interrupt_handler, >>>> - IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev)) >>>> + IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev)) >>>> return -1; >>>> >>>> disable_irq(dev->irq); >>>> >>>> AND add the quiet cmdline option. >>> >>> Ok, so we know why HPET didn't boot for you, and that was fixed later (by >>> that 5ceb1a04). But is this also when the floppy started mis-behaving? >>> >> >> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops >> working >> and also when I could no longer boot with hpet enabled. > > > I am missing something here. Commit 26afe5f2 is where system does not > boot with HPET or is it where the floppy stops working when you boot > with HPET enabled. > As it happens, both happen there. Commit 5ceb1a04 is where it starts booting _again_ with hpet enabled. So I took that patch (5ceb1a04) and applied it to (26afe5f2f) to be able to boot with hpet enabled. I had to use the quiet option to get to a login prompt, but there is where the floppy format first fails, just as it does in 2.6.28 and up. > Can you try "idle=halt" with both .27 and .28 with /proc/interrupts > output in each case. With that option, we should be using local APIC > timer and PIT, HPET or HPET with MSI should not really matter. Does it > still fail with .28 with that option? > Yes, I will try that for you but will have to wait until the morning. Sorry. Regards Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-23 0:22 ` Mark Hounschell @ 2009-12-23 13:02 ` Mark Hounschell 2009-12-23 15:10 ` Pallipadi, Venkatesh 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2009-12-23 13:02 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: dmarkh, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On 12/22/2009 07:22 PM, Mark Hounschell wrote: > On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote: >> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote: >>> On 12/22/2009 12:38 PM, Linus Torvalds wrote: >>>> >>>> [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for >>>> details, but Mark is basically chasing down a situation where the floppy >>>> driver seems to have trouble formatting floppies, and it happened >>>> between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a >>>> memory block transfers the wrong value for the first byte of the block. >>>> >>>> Which should be impossible, but whatever. Some part of the system has a >>>> cached buffer that isn't flushed. >>>> >>>> What gets _you_ guys involved is that Mark cannot reproduce the bug if >>>> HPET is disabled in the BIOS or by using 'nohpet'. He found that out by >>>> pure luck while bisecting, because some time during his bisect, his >>>> machine wouldn't even boot with HPET. >>>> >>>> So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But >>>> 2.6.28 (and current -git) does not. Any ideas? ] >>>> >>>> On Tue, 22 Dec 2009, Mark Hounschell wrote: >>>>> >>>>> Ok, I may have something that might help. >>>>> >>>>> # git bisect bad >>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit >>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 >>>>> Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> >>>>> Date: Fri Sep 5 18:02:18 2008 -0700 >>>>> >>>>> x86: HPET_MSI Initialise per-cpu HPET timers >>>>> >>>>> Initialize a per CPU HPET MSI timer when possible. We retain the HPET >>>>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We >>>>> setup the remaining HPET timers as per CPU MSI based timers. This per CPU >>>>> timer will eliminate the need for timer broadcasting with IRQ 0 when there >>>>> is non-functional LAPIC timer across CPU deep C-states. >>>>> >>>>> If there are more CPUs than number of available timers, CPUs that do not >>>>> find any timer to use will continue using LAPIC and IRQ 0 broadcast. >>>>> >>>>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> >>>>> Signed-off-by: Shaohua Li <shaohua.li@intel.com> >>>>> Signed-off-by: Ingo Molnar <mingo@elte.hu> >>>>> >>>>> And of coarse this was the first commit that I could not boot if I had hpet >>>>> enabled. To get this one to boot (single user mode only) I had to add the >>>>> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c >>>>> >>>>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a >>>>> >>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev) >>>>> { >>>>> >>>>> if (request_irq(dev->irq, hpet_interrupt_handler, >>>>> - IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev)) >>>>> + IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev)) >>>>> return -1; >>>>> >>>>> disable_irq(dev->irq); >>>>> >>>>> AND add the quiet cmdline option. >>>> >>>> Ok, so we know why HPET didn't boot for you, and that was fixed later (by >>>> that 5ceb1a04). But is this also when the floppy started mis-behaving? >>>> >>> >>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops >>> working >>> and also when I could no longer boot with hpet enabled. >> >> >> I am missing something here. Commit 26afe5f2 is where system does not >> boot with HPET or is it where the floppy stops working when you boot >> with HPET enabled. >> > > As it happens, both happen there. Commit 5ceb1a04 is where it starts > booting _again_ with hpet enabled. So I took that patch (5ceb1a04) and > applied it to (26afe5f2f) to be able to boot with hpet enabled. I had to > use the quiet option to get to a login prompt, but there is where the > floppy format first fails, just as it does in 2.6.28 and up. > >> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts >> output in each case. With that option, we should be using local APIC >> timer and PIT, HPET or HPET with MSI should not really matter. Does it >> still fail with .28 with that option? >> 2.6.28 still fails with that option. 2.6.27.41 /proc/interrupts with idle=halt CPU0 CPU1 CPU2 CPU3 0: 126 0 0 1 IO-APIC-edge timer 1: 0 0 1 157 IO-APIC-edge i8042 3: 0 0 0 6 IO-APIC-edge 4: 0 0 0 6 IO-APIC-edge 6: 0 0 0 4 IO-APIC-edge floppy 8: 0 0 0 1 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 1 128 IO-APIC-edge i8042 14: 0 0 34 4457 IO-APIC-edge pata_atiixp 15: 0 0 4 480 IO-APIC-edge pata_atiixp 16: 0 0 0 397 IO-APIC-fasteoi aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel 17: 0 0 0 2 IO-APIC-fasteoi ehci_hcd:usb1 18: 0 0 0 0 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 19: 0 0 0 142 IO-APIC-fasteoi aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 22: 0 0 4 1154 IO-APIC-fasteoi ahci 219: 0 0 3 63 PCI-MSI-edge eth0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 91539 91964 92525 91181 Local timer interrupts RES: 2888 3873 2434 2721 Rescheduling interrupts CAL: 240 245 247 84 function call interrupts TLB: 768 628 526 512 TLB shootdowns SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 2.6.28 /proc/interrupts with idle=halt CPU0 CPU1 CPU2 CPU3 0: 126 0 2 0 IO-APIC-edge timer 1: 0 0 192 0 IO-APIC-edge i8042 3: 0 0 6 0 IO-APIC-edge 4: 0 0 6 0 IO-APIC-edge 6: 0 0 4 0 IO-APIC-edge floppy 8: 0 0 1 0 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 128 1 IO-APIC-edge i8042 14: 0 1 147114 396 IO-APIC-edge pata_atiixp 15: 0 0 646 2 IO-APIC-edge pata_atiixp 16: 0 0 396 0 IO-APIC-fasteoi aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel 17: 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1 18: 0 0 0 0 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 19: 0 0 362 1 IO-APIC-fasteoi aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 22: 0 0 874 1 IO-APIC-fasteoi ahci 1274: 0 0 193 4 PCI-MSI-edge eth0 1279: 513207 0 0 0 HPET_MSI-edge hpet2 NMI: 0 0 0 0 Non-maskable interrupts LOC: 268 513395 513138 522088 Local timer interrupts RES: 3262 3679 2573 3746 Rescheduling interrupts CAL: 131 166 57 147 Function call interrupts TLB: 680 438 450 639 TLB shootdowns SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* RE: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-23 13:02 ` Mark Hounschell @ 2009-12-23 15:10 ` Pallipadi, Venkatesh 2009-12-23 15:34 ` Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Pallipadi, Venkatesh @ 2009-12-23 15:10 UTC (permalink / raw) To: markh@compro.net Cc: dmarkh@cfl.rr.com, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar >-----Original Message----- >From: Mark Hounschell [mailto:markh@compro.net] >Sent: Wednesday, December 23, 2009 5:03 AM >To: Pallipadi, Venkatesh >Cc: dmarkh@cfl.rr.com; Linus Torvalds; Alain Knaff; Linux >Kernel Mailing List; fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar >Subject: Re: [Fdutils] DMA cache consistency bug introduced in >2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) > >On 12/22/2009 07:22 PM, Mark Hounschell wrote: >> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote: >>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote: >>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote: >>>>> >>>>> [ Ingo, Venki and Shaohua added to cc: see the whole >thread on lkml for >>>>> details, but Mark is basically chasing down a situation >where the floppy >>>>> driver seems to have trouble formatting floppies, and >it happened >>>>> between 2.6.27 and .28. The trouble seems to be that a >DMA transfer of a >>>>> memory block transfers the wrong value for the first >byte of the block. >>>>> >>>>> Which should be impossible, but whatever. Some part of >the system has a >>>>> cached buffer that isn't flushed. >>>>> >>>>> What gets _you_ guys involved is that Mark cannot >reproduce the bug if >>>>> HPET is disabled in the BIOS or by using 'nohpet'. He >found that out by >>>>> pure luck while bisecting, because some time during his >bisect, his >>>>> machine wouldn't even boot with HPET. >>>>> >>>>> So the problem is: with HPET enabled, 2.6.27.4 _used_ >to work. But >>>>> 2.6.28 (and current -git) does not. Any ideas? ] >>>>> >>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote: >>>>>> >>>>>> Ok, I may have something that might help. >>>>>> >>>>>> # git bisect bad >>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit >>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 >>>>>> Author: venkatesh.pallipadi@intel.com ><venkatesh.pallipadi@intel.com> >>>>>> Date: Fri Sep 5 18:02:18 2008 -0700 >>>>>> >>>>>> x86: HPET_MSI Initialise per-cpu HPET timers >>>>>> >>>>>> Initialize a per CPU HPET MSI timer when possible. >We retain the HPET >>>>>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when >legacy mode is being used. We >>>>>> setup the remaining HPET timers as per CPU MSI based >timers. This per CPU >>>>>> timer will eliminate the need for timer broadcasting >with IRQ 0 when there >>>>>> is non-functional LAPIC timer across CPU deep C-states. >>>>>> >>>>>> If there are more CPUs than number of available >timers, CPUs that do not >>>>>> find any timer to use will continue using LAPIC and >IRQ 0 broadcast. >>>>>> >>>>>> Signed-off-by: Venkatesh Pallipadi ><venkatesh.pallipadi@intel.com> >>>>>> Signed-off-by: Shaohua Li <shaohua.li@intel.com> >>>>>> Signed-off-by: Ingo Molnar <mingo@elte.hu> >>>>>> >>>>>> And of coarse this was the first commit that I could not >boot if I had hpet >>>>>> enabled. To get this one to boot (single user mode only) >I had to add the >>>>>> the quiet cmdline option and following patch from to >arch/x86/kernel/hpet.c >>>>>> >>>>>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a >>>>>> >>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct >hpet_dev *dev) >>>>>> { >>>>>> >>>>>> if (request_irq(dev->irq, hpet_interrupt_handler, >>>>>> - IRQF_SHARED|IRQF_NOBALANCING, >dev->name, dev)) >>>>>> + IRQF_DISABLED|IRQF_NOBALANCING, >dev->name, dev)) >>>>>> return -1; >>>>>> >>>>>> disable_irq(dev->irq); >>>>>> >>>>>> AND add the quiet cmdline option. >>>>> >>>>> Ok, so we know why HPET didn't boot for you, and that was >fixed later (by >>>>> that 5ceb1a04). But is this also when the floppy started >mis-behaving? >>>>> >>>> >>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when >the floppy stops >>>> working >>>> and also when I could no longer boot with hpet enabled. >>> >>> >>> I am missing something here. Commit 26afe5f2 is where >system does not >>> boot with HPET or is it where the floppy stops working when you boot >>> with HPET enabled. >>> >> >> As it happens, both happen there. Commit 5ceb1a04 is where it starts >> booting _again_ with hpet enabled. So I took that patch >(5ceb1a04) and >> applied it to (26afe5f2f) to be able to boot with hpet >enabled. I had to >> use the quiet option to get to a login prompt, but there is where the >> floppy format first fails, just as it does in 2.6.28 and up. >> >>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts >>> output in each case. With that option, we should be using local APIC >>> timer and PIT, HPET or HPET with MSI should not really >matter. Does it >>> still fail with .28 with that option? >>> > >2.6.28 still fails with that option. > >2.6.27.41 /proc/interrupts with idle=halt > > CPU0 CPU1 CPU2 CPU3 > 0: 126 0 0 1 >IO-APIC-edge timer > 1: 0 0 1 157 >IO-APIC-edge i8042 > 3: 0 0 0 6 IO-APIC-edge > 4: 0 0 0 6 IO-APIC-edge > 6: 0 0 0 4 >IO-APIC-edge floppy > 8: 0 0 0 1 >IO-APIC-edge rtc0 > 9: 0 0 0 0 >IO-APIC-fasteoi acpi > 12: 0 0 1 128 >IO-APIC-edge i8042 > 14: 0 0 34 4457 IO-APIC-edge >pata_atiixp > 15: 0 0 4 480 IO-APIC-edge >pata_atiixp > 16: 0 0 0 397 IO-APIC-fasteoi >aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel > 17: 0 0 0 2 IO-APIC-fasteoi >ehci_hcd:usb1 > 18: 0 0 0 0 IO-APIC-fasteoi >ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 > 19: 0 0 0 142 IO-APIC-fasteoi >aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 > 22: 0 0 4 1154 >IO-APIC-fasteoi ahci >219: 0 0 3 63 >PCI-MSI-edge eth0 >NMI: 0 0 0 0 >Non-maskable interrupts >LOC: 91539 91964 92525 91181 Local timer >interrupts >RES: 2888 3873 2434 2721 >Rescheduling interrupts >CAL: 240 245 247 84 function >call interrupts >TLB: 768 628 526 512 TLB shootdowns >SPU: 0 0 0 0 Spurious interrupts >ERR: 0 >MIS: 0 > >2.6.28 /proc/interrupts with idle=halt > > CPU0 CPU1 CPU2 CPU3 > 0: 126 0 2 0 >IO-APIC-edge timer > 1: 0 0 192 0 >IO-APIC-edge i8042 > 3: 0 0 6 0 IO-APIC-edge > 4: 0 0 6 0 IO-APIC-edge > 6: 0 0 4 0 >IO-APIC-edge floppy > 8: 0 0 1 0 >IO-APIC-edge rtc0 > 9: 0 0 0 0 >IO-APIC-fasteoi acpi > 12: 0 0 128 1 >IO-APIC-edge i8042 > 14: 0 1 147114 396 IO-APIC-edge >pata_atiixp > 15: 0 0 646 2 IO-APIC-edge >pata_atiixp > 16: 0 0 396 0 IO-APIC-fasteoi >aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel > 17: 0 0 0 0 IO-APIC-fasteoi >ehci_hcd:usb1 > 18: 0 0 0 0 IO-APIC-fasteoi >ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 > 19: 0 0 362 1 IO-APIC-fasteoi >aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 > 22: 0 0 874 1 >IO-APIC-fasteoi ahci >1274: 0 0 193 4 >PCI-MSI-edge eth0 >1279: 513207 0 0 0 >HPET_MSI-edge hpet2 >NMI: 0 0 0 0 >Non-maskable interrupts >LOC: 268 513395 513138 522088 Local timer >interrupts >RES: 3262 3679 2573 3746 >Rescheduling interrupts >CAL: 131 166 57 147 Function >call interrupts >TLB: 680 438 450 639 TLB shootdowns >SPU: 0 0 0 0 Spurious interrupts >ERR: 0 >MIS: 0 > Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case. I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28 /proc/timer_list grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* Thanks, Venki ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-23 15:10 ` Pallipadi, Venkatesh @ 2009-12-23 15:34 ` Mark Hounschell 2009-12-23 15:57 ` Mark Hounschell 2009-12-23 16:31 ` Linus Torvalds 0 siblings, 2 replies; 74+ messages in thread From: Mark Hounschell @ 2009-12-23 15:34 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: dmarkh@cfl.rr.com, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar [-- Attachment #1: Type: text/plain, Size: 9847 bytes --] On 12/23/2009 10:10 AM, Pallipadi, Venkatesh wrote: > > >> -----Original Message----- >> From: Mark Hounschell [mailto:markh@compro.net] >> Sent: Wednesday, December 23, 2009 5:03 AM >> To: Pallipadi, Venkatesh >> Cc: dmarkh@cfl.rr.com; Linus Torvalds; Alain Knaff; Linux >> Kernel Mailing List; fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar >> Subject: Re: [Fdutils] DMA cache consistency bug introduced in >> 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) >> >> On 12/22/2009 07:22 PM, Mark Hounschell wrote: >>> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote: >>>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote: >>>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote: >>>>>> >>>>>> [ Ingo, Venki and Shaohua added to cc: see the whole >> thread on lkml for >>>>>> details, but Mark is basically chasing down a situation >> where the floppy >>>>>> driver seems to have trouble formatting floppies, and >> it happened >>>>>> between 2.6.27 and .28. The trouble seems to be that a >> DMA transfer of a >>>>>> memory block transfers the wrong value for the first >> byte of the block. >>>>>> >>>>>> Which should be impossible, but whatever. Some part of >> the system has a >>>>>> cached buffer that isn't flushed. >>>>>> >>>>>> What gets _you_ guys involved is that Mark cannot >> reproduce the bug if >>>>>> HPET is disabled in the BIOS or by using 'nohpet'. He >> found that out by >>>>>> pure luck while bisecting, because some time during his >> bisect, his >>>>>> machine wouldn't even boot with HPET. >>>>>> >>>>>> So the problem is: with HPET enabled, 2.6.27.4 _used_ >> to work. But >>>>>> 2.6.28 (and current -git) does not. Any ideas? ] >>>>>> >>>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote: >>>>>>> >>>>>>> Ok, I may have something that might help. >>>>>>> >>>>>>> # git bisect bad >>>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit >>>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 >>>>>>> Author: venkatesh.pallipadi@intel.com >> <venkatesh.pallipadi@intel.com> >>>>>>> Date: Fri Sep 5 18:02:18 2008 -0700 >>>>>>> >>>>>>> x86: HPET_MSI Initialise per-cpu HPET timers >>>>>>> >>>>>>> Initialize a per CPU HPET MSI timer when possible. >> We retain the HPET >>>>>>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when >> legacy mode is being used. We >>>>>>> setup the remaining HPET timers as per CPU MSI based >> timers. This per CPU >>>>>>> timer will eliminate the need for timer broadcasting >> with IRQ 0 when there >>>>>>> is non-functional LAPIC timer across CPU deep C-states. >>>>>>> >>>>>>> If there are more CPUs than number of available >> timers, CPUs that do not >>>>>>> find any timer to use will continue using LAPIC and >> IRQ 0 broadcast. >>>>>>> >>>>>>> Signed-off-by: Venkatesh Pallipadi >> <venkatesh.pallipadi@intel.com> >>>>>>> Signed-off-by: Shaohua Li <shaohua.li@intel.com> >>>>>>> Signed-off-by: Ingo Molnar <mingo@elte.hu> >>>>>>> >>>>>>> And of coarse this was the first commit that I could not >> boot if I had hpet >>>>>>> enabled. To get this one to boot (single user mode only) >> I had to add the >>>>>>> the quiet cmdline option and following patch from to >> arch/x86/kernel/hpet.c >>>>>>> >>>>>>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a >>>>>>> >>>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct >> hpet_dev *dev) >>>>>>> { >>>>>>> >>>>>>> if (request_irq(dev->irq, hpet_interrupt_handler, >>>>>>> - IRQF_SHARED|IRQF_NOBALANCING, >> dev->name, dev)) >>>>>>> + IRQF_DISABLED|IRQF_NOBALANCING, >> dev->name, dev)) >>>>>>> return -1; >>>>>>> >>>>>>> disable_irq(dev->irq); >>>>>>> >>>>>>> AND add the quiet cmdline option. >>>>>> >>>>>> Ok, so we know why HPET didn't boot for you, and that was >> fixed later (by >>>>>> that 5ceb1a04). But is this also when the floppy started >> mis-behaving? >>>>>> >>>>> >>>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when >> the floppy stops >>>>> working >>>>> and also when I could no longer boot with hpet enabled. >>>> >>>> >>>> I am missing something here. Commit 26afe5f2 is where >> system does not >>>> boot with HPET or is it where the floppy stops working when you boot >>>> with HPET enabled. >>>> >>> >>> As it happens, both happen there. Commit 5ceb1a04 is where it starts >>> booting _again_ with hpet enabled. So I took that patch >> (5ceb1a04) and >>> applied it to (26afe5f2f) to be able to boot with hpet >> enabled. I had to >>> use the quiet option to get to a login prompt, but there is where the >>> floppy format first fails, just as it does in 2.6.28 and up. >>> >>>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts >>>> output in each case. With that option, we should be using local APIC >>>> timer and PIT, HPET or HPET with MSI should not really >> matter. Does it >>>> still fail with .28 with that option? >>>> >> >> 2.6.28 still fails with that option. >> >> 2.6.27.41 /proc/interrupts with idle=halt >> >> CPU0 CPU1 CPU2 CPU3 >> 0: 126 0 0 1 >> IO-APIC-edge timer >> 1: 0 0 1 157 >> IO-APIC-edge i8042 >> 3: 0 0 0 6 IO-APIC-edge >> 4: 0 0 0 6 IO-APIC-edge >> 6: 0 0 0 4 >> IO-APIC-edge floppy >> 8: 0 0 0 1 >> IO-APIC-edge rtc0 >> 9: 0 0 0 0 >> IO-APIC-fasteoi acpi >> 12: 0 0 1 128 >> IO-APIC-edge i8042 >> 14: 0 0 34 4457 IO-APIC-edge >> pata_atiixp >> 15: 0 0 4 480 IO-APIC-edge >> pata_atiixp >> 16: 0 0 0 397 IO-APIC-fasteoi >> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel >> 17: 0 0 0 2 IO-APIC-fasteoi >> ehci_hcd:usb1 >> 18: 0 0 0 0 IO-APIC-fasteoi >> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 >> 19: 0 0 0 142 IO-APIC-fasteoi >> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 >> 22: 0 0 4 1154 >> IO-APIC-fasteoi ahci >> 219: 0 0 3 63 >> PCI-MSI-edge eth0 >> NMI: 0 0 0 0 >> Non-maskable interrupts >> LOC: 91539 91964 92525 91181 Local timer >> interrupts >> RES: 2888 3873 2434 2721 >> Rescheduling interrupts >> CAL: 240 245 247 84 function >> call interrupts >> TLB: 768 628 526 512 TLB shootdowns >> SPU: 0 0 0 0 Spurious interrupts >> ERR: 0 >> MIS: 0 >> >> 2.6.28 /proc/interrupts with idle=halt >> >> CPU0 CPU1 CPU2 CPU3 >> 0: 126 0 2 0 >> IO-APIC-edge timer >> 1: 0 0 192 0 >> IO-APIC-edge i8042 >> 3: 0 0 6 0 IO-APIC-edge >> 4: 0 0 6 0 IO-APIC-edge >> 6: 0 0 4 0 >> IO-APIC-edge floppy >> 8: 0 0 1 0 >> IO-APIC-edge rtc0 >> 9: 0 0 0 0 >> IO-APIC-fasteoi acpi >> 12: 0 0 128 1 >> IO-APIC-edge i8042 >> 14: 0 1 147114 396 IO-APIC-edge >> pata_atiixp >> 15: 0 0 646 2 IO-APIC-edge >> pata_atiixp >> 16: 0 0 396 0 IO-APIC-fasteoi >> aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel >> 17: 0 0 0 0 IO-APIC-fasteoi >> ehci_hcd:usb1 >> 18: 0 0 0 0 IO-APIC-fasteoi >> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 >> 19: 0 0 362 1 IO-APIC-fasteoi >> aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 >> 22: 0 0 874 1 >> IO-APIC-fasteoi ahci >> 1274: 0 0 193 4 >> PCI-MSI-edge eth0 >> 1279: 513207 0 0 0 >> HPET_MSI-edge hpet2 >> NMI: 0 0 0 0 >> Non-maskable interrupts >> LOC: 268 513395 513138 522088 Local timer >> interrupts >> RES: 3262 3679 2573 3746 >> Rescheduling interrupts >> CAL: 131 166 57 147 Function >> call interrupts >> TLB: 680 438 450 639 TLB shootdowns >> SPU: 0 0 0 0 Spurious interrupts >> ERR: 0 >> MIS: 0 >> > > Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case. > > I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28 > /proc/timer_list Attached. > grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* I have no /sys/devices/system/cpu/cpu0/cpuidle on this machine. Maybe because of # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # CONFIG_CPU_IDLE is not set Would it be OK if when you ask for 2.6.28 info, I use a 2.6.32.2 kernel? That kernel also fails fdformat with hpet enabled on these machines. Thanks Mark [-- Attachment #2: timer_list.txt --] [-- Type: text/plain, Size: 7901 bytes --] Timer List Version: v0.4 HRTIMER_MAX_CLOCK_BASES: 2 now at 123990857169 nsecs cpu: 0 clock 0: .base: c2a13320 .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1261581506376548727 nsecs active timers: clock 1: .base: c2a1334c .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <c2a133a4>, tick_sched_timer, S:01 # expires at 123991000000-123991000000 nsecs [in 142831 to 142831 nsecs] #1: <f1987544>, it_real_fn, S:01 # expires at 124645673184-124645673184 nsecs [in 654816015 to 654816015 nsecs] #2: <f2823b4c>, hrtimer_wakeup, S:01 # expires at 125434022644-125439022643 nsecs [in 1443165475 to 1448165474 nsecs] #3: <f1ab3e94>, hrtimer_wakeup, S:01 # expires at 3668872852847-3668872902847 nsecs [in 3544881995678 to 3544882045678 nsecs] #4: <f2269b4c>, hrtimer_wakeup, S:01 # expires at 4295018153722969-4295018153772969 nsecs [in 4294894162865800 to 4294894162915800 nsecs] .expires_next : 123991000000 nsecs .hres_active : 1 .nr_events : 125349 .nohz_mode : 0 .idle_tick : 0 nsecs .tick_stopped : 0 .idle_jiffies : 0 .idle_calls : 0 .idle_sleeps : 0 .idle_entrytime : 0 nsecs .idle_waketime : 0 nsecs .idle_exittime : 0 nsecs .idle_sleeptime : 0 nsecs .last_jiffies : 0 .next_jiffies : 0 .idle_expires : 0 nsecs jiffies: 4294791286 cpu: 1 clock 0: .base: c2a1c320 .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1261581506376548727 nsecs active timers: clock 1: .base: c2a1c34c .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <c2a1c3a4>, tick_sched_timer, S:01 # expires at 123991125000-123991125000 nsecs [in 267831 to 267831 nsecs] #1: <c043b230>, sched_rt_period_timer, S:01 # expires at 124000000000-124000000000 nsecs [in 9142831 to 9142831 nsecs] #2: <f1ab5bc4>, hrtimer_wakeup, S:01 # expires at 129199139399-129219139398 nsecs [in 5208282230 to 5228282229 nsecs] #3: <f1a77b4c>, hrtimer_wakeup, S:01 # expires at 139203140160-139233140159 nsecs [in 15212282991 to 15242282990 nsecs] #4: <f1aade94>, hrtimer_wakeup, S:01 # expires at 28868872949729-28868872999729 nsecs [in 28744882092560 to 28744882142560 nsecs] .expires_next : 123991125000 nsecs .hres_active : 1 .nr_events : 123377 .nohz_mode : 0 .idle_tick : 0 nsecs .tick_stopped : 0 .idle_jiffies : 0 .idle_calls : 0 .idle_sleeps : 0 .idle_entrytime : 0 nsecs .idle_waketime : 0 nsecs .idle_exittime : 0 nsecs .idle_sleeptime : 0 nsecs .last_jiffies : 0 .next_jiffies : 0 .idle_expires : 0 nsecs jiffies: 4294791286 cpu: 2 clock 0: .base: c2a25320 .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1261581506376548727 nsecs active timers: clock 1: .base: c2a2534c .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <c2a253a4>, tick_sched_timer, S:01 # expires at 123991250000-123991250000 nsecs [in 392831 to 392831 nsecs] #1: <f1eb9bc4>, hrtimer_wakeup, S:01 # expires at 124623691750-124625680749 nsecs [in 632834581 to 634823580 nsecs] #2: <f1f7dbc4>, hrtimer_wakeup, S:01 # expires at 127624283651-127628265650 nsecs [in 3633426482 to 3637408481 nsecs] #3: <f1cf1bc4>, hrtimer_wakeup, S:01 # expires at 136624366877-136654360876 nsecs [in 12633509708 to 12663503707 nsecs] #4: <f1ad7bc4>, hrtimer_wakeup, S:01 # expires at 153654620007-153692611006 nsecs [in 29663762838 to 29701753837 nsecs] #5: <f1b25f58>, hrtimer_wakeup, S:01 # expires at 155514242261-155514292261 nsecs [in 31523385092 to 31523435092 nsecs] #6: <f198de94>, hrtimer_wakeup, S:01 # expires at 668873371418-668873421418 nsecs [in 544882514249 to 544882564249 nsecs] #7: <f1f3fb4c>, hrtimer_wakeup, S:01 # expires at 86508836731823-86508936731823 nsecs [in 86384845874654 to 86384945874654 nsecs] .expires_next : 123991250000 nsecs .hres_active : 1 .nr_events : 123166 .nohz_mode : 0 .idle_tick : 0 nsecs .tick_stopped : 0 .idle_jiffies : 0 .idle_calls : 0 .idle_sleeps : 0 .idle_entrytime : 0 nsecs .idle_waketime : 0 nsecs .idle_exittime : 0 nsecs .idle_sleeptime : 0 nsecs .last_jiffies : 0 .next_jiffies : 0 .idle_expires : 0 nsecs jiffies: 4294791286 cpu: 3 clock 0: .base: c2a2e320 .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1261581506376548727 nsecs active timers: clock 1: .base: c2a2e34c .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <c2a2e3a4>, tick_sched_timer, S:01 # expires at 123991375000-123991375000 nsecs [in 517831 to 517831 nsecs] #1: <f1935bc4>, hrtimer_wakeup, S:01 # expires at 124624395215-124626393214 nsecs [in 633538046 to 635536045 nsecs] #2: <f1aafbc4>, hrtimer_wakeup, S:01 # expires at 169815643582-169875643581 nsecs [in 45824786413 to 45884786412 nsecs] #3: <f23cdbc4>, hrtimer_wakeup, S:01 # expires at 346123697800-346223697800 nsecs [in 222132840631 to 222232840631 nsecs] #4: <f1b04204>, it_real_fn, S:01 # expires at 403383744722-403383744722 nsecs [in 279392887553 to 279392887553 nsecs] #5: <f1b09e04>, it_real_fn, S:01 # expires at 403383795968-403383795968 nsecs [in 279392938799 to 279392938799 nsecs] #6: <f19871c4>, it_real_fn, S:01 # expires at 403383804795-403383804795 nsecs [in 279392947626 to 279392947626 nsecs] #7: <f199be94>, hrtimer_wakeup, S:01 # expires at 668872854209-668872904209 nsecs [in 544881997040 to 544882047040 nsecs] .expires_next : 123991375000 nsecs .hres_active : 1 .nr_events : 122962 .nohz_mode : 0 .idle_tick : 0 nsecs .tick_stopped : 0 .idle_jiffies : 0 .idle_calls : 0 .idle_sleeps : 0 .idle_entrytime : 0 nsecs .idle_waketime : 0 nsecs .idle_exittime : 0 nsecs .idle_sleeptime : 0 nsecs .last_jiffies : 0 .next_jiffies : 0 .idle_expires : 0 nsecs jiffies: 4294791286 Tick Device: mode: 1 Broadcast device Clock Event Device: hpet max_delta_ns: 2147483647 min_delta_ns: 5000 mult: 61510048 shift: 32 mode: 3 next_event: 9223372036854775807 nsecs set_next_event: hpet_legacy_next_event set_mode: hpet_legacy_set_mode event_handler: tick_handle_oneshot_broadcast tick_broadcast_mask: 00000000 tick_broadcast_oneshot_mask: 00000000 Tick Device: mode: 1 Per CPU device: 0 Clock Event Device: hpet2 max_delta_ns: 2147483647 min_delta_ns: 5000 mult: 61510047 shift: 32 mode: 3 next_event: 123991000000 nsecs set_next_event: hpet_msi_next_event set_mode: hpet_msi_set_mode event_handler: hrtimer_interrupt Tick Device: mode: 1 Per CPU device: 1 Clock Event Device: lapic max_delta_ns: 670831998 min_delta_ns: 1199 mult: 53707624 shift: 32 mode: 3 next_event: 123991125000 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt Tick Device: mode: 1 Per CPU device: 2 Clock Event Device: lapic max_delta_ns: 670831998 min_delta_ns: 1199 mult: 53707624 shift: 32 mode: 3 next_event: 123991250000 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt Tick Device: mode: 1 Per CPU device: 3 Clock Event Device: lapic max_delta_ns: 670831998 min_delta_ns: 1199 mult: 53707624 shift: 32 mode: 3 next_event: 123991375000 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-23 15:34 ` Mark Hounschell @ 2009-12-23 15:57 ` Mark Hounschell 2009-12-23 16:31 ` Linus Torvalds 1 sibling, 0 replies; 74+ messages in thread From: Mark Hounschell @ 2009-12-23 15:57 UTC (permalink / raw) To: markh Cc: Pallipadi, Venkatesh, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar, Linus Torvalds On 12/23/2009 10:34 AM, Mark Hounschell wrote: > On 12/23/2009 10:10 AM, Pallipadi, Venkatesh wrote: >> >> >>> -----Original Message----- >>> From: Mark Hounschell [mailto:markh@compro.net] >>> Sent: Wednesday, December 23, 2009 5:03 AM >>> To: Pallipadi, Venkatesh >>> Cc: dmarkh@cfl.rr.com; Linus Torvalds; Alain Knaff; Linux >>> Kernel Mailing List; fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar >>> Subject: Re: [Fdutils] DMA cache consistency bug introduced in >>> 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) >>> >>> On 12/22/2009 07:22 PM, Mark Hounschell wrote: >>>> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote: >>>>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote: >>>>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote: >>>>>>> >>>>>>> [ Ingo, Venki and Shaohua added to cc: see the whole >>> thread on lkml for >>>>>>> details, but Mark is basically chasing down a situation >>> where the floppy >>>>>>> driver seems to have trouble formatting floppies, and >>> it happened >>>>>>> between 2.6.27 and .28. The trouble seems to be that a >>> DMA transfer of a >>>>>>> memory block transfers the wrong value for the first >>> byte of the block. >>>>>>> >>>>>>> Which should be impossible, but whatever. Some part of >>> the system has a >>>>>>> cached buffer that isn't flushed. >>>>>>> >>>>>>> What gets _you_ guys involved is that Mark cannot >>> reproduce the bug if >>>>>>> HPET is disabled in the BIOS or by using 'nohpet'. He >>> found that out by >>>>>>> pure luck while bisecting, because some time during his >>> bisect, his >>>>>>> machine wouldn't even boot with HPET. >>>>>>> >>>>>>> So the problem is: with HPET enabled, 2.6.27.4 _used_ >>> to work. But >>>>>>> 2.6.28 (and current -git) does not. Any ideas? ] >>>>>>> >>>>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote: >>>>>>>> >>>>>>>> Ok, I may have something that might help. >>>>>>>> >>>>>>>> # git bisect bad >>>>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit >>>>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 >>>>>>>> Author: venkatesh.pallipadi@intel.com >>> <venkatesh.pallipadi@intel.com> >>>>>>>> Date: Fri Sep 5 18:02:18 2008 -0700 >>>>>>>> >>>>>>>> x86: HPET_MSI Initialise per-cpu HPET timers >>>>>>>> >>>>>>>> Initialize a per CPU HPET MSI timer when possible. >>> We retain the HPET >>>>>>>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when >>> legacy mode is being used. We >>>>>>>> setup the remaining HPET timers as per CPU MSI based >>> timers. This per CPU >>>>>>>> timer will eliminate the need for timer broadcasting >>> with IRQ 0 when there >>>>>>>> is non-functional LAPIC timer across CPU deep C-states. >>>>>>>> >>>>>>>> If there are more CPUs than number of available >>> timers, CPUs that do not >>>>>>>> find any timer to use will continue using LAPIC and >>> IRQ 0 broadcast. >>>>>>>> >>>>>>>> Signed-off-by: Venkatesh Pallipadi >>> <venkatesh.pallipadi@intel.com> >>>>>>>> Signed-off-by: Shaohua Li <shaohua.li@intel.com> >>>>>>>> Signed-off-by: Ingo Molnar <mingo@elte.hu> >>>>>>>> >>>>>>>> And of coarse this was the first commit that I could not >>> boot if I had hpet >>>>>>>> enabled. To get this one to boot (single user mode only) >>> I had to add the >>>>>>>> the quiet cmdline option and following patch from to >>> arch/x86/kernel/hpet.c >>>>>>>> >>>>>>>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a >>>>>>>> >>>>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct >>> hpet_dev *dev) >>>>>>>> { >>>>>>>> >>>>>>>> if (request_irq(dev->irq, hpet_interrupt_handler, >>>>>>>> - IRQF_SHARED|IRQF_NOBALANCING, >>> dev->name, dev)) >>>>>>>> + IRQF_DISABLED|IRQF_NOBALANCING, >>> dev->name, dev)) >>>>>>>> return -1; >>>>>>>> >>>>>>>> disable_irq(dev->irq); >>>>>>>> >>>>>>>> AND add the quiet cmdline option. >>>>>>> >>>>>>> Ok, so we know why HPET didn't boot for you, and that was >>> fixed later (by >>>>>>> that 5ceb1a04). But is this also when the floppy started >>> mis-behaving? >>>>>>> >>>>>> >>>>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when >>> the floppy stops >>>>>> working >>>>>> and also when I could no longer boot with hpet enabled. >>>>> >>>>> >>>>> I am missing something here. Commit 26afe5f2 is where >>> system does not >>>>> boot with HPET or is it where the floppy stops working when you boot >>>>> with HPET enabled. >>>>> >>>> >>>> As it happens, both happen there. Commit 5ceb1a04 is where it starts >>>> booting _again_ with hpet enabled. So I took that patch >>> (5ceb1a04) and >>>> applied it to (26afe5f2f) to be able to boot with hpet >>> enabled. I had to >>>> use the quiet option to get to a login prompt, but there is where the >>>> floppy format first fails, just as it does in 2.6.28 and up. >>>> >>>>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts >>>>> output in each case. With that option, we should be using local APIC >>>>> timer and PIT, HPET or HPET with MSI should not really >>> matter. Does it >>>>> still fail with .28 with that option? >>>>> >>> >>> 2.6.28 still fails with that option. >>> >>> 2.6.27.41 /proc/interrupts with idle=halt >>> >>> CPU0 CPU1 CPU2 CPU3 >>> 0: 126 0 0 1 >>> IO-APIC-edge timer >>> 1: 0 0 1 157 >>> IO-APIC-edge i8042 >>> 3: 0 0 0 6 IO-APIC-edge >>> 4: 0 0 0 6 IO-APIC-edge >>> 6: 0 0 0 4 >>> IO-APIC-edge floppy >>> 8: 0 0 0 1 >>> IO-APIC-edge rtc0 >>> 9: 0 0 0 0 >>> IO-APIC-fasteoi acpi >>> 12: 0 0 1 128 >>> IO-APIC-edge i8042 >>> 14: 0 0 34 4457 IO-APIC-edge >>> pata_atiixp >>> 15: 0 0 4 480 IO-APIC-edge >>> pata_atiixp >>> 16: 0 0 0 397 IO-APIC-fasteoi >>> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel >>> 17: 0 0 0 2 IO-APIC-fasteoi >>> ehci_hcd:usb1 >>> 18: 0 0 0 0 IO-APIC-fasteoi >>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 >>> 19: 0 0 0 142 IO-APIC-fasteoi >>> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 >>> 22: 0 0 4 1154 >>> IO-APIC-fasteoi ahci >>> 219: 0 0 3 63 >>> PCI-MSI-edge eth0 >>> NMI: 0 0 0 0 >>> Non-maskable interrupts >>> LOC: 91539 91964 92525 91181 Local timer >>> interrupts >>> RES: 2888 3873 2434 2721 >>> Rescheduling interrupts >>> CAL: 240 245 247 84 function >>> call interrupts >>> TLB: 768 628 526 512 TLB shootdowns >>> SPU: 0 0 0 0 Spurious interrupts >>> ERR: 0 >>> MIS: 0 >>> >>> 2.6.28 /proc/interrupts with idle=halt >>> >>> CPU0 CPU1 CPU2 CPU3 >>> 0: 126 0 2 0 >>> IO-APIC-edge timer >>> 1: 0 0 192 0 >>> IO-APIC-edge i8042 >>> 3: 0 0 6 0 IO-APIC-edge >>> 4: 0 0 6 0 IO-APIC-edge >>> 6: 0 0 4 0 >>> IO-APIC-edge floppy >>> 8: 0 0 1 0 >>> IO-APIC-edge rtc0 >>> 9: 0 0 0 0 >>> IO-APIC-fasteoi acpi >>> 12: 0 0 128 1 >>> IO-APIC-edge i8042 >>> 14: 0 1 147114 396 IO-APIC-edge >>> pata_atiixp >>> 15: 0 0 646 2 IO-APIC-edge >>> pata_atiixp >>> 16: 0 0 396 0 IO-APIC-fasteoi >>> aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel >>> 17: 0 0 0 0 IO-APIC-fasteoi >>> ehci_hcd:usb1 >>> 18: 0 0 0 0 IO-APIC-fasteoi >>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 >>> 19: 0 0 362 1 IO-APIC-fasteoi >>> aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 >>> 22: 0 0 874 1 >>> IO-APIC-fasteoi ahci >>> 1274: 0 0 193 4 >>> PCI-MSI-edge eth0 >>> 1279: 513207 0 0 0 >>> HPET_MSI-edge hpet2 >>> NMI: 0 0 0 0 >>> Non-maskable interrupts >>> LOC: 268 513395 513138 522088 Local timer >>> interrupts >>> RES: 3262 3679 2573 3746 >>> Rescheduling interrupts >>> CAL: 131 166 57 147 Function >>> call interrupts >>> TLB: 680 438 450 639 TLB shootdowns >>> SPU: 0 0 0 0 Spurious interrupts >>> ERR: 0 >>> MIS: 0 >>> >> >> Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case. >> >> I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28 >> /proc/timer_list > > Attached. > >> grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* > > I have no /sys/devices/system/cpu/cpu0/cpuidle on this machine. > Maybe because of > > # > # CPU Frequency scaling > # > # CONFIG_CPU_FREQ is not set > # CONFIG_CPU_IDLE is not set > > Would it be OK if when you ask for 2.6.28 info, I use a 2.6.32.2 kernel? > That kernel also fails fdformat with hpet enabled on these machines. > I do have this on 2.6.32.2 though. # grep . /sys/devices/system/cpu/cpuidle/current_* /sys/devices/system/cpu/cpuidle/current_driver:acpi_idle /sys/devices/system/cpu/cpuidle/current_governor_ro:ladder Want me to go back to 2.6.28 and show this? Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) 2009-12-23 15:34 ` Mark Hounschell 2009-12-23 15:57 ` Mark Hounschell @ 2009-12-23 16:31 ` Linus Torvalds 2009-12-23 16:38 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Andi Kleen 1 sibling, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2009-12-23 16:31 UTC (permalink / raw) To: Mark Hounschell Cc: Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Wed, 23 Dec 2009, Mark Hounschell wrote: > > > > Hmm. Looks like hpet2 is still getting used instead of local APIC > > timer in .28 case. > > > > I was expecting some low number in hpet2 and local timer on all CPU to > > be around the same value. Above shows CPU 0 is depending on hpet2 for > > some reason even with idle=halt. Can you send the output of below two > > in case of .28 /proc/timer_list > > Attached. Oh wow. That's crazy: Tick Device: mode: 1 Per CPU device: 0 Clock Event Device: hpet2 max_delta_ns: 2147483647 min_delta_ns: 5000 mult: 61510047 shift: 32 mode: 3 next_event: 123991000000 nsecs set_next_event: hpet_msi_next_event set_mode: hpet_msi_set_mode event_handler: hrtimer_interrupt Tick Device: mode: 1 Per CPU device: 1 Clock Event Device: lapic max_delta_ns: 670831998 min_delta_ns: 1199 mult: 53707624 shift: 32 mode: 3 next_event: 123991125000 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt ... It's not using the lapic for CPU0. Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty expensive to reprogram (compared to the local apic). And having different timers for different CPU's is just odd. The fact that the timer subsystem can do this and it all (mostly) works at all is nice and impressive, but doesn't make it any less crazy ;) That said, none of this seems to explain why DMA/fdformat doesn't work. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 16:31 ` Linus Torvalds @ 2009-12-23 16:38 ` Andi Kleen 2009-12-23 16:49 ` Linus Torvalds 2009-12-23 17:41 ` Mark Hounschell 0 siblings, 2 replies; 74+ messages in thread From: Andi Kleen @ 2009-12-23 16:38 UTC (permalink / raw) To: Linus Torvalds Cc: Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar Linus Torvalds <torvalds@linux-foundation.org> writes: > It's not using the lapic for CPU0. > > Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty > expensive to reprogram (compared to the local apic). And having different > timers for different CPU's is just odd. > > The fact that the timer subsystem can do this and it all (mostly) works at > all is nice and impressive, but doesn't make it any less crazy ;) I suspect it's a system where the APIC timer stops in deeper idle states and it supports them. In this case CPU #0 does timer broadcasts when needed to wake the other CPUs up from deep C, but for that it has to run with HPET. At least the other ones can still enjoy the LAPIC timer. This might suggest that Mark's floppy controller doesn't like deep C? Mark, did you try booting with processor.max_cstate=1 and HPET enabled? -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 16:38 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Andi Kleen @ 2009-12-23 16:49 ` Linus Torvalds 2009-12-23 17:08 ` Andi Kleen ` (2 more replies) 2009-12-23 17:41 ` Mark Hounschell 1 sibling, 3 replies; 74+ messages in thread From: Linus Torvalds @ 2009-12-23 16:49 UTC (permalink / raw) To: Andi Kleen Cc: Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Wed, 23 Dec 2009, Andi Kleen wrote: > > I suspect it's a system where the APIC timer stops in deeper idle > states and it supports them. In this case CPU #0 does timer broadcasts > when needed to wake the other CPUs up from deep C, but for that it has > to run with HPET. At least the other ones can still enjoy the LAPIC > timer. Ahh, ok, that makes sense. I was assuming the broadcast timer would act in that capacity, but.. > This might suggest that Mark's floppy controller doesn't like > deep C? Mark, did you try booting with processor.max_cstate=1 > and HPET enabled? We have indeed had historical issues with floppy and sleep states before. I do note another issue, though - the floppy driver itself seems totally broken when it comes to using interleaved sectors. Alain, that "place logical sectors" code is simply _broken_ - the "while" kicks in only if the first sector we test is busy _and_ we were at the last sector so that we increment past F_SECT_PER_TRACK. So shouldn't that sector layout be something like the appended? Linus --- drivers/block/floppy.c | 7 ++----- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 3266b4f..9c9148c 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -2237,13 +2237,10 @@ static void setup_format_params(int track) for (count = 1; count <= F_SECT_PER_TRACK; ++count) { here[n].sect = count; n = (n + il) % F_SECT_PER_TRACK; - if (here[n].sect) { /* sector busy, find next free sector */ + while (here[n].sect) { /* sector busy, find next free sector */ ++n; - if (n >= F_SECT_PER_TRACK) { + if (n >= F_SECT_PER_TRACK) n -= F_SECT_PER_TRACK; - while (here[n].sect) - ++n; - } } } if (_floppy->stretch & FD_SECTBASEMASK) { ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 16:49 ` Linus Torvalds @ 2009-12-23 17:08 ` Andi Kleen 2009-12-25 12:21 ` Arjan van de Ven 2009-12-27 11:09 ` Pavel Machek 2009-12-23 17:19 ` Pallipadi, Venkatesh 2009-12-23 20:11 ` alain 2 siblings, 2 replies; 74+ messages in thread From: Andi Kleen @ 2009-12-23 17:08 UTC (permalink / raw) To: Linus Torvalds Cc: Andi Kleen, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Wed, Dec 23, 2009 at 08:49:38AM -0800, Linus Torvalds wrote: > > > On Wed, 23 Dec 2009, Andi Kleen wrote: > > > > I suspect it's a system where the APIC timer stops in deeper idle > > states and it supports them. In this case CPU #0 does timer broadcasts > > when needed to wake the other CPUs up from deep C, but for that it has > > to run with HPET. At least the other ones can still enjoy the LAPIC > > timer. > > Ahh, ok, that makes sense. I was assuming the broadcast timer would act in > that capacity, but.. The "broadcasts" are done using IPIs from cpu #08 and only when that target CPU is deep idle. That's more efficient than letting the hardware always broadcast. > > > This might suggest that Mark's floppy controller doesn't like > > deep C? Mark, did you try booting with processor.max_cstate=1 > > and HPET enabled? > > We have indeed had historical issues with floppy and sleep states before. I removed that code when moving to 64bit (floppy driver disabling C1), but perhaps we need some variant of it again (but it's the first such report in many years). Although it would be sad to have it again on all systems. -Andi ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 17:08 ` Andi Kleen @ 2009-12-25 12:21 ` Arjan van de Ven 2009-12-25 20:33 ` Andi Kleen 2009-12-27 11:09 ` Pavel Machek 1 sibling, 1 reply; 74+ messages in thread From: Arjan van de Ven @ 2009-12-25 12:21 UTC (permalink / raw) To: Andi Kleen Cc: Linus Torvalds, Andi Kleen, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Wed, 23 Dec 2009 18:08:32 +0100 Andi Kleen <andi@firstfloor.org> wrote: > I removed that code when moving to 64bit (floppy driver disabling C1), > but perhaps we need some variant of it again (but it's the first such > report in many years). Although it would be sad to have it again on > all systems. at least now we have the pmqos infrastructure, driver just needs to ask for 0 latency ;) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-25 12:21 ` Arjan van de Ven @ 2009-12-25 20:33 ` Andi Kleen 2009-12-26 9:38 ` Arjan van de Ven 0 siblings, 1 reply; 74+ messages in thread From: Andi Kleen @ 2009-12-25 20:33 UTC (permalink / raw) To: Arjan van de Ven Cc: Andi Kleen, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Fri, Dec 25, 2009 at 01:21:16PM +0100, Arjan van de Ven wrote: > On Wed, 23 Dec 2009 18:08:32 +0100 > Andi Kleen <andi@firstfloor.org> wrote: > > > I removed that code when moving to 64bit (floppy driver disabling C1), > > but perhaps we need some variant of it again (but it's the first such > > report in many years). Although it would be sad to have it again on > > all systems. > > at least now we have the pmqos infrastructure, driver just needs to ask > for 0 latency ;) Does pmqos work with apci=off etc.? I didn't think it shut down the classic "HLT" idle, does it? The old i386 systems needed that apparently, they long pre date any deeper idle states. Anyways the code is still there for 32bit. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-25 20:33 ` Andi Kleen @ 2009-12-26 9:38 ` Arjan van de Ven 2009-12-26 16:40 ` Andi Kleen 0 siblings, 1 reply; 74+ messages in thread From: Arjan van de Ven @ 2009-12-26 9:38 UTC (permalink / raw) To: Andi Kleen Cc: Andi Kleen, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Fri, 25 Dec 2009 21:33:04 +0100 Andi Kleen <andi@firstfloor.org> wrote: > On Fri, Dec 25, 2009 at 01:21:16PM +0100, Arjan van de Ven wrote: > > On Wed, 23 Dec 2009 18:08:32 +0100 > > Andi Kleen <andi@firstfloor.org> wrote: > > > > > I removed that code when moving to 64bit (floppy driver disabling > > > C1), but perhaps we need some variant of it again (but it's the > > > first such report in many years). Although it would be sad to > > > have it again on all systems. > > > > at least now we have the pmqos infrastructure, driver just needs to > > ask for 0 latency ;) > > Does pmqos work with apci=off etc.? yes > I didn't think it shut down > the classic "HLT" idle, does it? it does if you specify a latency of 0; it will then go into the spin-only state until you give up your latency requirement -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-26 9:38 ` Arjan van de Ven @ 2009-12-26 16:40 ` Andi Kleen 2009-12-27 12:28 ` Alain Knaff 0 siblings, 1 reply; 74+ messages in thread From: Andi Kleen @ 2009-12-26 16:40 UTC (permalink / raw) To: Arjan van de Ven Cc: Andi Kleen, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar > > Does pmqos work with apci=off etc.? > > yes > > > I didn't think it shut down > > the classic "HLT" idle, does it? > > it does if you specify a latency of 0; it will then go into the > spin-only state until you give up your latency requirement I looked at it this evening, but it seems like pm_qos is not interrupt safe (e.g. calls blocking notifiers) and floppy currently does enable/disable_hlt from interrupts and bottom halves. Would need some more infrastructure work or restructuring of the floppy driver. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-26 16:40 ` Andi Kleen @ 2009-12-27 12:28 ` Alain Knaff 2009-12-28 1:54 ` Andi Kleen 0 siblings, 1 reply; 74+ messages in thread From: Alain Knaff @ 2009-12-27 12:28 UTC (permalink / raw) To: Andi Kleen Cc: Arjan van de Ven, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar, morgan, JONES Andi Kleen wrote: >>> Does pmqos work with apci=off etc.? >> yes >> >>> I didn't think it shut down >>> the classic "HLT" idle, does it? >> it does if you specify a latency of 0; it will then go into the >> spin-only state until you give up your latency requirement > > I looked at it this evening, but it seems like pm_qos is not > interrupt safe (e.g. calls blocking notifiers) and floppy currently does > enable/disable_hlt from interrupts and bottom halves. > > Would need some more infrastructure work or restructuring > of the floppy driver. > > -Andi disable_hlt/enable_hlt was only needed to work around a bug on TM4000 (Texas Instrument) Laptops which were popular around 1994 / 1995. Basically, as soon as the CPU went into hlt() state, so did the DMA controller, either causing a really slow transfer, or (worse) a buffer over/underrun which failed the operation. On hardware unaffected by this particular bug (which would be most hardware around now, 14 years after the fact...), these calls can safely be removed. Regards, Alain ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-27 12:28 ` Alain Knaff @ 2009-12-28 1:54 ` Andi Kleen 2009-12-28 10:27 ` Alain Knaff 0 siblings, 1 reply; 74+ messages in thread From: Andi Kleen @ 2009-12-28 1:54 UTC (permalink / raw) To: Alain Knaff Cc: Andi Kleen, Arjan van de Ven, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar, morgan, JONES > disable_hlt/enable_hlt was only needed to work around a bug on TM4000 > (Texas Instrument) Laptops which were popular around 1994 / 1995. I don't think we can fully drop support for these systems. Did they have an unique PCI ID or something else that could be tested for? Perhaps it could be just a white list like dmi_year > 1995 to disable. Depending on how often floppies are still used this might save non trivial amounts of power on newer systems :) Anyways it would be probably good to convert this to the new infrastructure, and remove the old hooks, but the interrupt-context issue would need to be fixed first. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-28 1:54 ` Andi Kleen @ 2009-12-28 10:27 ` Alain Knaff 2009-12-28 14:54 ` Andi Kleen 0 siblings, 1 reply; 74+ messages in thread From: Alain Knaff @ 2009-12-28 10:27 UTC (permalink / raw) To: Andi Kleen Cc: Arjan van de Ven, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar, morgan Andi Kleen wrote: >> disable_hlt/enable_hlt was only needed to work around a bug on TM4000 >> (Texas Instrument) Laptops which were popular around 1994 / 1995. > > I don't think we can fully drop support for these systems. > > Did they have an unique PCI ID or something else that could be tested > for? Floppy controllers are not PCI devices and thus have no PCI id unfortunately... :-( > Perhaps it could be just a white list like dmi_year > 1995 to disable. > > Depending on how often floppies are still used this might save > non trivial amounts of power on newer systems :) Removing these calls will indeed save a *tiny* amount of power by allowing the CPU to go into halt during DMA transfer. But the main argument should be simplification. > Anyways it would be probably good to convert this to the new infrastructure, > and remove the old hooks, but the interrupt-context issue would > need to be fixed first. > > -Andi Well, at least for testing whether it fixes the new problem (DMA cache issue), it's useful to know that these calls can be safely removed on almost all of today's machines. That way, we will know whether this refactoring will be worth the effort. Regards, Alain ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-28 10:27 ` Alain Knaff @ 2009-12-28 14:54 ` Andi Kleen 0 siblings, 0 replies; 74+ messages in thread From: Andi Kleen @ 2009-12-28 14:54 UTC (permalink / raw) To: Alain Knaff Cc: Andi Kleen, Arjan van de Ven, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar, morgan On Mon, Dec 28, 2009 at 11:27:56AM +0100, Alain Knaff wrote: > Andi Kleen wrote: > >> disable_hlt/enable_hlt was only needed to work around a bug on TM4000 > >> (Texas Instrument) Laptops which were popular around 1994 / 1995. > > > > I don't think we can fully drop support for these systems. > > > > Did they have an unique PCI ID or something else that could be tested > > for? > > Floppy controllers are not PCI devices and thus have no PCI id > unfortunately... :-( Yes, but it's enough to identify any component in the system. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 17:08 ` Andi Kleen 2009-12-25 12:21 ` Arjan van de Ven @ 2009-12-27 11:09 ` Pavel Machek 2009-12-28 20:54 ` Mark Hounschell 1 sibling, 1 reply; 74+ messages in thread From: Pavel Machek @ 2009-12-27 11:09 UTC (permalink / raw) To: Andi Kleen Cc: Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar > > > This might suggest that Mark's floppy controller doesn't like > > > deep C? Mark, did you try booting with processor.max_cstate=1 > > > and HPET enabled? > > > > We have indeed had historical issues with floppy and sleep states before. > > I removed that code when moving to 64bit (floppy driver disabling C1), > but perhaps we need some variant of it again (but it's the first such > report in many years). Although it would be sad to have it again on all > systems. C1 is hlt. Are you sure? I could see how C3 could cause problems (DMA latency), but... Can mark simply try with idle=poll? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-27 11:09 ` Pavel Machek @ 2009-12-28 20:54 ` Mark Hounschell 0 siblings, 0 replies; 74+ messages in thread From: Mark Hounschell @ 2009-12-28 20:54 UTC (permalink / raw) To: Pavel Machek Cc: Andi Kleen, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On 12/27/2009 06:09 AM, Pavel Machek wrote: > >>>> This might suggest that Mark's floppy controller doesn't like >>>> deep C? Mark, did you try booting with processor.max_cstate=1 >>>> and HPET enabled? >>> >>> We have indeed had historical issues with floppy and sleep states before. >> >> I removed that code when moving to 64bit (floppy driver disabling C1), >> but perhaps we need some variant of it again (but it's the first such >> report in many years). Although it would be sad to have it again on all >> systems. > > C1 is hlt. Are you sure? I could see how C3 could cause problems (DMA > latency), but... > > Can mark simply try with idle=poll? > > Pavel > The floppy still fails with idle=poll Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* RE: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 16:49 ` Linus Torvalds 2009-12-23 17:08 ` Andi Kleen @ 2009-12-23 17:19 ` Pallipadi, Venkatesh 2009-12-23 17:16 ` Andi Kleen 2009-12-23 20:11 ` alain 2 siblings, 1 reply; 74+ messages in thread From: Pallipadi, Venkatesh @ 2009-12-23 17:19 UTC (permalink / raw) To: Linus Torvalds, Andi Kleen Cc: Mark Hounschell, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar >-----Original Message----- >From: Linus Torvalds [mailto:torvalds@linux-foundation.org] >Sent: Wednesday, December 23, 2009 8:50 AM >To: Andi Kleen >Cc: Mark Hounschell; Pallipadi, Venkatesh; dmarkh@cfl.rr.com; >Alain Knaff; Linux Kernel Mailing List; >fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar >Subject: Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 > > > >On Wed, 23 Dec 2009, Andi Kleen wrote: >> >> I suspect it's a system where the APIC timer stops in deeper idle >> states and it supports them. In this case CPU #0 does timer >broadcasts >> when needed to wake the other CPUs up from deep C, but for >that it has >> to run with HPET. At least the other ones can still enjoy the LAPIC >> timer. > >Ahh, ok, that makes sense. I was assuming the broadcast timer >would act in >that capacity, but.. This is what I was thining yday and asked Mark to try idle=halt. This /proc/interrupts is with idle=halt when there should not be any C-states and broadcasts involved. >>> HPET_MSI-edge hpet2 >>> NMI: 0 0 0 0 >>> Non-maskable interrupts >>> LOC: 268 513395 513138 522088 Local timer >>> interrupts Not sure how this is related to floppy problem. But, we surely have something wrong with percpu HPET usage here. Thanks, Venki ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 17:19 ` Pallipadi, Venkatesh @ 2009-12-23 17:16 ` Andi Kleen 0 siblings, 0 replies; 74+ messages in thread From: Andi Kleen @ 2009-12-23 17:16 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: Linus Torvalds, Andi Kleen, Mark Hounschell, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar > This is what I was thining yday and asked Mark to try idle=halt. > This /proc/interrupts is with idle=halt when there should not be any > C-states and broadcasts involved. Ah ok, missed that sorry. Actually I'm glad that the floppy-idle hack is not needed again. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 16:49 ` Linus Torvalds 2009-12-23 17:08 ` Andi Kleen 2009-12-23 17:19 ` Pallipadi, Venkatesh @ 2009-12-23 20:11 ` alain 2 siblings, 0 replies; 74+ messages in thread From: alain @ 2009-12-23 20:11 UTC (permalink / raw) To: Linus Torvalds Cc: Andi Kleen, Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar Linus Torvalds wrote: > diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c > index 3266b4f..9c9148c 100644 > --- a/drivers/block/floppy.c > +++ b/drivers/block/floppy.c > @@ -2237,13 +2237,10 @@ static void setup_format_params(int track) > for (count = 1; count <= F_SECT_PER_TRACK; ++count) { > here[n].sect = count; > n = (n + il) % F_SECT_PER_TRACK; > - if (here[n].sect) { /* sector busy, find next free sector */ > + while (here[n].sect) { /* sector busy, find next free sector */ > ++n; > - if (n >= F_SECT_PER_TRACK) { > + if (n >= F_SECT_PER_TRACK) > n -= F_SECT_PER_TRACK; > - while (here[n].sect) > - ++n; > - } > } > } > if (_floppy->stretch & FD_SECTBASEMASK) { The original code does indeed look a little bit strange... and might break if there is a long run of "busy" sectors near the end of the physical track. Or maybe there is a mathematical reason why this situation cannot occur. I'll have to think about it a little bit more to come up with a test case that will break either the new or old code. But in any case, if a bug would occur due to this code, it would only depend on the format's parameters, and not on the hardwarde. Regards, Alain ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 16:38 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Andi Kleen 2009-12-23 16:49 ` Linus Torvalds @ 2009-12-23 17:41 ` Mark Hounschell 2009-12-23 18:01 ` Linus Torvalds 2009-12-23 19:18 ` Pallipadi, Venkatesh 1 sibling, 2 replies; 74+ messages in thread From: Mark Hounschell @ 2009-12-23 17:41 UTC (permalink / raw) To: Andi Kleen Cc: Linus Torvalds, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On 12/23/2009 11:38 AM, Andi Kleen wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> It's not using the lapic for CPU0. >> >> Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty >> expensive to reprogram (compared to the local apic). And having different >> timers for different CPU's is just odd. >> >> The fact that the timer subsystem can do this and it all (mostly) works at >> all is nice and impressive, but doesn't make it any less crazy ;) > > I suspect it's a system where the APIC timer stops in deeper idle > states and it supports them. In this case CPU #0 does timer broadcasts > when needed to wake the other CPUs up from deep C, but for that it has > to run with HPET. At least the other ones can still enjoy the LAPIC > timer. > > This might suggest that Mark's floppy controller doesn't like > deep C? Mark, did you try booting with processor.max_cstate=1 > and HPET enabled? I just did and /proc/interrupts looks the same and the floppy still does not format. I'll try the patch Linus provided now. Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 17:41 ` Mark Hounschell @ 2009-12-23 18:01 ` Linus Torvalds 2009-12-23 18:11 ` Mark Hounschell 2009-12-23 19:18 ` Pallipadi, Venkatesh 1 sibling, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2009-12-23 18:01 UTC (permalink / raw) To: Mark Hounschell Cc: Andi Kleen, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Wed, 23 Dec 2009, Mark Hounschell wrote: > > I'll try the patch Linus provided now. I doubt it matters - because if it did, it would matter for everybody, and the HPET thing shouldn't make any difference at all. [ Or rather, it should matter for everybody trying to format a specific format (without interleave it won't matter, and not all formats have any interleave - I think it was mainly used on 5.25" floppies and special formats). ] Besides, maybe I was just mis-reading the code. But getting some testing for the patch certainly won't hurt, so I'm not going to argue against it any more ;) Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 18:01 ` Linus Torvalds @ 2009-12-23 18:11 ` Mark Hounschell 0 siblings, 0 replies; 74+ messages in thread From: Mark Hounschell @ 2009-12-23 18:11 UTC (permalink / raw) To: Linus Torvalds Cc: Andi Kleen, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On 12/23/2009 01:01 PM, Linus Torvalds wrote: > > > On Wed, 23 Dec 2009, Mark Hounschell wrote: >> >> I'll try the patch Linus provided now. > > I doubt it matters - because if it did, it would matter for everybody, and > the HPET thing shouldn't make any difference at all. > > [ Or rather, it should matter for everybody trying to format a specific > format (without interleave it won't matter, and not all formats have any > interleave - I think it was mainly used on 5.25" floppies and special > formats). ] > > Besides, maybe I was just mis-reading the code. > > But getting some testing for the patch certainly won't hurt, so I'm not > going to argue against it any more ;) Yea, that hosed it up pretty good. The very first track label sent out caused some sort of timeout. Dec 23 13:10:02 harley kernel: Dec 23 13:10:02 harley kernel: floppy driver state Dec 23 13:10:02 harley kernel: ------------------- Dec 23 13:10:02 harley kernel: now=9017 last interrupt=8117 diff=900 last called handler=f73ce27d Dec 23 13:10:02 harley kernel: timeout_message=lock fdc Dec 23 13:10:02 harley kernel: last output bytes: Dec 23 13:10:02 harley kernel: 0 90 4294899106 Dec 23 13:10:02 harley kernel: 1a 90 4294899106 Dec 23 13:10:02 harley kernel: 0 90 4294899106 Dec 23 13:10:02 harley kernel: 3 90 4294899106 Dec 23 13:10:02 harley kernel: c1 90 4294899106 Dec 23 13:10:02 harley kernel: 10 90 4294899106 Dec 23 13:10:02 harley kernel: 7 80 4294899106 Dec 23 13:10:02 harley kernel: 0 90 4294899106 Dec 23 13:10:02 harley kernel: 8 81 4294899106 Dec 23 13:10:02 harley kernel: 4 80 4294899106 Dec 23 13:10:02 harley kernel: 0 90 4294899106 Dec 23 13:10:02 harley kernel: e6 80 8007 Dec 23 13:10:02 harley kernel: 0 90 8007 Dec 23 13:10:02 harley syslog-ng[2651]: last message repeated 2 times Dec 23 13:10:02 harley kernel: 1 90 8007 Dec 23 13:10:02 harley kernel: 2 90 8007 Dec 23 13:10:02 harley kernel: 12 90 8007 Dec 23 13:10:02 harley kernel: 1b 90 8007 Dec 23 13:10:02 harley kernel: ff 90 8007 Dec 23 13:10:02 harley kernel: last result at 8117 Dec 23 13:10:02 harley kernel: last redo_fd_request at 8117 Dec 23 13:10:02 harley kernel: Dec 23 13:10:02 harley kernel: status=80 Dec 23 13:10:02 harley kernel: fdc_busy=1 Dec 23 13:10:02 harley kernel: cont=f73d58e4 Dec 23 13:10:02 harley kernel: current_req=(null) Dec 23 13:10:02 harley kernel: command_status=-1 Dec 23 13:10:02 harley kernel: Dec 23 13:10:02 harley kernel: floppy0: floppy timeout called Dec 23 13:10:22 harley kernel: Dec 23 13:10:22 harley kernel: floppy driver state Dec 23 13:10:22 harley kernel: ------------------- Dec 23 13:10:22 harley kernel: now=15017 last interrupt=8117 diff=6900 last called handler=f73ce27d Dec 23 13:10:22 harley kernel: timeout_message=do wakeup Dec 23 13:10:22 harley kernel: last output bytes: Dec 23 13:10:22 harley kernel: 0 90 4294899106 Dec 23 13:10:22 harley kernel: 1a 90 4294899106 Dec 23 13:10:22 harley kernel: 0 90 4294899106 Dec 23 13:10:22 harley kernel: 3 90 4294899106 Dec 23 13:10:22 harley kernel: c1 90 4294899106 Dec 23 13:10:22 harley kernel: 10 90 4294899106 Dec 23 13:10:22 harley kernel: 7 80 4294899106 Dec 23 13:10:22 harley kernel: 0 90 4294899106 Dec 23 13:10:22 harley kernel: 8 81 4294899106 Dec 23 13:10:22 harley kernel: 4 80 4294899106 Dec 23 13:10:22 harley kernel: 0 90 4294899106 Dec 23 13:10:22 harley kernel: e6 80 8007 Dec 23 13:10:22 harley kernel: 0 90 8007 Dec 23 13:10:22 harley syslog-ng[2651]: last message repeated 2 times Dec 23 13:10:22 harley kernel: 1 90 8007 Dec 23 13:10:22 harley kernel: 2 90 8007 Dec 23 13:10:22 harley kernel: 12 90 8007 Dec 23 13:10:22 harley kernel: 1b 90 8007 Dec 23 13:10:22 harley kernel: ff 90 8007 Dec 23 13:10:22 harley kernel: last result at 8117 Dec 23 13:10:22 harley kernel: last redo_fd_request at 8117 Dec 23 13:10:22 harley kernel: Dec 23 13:10:22 harley kernel: status=80 Dec 23 13:10:22 harley kernel: fdc_busy=1 Dec 23 13:10:22 harley kernel: floppy_work.func=f73d03da Dec 23 13:10:22 harley kernel: cont=f73d5274 Dec 23 13:10:22 harley kernel: current_req=(null) Dec 23 13:10:22 harley kernel: command_status=-1 Dec 23 13:10:22 harley kernel: Dec 23 13:10:22 harley kernel: floppy0: floppy timeout called Dec 23 13:10:22 harley kernel: floppy.c: no request in request_don Have to reboot now... Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 17:41 ` Mark Hounschell 2009-12-23 18:01 ` Linus Torvalds @ 2009-12-23 19:18 ` Pallipadi, Venkatesh 2009-12-23 19:35 ` Mark Hounschell 1 sibling, 1 reply; 74+ messages in thread From: Pallipadi, Venkatesh @ 2009-12-23 19:18 UTC (permalink / raw) To: Mark Hounschell Cc: Andi Kleen, Linus Torvalds, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Wed, Dec 23, 2009 at 09:41:50AM -0800, Mark Hounschell wrote: > On 12/23/2009 11:38 AM, Andi Kleen wrote: > > Linus Torvalds <torvalds@linux-foundation.org> writes: > > > >> It's not using the lapic for CPU0. > >> > >> Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty > >> expensive to reprogram (compared to the local apic). And having different > >> timers for different CPU's is just odd. > >> > >> The fact that the timer subsystem can do this and it all (mostly) works at > >> all is nice and impressive, but doesn't make it any less crazy ;) > > > > I suspect it's a system where the APIC timer stops in deeper idle > > states and it supports them. In this case CPU #0 does timer broadcasts > > when needed to wake the other CPUs up from deep C, but for that it has > > to run with HPET. At least the other ones can still enjoy the LAPIC > > timer. > > > > This might suggest that Mark's floppy controller doesn't like > > deep C? Mark, did you try booting with processor.max_cstate=1 > > and HPET enabled? > > I just did and /proc/interrupts looks the same and the floppy still does > not format. > Can you try this one line patch either on .28 or .32 (with /proc/interrupts output). This disables hpet2 and lapic timer should then be used on CPU 0. If things work with this test patch, we will know that the failure is somehow related to HPET usage in MSI mode. Thanks, Venki Reduce the rating of percpu hpet timer Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> --- arch/x86/kernel/hpet.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index cafb1c6..f89d17a 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu) hpet_setup_irq(hdev); evt->irq = hdev->irq; - evt->rating = 110; + evt->rating = 40; evt->features = CLOCK_EVT_FEAT_ONESHOT; if (hdev->flags & HPET_DEV_PERI_CAP) evt->features |= CLOCK_EVT_FEAT_PERIODIC; -- 1.6.0.6 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 19:18 ` Pallipadi, Venkatesh @ 2009-12-23 19:35 ` Mark Hounschell 2009-12-23 20:30 ` Pallipadi, Venkatesh 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2009-12-23 19:35 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On 12/23/2009 02:18 PM, Pallipadi, Venkatesh wrote: > On Wed, Dec 23, 2009 at 09:41:50AM -0800, Mark Hounschell wrote: >> On 12/23/2009 11:38 AM, Andi Kleen wrote: >>> Linus Torvalds <torvalds@linux-foundation.org> writes: >>> >>>> It's not using the lapic for CPU0. >>>> >>>> Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty >>>> expensive to reprogram (compared to the local apic). And having different >>>> timers for different CPU's is just odd. >>>> >>>> The fact that the timer subsystem can do this and it all (mostly) works at >>>> all is nice and impressive, but doesn't make it any less crazy ;) >>> >>> I suspect it's a system where the APIC timer stops in deeper idle >>> states and it supports them. In this case CPU #0 does timer broadcasts >>> when needed to wake the other CPUs up from deep C, but for that it has >>> to run with HPET. At least the other ones can still enjoy the LAPIC >>> timer. >>> >>> This might suggest that Mark's floppy controller doesn't like >>> deep C? Mark, did you try booting with processor.max_cstate=1 >>> and HPET enabled? >> >> I just did and /proc/interrupts looks the same and the floppy still does >> not format. >> > > Can you try this one line patch either on .28 or .32 (with /proc/interrupts > output). > This disables hpet2 and lapic timer should then be used on CPU 0. If things > work with this test patch, we will know that the failure is somehow related > to HPET usage in MSI mode. > > Thanks, > Venki > > Reduce the rating of percpu hpet timer > > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > --- > arch/x86/kernel/hpet.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c > index cafb1c6..f89d17a 100644 > --- a/arch/x86/kernel/hpet.c > +++ b/arch/x86/kernel/hpet.c > @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu) > hpet_setup_irq(hdev); > evt->irq = hdev->irq; > > - evt->rating = 110; > + evt->rating = 40; > evt->features = CLOCK_EVT_FEAT_ONESHOT; > if (hdev->flags & HPET_DEV_PERI_CAP) > evt->features |= CLOCK_EVT_FEAT_PERIODIC; That made it work. Used 2.6.32.2 cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 82 0 0 1 IO-APIC-edge timer 1: 0 0 0 67 IO-APIC-edge i8042 3: 0 0 0 6 IO-APIC-edge 4: 0 0 0 4 IO-APIC-edge 6: 0 0 0 4 IO-APIC-edge floppy 8: 0 0 0 8 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 10 1519 IO-APIC-edge i8042 14: 0 0 39 10995 IO-APIC-edge pata_atiixp 15: 0 0 3 391 IO-APIC-edge pata_atiixp 16: 0 0 2 606 IO-APIC-fasteoi aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib 17: 0 0 0 3 IO-APIC-fasteoi ehci_hcd:usb1, parport0, ni-pci-gpib 18: 0 0 10 2168 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia 19: 0 0 0 130 IO-APIC-fasteoi aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 22: 0 0 8 1151 IO-APIC-fasteoi ahci 24: 0 0 0 0 HPET_MSI-edge hpet2 29: 0 0 0 48 PCI-MSI-edge sky2@pci:0000:04:00.0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 34842 30177 29672 29632 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts PND: 0 0 0 0 Performance pending work RES: 17501 20449 16670 11224 Rescheduling interrupts CAL: 10554 2336 1102 1071 Function call interrupts TLB: 364 562 753 468 TLB shootdowns ERR: 0 MIS: 0 # fdformat /dev/fd0u1440 Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB. Formatting ... done Verifying ... done ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 19:35 ` Mark Hounschell @ 2009-12-23 20:30 ` Pallipadi, Venkatesh 2009-12-23 20:34 ` alain 2010-01-08 17:42 ` Mark Hounschell 0 siblings, 2 replies; 74+ messages in thread From: Pallipadi, Venkatesh @ 2009-12-23 20:30 UTC (permalink / raw) To: markh@compro.net Cc: Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Wed, 2009-12-23 at 11:35 -0800, Mark Hounschell wrote: > On 12/23/2009 02:18 PM, Pallipadi, Venkatesh wrote: > > On Wed, Dec 23, 2009 at 09:41:50AM -0800, Mark Hounschell wrote: > >> On 12/23/2009 11:38 AM, Andi Kleen wrote: > >>> Linus Torvalds <torvalds@linux-foundation.org> writes: > >>> > >>>> It's not using the lapic for CPU0. > >>>> > >>>> Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty > >>>> expensive to reprogram (compared to the local apic). And having different > >>>> timers for different CPU's is just odd. > >>>> > >>>> The fact that the timer subsystem can do this and it all (mostly) works at > >>>> all is nice and impressive, but doesn't make it any less crazy ;) > >>> > >>> I suspect it's a system where the APIC timer stops in deeper idle > >>> states and it supports them. In this case CPU #0 does timer broadcasts > >>> when needed to wake the other CPUs up from deep C, but for that it has > >>> to run with HPET. At least the other ones can still enjoy the LAPIC > >>> timer. > >>> > >>> This might suggest that Mark's floppy controller doesn't like > >>> deep C? Mark, did you try booting with processor.max_cstate=1 > >>> and HPET enabled? > >> > >> I just did and /proc/interrupts looks the same and the floppy still does > >> not format. > >> > > > > Can you try this one line patch either on .28 or .32 (with /proc/interrupts > > output). > > This disables hpet2 and lapic timer should then be used on CPU 0. If things > > work with this test patch, we will know that the failure is somehow related > > to HPET usage in MSI mode. > > > > Thanks, > > Venki > > > > Reduce the rating of percpu hpet timer > > > > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > > --- > > arch/x86/kernel/hpet.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c > > index cafb1c6..f89d17a 100644 > > --- a/arch/x86/kernel/hpet.c > > +++ b/arch/x86/kernel/hpet.c > > @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu) > > hpet_setup_irq(hdev); > > evt->irq = hdev->irq; > > > > - evt->rating = 110; > > + evt->rating = 40; > > evt->features = CLOCK_EVT_FEAT_ONESHOT; > > if (hdev->flags & HPET_DEV_PERI_CAP) > > evt->features |= CLOCK_EVT_FEAT_PERIODIC; > > That made it work. Used 2.6.32.2 > > cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 > 0: 82 0 0 1 IO-APIC-edge timer > 1: 0 0 0 67 IO-APIC-edge i8042 > 3: 0 0 0 6 IO-APIC-edge > 4: 0 0 0 4 IO-APIC-edge > 6: 0 0 0 4 IO-APIC-edge floppy > 8: 0 0 0 8 IO-APIC-edge rtc0 > 9: 0 0 0 0 IO-APIC-fasteoi acpi > 12: 0 0 10 1519 IO-APIC-edge i8042 > 14: 0 0 39 10995 IO-APIC-edge > pata_atiixp > 15: 0 0 3 391 IO-APIC-edge > pata_atiixp > 16: 0 0 2 606 IO-APIC-fasteoi > aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib > 17: 0 0 0 3 IO-APIC-fasteoi > ehci_hcd:usb1, parport0, ni-pci-gpib > 18: 0 0 10 2168 IO-APIC-fasteoi > ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia > 19: 0 0 0 130 IO-APIC-fasteoi > aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 > 22: 0 0 8 1151 IO-APIC-fasteoi ahci > 24: 0 0 0 0 HPET_MSI-edge hpet2 > 29: 0 0 0 48 PCI-MSI-edge > sky2@pci:0000:04:00.0 > NMI: 0 0 0 0 Non-maskable interrupts > LOC: 34842 30177 29672 29632 Local timer interrupts > SPU: 0 0 0 0 Spurious interrupts > PMI: 0 0 0 0 Performance monitoring > interrupts > PND: 0 0 0 0 Performance pending work > RES: 17501 20449 16670 11224 Rescheduling interrupts > CAL: 10554 2336 1102 1071 Function call interrupts > TLB: 364 562 753 468 TLB shootdowns > ERR: 0 > MIS: 0 > > > # fdformat /dev/fd0u1440 > Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB. > Formatting ... done > Verifying ... done Hmmm.. Thats very interesting indeed. That clearly says that HPET MSI interrupts somehow is causing some caching side effect in the chipset that results in this floppy dma failure. Here's is what we have until now. IRQ 0 is based on HPET legacy interrupt and HPET device is also capable of MSI on this platform. So we also have a percpu hpet (hpet2 tied to CPU0). percpu hpet was added to avoid the usage of IRQ0+LAPIC broadcast in cases where LAPIC timer will stop working in deep C-state. As we have only one HPET channel free for percpu HPET, we only have hpet2 tied to CPU 0 and other CPUs still have to go through IRQ0+LAPIC broadcast with deep C-state. One problem here is that percpu hpet should only get used when LAPIC cannot be used (that is when CPU enters deep C-state). Using hpet2 in place of LAPIC timer even when deep C-state is not supported is not right in terms of performance. We need some changes here to fix that [Problem 1]. But, that still does not explain why we are seeing this problem in the first place. I mean, using hpet2 is not optimal, but should not have functionality issues like this. Even fixing [Problem 1] above, we may see this problem on some other platform that supports deep C-state and so has hpet2 enabled for a valid reason. Also, I am not sure whether the problem also happens if legacy HPET interrupts are used during run time in place of LAPIC timer (May be worth to try this with a simple test patch, let me think about it). In this case, legacy HPET interrupt rightly goes quiet after boot, giving priority to LAPIC timer. With hpet MSI interrupts, we do a write followed by read of HPET memmapped register to set a HPET channel timeout + read of global HPET timer. This happens on every timer interrupt on CPU 0. And we also have MSI interrupt being delivered to CPU 0. I cannot think of any reason why this can break dma. We can probably try adding some dummy HPET read after dma write, to see if that flushes things properly. Thanks, Venki ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 20:30 ` Pallipadi, Venkatesh @ 2009-12-23 20:34 ` alain 2009-12-23 21:34 ` Pallipadi, Venkatesh 2010-01-08 17:42 ` Mark Hounschell 1 sibling, 1 reply; 74+ messages in thread From: alain @ 2009-12-23 20:34 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: markh@compro.net, Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar Pallipadi, Venkatesh wrote: > MSI interrupt being delivered to CPU 0. I cannot think of any reason why > this can break dma. We can probably try adding some dummy HPET read > after dma write, to see if that flushes things properly. Shouldn't that be "... some dummy HPET read _before_ dma write...". In order to ensure that DMA cache is consistent _before_ dma controller reads it? Regards, Alain ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 20:34 ` alain @ 2009-12-23 21:34 ` Pallipadi, Venkatesh 0 siblings, 0 replies; 74+ messages in thread From: Pallipadi, Venkatesh @ 2009-12-23 21:34 UTC (permalink / raw) To: alain Cc: markh@compro.net, Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Wed, 2009-12-23 at 12:34 -0800, alain wrote: > Pallipadi, Venkatesh wrote: > > MSI interrupt being delivered to CPU 0. I cannot think of any reason why > > this can break dma. We can probably try adding some dummy HPET read > > after dma write, to see if that flushes things properly. > > Shouldn't that be "... some dummy HPET read _before_ dma write...". In > order to ensure that DMA cache is consistent _before_ dma controller > reads it? > Yes. I meant after the contents of the buffer is changed and before the DMA transfer and the controller reading it. Thanks, Venki ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2009-12-23 20:30 ` Pallipadi, Venkatesh 2009-12-23 20:34 ` alain @ 2010-01-08 17:42 ` Mark Hounschell 2010-01-12 0:19 ` Pallipadi, Venkatesh 1 sibling, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2010-01-08 17:42 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On 12/23/2009 03:30 PM, Pallipadi, Venkatesh wrote: >>> Can you try this one line patch either on .28 or .32 (with /proc/interrupts >>> output). >>> This disables hpet2 and lapic timer should then be used on CPU 0. If things >>> work with this test patch, we will know that the failure is somehow related >>> to HPET usage in MSI mode. >>> >>> Thanks, >>> Venki >>> >>> Reduce the rating of percpu hpet timer >>> >>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> >>> --- >>> arch/x86/kernel/hpet.c | 2 +- >>> 1 files changed, 1 insertions(+), 1 deletions(-) >>> >>> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c >>> index cafb1c6..f89d17a 100644 >>> --- a/arch/x86/kernel/hpet.c >>> +++ b/arch/x86/kernel/hpet.c >>> @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu) >>> hpet_setup_irq(hdev); >>> evt->irq = hdev->irq; >>> >>> - evt->rating = 110; >>> + evt->rating = 40; >>> evt->features = CLOCK_EVT_FEAT_ONESHOT; >>> if (hdev->flags & HPET_DEV_PERI_CAP) >>> evt->features |= CLOCK_EVT_FEAT_PERIODIC; >> >> That made it work. Used 2.6.32.2 >> >> cat /proc/interrupts >> CPU0 CPU1 CPU2 CPU3 >> 0: 82 0 0 1 IO-APIC-edge timer >> 1: 0 0 0 67 IO-APIC-edge i8042 >> 3: 0 0 0 6 IO-APIC-edge >> 4: 0 0 0 4 IO-APIC-edge >> 6: 0 0 0 4 IO-APIC-edge floppy >> 8: 0 0 0 8 IO-APIC-edge rtc0 >> 9: 0 0 0 0 IO-APIC-fasteoi acpi >> 12: 0 0 10 1519 IO-APIC-edge i8042 >> 14: 0 0 39 10995 IO-APIC-edge >> pata_atiixp >> 15: 0 0 3 391 IO-APIC-edge >> pata_atiixp >> 16: 0 0 2 606 IO-APIC-fasteoi >> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib >> 17: 0 0 0 3 IO-APIC-fasteoi >> ehci_hcd:usb1, parport0, ni-pci-gpib >> 18: 0 0 10 2168 IO-APIC-fasteoi >> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia >> 19: 0 0 0 130 IO-APIC-fasteoi >> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 >> 22: 0 0 8 1151 IO-APIC-fasteoi ahci >> 24: 0 0 0 0 HPET_MSI-edge hpet2 >> 29: 0 0 0 48 PCI-MSI-edge >> sky2@pci:0000:04:00.0 >> NMI: 0 0 0 0 Non-maskable interrupts >> LOC: 34842 30177 29672 29632 Local timer interrupts >> SPU: 0 0 0 0 Spurious interrupts >> PMI: 0 0 0 0 Performance monitoring >> interrupts >> PND: 0 0 0 0 Performance pending work >> RES: 17501 20449 16670 11224 Rescheduling interrupts >> CAL: 10554 2336 1102 1071 Function call interrupts >> TLB: 364 562 753 468 TLB shootdowns >> ERR: 0 >> MIS: 0 >> >> >> # fdformat /dev/fd0u1440 >> Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB. >> Formatting ... done >> Verifying ... done > > Hmmm.. Thats very interesting indeed. > > That clearly says that HPET MSI interrupts somehow is causing some > caching side effect in the chipset that results in this floppy dma > failure. > > Here's is what we have until now. > IRQ 0 is based on HPET legacy interrupt and HPET device is also capable > of MSI on this platform. So we also have a percpu hpet (hpet2 tied to > CPU0). percpu hpet was added to avoid the usage of IRQ0+LAPIC broadcast > in cases where LAPIC timer will stop working in deep C-state. As we have > only one HPET channel free for percpu HPET, we only have hpet2 tied to > CPU 0 and other CPUs still have to go through IRQ0+LAPIC broadcast with > deep C-state. > > One problem here is that percpu hpet should only get used when LAPIC > cannot be used (that is when CPU enters deep C-state). Using hpet2 in > place of LAPIC timer even when deep C-state is not supported is not > right in terms of performance. We need some changes here to fix that > [Problem 1]. > > But, that still does not explain why we are seeing this problem in the > first place. I mean, using hpet2 is not optimal, but should not have > functionality issues like this. Even fixing [Problem 1] above, we may > see this problem on some other platform that supports deep C-state and > so has hpet2 enabled for a valid reason. > > Also, I am not sure whether the problem also happens if legacy HPET > interrupts are used during run time in place of LAPIC timer (May be > worth to try this with a simple test patch, let me think about it). In > this case, legacy HPET interrupt rightly goes quiet after boot, giving > priority to LAPIC timer. > > With hpet MSI interrupts, we do a write followed by read of HPET > memmapped register to set a HPET channel timeout + read of global HPET > timer. This happens on every timer interrupt on CPU 0. And we also have > MSI interrupt being delivered to CPU 0. I cannot think of any reason why > this can break dma. We can probably try adding some dummy HPET read > after dma write, to see if that flushes things properly. > Haven't seen any activity on this thread in a while. Just curious, are we still working this? Is there anything else I can do to help? Thanks Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2010-01-08 17:42 ` Mark Hounschell @ 2010-01-12 0:19 ` Pallipadi, Venkatesh 2010-01-12 9:04 ` Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Pallipadi, Venkatesh @ 2010-01-12 0:19 UTC (permalink / raw) To: markh@compro.net Cc: Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Fri, 2010-01-08 at 09:42 -0800, Mark Hounschell wrote: > On 12/23/2009 03:30 PM, Pallipadi, Venkatesh wrote: > > >>> Can you try this one line patch either on .28 or .32 (with /proc/interrupts > >>> output). > >>> This disables hpet2 and lapic timer should then be used on CPU 0. If things > >>> work with this test patch, we will know that the failure is somehow related > >>> to HPET usage in MSI mode. > >>> > >>> Thanks, > >>> Venki > >>> > >>> Reduce the rating of percpu hpet timer > >>> > >>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > >>> --- > >>> arch/x86/kernel/hpet.c | 2 +- > >>> 1 files changed, 1 insertions(+), 1 deletions(-) > >>> > >>> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c > >>> index cafb1c6..f89d17a 100644 > >>> --- a/arch/x86/kernel/hpet.c > >>> +++ b/arch/x86/kernel/hpet.c > >>> @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu) > >>> hpet_setup_irq(hdev); > >>> evt->irq = hdev->irq; > >>> > >>> - evt->rating = 110; > >>> + evt->rating = 40; > >>> evt->features = CLOCK_EVT_FEAT_ONESHOT; > >>> if (hdev->flags & HPET_DEV_PERI_CAP) > >>> evt->features |= CLOCK_EVT_FEAT_PERIODIC; > >> > >> That made it work. Used 2.6.32.2 > >> > >> cat /proc/interrupts > >> CPU0 CPU1 CPU2 CPU3 > >> 0: 82 0 0 1 IO-APIC-edge timer > >> 1: 0 0 0 67 IO-APIC-edge i8042 > >> 3: 0 0 0 6 IO-APIC-edge > >> 4: 0 0 0 4 IO-APIC-edge > >> 6: 0 0 0 4 IO-APIC-edge floppy > >> 8: 0 0 0 8 IO-APIC-edge rtc0 > >> 9: 0 0 0 0 IO-APIC-fasteoi acpi > >> 12: 0 0 10 1519 IO-APIC-edge i8042 > >> 14: 0 0 39 10995 IO-APIC-edge > >> pata_atiixp > >> 15: 0 0 3 391 IO-APIC-edge > >> pata_atiixp > >> 16: 0 0 2 606 IO-APIC-fasteoi > >> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib > >> 17: 0 0 0 3 IO-APIC-fasteoi > >> ehci_hcd:usb1, parport0, ni-pci-gpib > >> 18: 0 0 10 2168 IO-APIC-fasteoi > >> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia > >> 19: 0 0 0 130 IO-APIC-fasteoi > >> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 > >> 22: 0 0 8 1151 IO-APIC-fasteoi ahci > >> 24: 0 0 0 0 HPET_MSI-edge hpet2 > >> 29: 0 0 0 48 PCI-MSI-edge > >> sky2@pci:0000:04:00.0 > >> NMI: 0 0 0 0 Non-maskable interrupts > >> LOC: 34842 30177 29672 29632 Local timer interrupts > >> SPU: 0 0 0 0 Spurious interrupts > >> PMI: 0 0 0 0 Performance monitoring > >> interrupts > >> PND: 0 0 0 0 Performance pending work > >> RES: 17501 20449 16670 11224 Rescheduling interrupts > >> CAL: 10554 2336 1102 1071 Function call interrupts > >> TLB: 364 562 753 468 TLB shootdowns > >> ERR: 0 > >> MIS: 0 > >> > >> > >> # fdformat /dev/fd0u1440 > >> Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB. > >> Formatting ... done > >> Verifying ... done > > > > Hmmm.. Thats very interesting indeed. > > > > That clearly says that HPET MSI interrupts somehow is causing some > > caching side effect in the chipset that results in this floppy dma > > failure. > > > > Here's is what we have until now. > > IRQ 0 is based on HPET legacy interrupt and HPET device is also capable > > of MSI on this platform. So we also have a percpu hpet (hpet2 tied to > > CPU0). percpu hpet was added to avoid the usage of IRQ0+LAPIC broadcast > > in cases where LAPIC timer will stop working in deep C-state. As we have > > only one HPET channel free for percpu HPET, we only have hpet2 tied to > > CPU 0 and other CPUs still have to go through IRQ0+LAPIC broadcast with > > deep C-state. > > > > One problem here is that percpu hpet should only get used when LAPIC > > cannot be used (that is when CPU enters deep C-state). Using hpet2 in > > place of LAPIC timer even when deep C-state is not supported is not > > right in terms of performance. We need some changes here to fix that > > [Problem 1]. > > > > But, that still does not explain why we are seeing this problem in the > > first place. I mean, using hpet2 is not optimal, but should not have > > functionality issues like this. Even fixing [Problem 1] above, we may > > see this problem on some other platform that supports deep C-state and > > so has hpet2 enabled for a valid reason. > > > > Also, I am not sure whether the problem also happens if legacy HPET > > interrupts are used during run time in place of LAPIC timer (May be > > worth to try this with a simple test patch, let me think about it). In > > this case, legacy HPET interrupt rightly goes quiet after boot, giving > > priority to LAPIC timer. > > > > With hpet MSI interrupts, we do a write followed by read of HPET > > memmapped register to set a HPET channel timeout + read of global HPET > > timer. This happens on every timer interrupt on CPU 0. And we also have > > MSI interrupt being delivered to CPU 0. I cannot think of any reason why > > this can break dma. We can probably try adding some dummy HPET read > > after dma write, to see if that flushes things properly. > > > > Haven't seen any activity on this thread in a while. Just curious, are we > still working this? > Is there anything else I can do to help? Sorry for not following up on this. We have narrowed this down to HPET MSI and floppy DMA. I still don't know how HPET MSI interrupts are breaking floppy DMA. You are seeing the problem on two different systems. Correct? Do you have any system where this works with HPET MSI enabled? Couple of options on how we can go about this one: 1) Change the HPET-MSI change to not get activated when there are no C-states with LAPIC stoppage involved. This will resolve the problem on the systems you reported as there are no deep C-states. But, I fear that with the actual problem unresolved, we may hit it in future with this or some other platform having same issue with CPUs that support deep C-state. 2) Try this testcase on few other platforms that support HPET-MSI and deep C-states and check how widespread the problem is and then add a whitelist-blacklist for HPET MSI usage. I think, for 2.6.33 option 1 is better. Will work on that and send in patches for you test. Thanks, Venki ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2010-01-12 0:19 ` Pallipadi, Venkatesh @ 2010-01-12 9:04 ` Mark Hounschell 2010-01-15 2:01 ` Pallipadi, Venkatesh 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2010-01-12 9:04 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: markh@compro.net, Andi Kleen, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On 01/11/2010 07:19 PM, Pallipadi, Venkatesh wrote: > On Fri, 2010-01-08 at 09:42 -0800, Mark Hounschell wrote: >> On 12/23/2009 03:30 PM, Pallipadi, Venkatesh wrote: >> >>>>> Can you try this one line patch either on .28 or .32 (with /proc/interrupts >>>>> output). >>>>> This disables hpet2 and lapic timer should then be used on CPU 0. If things >>>>> work with this test patch, we will know that the failure is somehow related >>>>> to HPET usage in MSI mode. >>>>> >>>>> Thanks, >>>>> Venki >>>>> >>>>> Reduce the rating of percpu hpet timer >>>>> >>>>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> >>>>> --- >>>>> arch/x86/kernel/hpet.c | 2 +- >>>>> 1 files changed, 1 insertions(+), 1 deletions(-) >>>>> >>>>> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c >>>>> index cafb1c6..f89d17a 100644 >>>>> --- a/arch/x86/kernel/hpet.c >>>>> +++ b/arch/x86/kernel/hpet.c >>>>> @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu) >>>>> hpet_setup_irq(hdev); >>>>> evt->irq = hdev->irq; >>>>> >>>>> - evt->rating = 110; >>>>> + evt->rating = 40; >>>>> evt->features = CLOCK_EVT_FEAT_ONESHOT; >>>>> if (hdev->flags & HPET_DEV_PERI_CAP) >>>>> evt->features |= CLOCK_EVT_FEAT_PERIODIC; >>>> >>>> That made it work. Used 2.6.32.2 >>>> >>>> cat /proc/interrupts >>>> CPU0 CPU1 CPU2 CPU3 >>>> 0: 82 0 0 1 IO-APIC-edge timer >>>> 1: 0 0 0 67 IO-APIC-edge i8042 >>>> 3: 0 0 0 6 IO-APIC-edge >>>> 4: 0 0 0 4 IO-APIC-edge >>>> 6: 0 0 0 4 IO-APIC-edge floppy >>>> 8: 0 0 0 8 IO-APIC-edge rtc0 >>>> 9: 0 0 0 0 IO-APIC-fasteoi acpi >>>> 12: 0 0 10 1519 IO-APIC-edge i8042 >>>> 14: 0 0 39 10995 IO-APIC-edge >>>> pata_atiixp >>>> 15: 0 0 3 391 IO-APIC-edge >>>> pata_atiixp >>>> 16: 0 0 2 606 IO-APIC-fasteoi >>>> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib >>>> 17: 0 0 0 3 IO-APIC-fasteoi >>>> ehci_hcd:usb1, parport0, ni-pci-gpib >>>> 18: 0 0 10 2168 IO-APIC-fasteoi >>>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia >>>> 19: 0 0 0 130 IO-APIC-fasteoi >>>> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 >>>> 22: 0 0 8 1151 IO-APIC-fasteoi ahci >>>> 24: 0 0 0 0 HPET_MSI-edge hpet2 >>>> 29: 0 0 0 48 PCI-MSI-edge >>>> sky2@pci:0000:04:00.0 >>>> NMI: 0 0 0 0 Non-maskable interrupts >>>> LOC: 34842 30177 29672 29632 Local timer interrupts >>>> SPU: 0 0 0 0 Spurious interrupts >>>> PMI: 0 0 0 0 Performance monitoring >>>> interrupts >>>> PND: 0 0 0 0 Performance pending work >>>> RES: 17501 20449 16670 11224 Rescheduling interrupts >>>> CAL: 10554 2336 1102 1071 Function call interrupts >>>> TLB: 364 562 753 468 TLB shootdowns >>>> ERR: 0 >>>> MIS: 0 >>>> >>>> >>>> # fdformat /dev/fd0u1440 >>>> Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB. >>>> Formatting ... done >>>> Verifying ... done >>> >>> Hmmm.. Thats very interesting indeed. >>> >>> That clearly says that HPET MSI interrupts somehow is causing some >>> caching side effect in the chipset that results in this floppy dma >>> failure. >>> >>> Here's is what we have until now. >>> IRQ 0 is based on HPET legacy interrupt and HPET device is also capable >>> of MSI on this platform. So we also have a percpu hpet (hpet2 tied to >>> CPU0). percpu hpet was added to avoid the usage of IRQ0+LAPIC broadcast >>> in cases where LAPIC timer will stop working in deep C-state. As we have >>> only one HPET channel free for percpu HPET, we only have hpet2 tied to >>> CPU 0 and other CPUs still have to go through IRQ0+LAPIC broadcast with >>> deep C-state. >>> >>> One problem here is that percpu hpet should only get used when LAPIC >>> cannot be used (that is when CPU enters deep C-state). Using hpet2 in >>> place of LAPIC timer even when deep C-state is not supported is not >>> right in terms of performance. We need some changes here to fix that >>> [Problem 1]. >>> >>> But, that still does not explain why we are seeing this problem in the >>> first place. I mean, using hpet2 is not optimal, but should not have >>> functionality issues like this. Even fixing [Problem 1] above, we may >>> see this problem on some other platform that supports deep C-state and >>> so has hpet2 enabled for a valid reason. >>> >>> Also, I am not sure whether the problem also happens if legacy HPET >>> interrupts are used during run time in place of LAPIC timer (May be >>> worth to try this with a simple test patch, let me think about it). In >>> this case, legacy HPET interrupt rightly goes quiet after boot, giving >>> priority to LAPIC timer. >>> >>> With hpet MSI interrupts, we do a write followed by read of HPET >>> memmapped register to set a HPET channel timeout + read of global HPET >>> timer. This happens on every timer interrupt on CPU 0. And we also have >>> MSI interrupt being delivered to CPU 0. I cannot think of any reason why >>> this can break dma. We can probably try adding some dummy HPET read >>> after dma write, to see if that flushes things properly. >>> >> >> Haven't seen any activity on this thread in a while. Just curious, are we >> still working this? >> Is there anything else I can do to help? > > Sorry for not following up on this. We have narrowed this down to HPET > MSI and floppy DMA. I still don't know how HPET MSI interrupts are > breaking floppy DMA. > > You are seeing the problem on two different systems. Correct? Do you > have any system where this works with HPET MSI enabled? > I see the problem on every system in which the HPET2 shows up in /proc/interrupts. The machines that work with HPET enabled don't show HPET at all in /proc/interrupts. I have some of each. All the boxes that fail here use the (AMD) 790x series chip sets. > Couple of options on how we can go about this one: > 1) Change the HPET-MSI change to not get activated when there are no > C-states with LAPIC stoppage involved. This will resolve the problem on > the systems you reported as there are no deep C-states. But, I fear that > with the actual problem unresolved, we may hit it in future with this or > some other platform having same issue with CPUs that support deep > C-state. > 2) Try this testcase on few other platforms that support HPET-MSI and > deep C-states and check how widespread the problem is and then add a > whitelist-blacklist for HPET MSI usage. > > I think, for 2.6.33 option 1 is better. Will work on that and send in > patches for you test. > OK, thanks Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2010-01-12 9:04 ` Mark Hounschell @ 2010-01-15 2:01 ` Pallipadi, Venkatesh 2010-01-15 9:39 ` Mark Hounschell 2010-01-15 18:02 ` Mark Hounschell 0 siblings, 2 replies; 74+ messages in thread From: Pallipadi, Venkatesh @ 2010-01-15 2:01 UTC (permalink / raw) To: dmarkh@cfl.rr.com Cc: markh@compro.net, Andi Kleen, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On Tue, 2010-01-12 at 01:04 -0800, Mark Hounschell wrote: > On 01/11/2010 07:19 PM, Pallipadi, Venkatesh wrote: > > > > Sorry for not following up on this. We have narrowed this down to HPET > > MSI and floppy DMA. I still don't know how HPET MSI interrupts are > > breaking floppy DMA. > > > > You are seeing the problem on two different systems. Correct? Do you > > have any system where this works with HPET MSI enabled? > > > > I see the problem on every system in which the HPET2 shows up in > /proc/interrupts. The machines that work with HPET enabled don't show HPET > at all in /proc/interrupts. I have some of each. All the boxes that fail > here use the (AMD) 790x series chip sets. > > > Couple of options on how we can go about this one: > > 1) Change the HPET-MSI change to not get activated when there are no > > C-states with LAPIC stoppage involved. This will resolve the problem on > > the systems you reported as there are no deep C-states. But, I fear that > > with the actual problem unresolved, we may hit it in future with this or > > some other platform having same issue with CPUs that support deep > > C-state. > > 2) Try this testcase on few other platforms that support HPET-MSI and > > deep C-states and check how widespread the problem is and then add a > > whitelist-blacklist for HPET MSI usage. > > > > I think, for 2.6.33 option 1 is better. Will work on that and send in > > patches for you test. > > > Mark, I just sent out a patchset that should workaround the problem here. Can you check and let me know whether thats the case. We will still need a simpler/smaller workaround for .33. Will send a patch for that soon. Also, are you testing this with usb floppy controller? I tried to test it on my end, but fdformat doesn't seem to like my usb floppy drive. I tried, 'ufiformat -f 1440 <dev>', with which I am not able to reproduce the failure on any of my boxes. Not sure whether that really means I don't hit this bug or that is going through totally different code path. Thanks, Venki ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2010-01-15 2:01 ` Pallipadi, Venkatesh @ 2010-01-15 9:39 ` Mark Hounschell 2010-01-15 18:02 ` Mark Hounschell 1 sibling, 0 replies; 74+ messages in thread From: Mark Hounschell @ 2010-01-15 9:39 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: markh@compro.net, Andi Kleen, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On 01/14/2010 09:01 PM, Pallipadi, Venkatesh wrote: > On Tue, 2010-01-12 at 01:04 -0800, Mark Hounschell wrote: >> On 01/11/2010 07:19 PM, Pallipadi, Venkatesh wrote: >>> >>> Sorry for not following up on this. We have narrowed this down to HPET >>> MSI and floppy DMA. I still don't know how HPET MSI interrupts are >>> breaking floppy DMA. >>> >>> You are seeing the problem on two different systems. Correct? Do you >>> have any system where this works with HPET MSI enabled? >>> >> >> I see the problem on every system in which the HPET2 shows up in >> /proc/interrupts. The machines that work with HPET enabled don't show HPET >> at all in /proc/interrupts. I have some of each. All the boxes that fail >> here use the (AMD) 790x series chip sets. >> >>> Couple of options on how we can go about this one: >>> 1) Change the HPET-MSI change to not get activated when there are no >>> C-states with LAPIC stoppage involved. This will resolve the problem on >>> the systems you reported as there are no deep C-states. But, I fear that >>> with the actual problem unresolved, we may hit it in future with this or >>> some other platform having same issue with CPUs that support deep >>> C-state. >>> 2) Try this testcase on few other platforms that support HPET-MSI and >>> deep C-states and check how widespread the problem is and then add a >>> whitelist-blacklist for HPET MSI usage. >>> >>> I think, for 2.6.33 option 1 is better. Will work on that and send in >>> patches for you test. >>> >> > > Mark, > > I just sent out a patchset that should workaround the problem here. Can > you check and let me know whether thats the case. > Yes, I'll try that today. I assume I'll find them on LMKL. > We will still need a simpler/smaller workaround for .33. Will send a > patch for that soon. > > Also, are you testing this with usb floppy controller? I tried to test > it on my end, but fdformat doesn't seem to like my usb floppy drive. I > tried, 'ufiformat -f 1440 <dev>', with which I am not able to reproduce > the failure on any of my boxes. Not sure whether that really means I > don't hit this bug or that is going through totally different code path. > No, I've never even seen a USB floppy controller. Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 2010-01-15 2:01 ` Pallipadi, Venkatesh 2010-01-15 9:39 ` Mark Hounschell @ 2010-01-15 18:02 ` Mark Hounschell 2010-01-21 19:09 ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh 1 sibling, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2010-01-15 18:02 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: dmarkh@cfl.rr.com, Andi Kleen, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar On 01/14/2010 09:01 PM, Pallipadi, Venkatesh wrote: > On Tue, 2010-01-12 at 01:04 -0800, Mark Hounschell wrote: >> On 01/11/2010 07:19 PM, Pallipadi, Venkatesh wrote: >>> >>> Sorry for not following up on this. We have narrowed this down to HPET >>> MSI and floppy DMA. I still don't know how HPET MSI interrupts are >>> breaking floppy DMA. >>> >>> You are seeing the problem on two different systems. Correct? Do you >>> have any system where this works with HPET MSI enabled? >>> >> >> I see the problem on every system in which the HPET2 shows up in >> /proc/interrupts. The machines that work with HPET enabled don't show HPET >> at all in /proc/interrupts. I have some of each. All the boxes that fail >> here use the (AMD) 790x series chip sets. >> >>> Couple of options on how we can go about this one: >>> 1) Change the HPET-MSI change to not get activated when there are no >>> C-states with LAPIC stoppage involved. This will resolve the problem on >>> the systems you reported as there are no deep C-states. But, I fear that >>> with the actual problem unresolved, we may hit it in future with this or >>> some other platform having same issue with CPUs that support deep >>> C-state. >>> 2) Try this testcase on few other platforms that support HPET-MSI and >>> deep C-states and check how widespread the problem is and then add a >>> whitelist-blacklist for HPET MSI usage. >>> >>> I think, for 2.6.33 option 1 is better. Will work on that and send in >>> patches for you test. >>> >> > > Mark, > > I just sent out a patchset that should workaround the problem here. Can > you check and let me know whether thats the case. > Yes, it does seem to fix the issue. The floppy formats and /proc/interrupts look normal with nothing going on with the hpet2 msi. Regards Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-15 18:02 ` Mark Hounschell @ 2010-01-21 19:09 ` Pallipadi, Venkatesh 2010-01-22 22:00 ` [tip:x86/urgent] " tip-bot for Pallipadi, Venkatesh ` (2 more replies) 0 siblings, 3 replies; 74+ messages in thread From: Pallipadi, Venkatesh @ 2010-01-21 19:09 UTC (permalink / raw) To: Mark Hounschell Cc: Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Andi Kleen, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar, H Peter Anvin, Thomas Gleixner, stable HPET MSI on platforms with ATI SB700/SB800 as they seem to have some side-effects on floppy DMA. Do not use HPET MSI on such platforms. Original problem report from Mark Hounschell http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html Tested-by: Mark Hounschell <markh@compro.net> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> --- This patch needs to go to stable as well. But, there are some conflicts that prevents the patch from going as is. I can rebase/resubmit to stable once the patch goes upstream. arch/x86/include/asm/hpet.h | 1 + arch/x86/kernel/hpet.c | 8 ++++++++ arch/x86/kernel/quirks.c | 13 +++++++++++++ 3 files changed, 22 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h index 5d89fd2..1d5c08a 100644 --- a/arch/x86/include/asm/hpet.h +++ b/arch/x86/include/asm/hpet.h @@ -67,6 +67,7 @@ extern unsigned long hpet_address; extern unsigned long force_hpet_address; extern u8 hpet_blockid; extern int hpet_force_user; +extern u8 hpet_msi_disable; extern int is_hpet_enabled(void); extern int hpet_enable(void); extern void hpet_disable(void); diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index ba6e658..ad80a1c 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -34,6 +34,8 @@ */ unsigned long hpet_address; u8 hpet_blockid; /* OS timer block num */ +u8 hpet_msi_disable; + #ifdef CONFIG_PCI_MSI static unsigned long hpet_num_timers; #endif @@ -596,6 +598,9 @@ static void hpet_msi_capability_lookup(unsigned int start_timer) unsigned int num_timers_used = 0; int i; + if (hpet_msi_disable) + return; + if (boot_cpu_has(X86_FEATURE_ARAT)) return; id = hpet_readl(HPET_ID); @@ -928,6 +933,9 @@ static __init int hpet_late_init(void) hpet_reserve_platform_timers(hpet_readl(HPET_ID)); hpet_print_config(); + if (hpet_msi_disable) + return 0; + if (boot_cpu_has(X86_FEATURE_ARAT)) return 0; diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 18093d7..12e9fea 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -491,6 +491,19 @@ void force_hpet_resume(void) break; } } + +/* + * HPET MSI on some boards (ATI SB700/SB800) has side effect on + * floppy DMA. Disable HPET MSI on such platforms. + */ +static void force_disable_hpet_msi(struct pci_dev *unused) +{ + hpet_msi_disable = 1; +} + +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS, + force_disable_hpet_msi); + #endif #if defined(CONFIG_PCI) && defined(CONFIG_NUMA) -- 1.6.0.6 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [tip:x86/urgent] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-21 19:09 ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh @ 2010-01-22 22:00 ` tip-bot for Pallipadi, Venkatesh 2010-01-23 6:51 ` tip-bot for Pallipadi, Venkatesh 2010-01-23 7:21 ` [PATCH] " Yuhong Bao 2 siblings, 0 replies; 74+ messages in thread From: tip-bot for Pallipadi, Venkatesh @ 2010-01-22 22:00 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, markh, stable, venkatesh.pallipadi, tglx Commit-ID: 9f0b0ce525f19ef408e877b1c7662b60424c7cdc Gitweb: http://git.kernel.org/tip/9f0b0ce525f19ef408e877b1c7662b60424c7cdc Author: Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com> AuthorDate: Thu, 21 Jan 2010 11:09:52 -0800 Committer: H. Peter Anvin <hpa@zytor.com> CommitDate: Fri, 22 Jan 2010 13:47:01 -0800 x86: Disable HPET MSI on ATI SB700/SB800 HPET MSI on platforms with ATI SB700/SB800 as they seem to have some side-effects on floppy DMA. Do not use HPET MSI on such platforms. Original problem report from Mark Hounschell http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html [ This patch needs to go to stable as well. But, there are some conflicts that prevents the patch from going as is. I can rebase/resubmit to stable once the patch goes upstream. hpa: still Cc:'ing stable@ as an FYI. ] Tested-by: Mark Hounschell <markh@compro.net> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: <stable@kernel.org> LKML-Reference: <20100121190952.GA32523@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> --- arch/x86/include/asm/hpet.h | 1 + arch/x86/kernel/hpet.c | 8 ++++++++ arch/x86/kernel/quirks.c | 13 +++++++++++++ 3 files changed, 22 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h index 5d89fd2..1d5c08a 100644 --- a/arch/x86/include/asm/hpet.h +++ b/arch/x86/include/asm/hpet.h @@ -67,6 +67,7 @@ extern unsigned long hpet_address; extern unsigned long force_hpet_address; extern u8 hpet_blockid; extern int hpet_force_user; +extern u8 hpet_msi_disable; extern int is_hpet_enabled(void); extern int hpet_enable(void); extern void hpet_disable(void); diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index ba6e658..ad80a1c 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -34,6 +34,8 @@ */ unsigned long hpet_address; u8 hpet_blockid; /* OS timer block num */ +u8 hpet_msi_disable; + #ifdef CONFIG_PCI_MSI static unsigned long hpet_num_timers; #endif @@ -596,6 +598,9 @@ static void hpet_msi_capability_lookup(unsigned int start_timer) unsigned int num_timers_used = 0; int i; + if (hpet_msi_disable) + return; + if (boot_cpu_has(X86_FEATURE_ARAT)) return; id = hpet_readl(HPET_ID); @@ -928,6 +933,9 @@ static __init int hpet_late_init(void) hpet_reserve_platform_timers(hpet_readl(HPET_ID)); hpet_print_config(); + if (hpet_msi_disable) + return 0; + if (boot_cpu_has(X86_FEATURE_ARAT)) return 0; diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 18093d7..12e9fea 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -491,6 +491,19 @@ void force_hpet_resume(void) break; } } + +/* + * HPET MSI on some boards (ATI SB700/SB800) has side effect on + * floppy DMA. Disable HPET MSI on such platforms. + */ +static void force_disable_hpet_msi(struct pci_dev *unused) +{ + hpet_msi_disable = 1; +} + +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS, + force_disable_hpet_msi); + #endif #if defined(CONFIG_PCI) && defined(CONFIG_NUMA) ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [tip:x86/urgent] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-21 19:09 ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh 2010-01-22 22:00 ` [tip:x86/urgent] " tip-bot for Pallipadi, Venkatesh @ 2010-01-23 6:51 ` tip-bot for Pallipadi, Venkatesh 2010-01-23 7:21 ` [PATCH] " Yuhong Bao 2 siblings, 0 replies; 74+ messages in thread From: tip-bot for Pallipadi, Venkatesh @ 2010-01-23 6:51 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, markh, stable, venkatesh.pallipadi, tglx Commit-ID: 73472a46b5b28116b145fb5fc05242c1aa8e1461 Gitweb: http://git.kernel.org/tip/73472a46b5b28116b145fb5fc05242c1aa8e1461 Author: Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com> AuthorDate: Thu, 21 Jan 2010 11:09:52 -0800 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Sat, 23 Jan 2010 06:21:58 +0100 x86: Disable HPET MSI on ATI SB700/SB800 HPET MSI on platforms with ATI SB700/SB800 as they seem to have some side-effects on floppy DMA. Do not use HPET MSI on such platforms. Original problem report from Mark Hounschell http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html [ This patch needs to go to stable as well. But, there are some conflicts that prevents the patch from going as is. I can rebase/resubmit to stable once the patch goes upstream. hpa: still Cc:'ing stable@ as an FYI. ] Tested-by: Mark Hounschell <markh@compro.net> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: <stable@kernel.org> LKML-Reference: <20100121190952.GA32523@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> --- arch/x86/include/asm/hpet.h | 1 + arch/x86/kernel/hpet.c | 8 ++++++++ arch/x86/kernel/quirks.c | 13 +++++++++++++ 3 files changed, 22 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h index 5d89fd2..1d5c08a 100644 --- a/arch/x86/include/asm/hpet.h +++ b/arch/x86/include/asm/hpet.h @@ -67,6 +67,7 @@ extern unsigned long hpet_address; extern unsigned long force_hpet_address; extern u8 hpet_blockid; extern int hpet_force_user; +extern u8 hpet_msi_disable; extern int is_hpet_enabled(void); extern int hpet_enable(void); extern void hpet_disable(void); diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index ba6e658..ad80a1c 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -34,6 +34,8 @@ */ unsigned long hpet_address; u8 hpet_blockid; /* OS timer block num */ +u8 hpet_msi_disable; + #ifdef CONFIG_PCI_MSI static unsigned long hpet_num_timers; #endif @@ -596,6 +598,9 @@ static void hpet_msi_capability_lookup(unsigned int start_timer) unsigned int num_timers_used = 0; int i; + if (hpet_msi_disable) + return; + if (boot_cpu_has(X86_FEATURE_ARAT)) return; id = hpet_readl(HPET_ID); @@ -928,6 +933,9 @@ static __init int hpet_late_init(void) hpet_reserve_platform_timers(hpet_readl(HPET_ID)); hpet_print_config(); + if (hpet_msi_disable) + return 0; + if (boot_cpu_has(X86_FEATURE_ARAT)) return 0; diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 18093d7..12e9fea 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -491,6 +491,19 @@ void force_hpet_resume(void) break; } } + +/* + * HPET MSI on some boards (ATI SB700/SB800) has side effect on + * floppy DMA. Disable HPET MSI on such platforms. + */ +static void force_disable_hpet_msi(struct pci_dev *unused) +{ + hpet_msi_disable = 1; +} + +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS, + force_disable_hpet_msi); + #endif #if defined(CONFIG_PCI) && defined(CONFIG_NUMA) ^ permalink raw reply related [flat|nested] 74+ messages in thread
* RE: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-21 19:09 ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh 2010-01-22 22:00 ` [tip:x86/urgent] " tip-bot for Pallipadi, Venkatesh 2010-01-23 6:51 ` tip-bot for Pallipadi, Venkatesh @ 2010-01-23 7:21 ` Yuhong Bao 2010-01-25 17:10 ` Andreas Herrmann 2 siblings, 1 reply; 74+ messages in thread From: Yuhong Bao @ 2010-01-23 7:21 UTC (permalink / raw) To: venkatesh.pallipadi, markh Cc: dmarkh, andi, Linus Torvalds, alain, linux-kernel, andreas.herrmann3 > HPET MSI on platforms with ATI SB700/SB800 as they seem to have some > side-effects on floppy DMA. Do not use HPET MSI on such platforms. I think somebody from AMD should review the situation.Clearly something is happening inside their southbridge.CCing Andreas Herrmann from AMD. Yuhong Bao _________________________________________________________________ Hotmail: Trusted email with Microsoft’s powerful SPAM protection. http://clk.atdmt.com/GBL/go/196390706/direct/01/ ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-23 7:21 ` [PATCH] " Yuhong Bao @ 2010-01-25 17:10 ` Andreas Herrmann 2010-01-28 9:17 ` Mark Hounschell 2010-05-17 14:59 ` Andreas Herrmann 0 siblings, 2 replies; 74+ messages in thread From: Andreas Herrmann @ 2010-01-25 17:10 UTC (permalink / raw) To: Yuhong Bao Cc: venkatesh.pallipadi, markh, dmarkh, andi, Linus Torvalds, alain, linux-kernel On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote: > > > HPET MSI on platforms with ATI SB700/SB800 as they seem to have some > > side-effects on floppy DMA. Do not use HPET MSI on such platforms. Argh, will see what information I can find about this problem ... > I think somebody from AMD should review the situation.Clearly something is happening inside their southbridge.CCing Andreas Herrmann from AMD. I have the feeling that this problem should rather be fixed with a DMI quirk instead of disabling HPET MSI for the entire chipset. Was the latest available BIOS installed on the affected system? Thanks, Andreas -- Operating | Advanced Micro Devices GmbH System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München (OSRC) | Registergericht München, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-25 17:10 ` Andreas Herrmann @ 2010-01-28 9:17 ` Mark Hounschell 2010-01-28 13:25 ` Mark Hounschell 2010-05-17 14:59 ` Andreas Herrmann 1 sibling, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2010-01-28 9:17 UTC (permalink / raw) To: Andreas Herrmann Cc: Yuhong Bao, venkatesh.pallipadi, markh, andi, Linus Torvalds, alain, linux-kernel On 01/25/2010 12:10 PM, Andreas Herrmann wrote: > On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote: >> >>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some >>> side-effects on floppy DMA. Do not use HPET MSI on such platforms. > > Argh, will see what information I can find about this problem ... > >> I think somebody from AMD should review the situation.Clearly > something is happening inside their southbridge.CCing Andreas > Herrmann from AMD. > > I have the feeling that this problem should rather be fixed with a DMI > quirk instead of disabling HPET MSI for the entire chipset. > > Was the latest available BIOS installed on the affected system? > You mean "systems" of different manufactures? I will check today. Due to mis configured filters I didn't see this until today. Sorry. Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-28 9:17 ` Mark Hounschell @ 2010-01-28 13:25 ` Mark Hounschell 2010-01-28 13:41 ` Borislav Petkov 0 siblings, 1 reply; 74+ messages in thread From: Mark Hounschell @ 2010-01-28 13:25 UTC (permalink / raw) To: Andreas Herrmann Cc: dmarkh, Yuhong Bao, venkatesh.pallipadi, andi, Linus Torvalds, alain, linux-kernel On 01/28/2010 04:17 AM, Mark Hounschell wrote: > On 01/25/2010 12:10 PM, Andreas Herrmann wrote: >> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote: >>> >>>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some >>>> side-effects on floppy DMA. Do not use HPET MSI on such platforms. >> >> Argh, will see what information I can find about this problem ... >> >>> I think somebody from AMD should review the situation.Clearly >> something is happening inside their southbridge.CCing Andreas >> Herrmann from AMD. >> >> I have the feeling that this problem should rather be fixed with a DMI >> quirk instead of disabling HPET MSI for the entire chipset. >> >> Was the latest available BIOS installed on the affected system? >> > > You mean "systems" of different manufactures? I will check today. Due to > mis configured filters I didn't see this until today. Sorry. > > Mark > My BIOS were below rev on all my affected boards but updating did not help with the problem. Andreas, while I have your ear, I am also having another issue with this chip set doing peer to peer bus transfers between pci buses and pci-e buses and from pci-e to pci-e buses. I've read the chip set specs and they _seem_ to imply that it may not be allowed due to "Trusted Computing" something or another. I've posed the issue to the AMD forums with no luck, and I can't figure out why this doesn't work using these chip sets. Sorry to change the subject. I just figured I'd ask someone from AMD while I had the chance. Thanks and Regards Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-28 13:25 ` Mark Hounschell @ 2010-01-28 13:41 ` Borislav Petkov 2010-01-28 14:45 ` Mark Hounschell 0 siblings, 1 reply; 74+ messages in thread From: Borislav Petkov @ 2010-01-28 13:41 UTC (permalink / raw) To: Mark Hounschell Cc: Andreas Herrmann, dmarkh, Yuhong Bao, venkatesh.pallipadi, andi, Linus Torvalds, alain, linux-kernel On Thu, Jan 28, 2010 at 08:25:23AM -0500, Mark Hounschell wrote: > On 01/28/2010 04:17 AM, Mark Hounschell wrote: > > On 01/25/2010 12:10 PM, Andreas Herrmann wrote: > >> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote: > >>> > >>>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some > >>>> side-effects on floppy DMA. Do not use HPET MSI on such platforms. > >> > >> Argh, will see what information I can find about this problem ... > >> > >>> I think somebody from AMD should review the situation.Clearly > >> something is happening inside their southbridge.CCing Andreas > >> Herrmann from AMD. > >> > >> I have the feeling that this problem should rather be fixed with a DMI > >> quirk instead of disabling HPET MSI for the entire chipset. > >> > >> Was the latest available BIOS installed on the affected system? > >> > > > > You mean "systems" of different manufactures? I will check today. Due to > > mis configured filters I didn't see this until today. Sorry. > > > > Mark > > > > My BIOS were below rev on all my affected boards but updating did not help > with the problem. Hi, can you post the BIOS vendors of the boards along with the respective BIOS versions? Thanks. -- Regards/Gruss, Boris. -- Advanced Micro Devices, Inc. Operating Systems Research Center ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-28 13:41 ` Borislav Petkov @ 2010-01-28 14:45 ` Mark Hounschell 0 siblings, 0 replies; 74+ messages in thread From: Mark Hounschell @ 2010-01-28 14:45 UTC (permalink / raw) To: Borislav Petkov Cc: Andreas Herrmann, dmarkh, Yuhong Bao, venkatesh.pallipadi, andi, Linus Torvalds, alain, linux-kernel On 01/28/2010 08:41 AM, Borislav Petkov wrote: > On Thu, Jan 28, 2010 at 08:25:23AM -0500, Mark Hounschell wrote: >> On 01/28/2010 04:17 AM, Mark Hounschell wrote: >>> On 01/25/2010 12:10 PM, Andreas Herrmann wrote: >>>> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote: >>>>> >>>>>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some >>>>>> side-effects on floppy DMA. Do not use HPET MSI on such platforms. >>>> >>>> Argh, will see what information I can find about this problem ... >>>> >>>>> I think somebody from AMD should review the situation.Clearly >>>> something is happening inside their southbridge.CCing Andreas >>>> Herrmann from AMD. >>>> >>>> I have the feeling that this problem should rather be fixed with a DMI >>>> quirk instead of disabling HPET MSI for the entire chipset. >>>> >>>> Was the latest available BIOS installed on the affected system? >>>> >>> >>> You mean "systems" of different manufactures? I will check today. Due to >>> mis configured filters I didn't see this until today. Sorry. >>> >>> Mark >>> >> >> My BIOS were below rev on all my affected boards but updating did not help >> with the problem. > > Hi, > > can you post the BIOS vendors of the boards along with the respective > BIOS versions? > > Thanks. > DFI DK-790FXB-M3H5 MB using AWARD bios D7SDA09.BIN (10/09/2009) BIOSTAR TA790GXB A2+ using AMI bios 78DDA928.BST (09/28/09) Regards Mark ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-01-25 17:10 ` Andreas Herrmann 2010-01-28 9:17 ` Mark Hounschell @ 2010-05-17 14:59 ` Andreas Herrmann 2010-05-17 15:10 ` Yuhong Bao ` (2 more replies) 1 sibling, 3 replies; 74+ messages in thread From: Andreas Herrmann @ 2010-05-17 14:59 UTC (permalink / raw) To: Yuhong Bao Cc: venkatesh.pallipadi, markh, dmarkh, andi, Linus Torvalds, alain, linux-kernel On Mon, Jan 25, 2010 at 06:10:59PM +0100, Andreas Herrmann wrote: > On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote: > > > > > HPET MSI on platforms with ATI SB700/SB800 as they seem to have some > > > side-effects on floppy DMA. Do not use HPET MSI on such platforms. > > Argh, will see what information I can find about this problem ... FYI. I've tried to trigger the publication of errata information for that chipset. Finally this has happened. The discussed problem is indeed due to an erratum. See erratum #27 in http://support.amd.com/us/Embedded_TechDocs/46837.pdf The suggested workaround for this is to disable HPET MSI if LPC devices are used. I doubt that there is a convenient way for Linux to find out whether LPC devices are used. Thus I think the only solution to safely avoid the problem is the currently implemented quirk to disable HPET MSI on this chipset. Regards, Andreas -- Operating | Advanced Micro Devices GmbH System | Einsteinring 24, 85609 Dornach b. München, Germany Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München (OSRC) | Registergericht München, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 74+ messages in thread
* RE: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-05-17 14:59 ` Andreas Herrmann @ 2010-05-17 15:10 ` Yuhong Bao 2010-05-17 15:12 ` Linus Torvalds 2010-05-18 0:56 ` Robert Hancock 2 siblings, 0 replies; 74+ messages in thread From: Yuhong Bao @ 2010-05-17 15:10 UTC (permalink / raw) To: andreas.herrmann3 Cc: venkatesh.pallipadi, markh, dmarkh, andi, torvalds, alain, linux-kernel > The suggested workaround for this is to disable HPET MSI if LPC > devices are used. I doubt that there is a convenient way for Linux to > find out whether LPC devices are used.And don't forget that the Super I/O chip in most motherboards is an LPC device!(In fact, that was what LPC was invented for) > Thus I think the only solution > to safely avoid the problem is the currently implemented quirk to > disable HPET MSI on this chipset. Yuhong Bao _________________________________________________________________ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-05-17 14:59 ` Andreas Herrmann 2010-05-17 15:10 ` Yuhong Bao @ 2010-05-17 15:12 ` Linus Torvalds 2010-05-17 16:46 ` Andreas Herrmann 2010-05-18 0:56 ` Robert Hancock 2 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2010-05-17 15:12 UTC (permalink / raw) To: Andreas Herrmann Cc: Yuhong Bao, venkatesh.pallipadi, markh, dmarkh, andi, alain, linux-kernel On Mon, 17 May 2010, Andreas Herrmann wrote: > > FYI. I've tried to trigger the publication of errata information for that > chipset. Finally this has happened. > > The discussed problem is indeed due to an erratum. See erratum #27 in > http://support.amd.com/us/Embedded_TechDocs/46837.pdf > > The suggested workaround for this is to disable HPET MSI if LPC > devices are used. I doubt that there is a convenient way for Linux to > find out whether LPC devices are used. Thus I think the only solution > to safely avoid the problem is the currently implemented quirk to > disable HPET MSI on this chipset. Goodie. It would be good to point this out in the source too. Would you be willing to send in a patch that documents this quirk as a result of that erratum #27, so that we don't lose sight of why we're doing that odd MSI disable? Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-05-17 15:12 ` Linus Torvalds @ 2010-05-17 16:46 ` Andreas Herrmann 0 siblings, 0 replies; 74+ messages in thread From: Andreas Herrmann @ 2010-05-17 16:46 UTC (permalink / raw) To: Linus Torvalds Cc: Yuhong Bao, venkatesh.pallipadi@intel.com, markh@compro.net, dmarkh@cfl.rr.com, andi@firstfloor.org, alain@knaff.lu, linux-kernel@vger.kernel.org On Mon, May 17, 2010 at 11:12:59AM -0400, Linus Torvalds wrote: > > > On Mon, 17 May 2010, Andreas Herrmann wrote: > > > > FYI. I've tried to trigger the publication of errata information for that > > chipset. Finally this has happened. > > > > The discussed problem is indeed due to an erratum. See erratum #27 in > > http://support.amd.com/us/Embedded_TechDocs/46837.pdf > > > > The suggested workaround for this is to disable HPET MSI if LPC > > devices are used. I doubt that there is a convenient way for Linux to > > find out whether LPC devices are used. Thus I think the only solution > > to safely avoid the problem is the currently implemented quirk to > > disable HPET MSI on this chipset. > > Goodie. It would be good to point this out in the source too. Would you be > willing to send in a patch that documents this quirk as a result of that > erratum #27, so that we don't lose sight of why we're doing that odd MSI > disable? Done that. See http://marc.info/?l=linux-kernel&m=127411462230838 Andreas -- Operating | Advanced Micro Devices GmbH System | Einsteinring 24, 85609 Dornach b. München, Germany Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München (OSRC) | Registergericht München, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-05-17 14:59 ` Andreas Herrmann 2010-05-17 15:10 ` Yuhong Bao 2010-05-17 15:12 ` Linus Torvalds @ 2010-05-18 0:56 ` Robert Hancock 2010-05-18 1:02 ` Linus Torvalds 2010-05-18 8:45 ` Andi Kleen 2 siblings, 2 replies; 74+ messages in thread From: Robert Hancock @ 2010-05-18 0:56 UTC (permalink / raw) To: Andreas Herrmann Cc: Yuhong Bao, venkatesh.pallipadi, markh, dmarkh, andi, Linus Torvalds, alain, linux-kernel On 05/17/2010 08:59 AM, Andreas Herrmann wrote: > On Mon, Jan 25, 2010 at 06:10:59PM +0100, Andreas Herrmann wrote: >> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote: >>> >>>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some >>>> side-effects on floppy DMA. Do not use HPET MSI on such platforms. >> >> Argh, will see what information I can find about this problem ... > > FYI. I've tried to trigger the publication of errata information for that > chipset. Finally this has happened. > > The discussed problem is indeed due to an erratum. See erratum #27 in > http://support.amd.com/us/Embedded_TechDocs/46837.pdf > > The suggested workaround for this is to disable HPET MSI if LPC > devices are used. I doubt that there is a convenient way for Linux to > find out whether LPC devices are used. Thus I think the only solution > to safely avoid the problem is the currently implemented quirk to > disable HPET MSI on this chipset. If one wanted, you could disable HPET MSI on this chipset only when a driver requests an ISA DMA channel. Then if there's no floppy or other LPC DMA device present, it can stay enabled. I don't know if it's worth the trouble, though. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-05-18 0:56 ` Robert Hancock @ 2010-05-18 1:02 ` Linus Torvalds 2010-05-18 1:06 ` Robert Hancock 2010-05-18 8:45 ` Andi Kleen 1 sibling, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2010-05-18 1:02 UTC (permalink / raw) To: Robert Hancock Cc: Andreas Herrmann, Yuhong Bao, venkatesh.pallipadi, markh, dmarkh, andi, alain, linux-kernel On Mon, 17 May 2010, Robert Hancock wrote: > > If one wanted, you could disable HPET MSI on this chipset only when a driver > requests an ISA DMA channel. Then if there's no floppy or other LPC DMA device > present, it can stay enabled. I don't know if it's worth the trouble, though. Nope, that wouldn't work. Imagine a driver that already loaded, and is already using MSI (say, network device). What happens now if you want to access the floppy and load the floppy module? Oh, you can't? Need to bring down the network interface, unload that module first? Not practical. Sure, in theory we can do some crazy callback for "you now need to re-do your interrupt registration" for all devices. In practice, I can onyl say "not going to happen". Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-05-18 1:02 ` Linus Torvalds @ 2010-05-18 1:06 ` Robert Hancock 0 siblings, 0 replies; 74+ messages in thread From: Robert Hancock @ 2010-05-18 1:06 UTC (permalink / raw) To: Linus Torvalds Cc: Andreas Herrmann, Yuhong Bao, venkatesh.pallipadi, markh, dmarkh, andi, alain, linux-kernel On Mon, May 17, 2010 at 7:02 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Mon, 17 May 2010, Robert Hancock wrote: >> >> If one wanted, you could disable HPET MSI on this chipset only when a driver >> requests an ISA DMA channel. Then if there's no floppy or other LPC DMA device >> present, it can stay enabled. I don't know if it's worth the trouble, though. > > Nope, that wouldn't work. > > Imagine a driver that already loaded, and is already using MSI (say, > network device). What happens now if you want to access the floppy and > load the floppy module? Oh, you can't? Need to bring down the network > interface, unload that module first? Not practical. > > Sure, in theory we can do some crazy callback for "you now need to re-do > your interrupt registration" for all devices. In practice, I can onyl say > "not going to happen". It sounds like this bug only affects HPET MSI requests (presumably the only ones that the southbridge can concern itself with), not any others. It would require the HPET code to support having its MSI support yanked away at runtime, though. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-05-18 0:56 ` Robert Hancock 2010-05-18 1:02 ` Linus Torvalds @ 2010-05-18 8:45 ` Andi Kleen 2010-05-18 23:22 ` Robert Hancock 1 sibling, 1 reply; 74+ messages in thread From: Andi Kleen @ 2010-05-18 8:45 UTC (permalink / raw) To: Robert Hancock Cc: Andreas Herrmann, Yuhong Bao, venkatesh.pallipadi, markh, dmarkh, andi, Linus Torvalds, alain, linux-kernel > If one wanted, you could disable HPET MSI on this chipset only when a > driver requests an ISA DMA channel. Then if there's no floppy or other LPC > DMA device present, it can stay enabled. I don't know if it's worth the > trouble, though. There can be LPC devices which are not visible to the kernel, but only used through ACPI or the BIOS. Think of fancy fan controllers and similar. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 2010-05-18 8:45 ` Andi Kleen @ 2010-05-18 23:22 ` Robert Hancock 0 siblings, 0 replies; 74+ messages in thread From: Robert Hancock @ 2010-05-18 23:22 UTC (permalink / raw) To: Andi Kleen Cc: Andreas Herrmann, Yuhong Bao, venkatesh.pallipadi, markh, dmarkh, Linus Torvalds, alain, linux-kernel On Tue, May 18, 2010 at 2:45 AM, Andi Kleen <andi@firstfloor.org> wrote: >> If one wanted, you could disable HPET MSI on this chipset only when a >> driver requests an ISA DMA channel. Then if there's no floppy or other LPC >> DMA device present, it can stay enabled. I don't know if it's worth the >> trouble, though. > > There can be LPC devices which are not visible to the kernel, > but only used through ACPI or the BIOS. Think of fancy fan > controllers and similar. I would hope they wouldn't use DMA without kernel knowledge, otherwise that really would be an abomination.. ^ permalink raw reply [flat|nested] 74+ messages in thread
end of thread, other threads:[~2010-05-18 23:22 UTC | newest]
Thread overview: 74+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4AFB3962.2020106@ntlworld.com>
[not found] ` <4B2610F8.7050609@cfl.rr.com>
[not found] ` <4B2618EF.9020709@knaff.lu>
[not found] ` <4B264448.5040604@compro.net>
[not found] ` <4B26884C.8000306@knaff.lu>
[not found] ` <4B2697C4.2040204@compro.net>
[not found] ` <4B26A82E.5040902@knaff.lu>
[not found] ` <4B26B031.4060301@compro.net>
[not found] ` <4B26BAE3.2090408@knaff.lu>
[not found] ` <4B275975.8040509@cfl.rr.com>
[not found] ` <4B275B18.80704@knaff.lu>
[not found] ` <4B275D37.4090807@cfl.rr.com>
[not found] ` <4B2761E9.2030301@knaff.lu>
[not found] ` <4B276513.6030509@cfl.rr.com>
[not found] ` <4B276753.80807@knaff.lu>
[not found] ` <4B27983F.5090600@compro.net>
[not found] ` <4B27EF18.7050101@knaff.lu>
[not found] ` <4B28FDEB.3030800@compro.net>
[not found] ` <4B290029.90602@knaff.lu>
[not found] ` <4B2901DB.8040403@compro.net>
[not found] ` <4B29052B.9070406@knaff.lu>
[not found] ` <4B292D84.5040306@compro.net>
[not found] ` <4B29624F.2080109@knaff.lu>
[not found] ` <4B2A3805.8040707@compro.net>
[not found] ` <4B2A3E3E.8060405@knaff.lu>
[not found] ` <4B2A4975.8020809@compro.net>
[not found] ` <4B2A49F4.6070402@compro.net>
[not found] ` <4B2A4B86.8060307@knaff.lu>
[not found] ` <4B2A4C78.10107@compro.net>
[not found] ` <4B2A4CF7.6040000@knaff.lu>
[not found] ` <4B2A4EC9.2030902@compro.net>
[not found] ` <4B2A4FA5.5000701@knaff.lu>
[not found] ` <4B2A5192.6090602@compro.net>
[not found] ` <4B2A530D.3080606@knaff! .lu>
[not found] ` <4B2A530D.3080606@knaff.lu>
2009-12-17 17:00 ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff
2009-12-17 17:27 ` Linus Torvalds
2009-12-17 18:21 ` DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa
2009-12-17 20:46 ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff
2009-12-17 21:14 ` Linus Torvalds
2009-12-17 22:11 ` Alain Knaff
2009-12-17 22:43 ` Linus Torvalds
2009-12-17 23:24 ` Alain Knaff
2009-12-18 8:59 ` Mark Hounschell
2009-12-18 10:55 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: " Mark Hounschell
2009-12-18 15:01 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa
2009-12-18 15:22 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Linus Torvalds
2009-12-18 15:28 ` Mark Hounschell
2009-12-18 15:45 ` Linus Torvalds
2009-12-18 20:04 ` Mark Hounschell
2009-12-18 20:15 ` Linus Torvalds
2009-12-22 15:11 ` Mark Hounschell
2009-12-22 17:38 ` Linus Torvalds
2009-12-22 17:57 ` Mark Hounschell
2009-12-22 23:37 ` Pallipadi, Venkatesh
2009-12-23 0:22 ` Mark Hounschell
2009-12-23 13:02 ` Mark Hounschell
2009-12-23 15:10 ` Pallipadi, Venkatesh
2009-12-23 15:34 ` Mark Hounschell
2009-12-23 15:57 ` Mark Hounschell
2009-12-23 16:31 ` Linus Torvalds
2009-12-23 16:38 ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Andi Kleen
2009-12-23 16:49 ` Linus Torvalds
2009-12-23 17:08 ` Andi Kleen
2009-12-25 12:21 ` Arjan van de Ven
2009-12-25 20:33 ` Andi Kleen
2009-12-26 9:38 ` Arjan van de Ven
2009-12-26 16:40 ` Andi Kleen
2009-12-27 12:28 ` Alain Knaff
2009-12-28 1:54 ` Andi Kleen
2009-12-28 10:27 ` Alain Knaff
2009-12-28 14:54 ` Andi Kleen
2009-12-27 11:09 ` Pavel Machek
2009-12-28 20:54 ` Mark Hounschell
2009-12-23 17:19 ` Pallipadi, Venkatesh
2009-12-23 17:16 ` Andi Kleen
2009-12-23 20:11 ` alain
2009-12-23 17:41 ` Mark Hounschell
2009-12-23 18:01 ` Linus Torvalds
2009-12-23 18:11 ` Mark Hounschell
2009-12-23 19:18 ` Pallipadi, Venkatesh
2009-12-23 19:35 ` Mark Hounschell
2009-12-23 20:30 ` Pallipadi, Venkatesh
2009-12-23 20:34 ` alain
2009-12-23 21:34 ` Pallipadi, Venkatesh
2010-01-08 17:42 ` Mark Hounschell
2010-01-12 0:19 ` Pallipadi, Venkatesh
2010-01-12 9:04 ` Mark Hounschell
2010-01-15 2:01 ` Pallipadi, Venkatesh
2010-01-15 9:39 ` Mark Hounschell
2010-01-15 18:02 ` Mark Hounschell
2010-01-21 19:09 ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh
2010-01-22 22:00 ` [tip:x86/urgent] " tip-bot for Pallipadi, Venkatesh
2010-01-23 6:51 ` tip-bot for Pallipadi, Venkatesh
2010-01-23 7:21 ` [PATCH] " Yuhong Bao
2010-01-25 17:10 ` Andreas Herrmann
2010-01-28 9:17 ` Mark Hounschell
2010-01-28 13:25 ` Mark Hounschell
2010-01-28 13:41 ` Borislav Petkov
2010-01-28 14:45 ` Mark Hounschell
2010-05-17 14:59 ` Andreas Herrmann
2010-05-17 15:10 ` Yuhong Bao
2010-05-17 15:12 ` Linus Torvalds
2010-05-17 16:46 ` Andreas Herrmann
2010-05-18 0:56 ` Robert Hancock
2010-05-18 1:02 ` Linus Torvalds
2010-05-18 1:06 ` Robert Hancock
2010-05-18 8:45 ` Andi Kleen
2010-05-18 23:22 ` Robert Hancock
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).