DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?)
       [not found]                                                                   ` <4B2A530D.3080606@knaff.lu>
@ 2009-12-17 17:00                                                                     ` Alain Knaff
  2009-12-17 17:27                                                                       ` Linus Torvalds
  0 siblings, 1 reply; 74+ messages in thread
From: Alain Knaff @ 2009-12-17 17:00 UTC (permalink / raw)
  To: markh; +Cc: fdutils, torvalds, linux-kernel

On 17/12/09 16:49, Alain Knaff wrote:
> On 17/12/09 16:43, Mark Hounschell wrote:
>> On 12/17/2009 10:35 AM, Alain Knaff wrote:
>>
>>>> Should I do more work in between?
>>>
>>> No, but make sure to look at track 0... Other tracks will still have the
>>> error, as there was nothing forcing a memory flush between track 0 and 1...
>>
>> Ok track 0
> [...]
>> 0: 0
>> 1: 0
>> 2: 0
>> 3: 4f  <--
>> 4: 0 
>> 5: 1 
>> 6: 2 
>> no disk change
> 
> Yeah, that's what I meant... So the memory flusher program didn't manage to
> clear up the inconsistency...
> 
> So either my theory is wrong, or the memory flusher program was not
> efficient enough.... hmmm, maybe doing some surfing in between the formats,
> or doing another kernel compilation might be a better test.
> 
> Alain

Ok, so I had a look at the differences between 2.6.27.41 and 2.6.28, and
there have indeed been changes to the iommu and DMA handling code.

So I suspect that the problem may be lying here

Cc'ed Linus and kernel list on this. For Linux and the list, here's the
summary of what we are observing:

- A DMA transfer of a memory block transfers the wrong value for the first
byte of the block. All other bytes of the block are transferred correctly.
The value of the first byte turns out to be the value that this byte held
during the *previous* transfer. Just as if there was some kind of cache,
and the transfer started before that cache was refreshed with the new
values from main memory.

Example:

1. initial contents:  33 44 55 66
2. one DMA transfer is performed
3. program changes buffer to: 77 88 99 aa
4. new DMA transfer is performed => instead it transmits 33 88 99 aa
   (i.e. first byte is from previous contents)

This used to work in 2.6.27.41, but broke in 2.6.28 . It doesn't happen on
all hardware though.

It does indeed seem to be related to a DMA-side cache (rather than the
processor's cache not being flushed to main memory), as doing lots of
memory intensive work (kernel compilation) between 2 and 3 doesn't fix the
problem.

In the diff between 2.6.27.41 and 2.6.28, I noticed a lot of changes in
arch/x86/kernel/amd_iommu.c and related files, could any of these have
triggered this behavior?

Any ideas, anybody?

Alain

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?)
  2009-12-17 17:00                                                                     ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff
@ 2009-12-17 17:27                                                                       ` Linus Torvalds
  2009-12-17 18:21                                                                         ` DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa
  2009-12-17 20:46                                                                         ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff
  0 siblings, 2 replies; 74+ messages in thread
From: Linus Torvalds @ 2009-12-17 17:27 UTC (permalink / raw)
  To: Alain Knaff; +Cc: markh, fdutils, linux-kernel

On Thu, 17 Dec 2009, Alain Knaff wrote:
> 
> 1. initial contents:  33 44 55 66
> 2. one DMA transfer is performed
> 3. program changes buffer to: 77 88 99 aa
> 4. new DMA transfer is performed => instead it transmits 33 88 99 aa
>    (i.e. first byte is from previous contents)
> 
> This used to work in 2.6.27.41, but broke in 2.6.28 . It doesn't happen on
> all hardware though.

Do you have a list of hardware it works on? Especially chipsets.

On x86, where all caches are supposed to be totally coherent (except for 
I$ under very special circumstances), the above should never be able to 
happen. At least not unless there is really buggy hardware involved.

> It does indeed seem to be related to a DMA-side cache (rather than the
> processor's cache not being flushed to main memory), as doing lots of
> memory intensive work (kernel compilation) between 2 and 3 doesn't fix the
> problem.

I'm not entirely surprised. Actual CPU bugs are pretty rare in the x86 
world. But chipset bugs? Another thing entirely. There are buffers and 
caches there, and those are sometimes software-visible. The most obvious 
case of that is just the IOMMU's themselves, but from your description I 
don't think you actually change the DMA _mappings_ do you? Just the 
actual buffer (that was then mapped earlier)?

So I don't think it's the IOMMU code itself necessarily, although an IOMMU 
may well be involved (eg I could easily see a few cachelines worth of 
actual DMA data caching going on in the whole IOMMU too)

And to some degree the floppy driver might be _more_ likely to see some 
kinds of bugs, because it uses that crazy legacy DMA engine. So it's not 
going to go through the regular PCI DMA hardware paths, it's going to go 
through its own special paths that nobody else uses any more (and thus has 
probably not had as much testing).

> In the diff between 2.6.27.41 and 2.6.28, I noticed a lot of changes in
> arch/x86/kernel/amd_iommu.c and related files, could any of these have
> triggered this behavior?

Could it have triggered? Sure. Chipset caches are often flushed by certain 
trivial operations (often the caches are small, and operations like "any 
PIO access" will make sure they are flushed). Different IOMMU flush 
patterns could easily account for it.

But I think we'd like to see a list of hardware where this can be 
triggered, and quite frankly, a 'git bisect' would be absolutely wonderful 
especially if the list of hardware is not showing any really obvious 
patterns (and I assume they aren't all _that_ obvious, or you'd have 
mentioned them).

			Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: DMA cache consistency bug introduced in 2.6.28
  2009-12-17 17:27                                                                       ` Linus Torvalds
@ 2009-12-17 18:21                                                                         ` Krzysztof Halasa
  2009-12-17 20:46                                                                         ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff
  1 sibling, 0 replies; 74+ messages in thread
From: Krzysztof Halasa @ 2009-12-17 18:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alain Knaff, markh, fdutils, linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On x86, where all caches are supposed to be totally coherent (except for 
> I$ under very special circumstances),

BTW SWIOTLB is a non-coherent "cache" in some sense, though I'd be
surprised if it's related. Anyway mentioning $CPU and $RAM at the very
least would be a good idea in such cases.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?)
  2009-12-17 17:27                                                                       ` Linus Torvalds
  2009-12-17 18:21                                                                         ` DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa
@ 2009-12-17 20:46                                                                         ` Alain Knaff
  2009-12-17 21:14                                                                           ` Linus Torvalds
  1 sibling, 1 reply; 74+ messages in thread
From: Alain Knaff @ 2009-12-17 20:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: markh, fdutils, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 17 Dec 2009, Alain Knaff wrote:
>> 1. initial contents:  33 44 55 66
>> 2. one DMA transfer is performed
>> 3. program changes buffer to: 77 88 99 aa
>> 4. new DMA transfer is performed => instead it transmits 33 88 99 aa
>>    (i.e. first byte is from previous contents)
>>
>> This used to work in 2.6.27.41, but broke in 2.6.28 . It doesn't happen on
>> all hardware though.
> 
> Do you have a list of hardware it works on? Especially chipsets.

For the moment, I have a very small sample of hardware:
1. One machine which works (my own): Athlon XP 1800+ processor
2. One which doesn't work (Mark's)

I might get access to a wider sample of boxen in a week or so, in order
to do some stats.

What's the easiest way to find out the chipset?

Here's already the output of lspci from my machine (works):

00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP]
Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.5 Multimedia audio controller: VIA Technologies, Inc.
VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II]
(rev 74)
01:00.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4 MX
440] (rev a3)

[...]
> I'm not entirely surprised. Actual CPU bugs are pretty rare in the x86 
> world. But chipset bugs? Another thing entirely. There are buffers and 
> caches there, and those are sometimes software-visible. The most obvious 
> case of that is just the IOMMU's themselves, but from your description I 
> don't think you actually change the DMA _mappings_ do you? Just the 
> actual buffer (that was then mapped earlier)?

No, I don't change any DMA mappings. And the buffer is still the same
physical buffer, at the same physical address.

(It happens during formatting the floppy drive: here the first byte
happens to be the trackid of the first physical sector of the track, and
it always ends up being the track of the *previously* formatted track).

> So I don't think it's the IOMMU code itself necessarily, although an IOMMU 
> may well be involved (eg I could easily see a few cachelines worth of 
> actual DMA data caching going on in the whole IOMMU too)
> 
> And to some degree the floppy driver might be _more_ likely to see some 
> kinds of bugs, because it uses that crazy legacy DMA engine. So it's not 

Indeed, most other drivers use "bus master" DMA, that doesn't use the
legacy DMA controller at all, but use DMA controllers hosted on the
device itself...

> going to go through the regular PCI DMA hardware paths, it's going to go 
> through its own special paths that nobody else uses any more (and thus has 
> probably not had as much testing).
> 
>> In the diff between 2.6.27.41 and 2.6.28, I noticed a lot of changes in
>> arch/x86/kernel/amd_iommu.c and related files, could any of these have
>> triggered this behavior?
> 
> Could it have triggered? Sure. Chipset caches are often flushed by certain 
> trivial operations (often the caches are small, and operations like "any 
> PIO access" will make sure they are flushed). Different IOMMU flush 
> patterns could easily account for it.
> 
> But I think we'd like to see a list of hardware where this can be 
> triggered,

We'll get a list of 2 machines relatively quickly (unless other people
would like to chime in: the test is easy, just fdformat a floppy disk),
and more in a week or so.

> and quite frankly, a 'git bisect' would be absolutely wonderful 

How exactly would I use this (command line sample)?

> especially if the list of hardware is not showing any really obvious 
> patterns (and I assume they aren't all _that_ obvious, or you'd have 
> mentioned them).
> 
> 			Linus

Thanks,

Alain

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?)
  2009-12-17 20:46                                                                         ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff
@ 2009-12-17 21:14                                                                           ` Linus Torvalds
  2009-12-17 22:11                                                                             ` Alain Knaff
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2009-12-17 21:14 UTC (permalink / raw)
  To: Alain Knaff; +Cc: markh, fdutils, linux-kernel

On Thu, 17 Dec 2009, Alain Knaff wrote:
> 
> For the moment, I have a very small sample of hardware:
> 1. One machine which works (my own): Athlon XP 1800+ processor
> 2. One which doesn't work (Mark's)

Ok. I don't think I even have any machines with floppy drives any more 
(one external USB drive somewhere gathering dust just in case I ever 
encounter a floppy again).

> I might get access to a wider sample of boxen in a week or so, in order
> to do some stats.

Ok, I was more thinking "we have a bugzilla with ten different people 
reporting this". If it's just a single machine, that's not going to be 
relevant.

> What's the easiest way to find out the chipset?
> 
> Here's already the output of lspci from my machine (works):
> 
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge
> 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge

Yeah, lspci (and generally only the northbridge and southbridge matters, 
the "ISA bridge" might technically be relevant, but since it's universally 
on the same die as the southbridge, I left it in there just for kicks).

> (It happens during formatting the floppy drive: here the first byte
> happens to be the trackid of the first physical sector of the track, and
> it always ends up being the track of the *previously* formatted track).

I guess it could simply be a floppy controller bug too, triggered by some 
random timing difference or innocuous-looking change.

> > But I think we'd like to see a list of hardware where this can be 
> > triggered,
> 
> We'll get a list of 2 machines relatively quickly (unless other people
> would like to chime in: the test is easy, just fdformat a floppy disk),
> and more in a week or so.

Only the "it doesn't work on xyz" is likely interesting. The machines it 
works on are probably uninteresting statistically.

> > and quite frankly, a 'git bisect' would be absolutely wonderful 
> 
> How exactly would I use this (command line sample)?

You'd need a git tree that contains both the working and non-working 
versions, and then literally just do

	git bisect start
	git bisect good <known good version number here>
	git bisect bad <known bad version here>

and it will give you a commit to try. Compile, test, see if it's good or 
bad, and do

	git bisect [good|bad]

depending on the result. Rinse and repeat (depending on how tight the 
initial good/bad commits were, it will need 10-15 kernel tests).

So in this case, since apparently 2.6.27.41 is good, and 2.6.28 is not, it 
would be something like this:

	# clone hpa's tree that has all the stable releases in one place
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-allstable.git

	cd linux-2.6-allstable
	git bisect start
	git bisect bad v2.6.28
	git bisect good v2.6.27.41

and off you go.

NOTE! Bisection depends very much on the bug being 100% reproducible. If 
you ever mark a good kernel bad (because you messed up) or a bad kernel 
good (because the bug wasn't 100% reproducible, so you _thought_ it was 
good even though the bug was present and just happened to hide), the end 
result of the bisect will be totally unreliable and seriously screwed up.

So after a successful bisect, it is usually a good idea to try to go back 
to the original known-bad kernel, and then revert the commit that was 
indicated as the bad one (assuming the revert works - it could be that the 
bad one ends up being fundamental to other commits after it), and test 
that yes, that really fixes the bug.

It gets more complicated if the bisect hits kernels that you can't test 
because they have _unrelated_ issues on that machine (compile failures or 
just other bugs that hide the actual floppy behavior), but generally 
bisection is pretty simple. "man git-bisect" does have some extra 
pointers.

So git bisect may be somewhat time-consuming and mindless, but for 
reliably triggering bugs where nobody really knows what caused the bug it 
is a _really_ convenient thing to do. The only thing you need is a 
reliably triggering test-case, and some time.

			Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?)
  2009-12-17 21:14                                                                           ` Linus Torvalds
@ 2009-12-17 22:11                                                                             ` Alain Knaff
  2009-12-17 22:43                                                                               ` Linus Torvalds
  0 siblings, 1 reply; 74+ messages in thread
From: Alain Knaff @ 2009-12-17 22:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: markh, fdutils, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 17 Dec 2009, Alain Knaff wrote:
>> For the moment, I have a very small sample of hardware:
>> 1. One machine which works (my own): Athlon XP 1800+ processor
>> 2. One which doesn't work (Mark's)
> 
> Ok. I don't think I even have any machines with floppy drives any more 
> (one external USB drive somewhere gathering dust just in case I ever 
> encounter a floppy again).

Well, on my new box, I have no floppy drive either. The one I mentioned
is an old machine that I kept around just in case I needed to debug
floppy-related problems.

>> I might get access to a wider sample of boxen in a week or so, in order
>> to do some stats.
> 
> Ok, I was more thinking "we have a bugzilla with ten different people 
> reporting this". If it's just a single machine, that's not going to be 
> relevant.

We do have a bugzilla
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=548434 , but
unfortunately it has only 2 people so far having seen the bug, one of
which (ael) turned out to be a false alert (dusty drive).

> 
>> What's the easiest way to find out the chipset?
>>
>> Here's already the output of lspci from my machine (works):
>>
>> 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge
>> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge
>> 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
> 
> Yeah, lspci (and generally only the northbridge and southbridge matters, 
> the "ISA bridge" might technically be relevant, but since it's universally 
> on the same die as the southbridge, I left it in there just for kicks).

Good. Here's some info about some machines of Mark which do have the
problem (there's more than one, fortunately):

1st one showing the problem (claimed to be AMD 790x chipset):

00:00.0 Host bridge: ATI Technologies Inc RD790 Northbridge only dual
slot PCI-e_GFX and HT3 K8 part
00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge
(external gfx0 port A)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller

2nd one showing the problem (also claimed to be AMD 790x chipset):

00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge
(int gfx)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller

He also has several machines that do work:

1st one that does work:
00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)

... and a couple more where he didn't get around to test.

[...]
> Only the "it doesn't work on xyz" is likely interesting. The machines it 
> works on are probably uninteresting statistically.

I understand... (working machine above just mentioned for completeness'
sake).

[...]
> You'd need a git tree that contains both the working and non-working 
> versions, and then literally just do
> 
> 	git bisect start
> 	git bisect good <known good version number here>
> 	git bisect bad <known bad version here>
> 
> and it will give you a commit to try. Compile, test, see if it's good or 
> bad, and do
> 
> 	git bisect [good|bad]
> 
> depending on the result. Rinse and repeat (depending on how tight the 
> initial good/bad commits were, it will need 10-15 kernel tests).

... and how do I check out the most recent good / oldest bad kernel for
compilation?


> So in this case, since apparently 2.6.27.41 is good, and 2.6.28 is not, it 
> would be something like this:
> 
> 	# clone hpa's tree that has all the stable releases in one place
> 	git clone git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-allstable.git
> 
> 	cd linux-2.6-allstable
> 	git bisect start
> 	git bisect bad v2.6.28
> 	git bisect good v2.6.27.41
> 
> and off you go.

ok...

> NOTE! Bisection depends very much on the bug being 100% reproducible. If 
> you ever mark a good kernel bad (because you messed up) or a bad kernel 
> good (because the bug wasn't 100% reproducible, so you _thought_ it was 
> good even though the bug was present and just happened to hide), the end 
> result of the bisect will be totally unreliable and seriously screwed up.
> 
> So after a successful bisect, it is usually a good idea to try to go back 
> to the original known-bad kernel, and then revert the commit that was 
> indicated as the bad one (assuming the revert works - it could be that the 
> bad one ends up being fundamental to other commits after it), and test 
> that yes, that really fixes the bug.

What command lines would I use for that revert?

> It gets more complicated if the bisect hits kernels that you can't test 
> because they have _unrelated_ issues on that machine (compile failures or 
> just other bugs that hide the actual floppy behavior), but generally 
> bisection is pretty simple. "man git-bisect" does have some extra 
> pointers.
> 
> So git bisect may be somewhat time-consuming and mindless, but for 
> reliably triggering bugs where nobody really knows what caused the bug it 
> is a _really_ convenient thing to do. The only thing you need is a 
> reliably triggering test-case, and some time.
> 
> 			Linus

Alain

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?)
  2009-12-17 22:11                                                                             ` Alain Knaff
@ 2009-12-17 22:43                                                                               ` Linus Torvalds
  2009-12-17 23:24                                                                                 ` Alain Knaff
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2009-12-17 22:43 UTC (permalink / raw)
  To: Alain Knaff; +Cc: markh, fdutils, linux-kernel



On Thu, 17 Dec 2009, Alain Knaff wrote:
> [...]
> > You'd need a git tree that contains both the working and non-working 
> > versions, and then literally just do
> > 
> > 	git bisect start
> > 	git bisect good <known good version number here>
> > 	git bisect bad <known bad version here>
> > 
> > and it will give you a commit to try. Compile, test, see if it's good or 
> > bad, and do
> > 
> > 	git bisect [good|bad]
> > 
> > depending on the result. Rinse and repeat (depending on how tight the 
> > initial good/bad commits were, it will need 10-15 kernel tests).
> 
> ... and how do I check out the most recent good / oldest bad kernel for
> compilation?

'git bisect' does all that for you. You don't need to check out the 
kernels you mark good or bad - git will just calculate the commit graphs, 
and pick a commit that is in the "middle" between them, and check out that 
commit.

> > So after a successful bisect, it is usually a good idea to try to go back 
> > to the original known-bad kernel, and then revert the commit that was 
> > indicated as the bad one (assuming the revert works - it could be that the 
> > bad one ends up being fundamental to other commits after it), and test 
> > that yes, that really fixes the bug.
> 
> What command lines would I use for that revert?

	git revert <sha1-that-git-bisect-reported>

but even if that revert isn't successful, just the bisection result will 
be very interesting (assuming it all looks sane, of course - as mentioned, 
sometimes bisect results get screwed up because the bug isn't entirely 
reproducible due to timing etc).

			Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?)
  2009-12-17 22:43                                                                               ` Linus Torvalds
@ 2009-12-17 23:24                                                                                 ` Alain Knaff
  2009-12-18  8:59                                                                                   ` Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Alain Knaff @ 2009-12-17 23:24 UTC (permalink / raw)
  To: Linus Torvalds, markh; +Cc: fdutils, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 17 Dec 2009, Alain Knaff wrote:
>> [...]
>>> You'd need a git tree that contains both the working and non-working 
>>> versions, and then literally just do
>>>
>>> 	git bisect start
>>> 	git bisect good <known good version number here>
>>> 	git bisect bad <known bad version here>
>>>
>>> and it will give you a commit to try. Compile, test, see if it's good or 
>>> bad, and do
>>>
>>> 	git bisect [good|bad]
>>>
>>> depending on the result. Rinse and repeat (depending on how tight the 
>>> initial good/bad commits were, it will need 10-15 kernel tests).
>> ... and how do I check out the most recent good / oldest bad kernel for
>> compilation?
> 
> 'git bisect' does all that for you. You don't need to check out the 
> kernels you mark good or bad - git will just calculate the commit graphs, 
> and pick a commit that is in the "middle" between them, and check out that 
> commit.
> 
>>> So after a successful bisect, it is usually a good idea to try to go back 
>>> to the original known-bad kernel, and then revert the commit that was 
>>> indicated as the bad one (assuming the revert works - it could be that the 
>>> bad one ends up being fundamental to other commits after it), and test 
>>> that yes, that really fixes the bug.
>> What command lines would I use for that revert?
> 
> 	git revert <sha1-that-git-bisect-reported>
> 
> but even if that revert isn't successful, just the bisection result will 
> be very interesting (assuming it all looks sane, of course - as mentioned, 
> sometimes bisect results get screwed up because the bug isn't entirely 
> reproducible due to timing etc).
> 
> 			Linus

thanks for these explanations, that makes it clearer indeed.

Now, I only need to find a machine locally to test this on. Or Mark: are
you confident in doing this yourself?

Thanks,

Alain


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?)
  2009-12-17 23:24                                                                                 ` Alain Knaff
@ 2009-12-18  8:59                                                                                   ` Mark Hounschell
  2009-12-18 10:55                                                                                     ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: " Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2009-12-18  8:59 UTC (permalink / raw)
  To: Alain Knaff; +Cc: Linus Torvalds, markh, fdutils, linux-kernel

On 12/17/2009 06:24 PM, Alain Knaff wrote:

> 
> Now, I only need to find a machine locally to test this on. Or Mark: are
> you confident in doing this yourself?
> 

I'll give it a shot. Sounds easy enough. If I have problems, I'll yell.

Mark


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-18  8:59                                                                                   ` Mark Hounschell
@ 2009-12-18 10:55                                                                                     ` Mark Hounschell
  2009-12-18 15:01                                                                                       ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa
  2009-12-18 15:22                                                                                       ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Linus Torvalds
  0 siblings, 2 replies; 74+ messages in thread
From: Mark Hounschell @ 2009-12-18 10:55 UTC (permalink / raw)
  To: Alain Knaff; +Cc: Mark Hounschell, Linus Torvalds, linux-kernel, fdutils

On 12/18/2009 03:59 AM, Mark Hounschell wrote:
> On 12/17/2009 06:24 PM, Alain Knaff wrote:
> 
>>
>> Now, I only need to find a machine locally to test this on. Or Mark: are
>> you confident in doing this yourself?
>>
> 
> I'll give it a shot. Sounds easy enough. If I have problems, I'll yell.
> 

Ok, I ran into a build issue on the third on.

#harley:/usr/src # git clone
git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-allstable.git
Initialized empty Git repository in /usr/src/linux-2.6-allstable/.git/
remote: Counting objects: 1486248, done.
remote: Compressing objects: 100% (248092/248092), done.
Receiving objects: 100% (1486248/1486248), 323.35 MiB | 6753 KiB/s, done.
remote: Total 1486248 (delta 1236282), reused 1476516 (delta 1227133)
Resolving deltas: 100% (1236282/1236282), done.
Checking out files: 100% (31502/31502), done.


harley:/usr/src # cd linux-2.6-allstable
harley:/usr/src/linux-2.6-allstable # git bisect start
harley:/usr/src/linux-2.6-allstable # git bisect bad v2.6.28
harley:/usr/src/linux-2.6-allstable # git bisect good v2.6.27.41
Bisecting: a merge base must be tested
[3fa8749e584b55f1180411ab1b51117190bac1e5] Linux 2.6.27

Build and test kernel: This one worked so:

harley:/usr/src/linux-2.6-allstable # git bisect good
Bisecting: 4879 revisions left to test after this (roughly 12 steps)
[c813b4e16ead3c3df98ac84419d4df2adf33fe01] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6

Build and test kernel: This one worked so:

harley:/usr/src/linux-2.6-allstable # git bisect good
Bisecting: 2443 revisions left to test after this (roughly 11 steps)
[db563fc2e80534f98c7f9121a6f7dfe41f177a79] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6

This one doesn't build:

CC [M]  fs/ext3/super.o
fs/ext3/super.c: In function ‘ext3_quota_on’:
fs/ext3/super.c:2839: error: ‘nd’ undeclared (first use in this function)
fs/ext3/super.c:2839: error: (Each undeclared identifier is reported only once
fs/ext3/super.c:2839: error: for each function it appears in.)
make[2]: *** [fs/ext3/super.o] Error 1
make[1]: *** [fs/ext3] Error 2
make: *** [fs] Error 2

I haven't yet determined that I can but, if I were to make a modification to the
tree now to fix this would that screw up the bisect process?

Regards
Mark


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-18 10:55                                                                                     ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: " Mark Hounschell
@ 2009-12-18 15:01                                                                                       ` Krzysztof Halasa
  2009-12-18 15:22                                                                                       ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Linus Torvalds
  1 sibling, 0 replies; 74+ messages in thread
From: Krzysztof Halasa @ 2009-12-18 15:01 UTC (permalink / raw)
  To: dmarkh; +Cc: Alain Knaff, Mark Hounschell, Linus Torvalds, linux-kernel,
	fdutils

Mark Hounschell <dmarkh@cfl.rr.com> writes:

> harley:/usr/src/linux-2.6-allstable # git bisect good
> Bisecting: 2443 revisions left to test after this (roughly 11 steps)
> [db563fc2e80534f98c7f9121a6f7dfe41f177a79] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
>
> This one doesn't build:
>
> CC [M]  fs/ext3/super.o
> fs/ext3/super.c: In function ‘ext3_quota_on’:
> fs/ext3/super.c:2839: error: ‘nd’ undeclared (first use in this function)
> fs/ext3/super.c:2839: error: (Each undeclared identifier is reported only once
> fs/ext3/super.c:2839: error: for each function it appears in.)
> make[2]: *** [fs/ext3/super.o] Error 1
> make[1]: *** [fs/ext3] Error 2
> make: *** [fs] Error 2
>
> I haven't yet determined that I can but, if I were to make a modification to the
> tree now to fix this would that screw up the bisect process?

It won't, in such cases.
But you can also git reset --hard another_commit_id (while doing git
bisect) if it fixes this problem (e.g. some next commit).

And you can skip uninteresting parts of the tree when starting git
bisect (though if the cause is in skipped parts, the results will be
meaningless).
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-18 10:55                                                                                     ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: " Mark Hounschell
  2009-12-18 15:01                                                                                       ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa
@ 2009-12-18 15:22                                                                                       ` Linus Torvalds
  2009-12-18 15:28                                                                                         ` Mark Hounschell
  1 sibling, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2009-12-18 15:22 UTC (permalink / raw)
  To: Mark Hounschell; +Cc: Alain Knaff, Mark Hounschell, linux-kernel, fdutils



On Fri, 18 Dec 2009, Mark Hounschell wrote:
> 
> This one doesn't build:
> 
> CC [M]  fs/ext3/super.o
> fs/ext3/super.c: In function ‘ext3_quota_on’:
> fs/ext3/super.c:2839: error: ‘nd’ undeclared (first use in this function)
> fs/ext3/super.c:2839: error: (Each undeclared identifier is reported only once
> fs/ext3/super.c:2839: error: for each function it appears in.)
> make[2]: *** [fs/ext3/super.o] Error 1
> make[1]: *** [fs/ext3] Error 2
> make: *** [fs] Error 2
> 
> I haven't yet determined that I can but, if I were to make a modification to the
> tree now to fix this would that screw up the bisect process?

You can safely fix unrelated problems without screwing up the bisection. 
And in this case you can be pretty sure that this is unrelated, so it's 
all ok.

The fix for that silly problem is

-                       path_put(&nd.path);
+                       path_put(&path);

(it's due to a silent merge failure - it merged cleanly, but semantics had 
changed in a branch and impacted code that was newly introduced in another 
branch).


		Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-18 15:22                                                                                       ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Linus Torvalds
@ 2009-12-18 15:28                                                                                         ` Mark Hounschell
  2009-12-18 15:45                                                                                           ` Linus Torvalds
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2009-12-18 15:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils

On 12/18/2009 10:22 AM, Linus Torvalds wrote:
> 
> 
> On Fri, 18 Dec 2009, Mark Hounschell wrote:
>>
>> This one doesn't build:
>>
>> CC [M]  fs/ext3/super.o
>> fs/ext3/super.c: In function ‘ext3_quota_on’:
>> fs/ext3/super.c:2839: error: ‘nd’ undeclared (first use in this function)
>> fs/ext3/super.c:2839: error: (Each undeclared identifier is reported only once
>> fs/ext3/super.c:2839: error: for each function it appears in.)
>> make[2]: *** [fs/ext3/super.o] Error 1
>> make[1]: *** [fs/ext3] Error 2
>> make: *** [fs] Error 2
>>
>> I haven't yet determined that I can but, if I were to make a modification to the
>> tree now to fix this would that screw up the bisect process?
> 
> You can safely fix unrelated problems without screwing up the bisection. 
> And in this case you can be pretty sure that this is unrelated, so it's 
> all ok.
> 
> The fix for that silly problem is
> 
> -                       path_put(&nd.path);
> +                       path_put(&path);
> 
> (it's due to a silent merge failure - it merged cleanly, but semantics had 
> changed in a branch and impacted code that was newly introduced in another 
> branch).

Yep, thanks. I'm past that now. But haven't done a bisect [good|bad] on the
results of that one yet. Did you see Alain's email response to my bisect
progress report to him?

I'm still at a loss as to how to proceed?

Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-18 15:28                                                                                         ` Mark Hounschell
@ 2009-12-18 15:45                                                                                           ` Linus Torvalds
  2009-12-18 20:04                                                                                             ` Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2009-12-18 15:45 UTC (permalink / raw)
  To: Mark Hounschell; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils

On Fri, 18 Dec 2009, Mark Hounschell wrote:
> 
> Yep, thanks. I'm past that now. But haven't done a bisect [good|bad] on the
> results of that one yet. Did you see Alain's email response to my bisect
> progress report to him?
> 
> I'm still at a loss as to how to proceed?

Ahh, the HPET issue.

That one is actually very interesting information, because we've had 
problems with HPET before. But what I would suggest is to try to continue 
to bisect with HPET enabled (to see the problem), and the commit that you 
couldn't even boot with HPET enabled you should not count as good or bad 
because you just don't know.

You can do "git bisect skip" to make git know that some particular commit 
is not a commit you can test, and you can also move away from a whole 
problematic region to another area by doing

	git bisect visualize

to bring up a graphical gitk view of what all you have left to bisect, 
pick a good point (still _reasonably_ close to the middle) there, and do

	git reset --hard <the-point-you-want-to-test>

and try that kernel instead of the one git bisect suggested.

But this floppy DMA inconsistency being somehow HPET-related is 
interestign in itself. One thing that HPET does si to obviously change how 
we read the time - and what that can cause (totally indirectly) is that 
now we don't touch the southbridge with IO accesses nearly as much, 
because instead of going to the old 8253 PIT will touch the same legacy 
chip support that implements the floppy controller itself.

So it's entirely possible that the reason a non-HPET setup doesn't show 
this is that the accesses to the i8253 PIT part will "synchronize" the old 
floppy controller too, and hide some issue.

But still, I assume you had HPET enabled in 2.6.27, so it would be 
interesting to see exactly when the problem starts. 

		Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-18 15:45                                                                                           ` Linus Torvalds
@ 2009-12-18 20:04                                                                                             ` Mark Hounschell
  2009-12-18 20:15                                                                                               ` Linus Torvalds
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2009-12-18 20:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils

On 12/18/2009 10:45 AM, Linus Torvalds wrote:
> 
> 
> On Fri, 18 Dec 2009, Mark Hounschell wrote:
>>
>> Yep, thanks. I'm past that now. But haven't done a bisect [good|bad] on the
>> results of that one yet. Did you see Alain's email response to my bisect
>> progress report to him?
>>
>> I'm still at a loss as to how to proceed?
> 
> Ahh, the HPET issue.
> 
> That one is actually very interesting information, because we've had 
> problems with HPET before. But what I would suggest is to try to continue 
> to bisect with HPET enabled (to see the problem), and the commit that you 
> couldn't even boot with HPET enabled you should not count as good or bad 
> because you just don't know.
> 
> You can do "git bisect skip" to make git know that some particular commit 
> is not a commit you can test, and you can also move away from a whole 
> problematic region to another area by doing
> 
> 	git bisect visualize
> 
> to bring up a graphical gitk view of what all you have left to bisect, 
> pick a good point (still _reasonably_ close to the middle) there, and do
> 
> 	git reset --hard <the-point-you-want-to-test>
> 
> and try that kernel instead of the one git bisect suggested.
> 
> But this floppy DMA inconsistency being somehow HPET-related is 
> interestign in itself. One thing that HPET does si to obviously change how 
> we read the time - and what that can cause (totally indirectly) is that 
> now we don't touch the southbridge with IO accesses nearly as much, 
> because instead of going to the old 8253 PIT will touch the same legacy 
> chip support that implements the floppy controller itself.
> 
> So it's entirely possible that the reason a non-HPET setup doesn't show 
> this is that the accesses to the i8253 PIT part will "synchronize" the old 
> floppy controller too, and hide some issue.
> 
> But still, I assume you had HPET enabled in 2.6.27, so it would be 
> interesting to see exactly when the problem starts. 
> 
> 		Linus
> 

It looks like I may have to back up and first find the points that, let me,
and stop me,  booting with the HPET enabled. Before I change direction, can
the git-bisect start sequence use the SHA1 id for the starting 'goods' and
'bads'? I don't see reference to that in the doc.

Thanks
Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-18 20:04                                                                                             ` Mark Hounschell
@ 2009-12-18 20:15                                                                                               ` Linus Torvalds
  2009-12-22 15:11                                                                                                 ` Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2009-12-18 20:15 UTC (permalink / raw)
  To: Mark Hounschell; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils



On Fri, 18 Dec 2009, Mark Hounschell wrote:
> 
> It looks like I may have to back up and first find the points that, let me,
> and stop me,  booting with the HPET enabled. Before I change direction, can
> the git-bisect start sequence use the SHA1 id for the starting 'goods' and
> 'bads'? I don't see reference to that in the doc.

You can always use a SHA1 id instead of a tag. So when you did

	git bisect good v2.6.17.4

you could always have replaced that "v2.6.17.4" with the SHA1 of the 
commit.

In git, the SHA1 ID's are the "real" names - the tags and branch names are 
purely for human-readable decoration. Git always turns them into SHA1 id's 
internally.

		Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-18 20:15                                                                                               ` Linus Torvalds
@ 2009-12-22 15:11                                                                                                 ` Mark Hounschell
  2009-12-22 17:38                                                                                                   ` Linus Torvalds
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2009-12-22 15:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mark Hounschell, Alain Knaff, linux-kernel, fdutils

On 12/18/2009 03:15 PM, Linus Torvalds wrote:
> 
> 
> On Fri, 18 Dec 2009, Mark Hounschell wrote:
>>
>> It looks like I may have to back up and first find the points that, let me,
>> and stop me,  booting with the HPET enabled. Before I change direction, can
>> the git-bisect start sequence use the SHA1 id for the starting 'goods' and
>> 'bads'? I don't see reference to that in the doc.
> 
> You can always use a SHA1 id instead of a tag. So when you did
> 
> 	git bisect good v2.6.17.4
> 
> you could always have replaced that "v2.6.17.4" with the SHA1 of the 
> commit.
> 
> In git, the SHA1 ID's are the "real" names - the tags and branch names are 
> purely for human-readable decoration. Git always turns them into SHA1 id's 
> internally.
> 
> 		Linus
> 

Ok, I may have something that might help.

# git bisect bad
26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>
Date:   Fri Sep 5 18:02:18 2008 -0700

    x86: HPET_MSI Initialise per-cpu HPET timers

    Initialize a per CPU HPET MSI timer when possible. We retain the HPET
    timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being
used. We
    setup the remaining HPET timers as per CPU MSI based timers. This per CPU
    timer will eliminate the need for timer broadcasting with IRQ 0 when there
    is non-functional LAPIC timer across CPU deep C-states.

    If there are more CPUs than number of available timers, CPUs that do not
    find any timer to use will continue using LAPIC and IRQ 0 broadcast.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

:040000 040000 b0a11fa0abdc591427e78236a1f25f26b824140e
f2e9b13cf9e2eb7e0fc101660b1e1d499033d78f M      arch


And of coarse this was the first commit that I could not boot if I had hpet
enabled. To get this one to boot (single user mode only) I had to add the
the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c

commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a

@ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev)
 {

        if (request_irq(dev->irq, hpet_interrupt_handler,
-                       IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev))
+                       IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev))
                return -1;

        disable_irq(dev->irq);


AND add the quiet cmdline option.

Also, of all the machines it does work on with hpets enabled, I don't see
the HPET2 in /proc/interupts as below.


cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  0:         82          0          3          0   IO-APIC-edge      timer
  1:          0          0       1712          6   IO-APIC-edge      i8042
  3:          0          0          6          0   IO-APIC-edge
  4:          0          0          6          0   IO-APIC-edge
  6:          0          0          4          0   IO-APIC-edge      floppy
  8:          0          0         60          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          0      37798        179   IO-APIC-edge      i8042
 14:          0          0      16462         71   IO-APIC-edge
pata_atiixp
 15:          0          0       5713         17   IO-APIC-edge
pata_atiixp
 16:          0          0        904          2   IO-APIC-fasteoi
aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel, ni-pci-gpib
 17:          0          0          2          0   IO-APIC-fasteoi
ehci_hcd:usb1, parport0, ni-pci-gpib
 18:          0          0      49940         90   IO-APIC-fasteoi
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, nvidia
 19:          0          0        703          2   IO-APIC-fasteoi
aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
 22:          0          0       1303         15   IO-APIC-fasteoi   ahci


 24:     261763          0          0          0  HPET_MSI-edge      hpet2


 29:          0          0        220          5   PCI-MSI-edge
sky2@pci:0000:04:00.0
NMI:          0          0          0          0   Non-maskable interrupts
LOC:        138     271356     264446     261050   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0   Performance monitoring
interrupts
PND:          0          0          0          0   Performance pending work
RES:       4511       9275       8470       8086   Rescheduling interrupts
CAL:       3624       8666        523       4543   Function call interrupts
TLB:        981       1111       1065       1058   TLB shootdowns
ERR:          0
MIS:          0


Regards
Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-22 15:11                                                                                                 ` Mark Hounschell
@ 2009-12-22 17:38                                                                                                   ` Linus Torvalds
  2009-12-22 17:57                                                                                                     ` Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2009-12-22 17:38 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: Mark Hounschell, Alain Knaff, Linux Kernel Mailing List, fdutils,
	Venkatesh Pallipadi, Shaohua Li, Ingo Molnar


[ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for 
  details, but Mark is basically chasing down a situation where the floppy 
  driver seems to have trouble formatting floppies, and it happened 
  between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a 
  memory block transfers the wrong value for the first byte of the block.

  Which should be impossible, but whatever. Some part of the system has a 
  cached buffer that isn't flushed.

  What gets _you_ guys involved is that Mark cannot reproduce the bug if 
  HPET is disabled in the BIOS or by using 'nohpet'. He found that out by 
  pure luck while bisecting, because some time during his bisect, his 
  machine wouldn't even boot with HPET.

  So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But 
  2.6.28 (and current -git) does not.  Any ideas? ]

On Tue, 22 Dec 2009, Mark Hounschell wrote:
> 
> Ok, I may have something that might help.
> 
> # git bisect bad
> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
> Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>
> Date:   Fri Sep 5 18:02:18 2008 -0700
> 
>     x86: HPET_MSI Initialise per-cpu HPET timers
> 
>     Initialize a per CPU HPET MSI timer when possible. We retain the HPET
>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We
>     setup the remaining HPET timers as per CPU MSI based timers. This per CPU
>     timer will eliminate the need for timer broadcasting with IRQ 0 when there
>     is non-functional LAPIC timer across CPU deep C-states.
> 
>     If there are more CPUs than number of available timers, CPUs that do not
>     find any timer to use will continue using LAPIC and IRQ 0 broadcast.
> 
>     Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>     Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> And of coarse this was the first commit that I could not boot if I had hpet
> enabled. To get this one to boot (single user mode only) I had to add the
> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c
> 
> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
> 
> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev)
>  {
> 
>         if (request_irq(dev->irq, hpet_interrupt_handler,
> -                       IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev))
> +                       IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev))
>                 return -1;
> 
>         disable_irq(dev->irq);
> 
> AND add the quiet cmdline option.

Ok, so we know why HPET didn't boot for you, and that was fixed later (by 
that 5ceb1a04). But is this also when the floppy started mis-behaving?

IOW, _if_ you boot with that fix from commit 5ceb1a04 (and the quiet 
option - I wonder what that is about: do you have any ideas?), is the 
per-CPU HPET timer commit also the commit that causes floppy problems, or 
is this purely a "bisect when HPET became a boot-up problem"?

			Linus

---
> Also, of all the machines it does work on with hpets enabled, I don't see
> the HPET2 in /proc/interupts as below.
> 
> 
> cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   0:         82          0          3          0   IO-APIC-edge      timer
>   1:          0          0       1712          6   IO-APIC-edge      i8042
>   3:          0          0          6          0   IO-APIC-edge
>   4:          0          0          6          0   IO-APIC-edge
>   6:          0          0          4          0   IO-APIC-edge      floppy
>   8:          0          0         60          0   IO-APIC-edge      rtc0
>   9:          0          0          0          0   IO-APIC-fasteoi   acpi
>  12:          0          0      37798        179   IO-APIC-edge      i8042
>  14:          0          0      16462         71   IO-APIC-edge      pata_atiixp
>  15:          0          0       5713         17   IO-APIC-edge      pata_atiixp
>  16:          0          0        904          2   IO-APIC-fasteoi   aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel, ni-pci-gpib
>  17:          0          0          2          0   IO-APIC-fasteoi   ehci_hcd:usb1, parport0, ni-pci-gpib
>  18:          0          0      49940         90   IO-APIC-fasteoi   ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, nvidia
>  19:          0          0        703          2   IO-APIC-fasteoi   aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
>  22:          0          0       1303         15   IO-APIC-fasteoi   ahci
> 
>  24:     261763          0          0          0  HPET_MSI-edge      hpet2
> 
>  29:          0          0        220          5   PCI-MSI-edge      sky2@pci:0000:04:00.0
> NMI:          0          0          0          0   Non-maskable interrupts
> LOC:        138     271356     264446     261050   Local timer interrupts
> SPU:          0          0          0          0   Spurious interrupts
> PMI:          0          0          0          0   Performance monitoring interrupts
> PND:          0          0          0          0   Performance pending work
> RES:       4511       9275       8470       8086   Rescheduling interrupts
> CAL:       3624       8666        523       4543   Function call interrupts
> TLB:        981       1111       1065       1058   TLB shootdowns
> ERR:          0
> MIS:          0

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-22 17:38                                                                                                   ` Linus Torvalds
@ 2009-12-22 17:57                                                                                                     ` Mark Hounschell
  2009-12-22 23:37                                                                                                       ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2009-12-22 17:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mark Hounschell, Alain Knaff, Linux Kernel Mailing List, fdutils,
	Venkatesh Pallipadi, Shaohua Li, Ingo Molnar

On 12/22/2009 12:38 PM, Linus Torvalds wrote:
> 
> [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for 
>   details, but Mark is basically chasing down a situation where the floppy 
>   driver seems to have trouble formatting floppies, and it happened 
>   between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a 
>   memory block transfers the wrong value for the first byte of the block.
> 
>   Which should be impossible, but whatever. Some part of the system has a 
>   cached buffer that isn't flushed.
> 
>   What gets _you_ guys involved is that Mark cannot reproduce the bug if 
>   HPET is disabled in the BIOS or by using 'nohpet'. He found that out by 
>   pure luck while bisecting, because some time during his bisect, his 
>   machine wouldn't even boot with HPET.
> 
>   So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But 
>   2.6.28 (and current -git) does not.  Any ideas? ]
> 
> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>
>> Ok, I may have something that might help.
>>
>> # git bisect bad
>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>> Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>
>> Date:   Fri Sep 5 18:02:18 2008 -0700
>>
>>     x86: HPET_MSI Initialise per-cpu HPET timers
>>
>>     Initialize a per CPU HPET MSI timer when possible. We retain the HPET
>>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We
>>     setup the remaining HPET timers as per CPU MSI based timers. This per CPU
>>     timer will eliminate the need for timer broadcasting with IRQ 0 when there
>>     is non-functional LAPIC timer across CPU deep C-states.
>>
>>     If there are more CPUs than number of available timers, CPUs that do not
>>     find any timer to use will continue using LAPIC and IRQ 0 broadcast.
>>
>>     Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>>     Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>
>> And of coarse this was the first commit that I could not boot if I had hpet
>> enabled. To get this one to boot (single user mode only) I had to add the
>> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c
>>
>> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>
>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev)
>>  {
>>
>>         if (request_irq(dev->irq, hpet_interrupt_handler,
>> -                       IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev))
>> +                       IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev))
>>                 return -1;
>>
>>         disable_irq(dev->irq);
>>
>> AND add the quiet cmdline option.
> 
> Ok, so we know why HPET didn't boot for you, and that was fixed later (by 
> that 5ceb1a04). But is this also when the floppy started mis-behaving?
> 

Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops
working
and also when I could no longer boot with hpet enabled. Commit 5ceb1a04 is
where I found I could boot again with the hpet enabled. It was a simple
patch so backed it into where I was
in order to be able to boot with hpet on.  I did 2 different bisects. First
to find out when I could boot again with hpet on, then the next to find
which caused the floppy problem. Using the patch from the first bisect
(5ceb1a04) while doing the second bisect.

> IOW, _if_ you boot with that fix from commit 5ceb1a04 (and the quiet 
> option - I wonder what that is about: do you have any ideas?), is the 
> per-CPU HPET timer commit also the commit that causes floppy problems, or 
> is this purely a "bisect when HPET became a boot-up problem"?
> 

The quiet option was only needed because with that 5ceb1a04 commit applied
to the kernels I was interested in, kernel messages of some kind went on
for hours and I could not get a login prompt. They went by so fast and I
didn't have a serial console available to see them.
They must not have too important or critical because the machine acted as
normal as any machine in single user mode.

But once I got to a single user login prompt it was for sure the same
floppy problem.

> 
> ---
>> Also, of all the machines it does work on with hpets enabled, I don't see
>> the HPET2 in /proc/interupts as below.
>>
>>
>> cat /proc/interrupts
>>            CPU0       CPU1       CPU2       CPU3
>>   0:         82          0          3          0   IO-APIC-edge      timer
>>   1:          0          0       1712          6   IO-APIC-edge      i8042
>>   3:          0          0          6          0   IO-APIC-edge
>>   4:          0          0          6          0   IO-APIC-edge
>>   6:          0          0          4          0   IO-APIC-edge      floppy
>>   8:          0          0         60          0   IO-APIC-edge      rtc0
>>   9:          0          0          0          0   IO-APIC-fasteoi   acpi
>>  12:          0          0      37798        179   IO-APIC-edge      i8042
>>  14:          0          0      16462         71   IO-APIC-edge      pata_atiixp
>>  15:          0          0       5713         17   IO-APIC-edge      pata_atiixp
>>  16:          0          0        904          2   IO-APIC-fasteoi   aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel, ni-pci-gpib
>>  17:          0          0          2          0   IO-APIC-fasteoi   ehci_hcd:usb1, parport0, ni-pci-gpib
>>  18:          0          0      49940         90   IO-APIC-fasteoi   ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, nvidia
>>  19:          0          0        703          2   IO-APIC-fasteoi   aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
>>  22:          0          0       1303         15   IO-APIC-fasteoi   ahci
>>
>>  24:     261763          0          0          0  HPET_MSI-edge      hpet2
>>
>>  29:          0          0        220          5   PCI-MSI-edge      sky2@pci:0000:04:00.0
>> NMI:          0          0          0          0   Non-maskable interrupts
>> LOC:        138     271356     264446     261050   Local timer interrupts
>> SPU:          0          0          0          0   Spurious interrupts
>> PMI:          0          0          0          0   Performance monitoring interrupts
>> PND:          0          0          0          0   Performance pending work
>> RES:       4511       9275       8470       8086   Rescheduling interrupts
>> CAL:       3624       8666        523       4543   Function call interrupts
>> TLB:        981       1111       1065       1058   TLB shootdowns
>> ERR:          0
>> MIS:          0
> 


Regards
Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-22 17:57                                                                                                     ` Mark Hounschell
@ 2009-12-22 23:37                                                                                                       ` Pallipadi, Venkatesh
  2009-12-23  0:22                                                                                                         ` Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Pallipadi, Venkatesh @ 2009-12-22 23:37 UTC (permalink / raw)
  To: markh@compro.net
  Cc: Linus Torvalds, Mark Hounschell, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote:
> On 12/22/2009 12:38 PM, Linus Torvalds wrote:
> > 
> > [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for 
> >   details, but Mark is basically chasing down a situation where the floppy 
> >   driver seems to have trouble formatting floppies, and it happened 
> >   between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a 
> >   memory block transfers the wrong value for the first byte of the block.
> > 
> >   Which should be impossible, but whatever. Some part of the system has a 
> >   cached buffer that isn't flushed.
> > 
> >   What gets _you_ guys involved is that Mark cannot reproduce the bug if 
> >   HPET is disabled in the BIOS or by using 'nohpet'. He found that out by 
> >   pure luck while bisecting, because some time during his bisect, his 
> >   machine wouldn't even boot with HPET.
> > 
> >   So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But 
> >   2.6.28 (and current -git) does not.  Any ideas? ]
> > 
> > On Tue, 22 Dec 2009, Mark Hounschell wrote:
> >>
> >> Ok, I may have something that might help.
> >>
> >> # git bisect bad
> >> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
> >> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
> >> Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>
> >> Date:   Fri Sep 5 18:02:18 2008 -0700
> >>
> >>     x86: HPET_MSI Initialise per-cpu HPET timers
> >>
> >>     Initialize a per CPU HPET MSI timer when possible. We retain the HPET
> >>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We
> >>     setup the remaining HPET timers as per CPU MSI based timers. This per CPU
> >>     timer will eliminate the need for timer broadcasting with IRQ 0 when there
> >>     is non-functional LAPIC timer across CPU deep C-states.
> >>
> >>     If there are more CPUs than number of available timers, CPUs that do not
> >>     find any timer to use will continue using LAPIC and IRQ 0 broadcast.
> >>
> >>     Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> >>     Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> >>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> >>
> >> And of coarse this was the first commit that I could not boot if I had hpet
> >> enabled. To get this one to boot (single user mode only) I had to add the
> >> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c
> >>
> >> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
> >>
> >> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev)
> >>  {
> >>
> >>         if (request_irq(dev->irq, hpet_interrupt_handler,
> >> -                       IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev))
> >> +                       IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev))
> >>                 return -1;
> >>
> >>         disable_irq(dev->irq);
> >>
> >> AND add the quiet cmdline option.
> > 
> > Ok, so we know why HPET didn't boot for you, and that was fixed later (by 
> > that 5ceb1a04). But is this also when the floppy started mis-behaving?
> > 
> 
> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops
> working
> and also when I could no longer boot with hpet enabled.


I am missing something here. Commit 26afe5f2 is where system does not
boot with HPET or is it where the floppy stops working when you boot
with HPET enabled.

Can you try "idle=halt" with both .27 and .28 with /proc/interrupts
output in each case. With that option, we should be using local APIC
timer and PIT, HPET or HPET with MSI should not really matter. Does it
still fail with .28 with that option?

Thanks,
Venki


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-22 23:37                                                                                                       ` Pallipadi, Venkatesh
@ 2009-12-23  0:22                                                                                                         ` Mark Hounschell
  2009-12-23 13:02                                                                                                           ` Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2009-12-23  0:22 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: markh@compro.net, Linus Torvalds, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar, Alain Knaff

On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote:
> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote:
>> On 12/22/2009 12:38 PM, Linus Torvalds wrote:
>>>
>>> [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for 
>>>   details, but Mark is basically chasing down a situation where the floppy 
>>>   driver seems to have trouble formatting floppies, and it happened 
>>>   between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a 
>>>   memory block transfers the wrong value for the first byte of the block.
>>>
>>>   Which should be impossible, but whatever. Some part of the system has a 
>>>   cached buffer that isn't flushed.
>>>
>>>   What gets _you_ guys involved is that Mark cannot reproduce the bug if 
>>>   HPET is disabled in the BIOS or by using 'nohpet'. He found that out by 
>>>   pure luck while bisecting, because some time during his bisect, his 
>>>   machine wouldn't even boot with HPET.
>>>
>>>   So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But 
>>>   2.6.28 (and current -git) does not.  Any ideas? ]
>>>
>>> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>>>
>>>> Ok, I may have something that might help.
>>>>
>>>> # git bisect bad
>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>>>> Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>
>>>> Date:   Fri Sep 5 18:02:18 2008 -0700
>>>>
>>>>     x86: HPET_MSI Initialise per-cpu HPET timers
>>>>
>>>>     Initialize a per CPU HPET MSI timer when possible. We retain the HPET
>>>>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We
>>>>     setup the remaining HPET timers as per CPU MSI based timers. This per CPU
>>>>     timer will eliminate the need for timer broadcasting with IRQ 0 when there
>>>>     is non-functional LAPIC timer across CPU deep C-states.
>>>>
>>>>     If there are more CPUs than number of available timers, CPUs that do not
>>>>     find any timer to use will continue using LAPIC and IRQ 0 broadcast.
>>>>
>>>>     Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>>>>     Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>>>>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>>>
>>>> And of coarse this was the first commit that I could not boot if I had hpet
>>>> enabled. To get this one to boot (single user mode only) I had to add the
>>>> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c
>>>>
>>>> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>>>
>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev)
>>>>  {
>>>>
>>>>         if (request_irq(dev->irq, hpet_interrupt_handler,
>>>> -                       IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev))
>>>> +                       IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev))
>>>>                 return -1;
>>>>
>>>>         disable_irq(dev->irq);
>>>>
>>>> AND add the quiet cmdline option.
>>>
>>> Ok, so we know why HPET didn't boot for you, and that was fixed later (by 
>>> that 5ceb1a04). But is this also when the floppy started mis-behaving?
>>>
>>
>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops
>> working
>> and also when I could no longer boot with hpet enabled.
> 
> 
> I am missing something here. Commit 26afe5f2 is where system does not
> boot with HPET or is it where the floppy stops working when you boot
> with HPET enabled.
> 

As it happens, both happen there. Commit 5ceb1a04 is where it starts
booting _again_ with hpet enabled. So I took that patch (5ceb1a04) and
applied it to (26afe5f2f) to be able to boot with hpet enabled.  I had to
use the quiet option to get to a login prompt, but there is where the
floppy format first fails, just as it does in 2.6.28 and up.

> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts
> output in each case. With that option, we should be using local APIC
> timer and PIT, HPET or HPET with MSI should not really matter. Does it
> still fail with .28 with that option?
> 

Yes, I will try that for you but will have to wait until the morning. Sorry.

Regards
Mark



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-23  0:22                                                                                                         ` Mark Hounschell
@ 2009-12-23 13:02                                                                                                           ` Mark Hounschell
  2009-12-23 15:10                                                                                                             ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2009-12-23 13:02 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: dmarkh, Linus Torvalds, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar

On 12/22/2009 07:22 PM, Mark Hounschell wrote:
> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote:
>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote:
>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote:
>>>>
>>>> [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for 
>>>>   details, but Mark is basically chasing down a situation where the floppy 
>>>>   driver seems to have trouble formatting floppies, and it happened 
>>>>   between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a 
>>>>   memory block transfers the wrong value for the first byte of the block.
>>>>
>>>>   Which should be impossible, but whatever. Some part of the system has a 
>>>>   cached buffer that isn't flushed.
>>>>
>>>>   What gets _you_ guys involved is that Mark cannot reproduce the bug if 
>>>>   HPET is disabled in the BIOS or by using 'nohpet'. He found that out by 
>>>>   pure luck while bisecting, because some time during his bisect, his 
>>>>   machine wouldn't even boot with HPET.
>>>>
>>>>   So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But 
>>>>   2.6.28 (and current -git) does not.  Any ideas? ]
>>>>
>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>>>>
>>>>> Ok, I may have something that might help.
>>>>>
>>>>> # git bisect bad
>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>>>>> Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>
>>>>> Date:   Fri Sep 5 18:02:18 2008 -0700
>>>>>
>>>>>     x86: HPET_MSI Initialise per-cpu HPET timers
>>>>>
>>>>>     Initialize a per CPU HPET MSI timer when possible. We retain the HPET
>>>>>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We
>>>>>     setup the remaining HPET timers as per CPU MSI based timers. This per CPU
>>>>>     timer will eliminate the need for timer broadcasting with IRQ 0 when there
>>>>>     is non-functional LAPIC timer across CPU deep C-states.
>>>>>
>>>>>     If there are more CPUs than number of available timers, CPUs that do not
>>>>>     find any timer to use will continue using LAPIC and IRQ 0 broadcast.
>>>>>
>>>>>     Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>>>>>     Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>>>>>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>>>>
>>>>> And of coarse this was the first commit that I could not boot if I had hpet
>>>>> enabled. To get this one to boot (single user mode only) I had to add the
>>>>> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c
>>>>>
>>>>> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>>>>
>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev)
>>>>>  {
>>>>>
>>>>>         if (request_irq(dev->irq, hpet_interrupt_handler,
>>>>> -                       IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev))
>>>>> +                       IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev))
>>>>>                 return -1;
>>>>>
>>>>>         disable_irq(dev->irq);
>>>>>
>>>>> AND add the quiet cmdline option.
>>>>
>>>> Ok, so we know why HPET didn't boot for you, and that was fixed later (by 
>>>> that 5ceb1a04). But is this also when the floppy started mis-behaving?
>>>>
>>>
>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops
>>> working
>>> and also when I could no longer boot with hpet enabled.
>>
>>
>> I am missing something here. Commit 26afe5f2 is where system does not
>> boot with HPET or is it where the floppy stops working when you boot
>> with HPET enabled.
>>
> 
> As it happens, both happen there. Commit 5ceb1a04 is where it starts
> booting _again_ with hpet enabled. So I took that patch (5ceb1a04) and
> applied it to (26afe5f2f) to be able to boot with hpet enabled.  I had to
> use the quiet option to get to a login prompt, but there is where the
> floppy format first fails, just as it does in 2.6.28 and up.
> 
>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts
>> output in each case. With that option, we should be using local APIC
>> timer and PIT, HPET or HPET with MSI should not really matter. Does it
>> still fail with .28 with that option?
>>

2.6.28 still fails with that option.

2.6.27.41 /proc/interrupts with idle=halt

           CPU0       CPU1       CPU2       CPU3
  0:        126          0          0          1   IO-APIC-edge      timer
  1:          0          0          1        157   IO-APIC-edge      i8042
  3:          0          0          0          6   IO-APIC-edge
  4:          0          0          0          6   IO-APIC-edge
  6:          0          0          0          4   IO-APIC-edge      floppy
  8:          0          0          0          1   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          0          1        128   IO-APIC-edge      i8042
 14:          0          0         34       4457   IO-APIC-edge
pata_atiixp
 15:          0          0          4        480   IO-APIC-edge
pata_atiixp
 16:          0          0          0        397   IO-APIC-fasteoi
aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
 17:          0          0          0          2   IO-APIC-fasteoi
ehci_hcd:usb1
 18:          0          0          0          0   IO-APIC-fasteoi
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
 19:          0          0          0        142   IO-APIC-fasteoi
aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
 22:          0          0          4       1154   IO-APIC-fasteoi   ahci
219:          0          0          3         63   PCI-MSI-edge      eth0
NMI:          0          0          0          0   Non-maskable interrupts
LOC:      91539      91964      92525      91181   Local timer interrupts
RES:       2888       3873       2434       2721   Rescheduling interrupts
CAL:        240        245        247         84   function call interrupts
TLB:        768        628        526        512   TLB shootdowns
SPU:          0          0          0          0   Spurious interrupts
ERR:          0
MIS:          0

2.6.28 /proc/interrupts with idle=halt

           CPU0       CPU1       CPU2       CPU3
  0:        126          0          2          0   IO-APIC-edge      timer
  1:          0          0        192          0   IO-APIC-edge      i8042
  3:          0          0          6          0   IO-APIC-edge
  4:          0          0          6          0   IO-APIC-edge
  6:          0          0          4          0   IO-APIC-edge      floppy
  8:          0          0          1          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          0        128          1   IO-APIC-edge      i8042
 14:          0          1     147114        396   IO-APIC-edge
pata_atiixp
 15:          0          0        646          2   IO-APIC-edge
pata_atiixp
 16:          0          0        396          0   IO-APIC-fasteoi
aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel
 17:          0          0          0          0   IO-APIC-fasteoi
ehci_hcd:usb1
 18:          0          0          0          0   IO-APIC-fasteoi
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
 19:          0          0        362          1   IO-APIC-fasteoi
aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
 22:          0          0        874          1   IO-APIC-fasteoi   ahci
1274:          0          0        193          4   PCI-MSI-edge      eth0
1279:     513207          0          0          0  HPET_MSI-edge      hpet2
NMI:          0          0          0          0   Non-maskable interrupts
LOC:        268     513395     513138     522088   Local timer interrupts
RES:       3262       3679       2573       3746   Rescheduling interrupts
CAL:        131        166         57        147   Function call interrupts
TLB:        680        438        450        639   TLB shootdowns
SPU:          0          0          0          0   Spurious interrupts
ERR:          0
MIS:          0


Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-23 13:02                                                                                                           ` Mark Hounschell
@ 2009-12-23 15:10                                                                                                             ` Pallipadi, Venkatesh
  2009-12-23 15:34                                                                                                               ` Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Pallipadi, Venkatesh @ 2009-12-23 15:10 UTC (permalink / raw)
  To: markh@compro.net
  Cc: dmarkh@cfl.rr.com, Linus Torvalds, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

 

>-----Original Message-----
>From: Mark Hounschell [mailto:markh@compro.net] 
>Sent: Wednesday, December 23, 2009 5:03 AM
>To: Pallipadi, Venkatesh
>Cc: dmarkh@cfl.rr.com; Linus Torvalds; Alain Knaff; Linux 
>Kernel Mailing List; fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar
>Subject: Re: [Fdutils] DMA cache consistency bug introduced in 
>2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
>
>On 12/22/2009 07:22 PM, Mark Hounschell wrote:
>> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote:
>>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote:
>>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote:
>>>>>
>>>>> [ Ingo, Venki and Shaohua added to cc: see the whole 
>thread on lkml for 
>>>>>   details, but Mark is basically chasing down a situation 
>where the floppy 
>>>>>   driver seems to have trouble formatting floppies, and 
>it happened 
>>>>>   between 2.6.27 and .28. The trouble seems to be that a 
>DMA transfer of a 
>>>>>   memory block transfers the wrong value for the first 
>byte of the block.
>>>>>
>>>>>   Which should be impossible, but whatever. Some part of 
>the system has a 
>>>>>   cached buffer that isn't flushed.
>>>>>
>>>>>   What gets _you_ guys involved is that Mark cannot 
>reproduce the bug if 
>>>>>   HPET is disabled in the BIOS or by using 'nohpet'. He 
>found that out by 
>>>>>   pure luck while bisecting, because some time during his 
>bisect, his 
>>>>>   machine wouldn't even boot with HPET.
>>>>>
>>>>>   So the problem is: with HPET enabled, 2.6.27.4 _used_ 
>to work. But 
>>>>>   2.6.28 (and current -git) does not.  Any ideas? ]
>>>>>
>>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>>>>>
>>>>>> Ok, I may have something that might help.
>>>>>>
>>>>>> # git bisect bad
>>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>>>>>> Author: venkatesh.pallipadi@intel.com 
><venkatesh.pallipadi@intel.com>
>>>>>> Date:   Fri Sep 5 18:02:18 2008 -0700
>>>>>>
>>>>>>     x86: HPET_MSI Initialise per-cpu HPET timers
>>>>>>
>>>>>>     Initialize a per CPU HPET MSI timer when possible. 
>We retain the HPET
>>>>>>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when 
>legacy mode is being used. We
>>>>>>     setup the remaining HPET timers as per CPU MSI based 
>timers. This per CPU
>>>>>>     timer will eliminate the need for timer broadcasting 
>with IRQ 0 when there
>>>>>>     is non-functional LAPIC timer across CPU deep C-states.
>>>>>>
>>>>>>     If there are more CPUs than number of available 
>timers, CPUs that do not
>>>>>>     find any timer to use will continue using LAPIC and 
>IRQ 0 broadcast.
>>>>>>
>>>>>>     Signed-off-by: Venkatesh Pallipadi 
><venkatesh.pallipadi@intel.com>
>>>>>>     Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>>>>>>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>>>>>
>>>>>> And of coarse this was the first commit that I could not 
>boot if I had hpet
>>>>>> enabled. To get this one to boot (single user mode only) 
>I had to add the
>>>>>> the quiet cmdline option and following patch from to 
>arch/x86/kernel/hpet.c
>>>>>>
>>>>>> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>>>>>
>>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct 
>hpet_dev *dev)
>>>>>>  {
>>>>>>
>>>>>>         if (request_irq(dev->irq, hpet_interrupt_handler,
>>>>>> -                       IRQF_SHARED|IRQF_NOBALANCING, 
>dev->name, dev))
>>>>>> +                       IRQF_DISABLED|IRQF_NOBALANCING, 
>dev->name, dev))
>>>>>>                 return -1;
>>>>>>
>>>>>>         disable_irq(dev->irq);
>>>>>>
>>>>>> AND add the quiet cmdline option.
>>>>>
>>>>> Ok, so we know why HPET didn't boot for you, and that was 
>fixed later (by 
>>>>> that 5ceb1a04). But is this also when the floppy started 
>mis-behaving?
>>>>>
>>>>
>>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when 
>the floppy stops
>>>> working
>>>> and also when I could no longer boot with hpet enabled.
>>>
>>>
>>> I am missing something here. Commit 26afe5f2 is where 
>system does not
>>> boot with HPET or is it where the floppy stops working when you boot
>>> with HPET enabled.
>>>
>> 
>> As it happens, both happen there. Commit 5ceb1a04 is where it starts
>> booting _again_ with hpet enabled. So I took that patch 
>(5ceb1a04) and
>> applied it to (26afe5f2f) to be able to boot with hpet 
>enabled.  I had to
>> use the quiet option to get to a login prompt, but there is where the
>> floppy format first fails, just as it does in 2.6.28 and up.
>> 
>>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts
>>> output in each case. With that option, we should be using local APIC
>>> timer and PIT, HPET or HPET with MSI should not really 
>matter. Does it
>>> still fail with .28 with that option?
>>>
>
>2.6.28 still fails with that option.
>
>2.6.27.41 /proc/interrupts with idle=halt
>
>           CPU0       CPU1       CPU2       CPU3
>  0:        126          0          0          1   
>IO-APIC-edge      timer
>  1:          0          0          1        157   
>IO-APIC-edge      i8042
>  3:          0          0          0          6   IO-APIC-edge
>  4:          0          0          0          6   IO-APIC-edge
>  6:          0          0          0          4   
>IO-APIC-edge      floppy
>  8:          0          0          0          1   
>IO-APIC-edge      rtc0
>  9:          0          0          0          0   
>IO-APIC-fasteoi   acpi
> 12:          0          0          1        128   
>IO-APIC-edge      i8042
> 14:          0          0         34       4457   IO-APIC-edge
>pata_atiixp
> 15:          0          0          4        480   IO-APIC-edge
>pata_atiixp
> 16:          0          0          0        397   IO-APIC-fasteoi
>aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
> 17:          0          0          0          2   IO-APIC-fasteoi
>ehci_hcd:usb1
> 18:          0          0          0          0   IO-APIC-fasteoi
>ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
> 19:          0          0          0        142   IO-APIC-fasteoi
>aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
> 22:          0          0          4       1154   
>IO-APIC-fasteoi   ahci
>219:          0          0          3         63   
>PCI-MSI-edge      eth0
>NMI:          0          0          0          0   
>Non-maskable interrupts
>LOC:      91539      91964      92525      91181   Local timer 
>interrupts
>RES:       2888       3873       2434       2721   
>Rescheduling interrupts
>CAL:        240        245        247         84   function 
>call interrupts
>TLB:        768        628        526        512   TLB shootdowns
>SPU:          0          0          0          0   Spurious interrupts
>ERR:          0
>MIS:          0
>
>2.6.28 /proc/interrupts with idle=halt
>
>           CPU0       CPU1       CPU2       CPU3
>  0:        126          0          2          0   
>IO-APIC-edge      timer
>  1:          0          0        192          0   
>IO-APIC-edge      i8042
>  3:          0          0          6          0   IO-APIC-edge
>  4:          0          0          6          0   IO-APIC-edge
>  6:          0          0          4          0   
>IO-APIC-edge      floppy
>  8:          0          0          1          0   
>IO-APIC-edge      rtc0
>  9:          0          0          0          0   
>IO-APIC-fasteoi   acpi
> 12:          0          0        128          1   
>IO-APIC-edge      i8042
> 14:          0          1     147114        396   IO-APIC-edge
>pata_atiixp
> 15:          0          0        646          2   IO-APIC-edge
>pata_atiixp
> 16:          0          0        396          0   IO-APIC-fasteoi
>aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel
> 17:          0          0          0          0   IO-APIC-fasteoi
>ehci_hcd:usb1
> 18:          0          0          0          0   IO-APIC-fasteoi
>ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
> 19:          0          0        362          1   IO-APIC-fasteoi
>aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
> 22:          0          0        874          1   
>IO-APIC-fasteoi   ahci
>1274:          0          0        193          4   
>PCI-MSI-edge      eth0
>1279:     513207          0          0          0  
>HPET_MSI-edge      hpet2
>NMI:          0          0          0          0   
>Non-maskable interrupts
>LOC:        268     513395     513138     522088   Local timer 
>interrupts
>RES:       3262       3679       2573       3746   
>Rescheduling interrupts
>CAL:        131        166         57        147   Function 
>call interrupts
>TLB:        680        438        450        639   TLB shootdowns
>SPU:          0          0          0          0   Spurious interrupts
>ERR:          0
>MIS:          0
>

Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case.

I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28
/proc/timer_list
grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

Thanks,
Venki


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-23 15:10                                                                                                             ` Pallipadi, Venkatesh
@ 2009-12-23 15:34                                                                                                               ` Mark Hounschell
  2009-12-23 15:57                                                                                                                 ` Mark Hounschell
  2009-12-23 16:31                                                                                                                 ` Linus Torvalds
  0 siblings, 2 replies; 74+ messages in thread
From: Mark Hounschell @ 2009-12-23 15:34 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: dmarkh@cfl.rr.com, Linus Torvalds, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 9847 bytes --]

On 12/23/2009 10:10 AM, Pallipadi, Venkatesh wrote:
>  
> 
>> -----Original Message-----
>> From: Mark Hounschell [mailto:markh@compro.net] 
>> Sent: Wednesday, December 23, 2009 5:03 AM
>> To: Pallipadi, Venkatesh
>> Cc: dmarkh@cfl.rr.com; Linus Torvalds; Alain Knaff; Linux 
>> Kernel Mailing List; fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar
>> Subject: Re: [Fdutils] DMA cache consistency bug introduced in 
>> 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
>>
>> On 12/22/2009 07:22 PM, Mark Hounschell wrote:
>>> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote:
>>>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote:
>>>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote:
>>>>>>
>>>>>> [ Ingo, Venki and Shaohua added to cc: see the whole 
>> thread on lkml for 
>>>>>>   details, but Mark is basically chasing down a situation 
>> where the floppy 
>>>>>>   driver seems to have trouble formatting floppies, and 
>> it happened 
>>>>>>   between 2.6.27 and .28. The trouble seems to be that a 
>> DMA transfer of a 
>>>>>>   memory block transfers the wrong value for the first 
>> byte of the block.
>>>>>>
>>>>>>   Which should be impossible, but whatever. Some part of 
>> the system has a 
>>>>>>   cached buffer that isn't flushed.
>>>>>>
>>>>>>   What gets _you_ guys involved is that Mark cannot 
>> reproduce the bug if 
>>>>>>   HPET is disabled in the BIOS or by using 'nohpet'. He 
>> found that out by 
>>>>>>   pure luck while bisecting, because some time during his 
>> bisect, his 
>>>>>>   machine wouldn't even boot with HPET.
>>>>>>
>>>>>>   So the problem is: with HPET enabled, 2.6.27.4 _used_ 
>> to work. But 
>>>>>>   2.6.28 (and current -git) does not.  Any ideas? ]
>>>>>>
>>>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>>>>>>
>>>>>>> Ok, I may have something that might help.
>>>>>>>
>>>>>>> # git bisect bad
>>>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>>>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>>>>>>> Author: venkatesh.pallipadi@intel.com 
>> <venkatesh.pallipadi@intel.com>
>>>>>>> Date:   Fri Sep 5 18:02:18 2008 -0700
>>>>>>>
>>>>>>>     x86: HPET_MSI Initialise per-cpu HPET timers
>>>>>>>
>>>>>>>     Initialize a per CPU HPET MSI timer when possible. 
>> We retain the HPET
>>>>>>>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when 
>> legacy mode is being used. We
>>>>>>>     setup the remaining HPET timers as per CPU MSI based 
>> timers. This per CPU
>>>>>>>     timer will eliminate the need for timer broadcasting 
>> with IRQ 0 when there
>>>>>>>     is non-functional LAPIC timer across CPU deep C-states.
>>>>>>>
>>>>>>>     If there are more CPUs than number of available 
>> timers, CPUs that do not
>>>>>>>     find any timer to use will continue using LAPIC and 
>> IRQ 0 broadcast.
>>>>>>>
>>>>>>>     Signed-off-by: Venkatesh Pallipadi 
>> <venkatesh.pallipadi@intel.com>
>>>>>>>     Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>>>>>>>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>>>>>>
>>>>>>> And of coarse this was the first commit that I could not 
>> boot if I had hpet
>>>>>>> enabled. To get this one to boot (single user mode only) 
>> I had to add the
>>>>>>> the quiet cmdline option and following patch from to 
>> arch/x86/kernel/hpet.c
>>>>>>>
>>>>>>> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>>>>>>
>>>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct 
>> hpet_dev *dev)
>>>>>>>  {
>>>>>>>
>>>>>>>         if (request_irq(dev->irq, hpet_interrupt_handler,
>>>>>>> -                       IRQF_SHARED|IRQF_NOBALANCING, 
>> dev->name, dev))
>>>>>>> +                       IRQF_DISABLED|IRQF_NOBALANCING, 
>> dev->name, dev))
>>>>>>>                 return -1;
>>>>>>>
>>>>>>>         disable_irq(dev->irq);
>>>>>>>
>>>>>>> AND add the quiet cmdline option.
>>>>>>
>>>>>> Ok, so we know why HPET didn't boot for you, and that was 
>> fixed later (by 
>>>>>> that 5ceb1a04). But is this also when the floppy started 
>> mis-behaving?
>>>>>>
>>>>>
>>>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when 
>> the floppy stops
>>>>> working
>>>>> and also when I could no longer boot with hpet enabled.
>>>>
>>>>
>>>> I am missing something here. Commit 26afe5f2 is where 
>> system does not
>>>> boot with HPET or is it where the floppy stops working when you boot
>>>> with HPET enabled.
>>>>
>>>
>>> As it happens, both happen there. Commit 5ceb1a04 is where it starts
>>> booting _again_ with hpet enabled. So I took that patch 
>> (5ceb1a04) and
>>> applied it to (26afe5f2f) to be able to boot with hpet 
>> enabled.  I had to
>>> use the quiet option to get to a login prompt, but there is where the
>>> floppy format first fails, just as it does in 2.6.28 and up.
>>>
>>>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts
>>>> output in each case. With that option, we should be using local APIC
>>>> timer and PIT, HPET or HPET with MSI should not really 
>> matter. Does it
>>>> still fail with .28 with that option?
>>>>
>>
>> 2.6.28 still fails with that option.
>>
>> 2.6.27.41 /proc/interrupts with idle=halt
>>
>>           CPU0       CPU1       CPU2       CPU3
>>  0:        126          0          0          1   
>> IO-APIC-edge      timer
>>  1:          0          0          1        157   
>> IO-APIC-edge      i8042
>>  3:          0          0          0          6   IO-APIC-edge
>>  4:          0          0          0          6   IO-APIC-edge
>>  6:          0          0          0          4   
>> IO-APIC-edge      floppy
>>  8:          0          0          0          1   
>> IO-APIC-edge      rtc0
>>  9:          0          0          0          0   
>> IO-APIC-fasteoi   acpi
>> 12:          0          0          1        128   
>> IO-APIC-edge      i8042
>> 14:          0          0         34       4457   IO-APIC-edge
>> pata_atiixp
>> 15:          0          0          4        480   IO-APIC-edge
>> pata_atiixp
>> 16:          0          0          0        397   IO-APIC-fasteoi
>> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
>> 17:          0          0          0          2   IO-APIC-fasteoi
>> ehci_hcd:usb1
>> 18:          0          0          0          0   IO-APIC-fasteoi
>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
>> 19:          0          0          0        142   IO-APIC-fasteoi
>> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
>> 22:          0          0          4       1154   
>> IO-APIC-fasteoi   ahci
>> 219:          0          0          3         63   
>> PCI-MSI-edge      eth0
>> NMI:          0          0          0          0   
>> Non-maskable interrupts
>> LOC:      91539      91964      92525      91181   Local timer 
>> interrupts
>> RES:       2888       3873       2434       2721   
>> Rescheduling interrupts
>> CAL:        240        245        247         84   function 
>> call interrupts
>> TLB:        768        628        526        512   TLB shootdowns
>> SPU:          0          0          0          0   Spurious interrupts
>> ERR:          0
>> MIS:          0
>>
>> 2.6.28 /proc/interrupts with idle=halt
>>
>>           CPU0       CPU1       CPU2       CPU3
>>  0:        126          0          2          0   
>> IO-APIC-edge      timer
>>  1:          0          0        192          0   
>> IO-APIC-edge      i8042
>>  3:          0          0          6          0   IO-APIC-edge
>>  4:          0          0          6          0   IO-APIC-edge
>>  6:          0          0          4          0   
>> IO-APIC-edge      floppy
>>  8:          0          0          1          0   
>> IO-APIC-edge      rtc0
>>  9:          0          0          0          0   
>> IO-APIC-fasteoi   acpi
>> 12:          0          0        128          1   
>> IO-APIC-edge      i8042
>> 14:          0          1     147114        396   IO-APIC-edge
>> pata_atiixp
>> 15:          0          0        646          2   IO-APIC-edge
>> pata_atiixp
>> 16:          0          0        396          0   IO-APIC-fasteoi
>> aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel
>> 17:          0          0          0          0   IO-APIC-fasteoi
>> ehci_hcd:usb1
>> 18:          0          0          0          0   IO-APIC-fasteoi
>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
>> 19:          0          0        362          1   IO-APIC-fasteoi
>> aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
>> 22:          0          0        874          1   
>> IO-APIC-fasteoi   ahci
>> 1274:          0          0        193          4   
>> PCI-MSI-edge      eth0
>> 1279:     513207          0          0          0  
>> HPET_MSI-edge      hpet2
>> NMI:          0          0          0          0   
>> Non-maskable interrupts
>> LOC:        268     513395     513138     522088   Local timer 
>> interrupts
>> RES:       3262       3679       2573       3746   
>> Rescheduling interrupts
>> CAL:        131        166         57        147   Function 
>> call interrupts
>> TLB:        680        438        450        639   TLB shootdowns
>> SPU:          0          0          0          0   Spurious interrupts
>> ERR:          0
>> MIS:          0
>>
> 
> Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case.
> 
> I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28
> /proc/timer_list

Attached.

> grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

I have no /sys/devices/system/cpu/cpu0/cpuidle on this machine.
Maybe because of

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
# CONFIG_CPU_IDLE is not set

Would it be OK if when you ask for 2.6.28 info, I use a 2.6.32.2 kernel?
That kernel also fails fdformat with hpet enabled on these machines.

Thanks
Mark

[-- Attachment #2: timer_list.txt --]
[-- Type: text/plain, Size: 7901 bytes --]

Timer List Version: v0.4
HRTIMER_MAX_CLOCK_BASES: 2
now at 123990857169 nsecs

cpu: 0
 clock 0:
  .base:       c2a13320
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1261581506376548727 nsecs
active timers:
 clock 1:
  .base:       c2a1334c
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <c2a133a4>, tick_sched_timer, S:01
 # expires at 123991000000-123991000000 nsecs [in 142831 to 142831 nsecs]
 #1: <f1987544>, it_real_fn, S:01
 # expires at 124645673184-124645673184 nsecs [in 654816015 to 654816015 nsecs]
 #2: <f2823b4c>, hrtimer_wakeup, S:01
 # expires at 125434022644-125439022643 nsecs [in 1443165475 to 1448165474 nsecs]
 #3: <f1ab3e94>, hrtimer_wakeup, S:01
 # expires at 3668872852847-3668872902847 nsecs [in 3544881995678 to 3544882045678 nsecs]
 #4: <f2269b4c>, hrtimer_wakeup, S:01
 # expires at 4295018153722969-4295018153772969 nsecs [in 4294894162865800 to 4294894162915800 nsecs]
  .expires_next   : 123991000000 nsecs
  .hres_active    : 1
  .nr_events      : 125349
  .nohz_mode      : 0
  .idle_tick      : 0 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 0
  .idle_calls     : 0
  .idle_sleeps    : 0
  .idle_entrytime : 0 nsecs
  .idle_waketime  : 0 nsecs
  .idle_exittime  : 0 nsecs
  .idle_sleeptime : 0 nsecs
  .last_jiffies   : 0
  .next_jiffies   : 0
  .idle_expires   : 0 nsecs
jiffies: 4294791286

cpu: 1
 clock 0:
  .base:       c2a1c320
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1261581506376548727 nsecs
active timers:
 clock 1:
  .base:       c2a1c34c
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <c2a1c3a4>, tick_sched_timer, S:01
 # expires at 123991125000-123991125000 nsecs [in 267831 to 267831 nsecs]
 #1: <c043b230>, sched_rt_period_timer, S:01
 # expires at 124000000000-124000000000 nsecs [in 9142831 to 9142831 nsecs]
 #2: <f1ab5bc4>, hrtimer_wakeup, S:01
 # expires at 129199139399-129219139398 nsecs [in 5208282230 to 5228282229 nsecs]
 #3: <f1a77b4c>, hrtimer_wakeup, S:01
 # expires at 139203140160-139233140159 nsecs [in 15212282991 to 15242282990 nsecs]
 #4: <f1aade94>, hrtimer_wakeup, S:01
 # expires at 28868872949729-28868872999729 nsecs [in 28744882092560 to 28744882142560 nsecs]
  .expires_next   : 123991125000 nsecs
  .hres_active    : 1
  .nr_events      : 123377
  .nohz_mode      : 0
  .idle_tick      : 0 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 0
  .idle_calls     : 0
  .idle_sleeps    : 0
  .idle_entrytime : 0 nsecs
  .idle_waketime  : 0 nsecs
  .idle_exittime  : 0 nsecs
  .idle_sleeptime : 0 nsecs
  .last_jiffies   : 0
  .next_jiffies   : 0
  .idle_expires   : 0 nsecs
jiffies: 4294791286

cpu: 2
 clock 0:
  .base:       c2a25320
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1261581506376548727 nsecs
active timers:
 clock 1:
  .base:       c2a2534c
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <c2a253a4>, tick_sched_timer, S:01
 # expires at 123991250000-123991250000 nsecs [in 392831 to 392831 nsecs]
 #1: <f1eb9bc4>, hrtimer_wakeup, S:01
 # expires at 124623691750-124625680749 nsecs [in 632834581 to 634823580 nsecs]
 #2: <f1f7dbc4>, hrtimer_wakeup, S:01
 # expires at 127624283651-127628265650 nsecs [in 3633426482 to 3637408481 nsecs]
 #3: <f1cf1bc4>, hrtimer_wakeup, S:01
 # expires at 136624366877-136654360876 nsecs [in 12633509708 to 12663503707 nsecs]
 #4: <f1ad7bc4>, hrtimer_wakeup, S:01
 # expires at 153654620007-153692611006 nsecs [in 29663762838 to 29701753837 nsecs]
 #5: <f1b25f58>, hrtimer_wakeup, S:01
 # expires at 155514242261-155514292261 nsecs [in 31523385092 to 31523435092 nsecs]
 #6: <f198de94>, hrtimer_wakeup, S:01
 # expires at 668873371418-668873421418 nsecs [in 544882514249 to 544882564249 nsecs]
 #7: <f1f3fb4c>, hrtimer_wakeup, S:01
 # expires at 86508836731823-86508936731823 nsecs [in 86384845874654 to 86384945874654 nsecs]
  .expires_next   : 123991250000 nsecs
  .hres_active    : 1
  .nr_events      : 123166
  .nohz_mode      : 0
  .idle_tick      : 0 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 0
  .idle_calls     : 0
  .idle_sleeps    : 0
  .idle_entrytime : 0 nsecs
  .idle_waketime  : 0 nsecs
  .idle_exittime  : 0 nsecs
  .idle_sleeptime : 0 nsecs
  .last_jiffies   : 0
  .next_jiffies   : 0
  .idle_expires   : 0 nsecs
jiffies: 4294791286

cpu: 3
 clock 0:
  .base:       c2a2e320
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1261581506376548727 nsecs
active timers:
 clock 1:
  .base:       c2a2e34c
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <c2a2e3a4>, tick_sched_timer, S:01
 # expires at 123991375000-123991375000 nsecs [in 517831 to 517831 nsecs]
 #1: <f1935bc4>, hrtimer_wakeup, S:01
 # expires at 124624395215-124626393214 nsecs [in 633538046 to 635536045 nsecs]
 #2: <f1aafbc4>, hrtimer_wakeup, S:01
 # expires at 169815643582-169875643581 nsecs [in 45824786413 to 45884786412 nsecs]
 #3: <f23cdbc4>, hrtimer_wakeup, S:01
 # expires at 346123697800-346223697800 nsecs [in 222132840631 to 222232840631 nsecs]
 #4: <f1b04204>, it_real_fn, S:01
 # expires at 403383744722-403383744722 nsecs [in 279392887553 to 279392887553 nsecs]
 #5: <f1b09e04>, it_real_fn, S:01
 # expires at 403383795968-403383795968 nsecs [in 279392938799 to 279392938799 nsecs]
 #6: <f19871c4>, it_real_fn, S:01
 # expires at 403383804795-403383804795 nsecs [in 279392947626 to 279392947626 nsecs]
 #7: <f199be94>, hrtimer_wakeup, S:01
 # expires at 668872854209-668872904209 nsecs [in 544881997040 to 544882047040 nsecs]
  .expires_next   : 123991375000 nsecs
  .hres_active    : 1
  .nr_events      : 122962
  .nohz_mode      : 0
  .idle_tick      : 0 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 0
  .idle_calls     : 0
  .idle_sleeps    : 0
  .idle_entrytime : 0 nsecs
  .idle_waketime  : 0 nsecs
  .idle_exittime  : 0 nsecs
  .idle_sleeptime : 0 nsecs
  .last_jiffies   : 0
  .next_jiffies   : 0
  .idle_expires   : 0 nsecs
jiffies: 4294791286


Tick Device: mode:     1
Broadcast device
Clock Event Device: hpet
 max_delta_ns:   2147483647
 min_delta_ns:   5000
 mult:           61510048
 shift:          32
 mode:           3
 next_event:     9223372036854775807 nsecs
 set_next_event: hpet_legacy_next_event
 set_mode:       hpet_legacy_set_mode
 event_handler:  tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000000
tick_broadcast_oneshot_mask: 00000000


Tick Device: mode:     1
Per CPU device: 0
Clock Event Device: hpet2
 max_delta_ns:   2147483647
 min_delta_ns:   5000
 mult:           61510047
 shift:          32
 mode:           3
 next_event:     123991000000 nsecs
 set_next_event: hpet_msi_next_event
 set_mode:       hpet_msi_set_mode
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Per CPU device: 1
Clock Event Device: lapic
 max_delta_ns:   670831998
 min_delta_ns:   1199
 mult:           53707624
 shift:          32
 mode:           3
 next_event:     123991125000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Per CPU device: 2
Clock Event Device: lapic
 max_delta_ns:   670831998
 min_delta_ns:   1199
 mult:           53707624
 shift:          32
 mode:           3
 next_event:     123991250000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt

Tick Device: mode:     1
Per CPU device: 3
Clock Event Device: lapic
 max_delta_ns:   670831998
 min_delta_ns:   1199
 mult:           53707624
 shift:          32
 mode:           3
 next_event:     123991375000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-23 15:34                                                                                                               ` Mark Hounschell
@ 2009-12-23 15:57                                                                                                                 ` Mark Hounschell
  2009-12-23 16:31                                                                                                                 ` Linus Torvalds
  1 sibling, 0 replies; 74+ messages in thread
From: Mark Hounschell @ 2009-12-23 15:57 UTC (permalink / raw)
  To: markh
  Cc: Pallipadi, Venkatesh, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar,
	Linus Torvalds

On 12/23/2009 10:34 AM, Mark Hounschell wrote:
> On 12/23/2009 10:10 AM, Pallipadi, Venkatesh wrote:
>>  
>>
>>> -----Original Message-----
>>> From: Mark Hounschell [mailto:markh@compro.net] 
>>> Sent: Wednesday, December 23, 2009 5:03 AM
>>> To: Pallipadi, Venkatesh
>>> Cc: dmarkh@cfl.rr.com; Linus Torvalds; Alain Knaff; Linux 
>>> Kernel Mailing List; fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar
>>> Subject: Re: [Fdutils] DMA cache consistency bug introduced in 
>>> 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
>>>
>>> On 12/22/2009 07:22 PM, Mark Hounschell wrote:
>>>> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote:
>>>>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote:
>>>>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote:
>>>>>>>
>>>>>>> [ Ingo, Venki and Shaohua added to cc: see the whole 
>>> thread on lkml for 
>>>>>>>   details, but Mark is basically chasing down a situation 
>>> where the floppy 
>>>>>>>   driver seems to have trouble formatting floppies, and 
>>> it happened 
>>>>>>>   between 2.6.27 and .28. The trouble seems to be that a 
>>> DMA transfer of a 
>>>>>>>   memory block transfers the wrong value for the first 
>>> byte of the block.
>>>>>>>
>>>>>>>   Which should be impossible, but whatever. Some part of 
>>> the system has a 
>>>>>>>   cached buffer that isn't flushed.
>>>>>>>
>>>>>>>   What gets _you_ guys involved is that Mark cannot 
>>> reproduce the bug if 
>>>>>>>   HPET is disabled in the BIOS or by using 'nohpet'. He 
>>> found that out by 
>>>>>>>   pure luck while bisecting, because some time during his 
>>> bisect, his 
>>>>>>>   machine wouldn't even boot with HPET.
>>>>>>>
>>>>>>>   So the problem is: with HPET enabled, 2.6.27.4 _used_ 
>>> to work. But 
>>>>>>>   2.6.28 (and current -git) does not.  Any ideas? ]
>>>>>>>
>>>>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>>>>>>>
>>>>>>>> Ok, I may have something that might help.
>>>>>>>>
>>>>>>>> # git bisect bad
>>>>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>>>>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>>>>>>>> Author: venkatesh.pallipadi@intel.com 
>>> <venkatesh.pallipadi@intel.com>
>>>>>>>> Date:   Fri Sep 5 18:02:18 2008 -0700
>>>>>>>>
>>>>>>>>     x86: HPET_MSI Initialise per-cpu HPET timers
>>>>>>>>
>>>>>>>>     Initialize a per CPU HPET MSI timer when possible. 
>>> We retain the HPET
>>>>>>>>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when 
>>> legacy mode is being used. We
>>>>>>>>     setup the remaining HPET timers as per CPU MSI based 
>>> timers. This per CPU
>>>>>>>>     timer will eliminate the need for timer broadcasting 
>>> with IRQ 0 when there
>>>>>>>>     is non-functional LAPIC timer across CPU deep C-states.
>>>>>>>>
>>>>>>>>     If there are more CPUs than number of available 
>>> timers, CPUs that do not
>>>>>>>>     find any timer to use will continue using LAPIC and 
>>> IRQ 0 broadcast.
>>>>>>>>
>>>>>>>>     Signed-off-by: Venkatesh Pallipadi 
>>> <venkatesh.pallipadi@intel.com>
>>>>>>>>     Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>>>>>>>>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>>>>>>>
>>>>>>>> And of coarse this was the first commit that I could not 
>>> boot if I had hpet
>>>>>>>> enabled. To get this one to boot (single user mode only) 
>>> I had to add the
>>>>>>>> the quiet cmdline option and following patch from to 
>>> arch/x86/kernel/hpet.c
>>>>>>>>
>>>>>>>> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>>>>>>>
>>>>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct 
>>> hpet_dev *dev)
>>>>>>>>  {
>>>>>>>>
>>>>>>>>         if (request_irq(dev->irq, hpet_interrupt_handler,
>>>>>>>> -                       IRQF_SHARED|IRQF_NOBALANCING, 
>>> dev->name, dev))
>>>>>>>> +                       IRQF_DISABLED|IRQF_NOBALANCING, 
>>> dev->name, dev))
>>>>>>>>                 return -1;
>>>>>>>>
>>>>>>>>         disable_irq(dev->irq);
>>>>>>>>
>>>>>>>> AND add the quiet cmdline option.
>>>>>>>
>>>>>>> Ok, so we know why HPET didn't boot for you, and that was 
>>> fixed later (by 
>>>>>>> that 5ceb1a04). But is this also when the floppy started 
>>> mis-behaving?
>>>>>>>
>>>>>>
>>>>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when 
>>> the floppy stops
>>>>>> working
>>>>>> and also when I could no longer boot with hpet enabled.
>>>>>
>>>>>
>>>>> I am missing something here. Commit 26afe5f2 is where 
>>> system does not
>>>>> boot with HPET or is it where the floppy stops working when you boot
>>>>> with HPET enabled.
>>>>>
>>>>
>>>> As it happens, both happen there. Commit 5ceb1a04 is where it starts
>>>> booting _again_ with hpet enabled. So I took that patch 
>>> (5ceb1a04) and
>>>> applied it to (26afe5f2f) to be able to boot with hpet 
>>> enabled.  I had to
>>>> use the quiet option to get to a login prompt, but there is where the
>>>> floppy format first fails, just as it does in 2.6.28 and up.
>>>>
>>>>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts
>>>>> output in each case. With that option, we should be using local APIC
>>>>> timer and PIT, HPET or HPET with MSI should not really 
>>> matter. Does it
>>>>> still fail with .28 with that option?
>>>>>
>>>
>>> 2.6.28 still fails with that option.
>>>
>>> 2.6.27.41 /proc/interrupts with idle=halt
>>>
>>>           CPU0       CPU1       CPU2       CPU3
>>>  0:        126          0          0          1   
>>> IO-APIC-edge      timer
>>>  1:          0          0          1        157   
>>> IO-APIC-edge      i8042
>>>  3:          0          0          0          6   IO-APIC-edge
>>>  4:          0          0          0          6   IO-APIC-edge
>>>  6:          0          0          0          4   
>>> IO-APIC-edge      floppy
>>>  8:          0          0          0          1   
>>> IO-APIC-edge      rtc0
>>>  9:          0          0          0          0   
>>> IO-APIC-fasteoi   acpi
>>> 12:          0          0          1        128   
>>> IO-APIC-edge      i8042
>>> 14:          0          0         34       4457   IO-APIC-edge
>>> pata_atiixp
>>> 15:          0          0          4        480   IO-APIC-edge
>>> pata_atiixp
>>> 16:          0          0          0        397   IO-APIC-fasteoi
>>> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
>>> 17:          0          0          0          2   IO-APIC-fasteoi
>>> ehci_hcd:usb1
>>> 18:          0          0          0          0   IO-APIC-fasteoi
>>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
>>> 19:          0          0          0        142   IO-APIC-fasteoi
>>> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
>>> 22:          0          0          4       1154   
>>> IO-APIC-fasteoi   ahci
>>> 219:          0          0          3         63   
>>> PCI-MSI-edge      eth0
>>> NMI:          0          0          0          0   
>>> Non-maskable interrupts
>>> LOC:      91539      91964      92525      91181   Local timer 
>>> interrupts
>>> RES:       2888       3873       2434       2721   
>>> Rescheduling interrupts
>>> CAL:        240        245        247         84   function 
>>> call interrupts
>>> TLB:        768        628        526        512   TLB shootdowns
>>> SPU:          0          0          0          0   Spurious interrupts
>>> ERR:          0
>>> MIS:          0
>>>
>>> 2.6.28 /proc/interrupts with idle=halt
>>>
>>>           CPU0       CPU1       CPU2       CPU3
>>>  0:        126          0          2          0   
>>> IO-APIC-edge      timer
>>>  1:          0          0        192          0   
>>> IO-APIC-edge      i8042
>>>  3:          0          0          6          0   IO-APIC-edge
>>>  4:          0          0          6          0   IO-APIC-edge
>>>  6:          0          0          4          0   
>>> IO-APIC-edge      floppy
>>>  8:          0          0          1          0   
>>> IO-APIC-edge      rtc0
>>>  9:          0          0          0          0   
>>> IO-APIC-fasteoi   acpi
>>> 12:          0          0        128          1   
>>> IO-APIC-edge      i8042
>>> 14:          0          1     147114        396   IO-APIC-edge
>>> pata_atiixp
>>> 15:          0          0        646          2   IO-APIC-edge
>>> pata_atiixp
>>> 16:          0          0        396          0   IO-APIC-fasteoi
>>> aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel
>>> 17:          0          0          0          0   IO-APIC-fasteoi
>>> ehci_hcd:usb1
>>> 18:          0          0          0          0   IO-APIC-fasteoi
>>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
>>> 19:          0          0        362          1   IO-APIC-fasteoi
>>> aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
>>> 22:          0          0        874          1   
>>> IO-APIC-fasteoi   ahci
>>> 1274:          0          0        193          4   
>>> PCI-MSI-edge      eth0
>>> 1279:     513207          0          0          0  
>>> HPET_MSI-edge      hpet2
>>> NMI:          0          0          0          0   
>>> Non-maskable interrupts
>>> LOC:        268     513395     513138     522088   Local timer 
>>> interrupts
>>> RES:       3262       3679       2573       3746   
>>> Rescheduling interrupts
>>> CAL:        131        166         57        147   Function 
>>> call interrupts
>>> TLB:        680        438        450        639   TLB shootdowns
>>> SPU:          0          0          0          0   Spurious interrupts
>>> ERR:          0
>>> MIS:          0
>>>
>>
>> Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case.
>>
>> I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28
>> /proc/timer_list
> 
> Attached.
> 
>> grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
> 
> I have no /sys/devices/system/cpu/cpu0/cpuidle on this machine.
> Maybe because of
> 
> #
> # CPU Frequency scaling
> #
> # CONFIG_CPU_FREQ is not set
> # CONFIG_CPU_IDLE is not set
> 
> Would it be OK if when you ask for 2.6.28 info, I use a 2.6.32.2 kernel?
> That kernel also fails fdformat with hpet enabled on these machines.
> 

I do have this on 2.6.32.2 though.

# grep . /sys/devices/system/cpu/cpuidle/current_*
/sys/devices/system/cpu/cpuidle/current_driver:acpi_idle
/sys/devices/system/cpu/cpuidle/current_governor_ro:ladder

Want me to go back to 2.6.28 and show this?

Mark


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
  2009-12-23 15:34                                                                                                               ` Mark Hounschell
  2009-12-23 15:57                                                                                                                 ` Mark Hounschell
@ 2009-12-23 16:31                                                                                                                 ` Linus Torvalds
  2009-12-23 16:38                                                                                                                   ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Andi Kleen
  1 sibling, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2009-12-23 16:31 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar



On Wed, 23 Dec 2009, Mark Hounschell wrote:
> > 
> > Hmm. Looks like hpet2 is still getting used instead of local APIC 
> > timer in .28 case.
> > 
> > I was expecting some low number in hpet2 and local timer on all CPU to 
> > be around the same value. Above shows CPU 0 is depending on hpet2 for 
> > some reason even with idle=halt. Can you send the output of below two 
> > in case of .28 /proc/timer_list
> 
> Attached.

Oh wow.

That's crazy:

	Tick Device: mode:     1
	Per CPU device: 0
	Clock Event Device: hpet2
	 max_delta_ns:   2147483647
	 min_delta_ns:   5000
	 mult:           61510047
	 shift:          32
	 mode:           3
	 next_event:     123991000000 nsecs
	 set_next_event: hpet_msi_next_event
	 set_mode:       hpet_msi_set_mode
	 event_handler:  hrtimer_interrupt
	
	Tick Device: mode:     1
	Per CPU device: 1
	Clock Event Device: lapic
	 max_delta_ns:   670831998
	 min_delta_ns:   1199
	 mult:           53707624
	 shift:          32
	 mode:           3
	 next_event:     123991125000 nsecs
	 set_next_event: lapic_next_event
	 set_mode:       lapic_timer_setup
	 event_handler:  hrtimer_interrupt

	...

It's not using the lapic for CPU0. 

Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty 
expensive to reprogram (compared to the local apic). And having different 
timers for different CPU's is just odd.

The fact that the timer subsystem can do this and it all (mostly) works at 
all is nice and impressive, but doesn't make it any less crazy ;)

That said, none of this seems to explain why DMA/fdformat doesn't work.

			Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 16:31                                                                                                                 ` Linus Torvalds
@ 2009-12-23 16:38                                                                                                                   ` Andi Kleen
  2009-12-23 16:49                                                                                                                     ` Linus Torvalds
  2009-12-23 17:41                                                                                                                     ` Mark Hounschell
  0 siblings, 2 replies; 74+ messages in thread
From: Andi Kleen @ 2009-12-23 16:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com,
	Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu,
	Li, Shaohua, Ingo Molnar

Linus Torvalds <torvalds@linux-foundation.org> writes:

> It's not using the lapic for CPU0. 
>
> Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty 
> expensive to reprogram (compared to the local apic). And having different 
> timers for different CPU's is just odd.
>
> The fact that the timer subsystem can do this and it all (mostly) works at 
> all is nice and impressive, but doesn't make it any less crazy ;)

I suspect it's a system where the APIC timer stops in deeper idle
states and it supports them. In this case CPU #0 does timer broadcasts
when needed to wake the other CPUs up from deep C, but for that it has
to run with HPET. At least the other ones can still enjoy the LAPIC
timer.

This might suggest that Mark's floppy controller doesn't like
deep C? Mark, did you try booting with processor.max_cstate=1
and HPET enabled?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 16:38                                                                                                                   ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Andi Kleen
@ 2009-12-23 16:49                                                                                                                     ` Linus Torvalds
  2009-12-23 17:08                                                                                                                       ` Andi Kleen
                                                                                                                                         ` (2 more replies)
  2009-12-23 17:41                                                                                                                     ` Mark Hounschell
  1 sibling, 3 replies; 74+ messages in thread
From: Linus Torvalds @ 2009-12-23 16:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mark Hounschell, Pallipadi, Venkatesh, dmarkh@cfl.rr.com,
	Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu,
	Li, Shaohua, Ingo Molnar



On Wed, 23 Dec 2009, Andi Kleen wrote:
> 
> I suspect it's a system where the APIC timer stops in deeper idle
> states and it supports them. In this case CPU #0 does timer broadcasts
> when needed to wake the other CPUs up from deep C, but for that it has
> to run with HPET. At least the other ones can still enjoy the LAPIC
> timer.

Ahh, ok, that makes sense. I was assuming the broadcast timer would act in 
that capacity, but..

> This might suggest that Mark's floppy controller doesn't like
> deep C? Mark, did you try booting with processor.max_cstate=1
> and HPET enabled?

We have indeed had historical issues with floppy and sleep states before. 

I do note another issue, though - the floppy driver itself seems totally 
broken when it comes to using interleaved sectors. Alain, that "place 
logical sectors" code is simply _broken_ - the "while" kicks in only if 
the first sector we test is busy _and_ we were at the last sector so that 
we increment past F_SECT_PER_TRACK.

So shouldn't that sector layout be something like the appended?

		Linus
---
 drivers/block/floppy.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 3266b4f..9c9148c 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -2237,13 +2237,10 @@ static void setup_format_params(int track)
 	for (count = 1; count <= F_SECT_PER_TRACK; ++count) {
 		here[n].sect = count;
 		n = (n + il) % F_SECT_PER_TRACK;
-		if (here[n].sect) {	/* sector busy, find next free sector */
+		while (here[n].sect) {	/* sector busy, find next free sector */
 			++n;
-			if (n >= F_SECT_PER_TRACK) {
+			if (n >= F_SECT_PER_TRACK)
 				n -= F_SECT_PER_TRACK;
-				while (here[n].sect)
-					++n;
-			}
 		}
 	}
 	if (_floppy->stretch & FD_SECTBASEMASK) {

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 16:49                                                                                                                     ` Linus Torvalds
@ 2009-12-23 17:08                                                                                                                       ` Andi Kleen
  2009-12-25 12:21                                                                                                                         ` Arjan van de Ven
  2009-12-27 11:09                                                                                                                         ` Pavel Machek
  2009-12-23 17:19                                                                                                                       ` Pallipadi, Venkatesh
  2009-12-23 20:11                                                                                                                       ` alain
  2 siblings, 2 replies; 74+ messages in thread
From: Andi Kleen @ 2009-12-23 17:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Mark Hounschell, Pallipadi, Venkatesh,
	dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar

On Wed, Dec 23, 2009 at 08:49:38AM -0800, Linus Torvalds wrote:
> 
> 
> On Wed, 23 Dec 2009, Andi Kleen wrote:
> > 
> > I suspect it's a system where the APIC timer stops in deeper idle
> > states and it supports them. In this case CPU #0 does timer broadcasts
> > when needed to wake the other CPUs up from deep C, but for that it has
> > to run with HPET. At least the other ones can still enjoy the LAPIC
> > timer.
> 
> Ahh, ok, that makes sense. I was assuming the broadcast timer would act in 
> that capacity, but..

The "broadcasts" are done using IPIs from cpu #08 and only when that target 
CPU is deep idle.  That's more efficient than letting the hardware
always broadcast.

> 
> > This might suggest that Mark's floppy controller doesn't like
> > deep C? Mark, did you try booting with processor.max_cstate=1
> > and HPET enabled?
> 
> We have indeed had historical issues with floppy and sleep states before. 

I removed that code when moving to 64bit (floppy driver disabling C1),
but perhaps we need some variant of it again (but it's the first such
report in many years). Although it would be sad to have it again on all 
systems.

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 17:08                                                                                                                       ` Andi Kleen
@ 2009-12-25 12:21                                                                                                                         ` Arjan van de Ven
  2009-12-25 20:33                                                                                                                           ` Andi Kleen
  2009-12-27 11:09                                                                                                                         ` Pavel Machek
  1 sibling, 1 reply; 74+ messages in thread
From: Arjan van de Ven @ 2009-12-25 12:21 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Andi Kleen, Mark Hounschell, Pallipadi, Venkatesh,
	dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar

On Wed, 23 Dec 2009 18:08:32 +0100
Andi Kleen <andi@firstfloor.org> wrote:

> I removed that code when moving to 64bit (floppy driver disabling C1),
> but perhaps we need some variant of it again (but it's the first such
> report in many years). Although it would be sad to have it again on
> all systems.

at least now we have the pmqos infrastructure, driver just needs to ask
for 0 latency ;)


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-25 12:21                                                                                                                         ` Arjan van de Ven
@ 2009-12-25 20:33                                                                                                                           ` Andi Kleen
  2009-12-26  9:38                                                                                                                             ` Arjan van de Ven
  0 siblings, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2009-12-25 20:33 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh,
	dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar

On Fri, Dec 25, 2009 at 01:21:16PM +0100, Arjan van de Ven wrote:
> On Wed, 23 Dec 2009 18:08:32 +0100
> Andi Kleen <andi@firstfloor.org> wrote:
> 
> > I removed that code when moving to 64bit (floppy driver disabling C1),
> > but perhaps we need some variant of it again (but it's the first such
> > report in many years). Although it would be sad to have it again on
> > all systems.
> 
> at least now we have the pmqos infrastructure, driver just needs to ask
> for 0 latency ;)

Does pmqos work with apci=off etc.? I didn't think it shut down
the classic "HLT" idle, does it? The old i386 systems needed that
apparently, they long pre date any deeper idle states.

Anyways the code is still there for 32bit.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-25 20:33                                                                                                                           ` Andi Kleen
@ 2009-12-26  9:38                                                                                                                             ` Arjan van de Ven
  2009-12-26 16:40                                                                                                                               ` Andi Kleen
  0 siblings, 1 reply; 74+ messages in thread
From: Arjan van de Ven @ 2009-12-26  9:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh,
	dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar

On Fri, 25 Dec 2009 21:33:04 +0100
Andi Kleen <andi@firstfloor.org> wrote:

> On Fri, Dec 25, 2009 at 01:21:16PM +0100, Arjan van de Ven wrote:
> > On Wed, 23 Dec 2009 18:08:32 +0100
> > Andi Kleen <andi@firstfloor.org> wrote:
> > 
> > > I removed that code when moving to 64bit (floppy driver disabling
> > > C1), but perhaps we need some variant of it again (but it's the
> > > first such report in many years). Although it would be sad to
> > > have it again on all systems.
> > 
> > at least now we have the pmqos infrastructure, driver just needs to
> > ask for 0 latency ;)
> 
> Does pmqos work with apci=off etc.? 

yes

> I didn't think it shut down
> the classic "HLT" idle, does it? 

it does if you specify a latency of 0; it will then go into the
spin-only state until you give up your latency requirement


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-26  9:38                                                                                                                             ` Arjan van de Ven
@ 2009-12-26 16:40                                                                                                                               ` Andi Kleen
  2009-12-27 12:28                                                                                                                                 ` Alain Knaff
  0 siblings, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2009-12-26 16:40 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh,
	dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar

> > Does pmqos work with apci=off etc.? 
> 
> yes
> 
> > I didn't think it shut down
> > the classic "HLT" idle, does it? 
> 
> it does if you specify a latency of 0; it will then go into the
> spin-only state until you give up your latency requirement

I looked at it this evening, but it seems like pm_qos is not
interrupt safe (e.g. calls blocking notifiers) and floppy currently does 
enable/disable_hlt from interrupts and bottom halves.  

Would need some more infrastructure work or restructuring 
of the floppy driver.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-26 16:40                                                                                                                               ` Andi Kleen
@ 2009-12-27 12:28                                                                                                                                 ` Alain Knaff
  2009-12-28  1:54                                                                                                                                   ` Andi Kleen
  0 siblings, 1 reply; 74+ messages in thread
From: Alain Knaff @ 2009-12-27 12:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arjan van de Ven, Linus Torvalds, Mark Hounschell,
	Pallipadi, Venkatesh, dmarkh@cfl.rr.com,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar, morgan, JONES

Andi Kleen wrote:
>>> Does pmqos work with apci=off etc.? 
>> yes
>>
>>> I didn't think it shut down
>>> the classic "HLT" idle, does it? 
>> it does if you specify a latency of 0; it will then go into the
>> spin-only state until you give up your latency requirement
> 
> I looked at it this evening, but it seems like pm_qos is not
> interrupt safe (e.g. calls blocking notifiers) and floppy currently does 
> enable/disable_hlt from interrupts and bottom halves.  
> 
> Would need some more infrastructure work or restructuring 
> of the floppy driver.
> 
> -Andi

disable_hlt/enable_hlt was only needed to work around a bug on TM4000
(Texas Instrument) Laptops which were popular around 1994 / 1995.
Basically, as soon as the CPU went into hlt() state, so did the DMA
controller, either causing a really slow transfer, or (worse) a buffer
over/underrun which failed the operation.

On hardware unaffected by this particular bug (which would be most
hardware around now, 14 years after the fact...), these calls can safely
be removed.

Regards,

Alain


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-27 12:28                                                                                                                                 ` Alain Knaff
@ 2009-12-28  1:54                                                                                                                                   ` Andi Kleen
  2009-12-28 10:27                                                                                                                                     ` Alain Knaff
  0 siblings, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2009-12-28  1:54 UTC (permalink / raw)
  To: Alain Knaff
  Cc: Andi Kleen, Arjan van de Ven, Linus Torvalds, Mark Hounschell,
	Pallipadi, Venkatesh, dmarkh@cfl.rr.com,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar, morgan, JONES

> disable_hlt/enable_hlt was only needed to work around a bug on TM4000
> (Texas Instrument) Laptops which were popular around 1994 / 1995.

I don't think we can fully drop support for these systems.

Did they have an unique PCI ID or something else that could be tested
for?

Perhaps it could be just a white list like dmi_year > 1995 to disable.

Depending on how often floppies are still used this might save
non trivial amounts of power on newer systems :)

Anyways it would be probably good to convert this to the new infrastructure,
and remove the old hooks, but the interrupt-context issue would
need to be fixed first.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-28  1:54                                                                                                                                   ` Andi Kleen
@ 2009-12-28 10:27                                                                                                                                     ` Alain Knaff
  2009-12-28 14:54                                                                                                                                       ` Andi Kleen
  0 siblings, 1 reply; 74+ messages in thread
From: Alain Knaff @ 2009-12-28 10:27 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arjan van de Ven, Linus Torvalds, Mark Hounschell,
	Pallipadi, Venkatesh, dmarkh@cfl.rr.com,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar, morgan

Andi Kleen wrote:
>> disable_hlt/enable_hlt was only needed to work around a bug on TM4000
>> (Texas Instrument) Laptops which were popular around 1994 / 1995.
> 
> I don't think we can fully drop support for these systems.
> 
> Did they have an unique PCI ID or something else that could be tested
> for?

Floppy controllers are not PCI devices and thus have no PCI id
unfortunately... :-(

> Perhaps it could be just a white list like dmi_year > 1995 to disable.
> 
> Depending on how often floppies are still used this might save
> non trivial amounts of power on newer systems :)

Removing these calls will indeed save a *tiny* amount of power by
allowing the CPU to go into halt during DMA transfer. But the main
argument should be simplification.

> Anyways it would be probably good to convert this to the new infrastructure,
> and remove the old hooks, but the interrupt-context issue would
> need to be fixed first.
> 
> -Andi

Well, at least for testing whether it fixes the new problem (DMA cache
issue), it's useful to know that these calls can be safely removed on
almost all of today's machines. That way, we will know whether this
refactoring will be worth the effort.

Regards,

Alain

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-28 10:27                                                                                                                                     ` Alain Knaff
@ 2009-12-28 14:54                                                                                                                                       ` Andi Kleen
  0 siblings, 0 replies; 74+ messages in thread
From: Andi Kleen @ 2009-12-28 14:54 UTC (permalink / raw)
  To: Alain Knaff
  Cc: Andi Kleen, Arjan van de Ven, Linus Torvalds, Mark Hounschell,
	Pallipadi, Venkatesh, dmarkh@cfl.rr.com,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar, morgan

On Mon, Dec 28, 2009 at 11:27:56AM +0100, Alain Knaff wrote:
> Andi Kleen wrote:
> >> disable_hlt/enable_hlt was only needed to work around a bug on TM4000
> >> (Texas Instrument) Laptops which were popular around 1994 / 1995.
> > 
> > I don't think we can fully drop support for these systems.
> > 
> > Did they have an unique PCI ID or something else that could be tested
> > for?
> 
> Floppy controllers are not PCI devices and thus have no PCI id
> unfortunately... :-(

Yes, but it's enough to identify any component in the system.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 17:08                                                                                                                       ` Andi Kleen
  2009-12-25 12:21                                                                                                                         ` Arjan van de Ven
@ 2009-12-27 11:09                                                                                                                         ` Pavel Machek
  2009-12-28 20:54                                                                                                                           ` Mark Hounschell
  1 sibling, 1 reply; 74+ messages in thread
From: Pavel Machek @ 2009-12-27 11:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh,
	dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar


> > > This might suggest that Mark's floppy controller doesn't like
> > > deep C? Mark, did you try booting with processor.max_cstate=1
> > > and HPET enabled?
> > 
> > We have indeed had historical issues with floppy and sleep states before. 
> 
> I removed that code when moving to 64bit (floppy driver disabling C1),
> but perhaps we need some variant of it again (but it's the first such
> report in many years). Although it would be sad to have it again on all 
> systems.

C1 is hlt. Are you sure? I could see how C3 could cause problems (DMA
 latency), but...

Can mark simply try with idle=poll?

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-27 11:09                                                                                                                         ` Pavel Machek
@ 2009-12-28 20:54                                                                                                                           ` Mark Hounschell
  0 siblings, 0 replies; 74+ messages in thread
From: Mark Hounschell @ 2009-12-28 20:54 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andi Kleen, Linus Torvalds, Mark Hounschell, Pallipadi, Venkatesh,
	Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu,
	Li, Shaohua, Ingo Molnar

On 12/27/2009 06:09 AM, Pavel Machek wrote:
> 
>>>> This might suggest that Mark's floppy controller doesn't like
>>>> deep C? Mark, did you try booting with processor.max_cstate=1
>>>> and HPET enabled?
>>>
>>> We have indeed had historical issues with floppy and sleep states before. 
>>
>> I removed that code when moving to 64bit (floppy driver disabling C1),
>> but perhaps we need some variant of it again (but it's the first such
>> report in many years). Although it would be sad to have it again on all 
>> systems.
> 
> C1 is hlt. Are you sure? I could see how C3 could cause problems (DMA
>  latency), but...
> 
> Can mark simply try with idle=poll?
> 
> 									Pavel
> 

The floppy still fails with idle=poll

Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 16:49                                                                                                                     ` Linus Torvalds
  2009-12-23 17:08                                                                                                                       ` Andi Kleen
@ 2009-12-23 17:19                                                                                                                       ` Pallipadi, Venkatesh
  2009-12-23 17:16                                                                                                                         ` Andi Kleen
  2009-12-23 20:11                                                                                                                       ` alain
  2 siblings, 1 reply; 74+ messages in thread
From: Pallipadi, Venkatesh @ 2009-12-23 17:19 UTC (permalink / raw)
  To: Linus Torvalds, Andi Kleen
  Cc: Mark Hounschell, dmarkh@cfl.rr.com, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

 

>-----Original Message-----
>From: Linus Torvalds [mailto:torvalds@linux-foundation.org] 
>Sent: Wednesday, December 23, 2009 8:50 AM
>To: Andi Kleen
>Cc: Mark Hounschell; Pallipadi, Venkatesh; dmarkh@cfl.rr.com; 
>Alain Knaff; Linux Kernel Mailing List; 
>fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar
>Subject: Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
>
>
>
>On Wed, 23 Dec 2009, Andi Kleen wrote:
>> 
>> I suspect it's a system where the APIC timer stops in deeper idle
>> states and it supports them. In this case CPU #0 does timer 
>broadcasts
>> when needed to wake the other CPUs up from deep C, but for 
>that it has
>> to run with HPET. At least the other ones can still enjoy the LAPIC
>> timer.
>
>Ahh, ok, that makes sense. I was assuming the broadcast timer 
>would act in 
>that capacity, but..

This is what I was thining yday and asked Mark to try idle=halt.
This /proc/interrupts is with idle=halt when there should not be any
C-states and broadcasts involved.
>>> HPET_MSI-edge      hpet2
>>> NMI:          0          0          0          0   
>>> Non-maskable interrupts
>>> LOC:        268     513395     513138     522088   Local timer 
>>> interrupts

Not sure how this is related to floppy problem. But, we surely
have something wrong with percpu HPET usage here.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 17:19                                                                                                                       ` Pallipadi, Venkatesh
@ 2009-12-23 17:16                                                                                                                         ` Andi Kleen
  0 siblings, 0 replies; 74+ messages in thread
From: Andi Kleen @ 2009-12-23 17:16 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Linus Torvalds, Andi Kleen, Mark Hounschell, dmarkh@cfl.rr.com,
	Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu,
	Li, Shaohua, Ingo Molnar

> This is what I was thining yday and asked Mark to try idle=halt.
> This /proc/interrupts is with idle=halt when there should not be any
> C-states and broadcasts involved.

Ah ok, missed that sorry.

Actually I'm glad that the floppy-idle hack is not needed again.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 16:49                                                                                                                     ` Linus Torvalds
  2009-12-23 17:08                                                                                                                       ` Andi Kleen
  2009-12-23 17:19                                                                                                                       ` Pallipadi, Venkatesh
@ 2009-12-23 20:11                                                                                                                       ` alain
  2 siblings, 0 replies; 74+ messages in thread
From: alain @ 2009-12-23 20:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Mark Hounschell, Pallipadi, Venkatesh,
	dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar

Linus Torvalds wrote:

> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index 3266b4f..9c9148c 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -2237,13 +2237,10 @@ static void setup_format_params(int track)
>  	for (count = 1; count <= F_SECT_PER_TRACK; ++count) {
>  		here[n].sect = count;
>  		n = (n + il) % F_SECT_PER_TRACK;
> -		if (here[n].sect) {	/* sector busy, find next free sector */
> +		while (here[n].sect) {	/* sector busy, find next free sector */
>  			++n;
> -			if (n >= F_SECT_PER_TRACK) {
> +			if (n >= F_SECT_PER_TRACK)
>  				n -= F_SECT_PER_TRACK;
> -				while (here[n].sect)
> -					++n;
> -			}
>  		}
>  	}
>  	if (_floppy->stretch & FD_SECTBASEMASK) {

The original code does indeed look a little bit strange... and might
break if there is a long run of "busy" sectors near the end of the
physical track. Or maybe there is a mathematical reason why this
situation cannot occur. I'll have to think about it a little bit more to
come up with a test case that will break either the new or old code.

But in any case, if a bug would occur due to this code, it would only
depend on the format's parameters, and not on the hardwarde.

Regards,

Alain

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 16:38                                                                                                                   ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Andi Kleen
  2009-12-23 16:49                                                                                                                     ` Linus Torvalds
@ 2009-12-23 17:41                                                                                                                     ` Mark Hounschell
  2009-12-23 18:01                                                                                                                       ` Linus Torvalds
  2009-12-23 19:18                                                                                                                       ` Pallipadi, Venkatesh
  1 sibling, 2 replies; 74+ messages in thread
From: Mark Hounschell @ 2009-12-23 17:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Pallipadi, Venkatesh, dmarkh@cfl.rr.com,
	Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu,
	Li, Shaohua, Ingo Molnar

On 12/23/2009 11:38 AM, Andi Kleen wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
>> It's not using the lapic for CPU0. 
>>
>> Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty 
>> expensive to reprogram (compared to the local apic). And having different 
>> timers for different CPU's is just odd.
>>
>> The fact that the timer subsystem can do this and it all (mostly) works at 
>> all is nice and impressive, but doesn't make it any less crazy ;)
> 
> I suspect it's a system where the APIC timer stops in deeper idle
> states and it supports them. In this case CPU #0 does timer broadcasts
> when needed to wake the other CPUs up from deep C, but for that it has
> to run with HPET. At least the other ones can still enjoy the LAPIC
> timer.
> 
> This might suggest that Mark's floppy controller doesn't like
> deep C? Mark, did you try booting with processor.max_cstate=1
> and HPET enabled?

I just did and /proc/interrupts looks the same and the floppy still does
not format.

I'll try the patch Linus provided now.

Mark


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 17:41                                                                                                                     ` Mark Hounschell
@ 2009-12-23 18:01                                                                                                                       ` Linus Torvalds
  2009-12-23 18:11                                                                                                                         ` Mark Hounschell
  2009-12-23 19:18                                                                                                                       ` Pallipadi, Venkatesh
  1 sibling, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2009-12-23 18:01 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: Andi Kleen, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar



On Wed, 23 Dec 2009, Mark Hounschell wrote:
> 
> I'll try the patch Linus provided now.

I doubt it matters - because if it did, it would matter for everybody, and 
the HPET thing shouldn't make any difference at all.

[ Or rather, it should matter for everybody trying to format a specific 
  format (without interleave it won't matter, and not all formats have any 
  interleave - I think it was mainly used on 5.25" floppies and special 
  formats). ]

Besides, maybe I was just mis-reading the code.

But getting some testing for the patch certainly won't hurt, so I'm not 
going to argue against it any more ;)

		Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 18:01                                                                                                                       ` Linus Torvalds
@ 2009-12-23 18:11                                                                                                                         ` Mark Hounschell
  0 siblings, 0 replies; 74+ messages in thread
From: Mark Hounschell @ 2009-12-23 18:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On 12/23/2009 01:01 PM, Linus Torvalds wrote:
> 
> 
> On Wed, 23 Dec 2009, Mark Hounschell wrote:
>>
>> I'll try the patch Linus provided now.
> 
> I doubt it matters - because if it did, it would matter for everybody, and 
> the HPET thing shouldn't make any difference at all.
> 
> [ Or rather, it should matter for everybody trying to format a specific 
>   format (without interleave it won't matter, and not all formats have any 
>   interleave - I think it was mainly used on 5.25" floppies and special 
>   formats). ]
> 
> Besides, maybe I was just mis-reading the code.
> 
> But getting some testing for the patch certainly won't hurt, so I'm not 
> going to argue against it any more ;)

Yea, that hosed it up pretty good. The very first track label sent out
caused some sort of timeout.

Dec 23 13:10:02 harley kernel:
Dec 23 13:10:02 harley kernel: floppy driver state
Dec 23 13:10:02 harley kernel: -------------------
Dec 23 13:10:02 harley kernel: now=9017 last interrupt=8117 diff=900 last
called handler=f73ce27d
Dec 23 13:10:02 harley kernel: timeout_message=lock fdc
Dec 23 13:10:02 harley kernel: last output bytes:
Dec 23 13:10:02 harley kernel:  0 90 4294899106
Dec 23 13:10:02 harley kernel: 1a 90 4294899106
Dec 23 13:10:02 harley kernel:  0 90 4294899106
Dec 23 13:10:02 harley kernel:  3 90 4294899106
Dec 23 13:10:02 harley kernel: c1 90 4294899106
Dec 23 13:10:02 harley kernel: 10 90 4294899106
Dec 23 13:10:02 harley kernel:  7 80 4294899106
Dec 23 13:10:02 harley kernel:  0 90 4294899106
Dec 23 13:10:02 harley kernel:  8 81 4294899106
Dec 23 13:10:02 harley kernel:  4 80 4294899106
Dec 23 13:10:02 harley kernel:  0 90 4294899106
Dec 23 13:10:02 harley kernel: e6 80 8007
Dec 23 13:10:02 harley kernel:  0 90 8007
Dec 23 13:10:02 harley syslog-ng[2651]: last message repeated 2 times
Dec 23 13:10:02 harley kernel:  1 90 8007
Dec 23 13:10:02 harley kernel:  2 90 8007
Dec 23 13:10:02 harley kernel: 12 90 8007
Dec 23 13:10:02 harley kernel: 1b 90 8007
Dec 23 13:10:02 harley kernel: ff 90 8007
Dec 23 13:10:02 harley kernel: last result at 8117
Dec 23 13:10:02 harley kernel: last redo_fd_request at 8117
Dec 23 13:10:02 harley kernel:
Dec 23 13:10:02 harley kernel: status=80
Dec 23 13:10:02 harley kernel: fdc_busy=1
Dec 23 13:10:02 harley kernel: cont=f73d58e4
Dec 23 13:10:02 harley kernel: current_req=(null)
Dec 23 13:10:02 harley kernel: command_status=-1
Dec 23 13:10:02 harley kernel:
Dec 23 13:10:02 harley kernel: floppy0: floppy timeout called
Dec 23 13:10:22 harley kernel:
Dec 23 13:10:22 harley kernel: floppy driver state
Dec 23 13:10:22 harley kernel: -------------------
Dec 23 13:10:22 harley kernel: now=15017 last interrupt=8117 diff=6900 last
called handler=f73ce27d
Dec 23 13:10:22 harley kernel: timeout_message=do wakeup
Dec 23 13:10:22 harley kernel: last output bytes:
Dec 23 13:10:22 harley kernel:  0 90 4294899106
Dec 23 13:10:22 harley kernel: 1a 90 4294899106
Dec 23 13:10:22 harley kernel:  0 90 4294899106
Dec 23 13:10:22 harley kernel:  3 90 4294899106
Dec 23 13:10:22 harley kernel: c1 90 4294899106
Dec 23 13:10:22 harley kernel: 10 90 4294899106
Dec 23 13:10:22 harley kernel:  7 80 4294899106
Dec 23 13:10:22 harley kernel:  0 90 4294899106
Dec 23 13:10:22 harley kernel:  8 81 4294899106
Dec 23 13:10:22 harley kernel:  4 80 4294899106
Dec 23 13:10:22 harley kernel:  0 90 4294899106
Dec 23 13:10:22 harley kernel: e6 80 8007
Dec 23 13:10:22 harley kernel:  0 90 8007
Dec 23 13:10:22 harley syslog-ng[2651]: last message repeated 2 times
Dec 23 13:10:22 harley kernel:  1 90 8007
Dec 23 13:10:22 harley kernel:  2 90 8007
Dec 23 13:10:22 harley kernel: 12 90 8007
Dec 23 13:10:22 harley kernel: 1b 90 8007
Dec 23 13:10:22 harley kernel: ff 90 8007
Dec 23 13:10:22 harley kernel: last result at 8117
Dec 23 13:10:22 harley kernel: last redo_fd_request at 8117
Dec 23 13:10:22 harley kernel:
Dec 23 13:10:22 harley kernel: status=80
Dec 23 13:10:22 harley kernel: fdc_busy=1
Dec 23 13:10:22 harley kernel: floppy_work.func=f73d03da
Dec 23 13:10:22 harley kernel: cont=f73d5274
Dec 23 13:10:22 harley kernel: current_req=(null)
Dec 23 13:10:22 harley kernel: command_status=-1
Dec 23 13:10:22 harley kernel:
Dec 23 13:10:22 harley kernel: floppy0: floppy timeout called
Dec 23 13:10:22 harley kernel: floppy.c: no request in request_don

Have to reboot now...

Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 17:41                                                                                                                     ` Mark Hounschell
  2009-12-23 18:01                                                                                                                       ` Linus Torvalds
@ 2009-12-23 19:18                                                                                                                       ` Pallipadi, Venkatesh
  2009-12-23 19:35                                                                                                                         ` Mark Hounschell
  1 sibling, 1 reply; 74+ messages in thread
From: Pallipadi, Venkatesh @ 2009-12-23 19:18 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: Andi Kleen, Linus Torvalds, Pallipadi, Venkatesh,
	dmarkh@cfl.rr.com, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar

On Wed, Dec 23, 2009 at 09:41:50AM -0800, Mark Hounschell wrote:
> On 12/23/2009 11:38 AM, Andi Kleen wrote:
> > Linus Torvalds <torvalds@linux-foundation.org> writes:
> > 
> >> It's not using the lapic for CPU0. 
> >>
> >> Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty 
> >> expensive to reprogram (compared to the local apic). And having different 
> >> timers for different CPU's is just odd.
> >>
> >> The fact that the timer subsystem can do this and it all (mostly) works at 
> >> all is nice and impressive, but doesn't make it any less crazy ;)
> > 
> > I suspect it's a system where the APIC timer stops in deeper idle
> > states and it supports them. In this case CPU #0 does timer broadcasts
> > when needed to wake the other CPUs up from deep C, but for that it has
> > to run with HPET. At least the other ones can still enjoy the LAPIC
> > timer.
> > 
> > This might suggest that Mark's floppy controller doesn't like
> > deep C? Mark, did you try booting with processor.max_cstate=1
> > and HPET enabled?
> 
> I just did and /proc/interrupts looks the same and the floppy still does
> not format.
> 

Can you try this one line patch either on .28 or .32 (with /proc/interrupts
output).
This disables hpet2 and lapic timer should then be used on CPU 0. If things
work with this test patch, we will know that the failure is somehow related
to HPET usage in MSI mode.

Thanks,
Venki

Reduce the rating of percpu hpet timer

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
---
 arch/x86/kernel/hpet.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index cafb1c6..f89d17a 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
 	hpet_setup_irq(hdev);
 	evt->irq = hdev->irq;
 
-	evt->rating = 110;
+	evt->rating = 40;
 	evt->features = CLOCK_EVT_FEAT_ONESHOT;
 	if (hdev->flags & HPET_DEV_PERI_CAP)
 		evt->features |= CLOCK_EVT_FEAT_PERIODIC;
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 19:18                                                                                                                       ` Pallipadi, Venkatesh
@ 2009-12-23 19:35                                                                                                                         ` Mark Hounschell
  2009-12-23 20:30                                                                                                                           ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2009-12-23 19:35 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On 12/23/2009 02:18 PM, Pallipadi, Venkatesh wrote:
> On Wed, Dec 23, 2009 at 09:41:50AM -0800, Mark Hounschell wrote:
>> On 12/23/2009 11:38 AM, Andi Kleen wrote:
>>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>>>
>>>> It's not using the lapic for CPU0. 
>>>>
>>>> Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty 
>>>> expensive to reprogram (compared to the local apic). And having different 
>>>> timers for different CPU's is just odd.
>>>>
>>>> The fact that the timer subsystem can do this and it all (mostly) works at 
>>>> all is nice and impressive, but doesn't make it any less crazy ;)
>>>
>>> I suspect it's a system where the APIC timer stops in deeper idle
>>> states and it supports them. In this case CPU #0 does timer broadcasts
>>> when needed to wake the other CPUs up from deep C, but for that it has
>>> to run with HPET. At least the other ones can still enjoy the LAPIC
>>> timer.
>>>
>>> This might suggest that Mark's floppy controller doesn't like
>>> deep C? Mark, did you try booting with processor.max_cstate=1
>>> and HPET enabled?
>>
>> I just did and /proc/interrupts looks the same and the floppy still does
>> not format.
>>
> 
> Can you try this one line patch either on .28 or .32 (with /proc/interrupts
> output).
> This disables hpet2 and lapic timer should then be used on CPU 0. If things
> work with this test patch, we will know that the failure is somehow related
> to HPET usage in MSI mode.
> 
> Thanks,
> Venki
> 
> Reduce the rating of percpu hpet timer
> 
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> ---
>  arch/x86/kernel/hpet.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index cafb1c6..f89d17a 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
>  	hpet_setup_irq(hdev);
>  	evt->irq = hdev->irq;
>  
> -	evt->rating = 110;
> +	evt->rating = 40;
>  	evt->features = CLOCK_EVT_FEAT_ONESHOT;
>  	if (hdev->flags & HPET_DEV_PERI_CAP)
>  		evt->features |= CLOCK_EVT_FEAT_PERIODIC;

That made it work. Used 2.6.32.2

cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  0:         82          0          0          1   IO-APIC-edge      timer
  1:          0          0          0         67   IO-APIC-edge      i8042
  3:          0          0          0          6   IO-APIC-edge
  4:          0          0          0          4   IO-APIC-edge
  6:          0          0          0          4   IO-APIC-edge      floppy
  8:          0          0          0          8   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          0         10       1519   IO-APIC-edge      i8042
 14:          0          0         39      10995   IO-APIC-edge
pata_atiixp
 15:          0          0          3        391   IO-APIC-edge
pata_atiixp
 16:          0          0          2        606   IO-APIC-fasteoi
aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib
 17:          0          0          0          3   IO-APIC-fasteoi
ehci_hcd:usb1, parport0, ni-pci-gpib
 18:          0          0         10       2168   IO-APIC-fasteoi
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia
 19:          0          0          0        130   IO-APIC-fasteoi
aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
 22:          0          0          8       1151   IO-APIC-fasteoi   ahci
 24:          0          0          0          0  HPET_MSI-edge      hpet2
 29:          0          0          0         48   PCI-MSI-edge
sky2@pci:0000:04:00.0
NMI:          0          0          0          0   Non-maskable interrupts
LOC:      34842      30177      29672      29632   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0   Performance monitoring
interrupts
PND:          0          0          0          0   Performance pending work
RES:      17501      20449      16670      11224   Rescheduling interrupts
CAL:      10554       2336       1102       1071   Function call interrupts
TLB:        364        562        753        468   TLB shootdowns
ERR:          0
MIS:          0


# fdformat /dev/fd0u1440
Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
Formatting ... done
Verifying ... done

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 19:35                                                                                                                         ` Mark Hounschell
@ 2009-12-23 20:30                                                                                                                           ` Pallipadi, Venkatesh
  2009-12-23 20:34                                                                                                                             ` alain
  2010-01-08 17:42                                                                                                                             ` Mark Hounschell
  0 siblings, 2 replies; 74+ messages in thread
From: Pallipadi, Venkatesh @ 2009-12-23 20:30 UTC (permalink / raw)
  To: markh@compro.net
  Cc: Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On Wed, 2009-12-23 at 11:35 -0800, Mark Hounschell wrote:
> On 12/23/2009 02:18 PM, Pallipadi, Venkatesh wrote:
> > On Wed, Dec 23, 2009 at 09:41:50AM -0800, Mark Hounschell wrote:
> >> On 12/23/2009 11:38 AM, Andi Kleen wrote:
> >>> Linus Torvalds <torvalds@linux-foundation.org> writes:
> >>>
> >>>> It's not using the lapic for CPU0. 
> >>>>
> >>>> Using the HPET as a per-cpu timer is some crazy sh*t, since it's pretty 
> >>>> expensive to reprogram (compared to the local apic). And having different 
> >>>> timers for different CPU's is just odd.
> >>>>
> >>>> The fact that the timer subsystem can do this and it all (mostly) works at 
> >>>> all is nice and impressive, but doesn't make it any less crazy ;)
> >>>
> >>> I suspect it's a system where the APIC timer stops in deeper idle
> >>> states and it supports them. In this case CPU #0 does timer broadcasts
> >>> when needed to wake the other CPUs up from deep C, but for that it has
> >>> to run with HPET. At least the other ones can still enjoy the LAPIC
> >>> timer.
> >>>
> >>> This might suggest that Mark's floppy controller doesn't like
> >>> deep C? Mark, did you try booting with processor.max_cstate=1
> >>> and HPET enabled?
> >>
> >> I just did and /proc/interrupts looks the same and the floppy still does
> >> not format.
> >>
> > 
> > Can you try this one line patch either on .28 or .32 (with /proc/interrupts
> > output).
> > This disables hpet2 and lapic timer should then be used on CPU 0. If things
> > work with this test patch, we will know that the failure is somehow related
> > to HPET usage in MSI mode.
> > 
> > Thanks,
> > Venki
> > 
> > Reduce the rating of percpu hpet timer
> > 
> > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > ---
> >  arch/x86/kernel/hpet.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> > index cafb1c6..f89d17a 100644
> > --- a/arch/x86/kernel/hpet.c
> > +++ b/arch/x86/kernel/hpet.c
> > @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
> >  	hpet_setup_irq(hdev);
> >  	evt->irq = hdev->irq;
> >  
> > -	evt->rating = 110;
> > +	evt->rating = 40;
> >  	evt->features = CLOCK_EVT_FEAT_ONESHOT;
> >  	if (hdev->flags & HPET_DEV_PERI_CAP)
> >  		evt->features |= CLOCK_EVT_FEAT_PERIODIC;
> 
> That made it work. Used 2.6.32.2
> 
> cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   0:         82          0          0          1   IO-APIC-edge      timer
>   1:          0          0          0         67   IO-APIC-edge      i8042
>   3:          0          0          0          6   IO-APIC-edge
>   4:          0          0          0          4   IO-APIC-edge
>   6:          0          0          0          4   IO-APIC-edge      floppy
>   8:          0          0          0          8   IO-APIC-edge      rtc0
>   9:          0          0          0          0   IO-APIC-fasteoi   acpi
>  12:          0          0         10       1519   IO-APIC-edge      i8042
>  14:          0          0         39      10995   IO-APIC-edge
> pata_atiixp
>  15:          0          0          3        391   IO-APIC-edge
> pata_atiixp
>  16:          0          0          2        606   IO-APIC-fasteoi
> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib
>  17:          0          0          0          3   IO-APIC-fasteoi
> ehci_hcd:usb1, parport0, ni-pci-gpib
>  18:          0          0         10       2168   IO-APIC-fasteoi
> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia
>  19:          0          0          0        130   IO-APIC-fasteoi
> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
>  22:          0          0          8       1151   IO-APIC-fasteoi   ahci
>  24:          0          0          0          0  HPET_MSI-edge      hpet2
>  29:          0          0          0         48   PCI-MSI-edge
> sky2@pci:0000:04:00.0
> NMI:          0          0          0          0   Non-maskable interrupts
> LOC:      34842      30177      29672      29632   Local timer interrupts
> SPU:          0          0          0          0   Spurious interrupts
> PMI:          0          0          0          0   Performance monitoring
> interrupts
> PND:          0          0          0          0   Performance pending work
> RES:      17501      20449      16670      11224   Rescheduling interrupts
> CAL:      10554       2336       1102       1071   Function call interrupts
> TLB:        364        562        753        468   TLB shootdowns
> ERR:          0
> MIS:          0
> 
> 
> # fdformat /dev/fd0u1440
> Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
> Formatting ... done
> Verifying ... done

Hmmm.. Thats very interesting indeed.

That clearly says that HPET MSI interrupts somehow is causing some
caching side effect in the chipset that results in this floppy dma
failure.

Here's is what we have until now.
IRQ 0 is based on HPET legacy interrupt and HPET device is also capable
of MSI on this platform. So we also have a percpu hpet (hpet2 tied to
CPU0). percpu hpet was added to avoid the usage of IRQ0+LAPIC broadcast
in cases where LAPIC timer will stop working in deep C-state. As we have
only one HPET channel free for percpu HPET, we only have hpet2 tied to
CPU 0 and other CPUs still have to go through IRQ0+LAPIC broadcast with
deep C-state.

One problem here is that percpu hpet should only get used when LAPIC
cannot be used (that is when CPU enters deep C-state). Using hpet2 in
place of LAPIC timer even when deep C-state is not supported is not
right in terms of performance. We need some changes here to fix that
[Problem 1].

But, that still does not explain why we are seeing this problem in the
first place. I mean, using hpet2 is not optimal, but should not have
functionality issues like this. Even fixing [Problem 1] above, we may
see this problem on some other platform that supports deep C-state and
so has hpet2 enabled for a valid reason.

Also, I am not sure whether the problem also happens if legacy HPET
interrupts are used during run time in place of LAPIC timer (May be
worth to try this with a simple test patch, let me think about it). In
this case, legacy HPET interrupt rightly goes quiet after boot, giving
priority to LAPIC timer.

With hpet MSI interrupts, we do a write followed by read of HPET
memmapped register to set a HPET channel timeout + read of global HPET
timer. This happens on every timer interrupt on CPU 0. And we also have
MSI interrupt being delivered to CPU 0. I cannot think of any reason why
this can break dma. We can probably try adding some dummy HPET read
after dma write, to see if that flushes things properly.

Thanks,
Venki


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 20:30                                                                                                                           ` Pallipadi, Venkatesh
@ 2009-12-23 20:34                                                                                                                             ` alain
  2009-12-23 21:34                                                                                                                               ` Pallipadi, Venkatesh
  2010-01-08 17:42                                                                                                                             ` Mark Hounschell
  1 sibling, 1 reply; 74+ messages in thread
From: alain @ 2009-12-23 20:34 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: markh@compro.net, Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com,
	Alain Knaff, Linux Kernel Mailing List, fdutils@fdutils.linux.lu,
	Li, Shaohua, Ingo Molnar

Pallipadi, Venkatesh wrote:
> MSI interrupt being delivered to CPU 0. I cannot think of any reason why
> this can break dma. We can probably try adding some dummy HPET read
> after dma write, to see if that flushes things properly.

Shouldn't that be "... some dummy HPET read _before_ dma write...". In
order to ensure that DMA cache is consistent _before_ dma controller
reads it?

Regards,

Alain

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 20:34                                                                                                                             ` alain
@ 2009-12-23 21:34                                                                                                                               ` Pallipadi, Venkatesh
  0 siblings, 0 replies; 74+ messages in thread
From: Pallipadi, Venkatesh @ 2009-12-23 21:34 UTC (permalink / raw)
  To: alain
  Cc: markh@compro.net, Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On Wed, 2009-12-23 at 12:34 -0800, alain wrote:
> Pallipadi, Venkatesh wrote:
> > MSI interrupt being delivered to CPU 0. I cannot think of any reason why
> > this can break dma. We can probably try adding some dummy HPET read
> > after dma write, to see if that flushes things properly.
> 
> Shouldn't that be "... some dummy HPET read _before_ dma write...". In
> order to ensure that DMA cache is consistent _before_ dma controller
> reads it?
> 

Yes. I meant after the contents of the buffer is changed and before the
DMA transfer and the controller reading it.

Thanks,
Venki



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2009-12-23 20:30                                                                                                                           ` Pallipadi, Venkatesh
  2009-12-23 20:34                                                                                                                             ` alain
@ 2010-01-08 17:42                                                                                                                             ` Mark Hounschell
  2010-01-12  0:19                                                                                                                               ` Pallipadi, Venkatesh
  1 sibling, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2010-01-08 17:42 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On 12/23/2009 03:30 PM, Pallipadi, Venkatesh wrote:

>>> Can you try this one line patch either on .28 or .32 (with /proc/interrupts
>>> output).
>>> This disables hpet2 and lapic timer should then be used on CPU 0. If things
>>> work with this test patch, we will know that the failure is somehow related
>>> to HPET usage in MSI mode.
>>>
>>> Thanks,
>>> Venki
>>>
>>> Reduce the rating of percpu hpet timer
>>>
>>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>>> ---
>>>  arch/x86/kernel/hpet.c |    2 +-
>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
>>> index cafb1c6..f89d17a 100644
>>> --- a/arch/x86/kernel/hpet.c
>>> +++ b/arch/x86/kernel/hpet.c
>>> @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
>>>  	hpet_setup_irq(hdev);
>>>  	evt->irq = hdev->irq;
>>>  
>>> -	evt->rating = 110;
>>> +	evt->rating = 40;
>>>  	evt->features = CLOCK_EVT_FEAT_ONESHOT;
>>>  	if (hdev->flags & HPET_DEV_PERI_CAP)
>>>  		evt->features |= CLOCK_EVT_FEAT_PERIODIC;
>>
>> That made it work. Used 2.6.32.2
>>
>> cat /proc/interrupts
>>            CPU0       CPU1       CPU2       CPU3
>>   0:         82          0          0          1   IO-APIC-edge      timer
>>   1:          0          0          0         67   IO-APIC-edge      i8042
>>   3:          0          0          0          6   IO-APIC-edge
>>   4:          0          0          0          4   IO-APIC-edge
>>   6:          0          0          0          4   IO-APIC-edge      floppy
>>   8:          0          0          0          8   IO-APIC-edge      rtc0
>>   9:          0          0          0          0   IO-APIC-fasteoi   acpi
>>  12:          0          0         10       1519   IO-APIC-edge      i8042
>>  14:          0          0         39      10995   IO-APIC-edge
>> pata_atiixp
>>  15:          0          0          3        391   IO-APIC-edge
>> pata_atiixp
>>  16:          0          0          2        606   IO-APIC-fasteoi
>> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib
>>  17:          0          0          0          3   IO-APIC-fasteoi
>> ehci_hcd:usb1, parport0, ni-pci-gpib
>>  18:          0          0         10       2168   IO-APIC-fasteoi
>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia
>>  19:          0          0          0        130   IO-APIC-fasteoi
>> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
>>  22:          0          0          8       1151   IO-APIC-fasteoi   ahci
>>  24:          0          0          0          0  HPET_MSI-edge      hpet2
>>  29:          0          0          0         48   PCI-MSI-edge
>> sky2@pci:0000:04:00.0
>> NMI:          0          0          0          0   Non-maskable interrupts
>> LOC:      34842      30177      29672      29632   Local timer interrupts
>> SPU:          0          0          0          0   Spurious interrupts
>> PMI:          0          0          0          0   Performance monitoring
>> interrupts
>> PND:          0          0          0          0   Performance pending work
>> RES:      17501      20449      16670      11224   Rescheduling interrupts
>> CAL:      10554       2336       1102       1071   Function call interrupts
>> TLB:        364        562        753        468   TLB shootdowns
>> ERR:          0
>> MIS:          0
>>
>>
>> # fdformat /dev/fd0u1440
>> Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
>> Formatting ... done
>> Verifying ... done
> 
> Hmmm.. Thats very interesting indeed.
> 
> That clearly says that HPET MSI interrupts somehow is causing some
> caching side effect in the chipset that results in this floppy dma
> failure.
> 
> Here's is what we have until now.
> IRQ 0 is based on HPET legacy interrupt and HPET device is also capable
> of MSI on this platform. So we also have a percpu hpet (hpet2 tied to
> CPU0). percpu hpet was added to avoid the usage of IRQ0+LAPIC broadcast
> in cases where LAPIC timer will stop working in deep C-state. As we have
> only one HPET channel free for percpu HPET, we only have hpet2 tied to
> CPU 0 and other CPUs still have to go through IRQ0+LAPIC broadcast with
> deep C-state.
> 
> One problem here is that percpu hpet should only get used when LAPIC
> cannot be used (that is when CPU enters deep C-state). Using hpet2 in
> place of LAPIC timer even when deep C-state is not supported is not
> right in terms of performance. We need some changes here to fix that
> [Problem 1].
> 
> But, that still does not explain why we are seeing this problem in the
> first place. I mean, using hpet2 is not optimal, but should not have
> functionality issues like this. Even fixing [Problem 1] above, we may
> see this problem on some other platform that supports deep C-state and
> so has hpet2 enabled for a valid reason.
> 
> Also, I am not sure whether the problem also happens if legacy HPET
> interrupts are used during run time in place of LAPIC timer (May be
> worth to try this with a simple test patch, let me think about it). In
> this case, legacy HPET interrupt rightly goes quiet after boot, giving
> priority to LAPIC timer.
> 
> With hpet MSI interrupts, we do a write followed by read of HPET
> memmapped register to set a HPET channel timeout + read of global HPET
> timer. This happens on every timer interrupt on CPU 0. And we also have
> MSI interrupt being delivered to CPU 0. I cannot think of any reason why
> this can break dma. We can probably try adding some dummy HPET read
> after dma write, to see if that flushes things properly.
> 

Haven't seen any activity on this thread in a while. Just curious, are we
still working this?
Is there anything else I can do to help?

Thanks
Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2010-01-08 17:42                                                                                                                             ` Mark Hounschell
@ 2010-01-12  0:19                                                                                                                               ` Pallipadi, Venkatesh
  2010-01-12  9:04                                                                                                                                 ` Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Pallipadi, Venkatesh @ 2010-01-12  0:19 UTC (permalink / raw)
  To: markh@compro.net
  Cc: Andi Kleen, Linus Torvalds, dmarkh@cfl.rr.com, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On Fri, 2010-01-08 at 09:42 -0800, Mark Hounschell wrote:
> On 12/23/2009 03:30 PM, Pallipadi, Venkatesh wrote:
> 
> >>> Can you try this one line patch either on .28 or .32 (with /proc/interrupts
> >>> output).
> >>> This disables hpet2 and lapic timer should then be used on CPU 0. If things
> >>> work with this test patch, we will know that the failure is somehow related
> >>> to HPET usage in MSI mode.
> >>>
> >>> Thanks,
> >>> Venki
> >>>
> >>> Reduce the rating of percpu hpet timer
> >>>
> >>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> >>> ---
> >>>  arch/x86/kernel/hpet.c |    2 +-
> >>>  1 files changed, 1 insertions(+), 1 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> >>> index cafb1c6..f89d17a 100644
> >>> --- a/arch/x86/kernel/hpet.c
> >>> +++ b/arch/x86/kernel/hpet.c
> >>> @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
> >>>  	hpet_setup_irq(hdev);
> >>>  	evt->irq = hdev->irq;
> >>>  
> >>> -	evt->rating = 110;
> >>> +	evt->rating = 40;
> >>>  	evt->features = CLOCK_EVT_FEAT_ONESHOT;
> >>>  	if (hdev->flags & HPET_DEV_PERI_CAP)
> >>>  		evt->features |= CLOCK_EVT_FEAT_PERIODIC;
> >>
> >> That made it work. Used 2.6.32.2
> >>
> >> cat /proc/interrupts
> >>            CPU0       CPU1       CPU2       CPU3
> >>   0:         82          0          0          1   IO-APIC-edge      timer
> >>   1:          0          0          0         67   IO-APIC-edge      i8042
> >>   3:          0          0          0          6   IO-APIC-edge
> >>   4:          0          0          0          4   IO-APIC-edge
> >>   6:          0          0          0          4   IO-APIC-edge      floppy
> >>   8:          0          0          0          8   IO-APIC-edge      rtc0
> >>   9:          0          0          0          0   IO-APIC-fasteoi   acpi
> >>  12:          0          0         10       1519   IO-APIC-edge      i8042
> >>  14:          0          0         39      10995   IO-APIC-edge
> >> pata_atiixp
> >>  15:          0          0          3        391   IO-APIC-edge
> >> pata_atiixp
> >>  16:          0          0          2        606   IO-APIC-fasteoi
> >> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib
> >>  17:          0          0          0          3   IO-APIC-fasteoi
> >> ehci_hcd:usb1, parport0, ni-pci-gpib
> >>  18:          0          0         10       2168   IO-APIC-fasteoi
> >> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia
> >>  19:          0          0          0        130   IO-APIC-fasteoi
> >> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
> >>  22:          0          0          8       1151   IO-APIC-fasteoi   ahci
> >>  24:          0          0          0          0  HPET_MSI-edge      hpet2
> >>  29:          0          0          0         48   PCI-MSI-edge
> >> sky2@pci:0000:04:00.0
> >> NMI:          0          0          0          0   Non-maskable interrupts
> >> LOC:      34842      30177      29672      29632   Local timer interrupts
> >> SPU:          0          0          0          0   Spurious interrupts
> >> PMI:          0          0          0          0   Performance monitoring
> >> interrupts
> >> PND:          0          0          0          0   Performance pending work
> >> RES:      17501      20449      16670      11224   Rescheduling interrupts
> >> CAL:      10554       2336       1102       1071   Function call interrupts
> >> TLB:        364        562        753        468   TLB shootdowns
> >> ERR:          0
> >> MIS:          0
> >>
> >>
> >> # fdformat /dev/fd0u1440
> >> Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
> >> Formatting ... done
> >> Verifying ... done
> > 
> > Hmmm.. Thats very interesting indeed.
> > 
> > That clearly says that HPET MSI interrupts somehow is causing some
> > caching side effect in the chipset that results in this floppy dma
> > failure.
> > 
> > Here's is what we have until now.
> > IRQ 0 is based on HPET legacy interrupt and HPET device is also capable
> > of MSI on this platform. So we also have a percpu hpet (hpet2 tied to
> > CPU0). percpu hpet was added to avoid the usage of IRQ0+LAPIC broadcast
> > in cases where LAPIC timer will stop working in deep C-state. As we have
> > only one HPET channel free for percpu HPET, we only have hpet2 tied to
> > CPU 0 and other CPUs still have to go through IRQ0+LAPIC broadcast with
> > deep C-state.
> > 
> > One problem here is that percpu hpet should only get used when LAPIC
> > cannot be used (that is when CPU enters deep C-state). Using hpet2 in
> > place of LAPIC timer even when deep C-state is not supported is not
> > right in terms of performance. We need some changes here to fix that
> > [Problem 1].
> > 
> > But, that still does not explain why we are seeing this problem in the
> > first place. I mean, using hpet2 is not optimal, but should not have
> > functionality issues like this. Even fixing [Problem 1] above, we may
> > see this problem on some other platform that supports deep C-state and
> > so has hpet2 enabled for a valid reason.
> > 
> > Also, I am not sure whether the problem also happens if legacy HPET
> > interrupts are used during run time in place of LAPIC timer (May be
> > worth to try this with a simple test patch, let me think about it). In
> > this case, legacy HPET interrupt rightly goes quiet after boot, giving
> > priority to LAPIC timer.
> > 
> > With hpet MSI interrupts, we do a write followed by read of HPET
> > memmapped register to set a HPET channel timeout + read of global HPET
> > timer. This happens on every timer interrupt on CPU 0. And we also have
> > MSI interrupt being delivered to CPU 0. I cannot think of any reason why
> > this can break dma. We can probably try adding some dummy HPET read
> > after dma write, to see if that flushes things properly.
> > 
> 
> Haven't seen any activity on this thread in a while. Just curious, are we
> still working this?
> Is there anything else I can do to help?

Sorry for not following up on this. We have narrowed this down to HPET
MSI and floppy DMA. I still don't know how HPET MSI interrupts are
breaking floppy DMA.

You are seeing the problem on two different systems. Correct? Do you
have any system where this works with HPET MSI enabled?

Couple of options on how we can go about this one:
1) Change the HPET-MSI change to not get activated when there are no
C-states with LAPIC stoppage involved. This will resolve the problem on
the systems you reported as there are no deep C-states. But, I fear that
with the actual problem unresolved, we may hit it in future with this or
some other platform having same issue with CPUs that support deep
C-state.
2) Try this testcase on few other platforms that support HPET-MSI and
deep C-states and check how widespread the problem is and then add a
whitelist-blacklist for HPET MSI usage.

I think, for 2.6.33 option 1 is better. Will work on that and send in
patches for you test.

Thanks,
Venki 
 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2010-01-12  0:19                                                                                                                               ` Pallipadi, Venkatesh
@ 2010-01-12  9:04                                                                                                                                 ` Mark Hounschell
  2010-01-15  2:01                                                                                                                                   ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2010-01-12  9:04 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: markh@compro.net, Andi Kleen, Linus Torvalds, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On 01/11/2010 07:19 PM, Pallipadi, Venkatesh wrote:
> On Fri, 2010-01-08 at 09:42 -0800, Mark Hounschell wrote:
>> On 12/23/2009 03:30 PM, Pallipadi, Venkatesh wrote:
>>
>>>>> Can you try this one line patch either on .28 or .32 (with /proc/interrupts
>>>>> output).
>>>>> This disables hpet2 and lapic timer should then be used on CPU 0. If things
>>>>> work with this test patch, we will know that the failure is somehow related
>>>>> to HPET usage in MSI mode.
>>>>>
>>>>> Thanks,
>>>>> Venki
>>>>>
>>>>> Reduce the rating of percpu hpet timer
>>>>>
>>>>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>>>>> ---
>>>>>  arch/x86/kernel/hpet.c |    2 +-
>>>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
>>>>> index cafb1c6..f89d17a 100644
>>>>> --- a/arch/x86/kernel/hpet.c
>>>>> +++ b/arch/x86/kernel/hpet.c
>>>>> @@ -480,7 +480,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
>>>>>  	hpet_setup_irq(hdev);
>>>>>  	evt->irq = hdev->irq;
>>>>>  
>>>>> -	evt->rating = 110;
>>>>> +	evt->rating = 40;
>>>>>  	evt->features = CLOCK_EVT_FEAT_ONESHOT;
>>>>>  	if (hdev->flags & HPET_DEV_PERI_CAP)
>>>>>  		evt->features |= CLOCK_EVT_FEAT_PERIODIC;
>>>>
>>>> That made it work. Used 2.6.32.2
>>>>
>>>> cat /proc/interrupts
>>>>            CPU0       CPU1       CPU2       CPU3
>>>>   0:         82          0          0          1   IO-APIC-edge      timer
>>>>   1:          0          0          0         67   IO-APIC-edge      i8042
>>>>   3:          0          0          0          6   IO-APIC-edge
>>>>   4:          0          0          0          4   IO-APIC-edge
>>>>   6:          0          0          0          4   IO-APIC-edge      floppy
>>>>   8:          0          0          0          8   IO-APIC-edge      rtc0
>>>>   9:          0          0          0          0   IO-APIC-fasteoi   acpi
>>>>  12:          0          0         10       1519   IO-APIC-edge      i8042
>>>>  14:          0          0         39      10995   IO-APIC-edge
>>>> pata_atiixp
>>>>  15:          0          0          3        391   IO-APIC-edge
>>>> pata_atiixp
>>>>  16:          0          0          2        606   IO-APIC-fasteoi
>>>> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel, Digi DBX2, ni-pci-gpib
>>>>  17:          0          0          0          3   IO-APIC-fasteoi
>>>> ehci_hcd:usb1, parport0, ni-pci-gpib
>>>>  18:          0          0         10       2168   IO-APIC-fasteoi
>>>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, Digi DBX2, nvidia
>>>>  19:          0          0          0        130   IO-APIC-fasteoi
>>>> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
>>>>  22:          0          0          8       1151   IO-APIC-fasteoi   ahci
>>>>  24:          0          0          0          0  HPET_MSI-edge      hpet2
>>>>  29:          0          0          0         48   PCI-MSI-edge
>>>> sky2@pci:0000:04:00.0
>>>> NMI:          0          0          0          0   Non-maskable interrupts
>>>> LOC:      34842      30177      29672      29632   Local timer interrupts
>>>> SPU:          0          0          0          0   Spurious interrupts
>>>> PMI:          0          0          0          0   Performance monitoring
>>>> interrupts
>>>> PND:          0          0          0          0   Performance pending work
>>>> RES:      17501      20449      16670      11224   Rescheduling interrupts
>>>> CAL:      10554       2336       1102       1071   Function call interrupts
>>>> TLB:        364        562        753        468   TLB shootdowns
>>>> ERR:          0
>>>> MIS:          0
>>>>
>>>>
>>>> # fdformat /dev/fd0u1440
>>>> Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
>>>> Formatting ... done
>>>> Verifying ... done
>>>
>>> Hmmm.. Thats very interesting indeed.
>>>
>>> That clearly says that HPET MSI interrupts somehow is causing some
>>> caching side effect in the chipset that results in this floppy dma
>>> failure.
>>>
>>> Here's is what we have until now.
>>> IRQ 0 is based on HPET legacy interrupt and HPET device is also capable
>>> of MSI on this platform. So we also have a percpu hpet (hpet2 tied to
>>> CPU0). percpu hpet was added to avoid the usage of IRQ0+LAPIC broadcast
>>> in cases where LAPIC timer will stop working in deep C-state. As we have
>>> only one HPET channel free for percpu HPET, we only have hpet2 tied to
>>> CPU 0 and other CPUs still have to go through IRQ0+LAPIC broadcast with
>>> deep C-state.
>>>
>>> One problem here is that percpu hpet should only get used when LAPIC
>>> cannot be used (that is when CPU enters deep C-state). Using hpet2 in
>>> place of LAPIC timer even when deep C-state is not supported is not
>>> right in terms of performance. We need some changes here to fix that
>>> [Problem 1].
>>>
>>> But, that still does not explain why we are seeing this problem in the
>>> first place. I mean, using hpet2 is not optimal, but should not have
>>> functionality issues like this. Even fixing [Problem 1] above, we may
>>> see this problem on some other platform that supports deep C-state and
>>> so has hpet2 enabled for a valid reason.
>>>
>>> Also, I am not sure whether the problem also happens if legacy HPET
>>> interrupts are used during run time in place of LAPIC timer (May be
>>> worth to try this with a simple test patch, let me think about it). In
>>> this case, legacy HPET interrupt rightly goes quiet after boot, giving
>>> priority to LAPIC timer.
>>>
>>> With hpet MSI interrupts, we do a write followed by read of HPET
>>> memmapped register to set a HPET channel timeout + read of global HPET
>>> timer. This happens on every timer interrupt on CPU 0. And we also have
>>> MSI interrupt being delivered to CPU 0. I cannot think of any reason why
>>> this can break dma. We can probably try adding some dummy HPET read
>>> after dma write, to see if that flushes things properly.
>>>
>>
>> Haven't seen any activity on this thread in a while. Just curious, are we
>> still working this?
>> Is there anything else I can do to help?
> 
> Sorry for not following up on this. We have narrowed this down to HPET
> MSI and floppy DMA. I still don't know how HPET MSI interrupts are
> breaking floppy DMA.
> 
> You are seeing the problem on two different systems. Correct? Do you
> have any system where this works with HPET MSI enabled?
> 

I see the problem on every system in which the HPET2 shows up in
/proc/interrupts. The machines that work with HPET enabled don't show HPET
at all in /proc/interrupts.  I have some of each. All the boxes that fail
here use the (AMD) 790x series chip sets.

> Couple of options on how we can go about this one:
> 1) Change the HPET-MSI change to not get activated when there are no
> C-states with LAPIC stoppage involved. This will resolve the problem on
> the systems you reported as there are no deep C-states. But, I fear that
> with the actual problem unresolved, we may hit it in future with this or
> some other platform having same issue with CPUs that support deep
> C-state.
> 2) Try this testcase on few other platforms that support HPET-MSI and
> deep C-states and check how widespread the problem is and then add a
> whitelist-blacklist for HPET MSI usage.
> 
> I think, for 2.6.33 option 1 is better. Will work on that and send in
> patches for you test.
> 

OK, thanks
Mark


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2010-01-12  9:04                                                                                                                                 ` Mark Hounschell
@ 2010-01-15  2:01                                                                                                                                   ` Pallipadi, Venkatesh
  2010-01-15  9:39                                                                                                                                     ` Mark Hounschell
  2010-01-15 18:02                                                                                                                                     ` Mark Hounschell
  0 siblings, 2 replies; 74+ messages in thread
From: Pallipadi, Venkatesh @ 2010-01-15  2:01 UTC (permalink / raw)
  To: dmarkh@cfl.rr.com
  Cc: markh@compro.net, Andi Kleen, Linus Torvalds, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On Tue, 2010-01-12 at 01:04 -0800, Mark Hounschell wrote:
> On 01/11/2010 07:19 PM, Pallipadi, Venkatesh wrote:
> > 
> > Sorry for not following up on this. We have narrowed this down to HPET
> > MSI and floppy DMA. I still don't know how HPET MSI interrupts are
> > breaking floppy DMA.
> > 
> > You are seeing the problem on two different systems. Correct? Do you
> > have any system where this works with HPET MSI enabled?
> > 
> 
> I see the problem on every system in which the HPET2 shows up in
> /proc/interrupts. The machines that work with HPET enabled don't show HPET
> at all in /proc/interrupts.  I have some of each. All the boxes that fail
> here use the (AMD) 790x series chip sets.
> 
> > Couple of options on how we can go about this one:
> > 1) Change the HPET-MSI change to not get activated when there are no
> > C-states with LAPIC stoppage involved. This will resolve the problem on
> > the systems you reported as there are no deep C-states. But, I fear that
> > with the actual problem unresolved, we may hit it in future with this or
> > some other platform having same issue with CPUs that support deep
> > C-state.
> > 2) Try this testcase on few other platforms that support HPET-MSI and
> > deep C-states and check how widespread the problem is and then add a
> > whitelist-blacklist for HPET MSI usage.
> > 
> > I think, for 2.6.33 option 1 is better. Will work on that and send in
> > patches for you test.
> > 
> 

Mark,

I just sent out a patchset that should workaround the problem here. Can
you check and let me know whether thats the case.

We will still need a simpler/smaller workaround for .33. Will send a
patch for that soon.

Also, are you testing this with usb floppy controller? I tried to test
it on my end, but fdformat doesn't seem to like my usb floppy drive. I
tried, 'ufiformat -f 1440 <dev>', with which I am not able to reproduce
the failure on any of my boxes. Not sure whether that really means I
don't hit this bug or that is going through totally different code path.

Thanks,
Venki
 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2010-01-15  2:01                                                                                                                                   ` Pallipadi, Venkatesh
@ 2010-01-15  9:39                                                                                                                                     ` Mark Hounschell
  2010-01-15 18:02                                                                                                                                     ` Mark Hounschell
  1 sibling, 0 replies; 74+ messages in thread
From: Mark Hounschell @ 2010-01-15  9:39 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: markh@compro.net, Andi Kleen, Linus Torvalds, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On 01/14/2010 09:01 PM, Pallipadi, Venkatesh wrote:
> On Tue, 2010-01-12 at 01:04 -0800, Mark Hounschell wrote:
>> On 01/11/2010 07:19 PM, Pallipadi, Venkatesh wrote:
>>>
>>> Sorry for not following up on this. We have narrowed this down to HPET
>>> MSI and floppy DMA. I still don't know how HPET MSI interrupts are
>>> breaking floppy DMA.
>>>
>>> You are seeing the problem on two different systems. Correct? Do you
>>> have any system where this works with HPET MSI enabled?
>>>
>>
>> I see the problem on every system in which the HPET2 shows up in
>> /proc/interrupts. The machines that work with HPET enabled don't show HPET
>> at all in /proc/interrupts.  I have some of each. All the boxes that fail
>> here use the (AMD) 790x series chip sets.
>>
>>> Couple of options on how we can go about this one:
>>> 1) Change the HPET-MSI change to not get activated when there are no
>>> C-states with LAPIC stoppage involved. This will resolve the problem on
>>> the systems you reported as there are no deep C-states. But, I fear that
>>> with the actual problem unresolved, we may hit it in future with this or
>>> some other platform having same issue with CPUs that support deep
>>> C-state.
>>> 2) Try this testcase on few other platforms that support HPET-MSI and
>>> deep C-states and check how widespread the problem is and then add a
>>> whitelist-blacklist for HPET MSI usage.
>>>
>>> I think, for 2.6.33 option 1 is better. Will work on that and send in
>>> patches for you test.
>>>
>>
> 
> Mark,
> 
> I just sent out a patchset that should workaround the problem here. Can
> you check and let me know whether thats the case.
> 

Yes, I'll try that today. I  assume I'll find them on LMKL.

> We will still need a simpler/smaller workaround for .33. Will send a
> patch for that soon.
> 
> Also, are you testing this with usb floppy controller? I tried to test
> it on my end, but fdformat doesn't seem to like my usb floppy drive. I
> tried, 'ufiformat -f 1440 <dev>', with which I am not able to reproduce
> the failure on any of my boxes. Not sure whether that really means I
> don't hit this bug or that is going through totally different code path.
> 

No, I've never even seen a USB floppy controller.

Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28
  2010-01-15  2:01                                                                                                                                   ` Pallipadi, Venkatesh
  2010-01-15  9:39                                                                                                                                     ` Mark Hounschell
@ 2010-01-15 18:02                                                                                                                                     ` Mark Hounschell
  2010-01-21 19:09                                                                                                                                       ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh
  1 sibling, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2010-01-15 18:02 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: dmarkh@cfl.rr.com, Andi Kleen, Linus Torvalds, Alain Knaff,
	Linux Kernel Mailing List, fdutils@fdutils.linux.lu, Li, Shaohua,
	Ingo Molnar

On 01/14/2010 09:01 PM, Pallipadi, Venkatesh wrote:
> On Tue, 2010-01-12 at 01:04 -0800, Mark Hounschell wrote:
>> On 01/11/2010 07:19 PM, Pallipadi, Venkatesh wrote:
>>>
>>> Sorry for not following up on this. We have narrowed this down to HPET
>>> MSI and floppy DMA. I still don't know how HPET MSI interrupts are
>>> breaking floppy DMA.
>>>
>>> You are seeing the problem on two different systems. Correct? Do you
>>> have any system where this works with HPET MSI enabled?
>>>
>>
>> I see the problem on every system in which the HPET2 shows up in
>> /proc/interrupts. The machines that work with HPET enabled don't show HPET
>> at all in /proc/interrupts.  I have some of each. All the boxes that fail
>> here use the (AMD) 790x series chip sets.
>>
>>> Couple of options on how we can go about this one:
>>> 1) Change the HPET-MSI change to not get activated when there are no
>>> C-states with LAPIC stoppage involved. This will resolve the problem on
>>> the systems you reported as there are no deep C-states. But, I fear that
>>> with the actual problem unresolved, we may hit it in future with this or
>>> some other platform having same issue with CPUs that support deep
>>> C-state.
>>> 2) Try this testcase on few other platforms that support HPET-MSI and
>>> deep C-states and check how widespread the problem is and then add a
>>> whitelist-blacklist for HPET MSI usage.
>>>
>>> I think, for 2.6.33 option 1 is better. Will work on that and send in
>>> patches for you test.
>>>
>>
> 
> Mark,
> 
> I just sent out a patchset that should workaround the problem here. Can
> you check and let me know whether thats the case.
> 

Yes, it does seem to fix the issue. The floppy formats and /proc/interrupts
look normal with nothing going on with the hpet2 msi.

Regards
Mark


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-15 18:02                                                                                                                                     ` Mark Hounschell
@ 2010-01-21 19:09                                                                                                                                       ` Pallipadi, Venkatesh
  2010-01-22 22:00                                                                                                                                         ` [tip:x86/urgent] " tip-bot for Pallipadi, Venkatesh
                                                                                                                                                           ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Pallipadi, Venkatesh @ 2010-01-21 19:09 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: Pallipadi, Venkatesh, dmarkh@cfl.rr.com, Andi Kleen,
	Linus Torvalds, Alain Knaff, Linux Kernel Mailing List,
	fdutils@fdutils.linux.lu, Li, Shaohua, Ingo Molnar, H Peter Anvin,
	Thomas Gleixner, stable


HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
side-effects on floppy DMA. Do not use HPET MSI on such platforms.

Original problem report from Mark Hounschell
http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html

Tested-by: Mark Hounschell <markh@compro.net>

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
---

This patch needs to go to stable as well. But, there are some conflicts that prevents
the patch from going as is. I can rebase/resubmit to stable once the patch goes upstream.

 arch/x86/include/asm/hpet.h |    1 +
 arch/x86/kernel/hpet.c      |    8 ++++++++
 arch/x86/kernel/quirks.c    |   13 +++++++++++++
 3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 5d89fd2..1d5c08a 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -67,6 +67,7 @@ extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
 extern u8 hpet_blockid;
 extern int hpet_force_user;
+extern u8 hpet_msi_disable;
 extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index ba6e658..ad80a1c 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -34,6 +34,8 @@
  */
 unsigned long				hpet_address;
 u8					hpet_blockid; /* OS timer block num */
+u8					hpet_msi_disable;
+
 #ifdef CONFIG_PCI_MSI
 static unsigned long			hpet_num_timers;
 #endif
@@ -596,6 +598,9 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
 	unsigned int num_timers_used = 0;
 	int i;
 
+	if (hpet_msi_disable)
+		return;
+
 	if (boot_cpu_has(X86_FEATURE_ARAT))
 		return;
 	id = hpet_readl(HPET_ID);
@@ -928,6 +933,9 @@ static __init int hpet_late_init(void)
 	hpet_reserve_platform_timers(hpet_readl(HPET_ID));
 	hpet_print_config();
 
+	if (hpet_msi_disable)
+		return 0;
+
 	if (boot_cpu_has(X86_FEATURE_ARAT))
 		return 0;
 
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 18093d7..12e9fea 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -491,6 +491,19 @@ void force_hpet_resume(void)
 		break;
 	}
 }
+
+/*
+ * HPET MSI on some boards (ATI SB700/SB800) has side effect on
+ * floppy DMA. Disable HPET MSI on such platforms.
+ */
+static void force_disable_hpet_msi(struct pci_dev *unused)
+{
+	hpet_msi_disable = 1;
+}
+
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
+			 force_disable_hpet_msi);
+
 #endif
 
 #if defined(CONFIG_PCI) && defined(CONFIG_NUMA)
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/urgent] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-21 19:09                                                                                                                                       ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh
@ 2010-01-22 22:00                                                                                                                                         ` tip-bot for Pallipadi, Venkatesh
  2010-01-23  6:51                                                                                                                                         ` tip-bot for Pallipadi, Venkatesh
  2010-01-23  7:21                                                                                                                                         ` [PATCH] " Yuhong Bao
  2 siblings, 0 replies; 74+ messages in thread
From: tip-bot for Pallipadi, Venkatesh @ 2010-01-22 22:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, markh, stable, venkatesh.pallipadi,
	tglx

Commit-ID:  9f0b0ce525f19ef408e877b1c7662b60424c7cdc
Gitweb:     http://git.kernel.org/tip/9f0b0ce525f19ef408e877b1c7662b60424c7cdc
Author:     Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
AuthorDate: Thu, 21 Jan 2010 11:09:52 -0800
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Fri, 22 Jan 2010 13:47:01 -0800

x86: Disable HPET MSI on ATI SB700/SB800

HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
side-effects on floppy DMA. Do not use HPET MSI on such platforms.

Original problem report from Mark Hounschell
http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html

[ This patch needs to go to stable as well. But, there are some
  conflicts that prevents the patch from going as is. I can
  rebase/resubmit to stable once the patch goes upstream.
  hpa: still Cc:'ing stable@ as an FYI. ]

Tested-by: Mark Hounschell <markh@compro.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
LKML-Reference: <20100121190952.GA32523@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/include/asm/hpet.h |    1 +
 arch/x86/kernel/hpet.c      |    8 ++++++++
 arch/x86/kernel/quirks.c    |   13 +++++++++++++
 3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 5d89fd2..1d5c08a 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -67,6 +67,7 @@ extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
 extern u8 hpet_blockid;
 extern int hpet_force_user;
+extern u8 hpet_msi_disable;
 extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index ba6e658..ad80a1c 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -34,6 +34,8 @@
  */
 unsigned long				hpet_address;
 u8					hpet_blockid; /* OS timer block num */
+u8					hpet_msi_disable;
+
 #ifdef CONFIG_PCI_MSI
 static unsigned long			hpet_num_timers;
 #endif
@@ -596,6 +598,9 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
 	unsigned int num_timers_used = 0;
 	int i;
 
+	if (hpet_msi_disable)
+		return;
+
 	if (boot_cpu_has(X86_FEATURE_ARAT))
 		return;
 	id = hpet_readl(HPET_ID);
@@ -928,6 +933,9 @@ static __init int hpet_late_init(void)
 	hpet_reserve_platform_timers(hpet_readl(HPET_ID));
 	hpet_print_config();
 
+	if (hpet_msi_disable)
+		return 0;
+
 	if (boot_cpu_has(X86_FEATURE_ARAT))
 		return 0;
 
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 18093d7..12e9fea 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -491,6 +491,19 @@ void force_hpet_resume(void)
 		break;
 	}
 }
+
+/*
+ * HPET MSI on some boards (ATI SB700/SB800) has side effect on
+ * floppy DMA. Disable HPET MSI on such platforms.
+ */
+static void force_disable_hpet_msi(struct pci_dev *unused)
+{
+	hpet_msi_disable = 1;
+}
+
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
+			 force_disable_hpet_msi);
+
 #endif
 
 #if defined(CONFIG_PCI) && defined(CONFIG_NUMA)

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/urgent] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-21 19:09                                                                                                                                       ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh
  2010-01-22 22:00                                                                                                                                         ` [tip:x86/urgent] " tip-bot for Pallipadi, Venkatesh
@ 2010-01-23  6:51                                                                                                                                         ` tip-bot for Pallipadi, Venkatesh
  2010-01-23  7:21                                                                                                                                         ` [PATCH] " Yuhong Bao
  2 siblings, 0 replies; 74+ messages in thread
From: tip-bot for Pallipadi, Venkatesh @ 2010-01-23  6:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, markh, stable, venkatesh.pallipadi,
	tglx

Commit-ID:  73472a46b5b28116b145fb5fc05242c1aa8e1461
Gitweb:     http://git.kernel.org/tip/73472a46b5b28116b145fb5fc05242c1aa8e1461
Author:     Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
AuthorDate: Thu, 21 Jan 2010 11:09:52 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sat, 23 Jan 2010 06:21:58 +0100

x86: Disable HPET MSI on ATI SB700/SB800

HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
side-effects on floppy DMA. Do not use HPET MSI on such platforms.

Original problem report from Mark Hounschell
http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html

[ This patch needs to go to stable as well. But, there are some
  conflicts that prevents the patch from going as is. I can
  rebase/resubmit to stable once the patch goes upstream.
  hpa: still Cc:'ing stable@ as an FYI. ]

Tested-by: Mark Hounschell <markh@compro.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
LKML-Reference: <20100121190952.GA32523@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/include/asm/hpet.h |    1 +
 arch/x86/kernel/hpet.c      |    8 ++++++++
 arch/x86/kernel/quirks.c    |   13 +++++++++++++
 3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 5d89fd2..1d5c08a 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -67,6 +67,7 @@ extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
 extern u8 hpet_blockid;
 extern int hpet_force_user;
+extern u8 hpet_msi_disable;
 extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index ba6e658..ad80a1c 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -34,6 +34,8 @@
  */
 unsigned long				hpet_address;
 u8					hpet_blockid; /* OS timer block num */
+u8					hpet_msi_disable;
+
 #ifdef CONFIG_PCI_MSI
 static unsigned long			hpet_num_timers;
 #endif
@@ -596,6 +598,9 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
 	unsigned int num_timers_used = 0;
 	int i;
 
+	if (hpet_msi_disable)
+		return;
+
 	if (boot_cpu_has(X86_FEATURE_ARAT))
 		return;
 	id = hpet_readl(HPET_ID);
@@ -928,6 +933,9 @@ static __init int hpet_late_init(void)
 	hpet_reserve_platform_timers(hpet_readl(HPET_ID));
 	hpet_print_config();
 
+	if (hpet_msi_disable)
+		return 0;
+
 	if (boot_cpu_has(X86_FEATURE_ARAT))
 		return 0;
 
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 18093d7..12e9fea 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -491,6 +491,19 @@ void force_hpet_resume(void)
 		break;
 	}
 }
+
+/*
+ * HPET MSI on some boards (ATI SB700/SB800) has side effect on
+ * floppy DMA. Disable HPET MSI on such platforms.
+ */
+static void force_disable_hpet_msi(struct pci_dev *unused)
+{
+	hpet_msi_disable = 1;
+}
+
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
+			 force_disable_hpet_msi);
+
 #endif
 
 #if defined(CONFIG_PCI) && defined(CONFIG_NUMA)

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* RE: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-21 19:09                                                                                                                                       ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh
  2010-01-22 22:00                                                                                                                                         ` [tip:x86/urgent] " tip-bot for Pallipadi, Venkatesh
  2010-01-23  6:51                                                                                                                                         ` tip-bot for Pallipadi, Venkatesh
@ 2010-01-23  7:21                                                                                                                                         ` Yuhong Bao
  2010-01-25 17:10                                                                                                                                           ` Andreas Herrmann
  2 siblings, 1 reply; 74+ messages in thread
From: Yuhong Bao @ 2010-01-23  7:21 UTC (permalink / raw)
  To: venkatesh.pallipadi, markh
  Cc: dmarkh, andi, Linus Torvalds, alain, linux-kernel,
	andreas.herrmann3


> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
> side-effects on floppy DMA. Do not use HPET MSI on such platforms.
I think somebody from AMD should review the situation.Clearly something is happening inside their southbridge.CCing Andreas Herrmann from AMD.
Yuhong Bao 		 	   		  
_________________________________________________________________
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
http://clk.atdmt.com/GBL/go/196390706/direct/01/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-23  7:21                                                                                                                                         ` [PATCH] " Yuhong Bao
@ 2010-01-25 17:10                                                                                                                                           ` Andreas Herrmann
  2010-01-28  9:17                                                                                                                                             ` Mark Hounschell
  2010-05-17 14:59                                                                                                                                             ` Andreas Herrmann
  0 siblings, 2 replies; 74+ messages in thread
From: Andreas Herrmann @ 2010-01-25 17:10 UTC (permalink / raw)
  To: Yuhong Bao
  Cc: venkatesh.pallipadi, markh, dmarkh, andi, Linus Torvalds, alain,
	linux-kernel

On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote:
> 
> > HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
> > side-effects on floppy DMA. Do not use HPET MSI on such platforms.

Argh, will see what information I can find about this problem ...

> I think somebody from AMD should review the situation.Clearly
  something is happening inside their southbridge.CCing Andreas
  Herrmann from AMD.

I have the feeling that this problem should rather be fixed with a DMI
quirk instead of disabling HPET MSI for the entire chipset.

Was the latest available BIOS installed on the affected system?


Thanks,
Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-25 17:10                                                                                                                                           ` Andreas Herrmann
@ 2010-01-28  9:17                                                                                                                                             ` Mark Hounschell
  2010-01-28 13:25                                                                                                                                               ` Mark Hounschell
  2010-05-17 14:59                                                                                                                                             ` Andreas Herrmann
  1 sibling, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2010-01-28  9:17 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Yuhong Bao, venkatesh.pallipadi, markh, andi, Linus Torvalds,
	alain, linux-kernel

On 01/25/2010 12:10 PM, Andreas Herrmann wrote:
> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote:
>>
>>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
>>> side-effects on floppy DMA. Do not use HPET MSI on such platforms.
> 
> Argh, will see what information I can find about this problem ...
> 
>> I think somebody from AMD should review the situation.Clearly
>   something is happening inside their southbridge.CCing Andreas
>   Herrmann from AMD.
> 
> I have the feeling that this problem should rather be fixed with a DMI
> quirk instead of disabling HPET MSI for the entire chipset.
> 
> Was the latest available BIOS installed on the affected system?
> 

You mean "systems" of different manufactures?  I will check today. Due to
mis configured filters I didn't see this until today. Sorry.

Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-28  9:17                                                                                                                                             ` Mark Hounschell
@ 2010-01-28 13:25                                                                                                                                               ` Mark Hounschell
  2010-01-28 13:41                                                                                                                                                 ` Borislav Petkov
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Hounschell @ 2010-01-28 13:25 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: dmarkh, Yuhong Bao, venkatesh.pallipadi, andi, Linus Torvalds,
	alain, linux-kernel

On 01/28/2010 04:17 AM, Mark Hounschell wrote:
> On 01/25/2010 12:10 PM, Andreas Herrmann wrote:
>> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote:
>>>
>>>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
>>>> side-effects on floppy DMA. Do not use HPET MSI on such platforms.
>>
>> Argh, will see what information I can find about this problem ...
>>
>>> I think somebody from AMD should review the situation.Clearly
>>   something is happening inside their southbridge.CCing Andreas
>>   Herrmann from AMD.
>>
>> I have the feeling that this problem should rather be fixed with a DMI
>> quirk instead of disabling HPET MSI for the entire chipset.
>>
>> Was the latest available BIOS installed on the affected system?
>>
> 
> You mean "systems" of different manufactures?  I will check today. Due to
> mis configured filters I didn't see this until today. Sorry.
> 
> Mark
> 

My BIOS were below rev on all my affected boards but updating did not help
with the problem.

Andreas, while I have your ear, I am also having another issue with this
chip set doing peer to peer bus transfers between pci buses and pci-e buses
and from pci-e to pci-e buses. I've read the chip set specs and they _seem_
to imply that it may not be allowed due to "Trusted Computing" something or
another. I've posed the issue to the AMD forums with no luck, and
I can't figure out why this doesn't work using these chip sets.

Sorry to change the subject. I just figured I'd ask someone from AMD while
I had the chance.

Thanks and Regards
Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-28 13:25                                                                                                                                               ` Mark Hounschell
@ 2010-01-28 13:41                                                                                                                                                 ` Borislav Petkov
  2010-01-28 14:45                                                                                                                                                   ` Mark Hounschell
  0 siblings, 1 reply; 74+ messages in thread
From: Borislav Petkov @ 2010-01-28 13:41 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: Andreas Herrmann, dmarkh, Yuhong Bao, venkatesh.pallipadi, andi,
	Linus Torvalds, alain, linux-kernel

On Thu, Jan 28, 2010 at 08:25:23AM -0500, Mark Hounschell wrote:
> On 01/28/2010 04:17 AM, Mark Hounschell wrote:
> > On 01/25/2010 12:10 PM, Andreas Herrmann wrote:
> >> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote:
> >>>
> >>>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
> >>>> side-effects on floppy DMA. Do not use HPET MSI on such platforms.
> >>
> >> Argh, will see what information I can find about this problem ...
> >>
> >>> I think somebody from AMD should review the situation.Clearly
> >>   something is happening inside their southbridge.CCing Andreas
> >>   Herrmann from AMD.
> >>
> >> I have the feeling that this problem should rather be fixed with a DMI
> >> quirk instead of disabling HPET MSI for the entire chipset.
> >>
> >> Was the latest available BIOS installed on the affected system?
> >>
> > 
> > You mean "systems" of different manufactures?  I will check today. Due to
> > mis configured filters I didn't see this until today. Sorry.
> > 
> > Mark
> > 
> 
> My BIOS were below rev on all my affected boards but updating did not help
> with the problem.

Hi,

can you post the BIOS vendors of the boards along with the respective
BIOS versions?

Thanks.

-- 
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-28 13:41                                                                                                                                                 ` Borislav Petkov
@ 2010-01-28 14:45                                                                                                                                                   ` Mark Hounschell
  0 siblings, 0 replies; 74+ messages in thread
From: Mark Hounschell @ 2010-01-28 14:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andreas Herrmann, dmarkh, Yuhong Bao, venkatesh.pallipadi, andi,
	Linus Torvalds, alain, linux-kernel

On 01/28/2010 08:41 AM, Borislav Petkov wrote:
> On Thu, Jan 28, 2010 at 08:25:23AM -0500, Mark Hounschell wrote:
>> On 01/28/2010 04:17 AM, Mark Hounschell wrote:
>>> On 01/25/2010 12:10 PM, Andreas Herrmann wrote:
>>>> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote:
>>>>>
>>>>>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
>>>>>> side-effects on floppy DMA. Do not use HPET MSI on such platforms.
>>>>
>>>> Argh, will see what information I can find about this problem ...
>>>>
>>>>> I think somebody from AMD should review the situation.Clearly
>>>>   something is happening inside their southbridge.CCing Andreas
>>>>   Herrmann from AMD.
>>>>
>>>> I have the feeling that this problem should rather be fixed with a DMI
>>>> quirk instead of disabling HPET MSI for the entire chipset.
>>>>
>>>> Was the latest available BIOS installed on the affected system?
>>>>
>>>
>>> You mean "systems" of different manufactures?  I will check today. Due to
>>> mis configured filters I didn't see this until today. Sorry.
>>>
>>> Mark
>>>
>>
>> My BIOS were below rev on all my affected boards but updating did not help
>> with the problem.
> 
> Hi,
> 
> can you post the BIOS vendors of the boards along with the respective
> BIOS versions?
> 
> Thanks.
> 

DFI  DK-790FXB-M3H5 MB using AWARD bios D7SDA09.BIN (10/09/2009)
BIOSTAR TA790GXB A2+ using AMI bios 78DDA928.BST  (09/28/09)

Regards
Mark

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-01-25 17:10                                                                                                                                           ` Andreas Herrmann
  2010-01-28  9:17                                                                                                                                             ` Mark Hounschell
@ 2010-05-17 14:59                                                                                                                                             ` Andreas Herrmann
  2010-05-17 15:10                                                                                                                                               ` Yuhong Bao
                                                                                                                                                                 ` (2 more replies)
  1 sibling, 3 replies; 74+ messages in thread
From: Andreas Herrmann @ 2010-05-17 14:59 UTC (permalink / raw)
  To: Yuhong Bao
  Cc: venkatesh.pallipadi, markh, dmarkh, andi, Linus Torvalds, alain,
	linux-kernel

On Mon, Jan 25, 2010 at 06:10:59PM +0100, Andreas Herrmann wrote:
> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote:
> > 
> > > HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
> > > side-effects on floppy DMA. Do not use HPET MSI on such platforms.
> 
> Argh, will see what information I can find about this problem ...

FYI. I've tried to trigger the publication of errata information for that
chipset. Finally this has happened.

The discussed problem is indeed due to an erratum. See erratum #27 in
http://support.amd.com/us/Embedded_TechDocs/46837.pdf

The suggested workaround for this is to disable HPET MSI if LPC
devices are used. I doubt that there is a convenient way for Linux to
find out whether LPC devices are used. Thus I think the only solution
to safely avoid the problem is the currently implemented quirk to
disable HPET MSI on this chipset.


Regards,

Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Einsteinring 24, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-05-17 14:59                                                                                                                                             ` Andreas Herrmann
@ 2010-05-17 15:10                                                                                                                                               ` Yuhong Bao
  2010-05-17 15:12                                                                                                                                               ` Linus Torvalds
  2010-05-18  0:56                                                                                                                                               ` Robert Hancock
  2 siblings, 0 replies; 74+ messages in thread
From: Yuhong Bao @ 2010-05-17 15:10 UTC (permalink / raw)
  To: andreas.herrmann3
  Cc: venkatesh.pallipadi, markh, dmarkh, andi, torvalds, alain,
	linux-kernel


> The suggested workaround for this is to disable HPET MSI if LPC
> devices are used. I doubt that there is a convenient way for Linux to
> find out whether LPC devices are used.And don't forget that the Super I/O chip in most motherboards is an LPC device!(In fact, that was what LPC was invented for)
> Thus I think the only solution
> to safely avoid the problem is the currently implemented quirk to
> disable HPET MSI on this chipset.
Yuhong Bao

 		 	   		  
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-05-17 14:59                                                                                                                                             ` Andreas Herrmann
  2010-05-17 15:10                                                                                                                                               ` Yuhong Bao
@ 2010-05-17 15:12                                                                                                                                               ` Linus Torvalds
  2010-05-17 16:46                                                                                                                                                 ` Andreas Herrmann
  2010-05-18  0:56                                                                                                                                               ` Robert Hancock
  2 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2010-05-17 15:12 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Yuhong Bao, venkatesh.pallipadi, markh, dmarkh, andi, alain,
	linux-kernel



On Mon, 17 May 2010, Andreas Herrmann wrote:
> 
> FYI. I've tried to trigger the publication of errata information for that
> chipset. Finally this has happened.
> 
> The discussed problem is indeed due to an erratum. See erratum #27 in
> http://support.amd.com/us/Embedded_TechDocs/46837.pdf
> 
> The suggested workaround for this is to disable HPET MSI if LPC
> devices are used. I doubt that there is a convenient way for Linux to
> find out whether LPC devices are used. Thus I think the only solution
> to safely avoid the problem is the currently implemented quirk to
> disable HPET MSI on this chipset.

Goodie. It would be good to point this out in the source too. Would you be 
willing to send in a patch that documents this quirk as a result of that 
erratum #27, so that we don't lose sight of why we're doing that odd MSI 
disable?

		Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-05-17 15:12                                                                                                                                               ` Linus Torvalds
@ 2010-05-17 16:46                                                                                                                                                 ` Andreas Herrmann
  0 siblings, 0 replies; 74+ messages in thread
From: Andreas Herrmann @ 2010-05-17 16:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yuhong Bao, venkatesh.pallipadi@intel.com, markh@compro.net,
	dmarkh@cfl.rr.com, andi@firstfloor.org, alain@knaff.lu,
	linux-kernel@vger.kernel.org

On Mon, May 17, 2010 at 11:12:59AM -0400, Linus Torvalds wrote:
> 
> 
> On Mon, 17 May 2010, Andreas Herrmann wrote:
> > 
> > FYI. I've tried to trigger the publication of errata information for that
> > chipset. Finally this has happened.
> > 
> > The discussed problem is indeed due to an erratum. See erratum #27 in
> > http://support.amd.com/us/Embedded_TechDocs/46837.pdf
> > 
> > The suggested workaround for this is to disable HPET MSI if LPC
> > devices are used. I doubt that there is a convenient way for Linux to
> > find out whether LPC devices are used. Thus I think the only solution
> > to safely avoid the problem is the currently implemented quirk to
> > disable HPET MSI on this chipset.
> 
> Goodie. It would be good to point this out in the source too. Would you be 
> willing to send in a patch that documents this quirk as a result of that 
> erratum #27, so that we don't lose sight of why we're doing that odd MSI 
> disable?

Done that.
See http://marc.info/?l=linux-kernel&m=127411462230838


Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Einsteinring 24, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-05-17 14:59                                                                                                                                             ` Andreas Herrmann
  2010-05-17 15:10                                                                                                                                               ` Yuhong Bao
  2010-05-17 15:12                                                                                                                                               ` Linus Torvalds
@ 2010-05-18  0:56                                                                                                                                               ` Robert Hancock
  2010-05-18  1:02                                                                                                                                                 ` Linus Torvalds
  2010-05-18  8:45                                                                                                                                                 ` Andi Kleen
  2 siblings, 2 replies; 74+ messages in thread
From: Robert Hancock @ 2010-05-18  0:56 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Yuhong Bao, venkatesh.pallipadi, markh, dmarkh, andi,
	Linus Torvalds, alain, linux-kernel

On 05/17/2010 08:59 AM, Andreas Herrmann wrote:
> On Mon, Jan 25, 2010 at 06:10:59PM +0100, Andreas Herrmann wrote:
>> On Fri, Jan 22, 2010 at 11:21:06PM -0800, Yuhong Bao wrote:
>>>
>>>> HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
>>>> side-effects on floppy DMA. Do not use HPET MSI on such platforms.
>>
>> Argh, will see what information I can find about this problem ...
>
> FYI. I've tried to trigger the publication of errata information for that
> chipset. Finally this has happened.
>
> The discussed problem is indeed due to an erratum. See erratum #27 in
> http://support.amd.com/us/Embedded_TechDocs/46837.pdf
>
> The suggested workaround for this is to disable HPET MSI if LPC
> devices are used. I doubt that there is a convenient way for Linux to
> find out whether LPC devices are used. Thus I think the only solution
> to safely avoid the problem is the currently implemented quirk to
> disable HPET MSI on this chipset.

If one wanted, you could disable HPET MSI on this chipset only when a 
driver requests an ISA DMA channel. Then if there's no floppy or other 
LPC DMA device present, it can stay enabled. I don't know if it's worth 
the trouble, though.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-05-18  0:56                                                                                                                                               ` Robert Hancock
@ 2010-05-18  1:02                                                                                                                                                 ` Linus Torvalds
  2010-05-18  1:06                                                                                                                                                   ` Robert Hancock
  2010-05-18  8:45                                                                                                                                                 ` Andi Kleen
  1 sibling, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2010-05-18  1:02 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Andreas Herrmann, Yuhong Bao, venkatesh.pallipadi, markh, dmarkh,
	andi, alain, linux-kernel



On Mon, 17 May 2010, Robert Hancock wrote:
> 
> If one wanted, you could disable HPET MSI on this chipset only when a driver
> requests an ISA DMA channel. Then if there's no floppy or other LPC DMA device
> present, it can stay enabled. I don't know if it's worth the trouble, though.

Nope, that wouldn't work.

Imagine a driver that already loaded, and is already using MSI (say, 
network device). What happens now if you want to access the floppy and 
load the floppy module? Oh, you can't? Need to bring down the network 
interface, unload that module first? Not practical.

Sure, in theory we can do some crazy callback for "you now need to re-do 
your interrupt registration" for all devices. In practice, I can onyl say 
"not going to happen".

		Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-05-18  1:02                                                                                                                                                 ` Linus Torvalds
@ 2010-05-18  1:06                                                                                                                                                   ` Robert Hancock
  0 siblings, 0 replies; 74+ messages in thread
From: Robert Hancock @ 2010-05-18  1:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andreas Herrmann, Yuhong Bao, venkatesh.pallipadi, markh, dmarkh,
	andi, alain, linux-kernel

On Mon, May 17, 2010 at 7:02 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 17 May 2010, Robert Hancock wrote:
>>
>> If one wanted, you could disable HPET MSI on this chipset only when a driver
>> requests an ISA DMA channel. Then if there's no floppy or other LPC DMA device
>> present, it can stay enabled. I don't know if it's worth the trouble, though.
>
> Nope, that wouldn't work.
>
> Imagine a driver that already loaded, and is already using MSI (say,
> network device). What happens now if you want to access the floppy and
> load the floppy module? Oh, you can't? Need to bring down the network
> interface, unload that module first? Not practical.
>
> Sure, in theory we can do some crazy callback for "you now need to re-do
> your interrupt registration" for all devices. In practice, I can onyl say
> "not going to happen".

It sounds like this bug only affects HPET MSI requests (presumably the
only ones that the southbridge can concern itself with), not any
others. It would require the HPET code to support having its MSI
support yanked away at runtime, though.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-05-18  0:56                                                                                                                                               ` Robert Hancock
  2010-05-18  1:02                                                                                                                                                 ` Linus Torvalds
@ 2010-05-18  8:45                                                                                                                                                 ` Andi Kleen
  2010-05-18 23:22                                                                                                                                                   ` Robert Hancock
  1 sibling, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2010-05-18  8:45 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Andreas Herrmann, Yuhong Bao, venkatesh.pallipadi, markh, dmarkh,
	andi, Linus Torvalds, alain, linux-kernel

> If one wanted, you could disable HPET MSI on this chipset only when a 
> driver requests an ISA DMA channel. Then if there's no floppy or other LPC 
> DMA device present, it can stay enabled. I don't know if it's worth the 
> trouble, though.

There can be LPC devices which are not visible to the kernel,
but only used through ACPI or the BIOS. Think of fancy fan
controllers and similar.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86: Disable HPET MSI on ATI SB700/SB800
  2010-05-18  8:45                                                                                                                                                 ` Andi Kleen
@ 2010-05-18 23:22                                                                                                                                                   ` Robert Hancock
  0 siblings, 0 replies; 74+ messages in thread
From: Robert Hancock @ 2010-05-18 23:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andreas Herrmann, Yuhong Bao, venkatesh.pallipadi, markh, dmarkh,
	Linus Torvalds, alain, linux-kernel

On Tue, May 18, 2010 at 2:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> If one wanted, you could disable HPET MSI on this chipset only when a
>> driver requests an ISA DMA channel. Then if there's no floppy or other LPC
>> DMA device present, it can stay enabled. I don't know if it's worth the
>> trouble, though.
>
> There can be LPC devices which are not visible to the kernel,
> but only used through ACPI or the BIOS. Think of fancy fan
> controllers and similar.

I would hope they wouldn't use DMA without kernel knowledge, otherwise
that really would be an abomination..

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2010-05-18 23:22 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4AFB3962.2020106@ntlworld.com>
     [not found] ` <4B2610F8.7050609@cfl.rr.com>
     [not found]   ` <4B2618EF.9020709@knaff.lu>
     [not found]     ` <4B264448.5040604@compro.net>
     [not found]       ` <4B26884C.8000306@knaff.lu>
     [not found]         ` <4B2697C4.2040204@compro.net>
     [not found]           ` <4B26A82E.5040902@knaff.lu>
     [not found]             ` <4B26B031.4060301@compro.net>
     [not found]               ` <4B26BAE3.2090408@knaff.lu>
     [not found]                 ` <4B275975.8040509@cfl.rr.com>
     [not found]                   ` <4B275B18.80704@knaff.lu>
     [not found]                     ` <4B275D37.4090807@cfl.rr.com>
     [not found]                       ` <4B2761E9.2030301@knaff.lu>
     [not found]                         ` <4B276513.6030509@cfl.rr.com>
     [not found]                           ` <4B276753.80807@knaff.lu>
     [not found]                             ` <4B27983F.5090600@compro.net>
     [not found]                               ` <4B27EF18.7050101@knaff.lu>
     [not found]                                 ` <4B28FDEB.3030800@compro.net>
     [not found]                                   ` <4B290029.90602@knaff.lu>
     [not found]                                     ` <4B2901DB.8040403@compro.net>
     [not found]                                       ` <4B29052B.9070406@knaff.lu>
     [not found]                                         ` <4B292D84.5040306@compro.net>
     [not found]                                           ` <4B29624F.2080109@knaff.lu>
     [not found]                                             ` <4B2A3805.8040707@compro.net>
     [not found]                                               ` <4B2A3E3E.8060405@knaff.lu>
     [not found]                                                 ` <4B2A4975.8020809@compro.net>
     [not found]                                                   ` <4B2A49F4.6070402@compro.net>
     [not found]                                                     ` <4B2A4B86.8060307@knaff.lu>
     [not found]                                                       ` <4B2A4C78.10107@compro.net>
     [not found]                                                         ` <4B2A4CF7.6040000@knaff.lu>
     [not found]                                                           ` <4B2A4EC9.2030902@compro.net>
     [not found]                                                             ` <4B2A4FA5.5000701@knaff.lu>
     [not found]                                                               ` <4B2A5192.6090602@compro.net>
     [not found]                                                                 ` <4B2A530D.3080606@knaff! .lu>
     [not found]                                                                   ` <4B2A530D.3080606@knaff.lu>
2009-12-17 17:00                                                                     ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff
2009-12-17 17:27                                                                       ` Linus Torvalds
2009-12-17 18:21                                                                         ` DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa
2009-12-17 20:46                                                                         ` DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils] Cannot format floppies under kernel 2.6.*?) Alain Knaff
2009-12-17 21:14                                                                           ` Linus Torvalds
2009-12-17 22:11                                                                             ` Alain Knaff
2009-12-17 22:43                                                                               ` Linus Torvalds
2009-12-17 23:24                                                                                 ` Alain Knaff
2009-12-18  8:59                                                                                   ` Mark Hounschell
2009-12-18 10:55                                                                                     ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: " Mark Hounschell
2009-12-18 15:01                                                                                       ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Krzysztof Halasa
2009-12-18 15:22                                                                                       ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) Linus Torvalds
2009-12-18 15:28                                                                                         ` Mark Hounschell
2009-12-18 15:45                                                                                           ` Linus Torvalds
2009-12-18 20:04                                                                                             ` Mark Hounschell
2009-12-18 20:15                                                                                               ` Linus Torvalds
2009-12-22 15:11                                                                                                 ` Mark Hounschell
2009-12-22 17:38                                                                                                   ` Linus Torvalds
2009-12-22 17:57                                                                                                     ` Mark Hounschell
2009-12-22 23:37                                                                                                       ` Pallipadi, Venkatesh
2009-12-23  0:22                                                                                                         ` Mark Hounschell
2009-12-23 13:02                                                                                                           ` Mark Hounschell
2009-12-23 15:10                                                                                                             ` Pallipadi, Venkatesh
2009-12-23 15:34                                                                                                               ` Mark Hounschell
2009-12-23 15:57                                                                                                                 ` Mark Hounschell
2009-12-23 16:31                                                                                                                 ` Linus Torvalds
2009-12-23 16:38                                                                                                                   ` [Fdutils] DMA cache consistency bug introduced in 2.6.28 Andi Kleen
2009-12-23 16:49                                                                                                                     ` Linus Torvalds
2009-12-23 17:08                                                                                                                       ` Andi Kleen
2009-12-25 12:21                                                                                                                         ` Arjan van de Ven
2009-12-25 20:33                                                                                                                           ` Andi Kleen
2009-12-26  9:38                                                                                                                             ` Arjan van de Ven
2009-12-26 16:40                                                                                                                               ` Andi Kleen
2009-12-27 12:28                                                                                                                                 ` Alain Knaff
2009-12-28  1:54                                                                                                                                   ` Andi Kleen
2009-12-28 10:27                                                                                                                                     ` Alain Knaff
2009-12-28 14:54                                                                                                                                       ` Andi Kleen
2009-12-27 11:09                                                                                                                         ` Pavel Machek
2009-12-28 20:54                                                                                                                           ` Mark Hounschell
2009-12-23 17:19                                                                                                                       ` Pallipadi, Venkatesh
2009-12-23 17:16                                                                                                                         ` Andi Kleen
2009-12-23 20:11                                                                                                                       ` alain
2009-12-23 17:41                                                                                                                     ` Mark Hounschell
2009-12-23 18:01                                                                                                                       ` Linus Torvalds
2009-12-23 18:11                                                                                                                         ` Mark Hounschell
2009-12-23 19:18                                                                                                                       ` Pallipadi, Venkatesh
2009-12-23 19:35                                                                                                                         ` Mark Hounschell
2009-12-23 20:30                                                                                                                           ` Pallipadi, Venkatesh
2009-12-23 20:34                                                                                                                             ` alain
2009-12-23 21:34                                                                                                                               ` Pallipadi, Venkatesh
2010-01-08 17:42                                                                                                                             ` Mark Hounschell
2010-01-12  0:19                                                                                                                               ` Pallipadi, Venkatesh
2010-01-12  9:04                                                                                                                                 ` Mark Hounschell
2010-01-15  2:01                                                                                                                                   ` Pallipadi, Venkatesh
2010-01-15  9:39                                                                                                                                     ` Mark Hounschell
2010-01-15 18:02                                                                                                                                     ` Mark Hounschell
2010-01-21 19:09                                                                                                                                       ` [PATCH] x86: Disable HPET MSI on ATI SB700/SB800 Pallipadi, Venkatesh
2010-01-22 22:00                                                                                                                                         ` [tip:x86/urgent] " tip-bot for Pallipadi, Venkatesh
2010-01-23  6:51                                                                                                                                         ` tip-bot for Pallipadi, Venkatesh
2010-01-23  7:21                                                                                                                                         ` [PATCH] " Yuhong Bao
2010-01-25 17:10                                                                                                                                           ` Andreas Herrmann
2010-01-28  9:17                                                                                                                                             ` Mark Hounschell
2010-01-28 13:25                                                                                                                                               ` Mark Hounschell
2010-01-28 13:41                                                                                                                                                 ` Borislav Petkov
2010-01-28 14:45                                                                                                                                                   ` Mark Hounschell
2010-05-17 14:59                                                                                                                                             ` Andreas Herrmann
2010-05-17 15:10                                                                                                                                               ` Yuhong Bao
2010-05-17 15:12                                                                                                                                               ` Linus Torvalds
2010-05-17 16:46                                                                                                                                                 ` Andreas Herrmann
2010-05-18  0:56                                                                                                                                               ` Robert Hancock
2010-05-18  1:02                                                                                                                                                 ` Linus Torvalds
2010-05-18  1:06                                                                                                                                                   ` Robert Hancock
2010-05-18  8:45                                                                                                                                                 ` Andi Kleen
2010-05-18 23:22                                                                                                                                                   ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).