* Re: 2.6.18-rc1-mm1
[not found] <20060709021106.9310d4d1.akpm@osdl.org>
@ 2006-07-09 11:22 ` Reuben Farrelly
2006-07-09 12:22 ` 2.6.18-rc1-mm1 Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Reuben Farrelly @ 2006-07-09 11:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, Alan Cox, linux-acpi, Randy Dunlap, Greg KH
On 9/07/2006 9:11 p.m., Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc1/2.6.18-rc1-mm1/
>
> - We're getting a relatively large number of crash reports coming out of the
> core sysfs/kobject/driver/bus code, and they're all really hard to diagnose.
>
> I am suspecting that what's happening is that some registration functions
> are failing and the caller is ignoring that failure. The code proceeds and
> crashes much later, in obscure ways.
>
> All these functions return error codes, and we're not checking them. We
> should. So there's a patch which marks all these things as __must_check,
> which causes around 1,500 new warnings.
>
> These are all bugs and they all need to be fixed.
Works. Well, it boots without crashing here and has been up for 30 or so
minutes without incident or so much as a log entry.
I assume that the bulk of those warnings about the return error codes will be
largely dealt with by individual maintainers as there are far too many to post here?
Some minor problems noted - possibly PCI/ACPI related (read on past the IDE bit
if that's not your cup of tea).
1. I've disabled the old IDE stuff and enabled Alan's IDE support
(CONFIG_SCSI_ATA_GENERIC=y). But it seems to be a bit unhappy with my IDE CD
burner:
ata_piix 0000:00:1f.1: version 2.00ac5
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1f.1 to 64
ata5: PATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0x30B0 irq 14
scsi4 : ata_piix
ata5.00: ATAPI, max UDMA/66
ata5.00: configured for UDMA/66
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: (BMDMA stat 0x24)
ata5.00: tag 0 cmd 0xa0 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata5: soft resetting port
ata5.00: configured for UDMA/66
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: (BMDMA stat 0x24)
ata5.00: tag 0 cmd 0xa0 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata5: soft resetting port
ata5.00: configured for UDMA/66
Losing some ticks... checking if CPU frequency changed.
ata5: EH complete
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: (BMDMA stat 0x24)
ata5.00: tag 0 cmd 0xa0 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata5: soft resetting port
ata5.00: configured for UDMA/66
ata5: EH complete
ata5.00: limiting speed to UDMA/44
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: (BMDMA stat 0x24)
ata5.00: tag 0 cmd 0xa0 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata5: soft resetting port
ata5.00: configured for UDMA/44
ata5: EH complete
ata6: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0x30B8 irq 15
scsi5 : ata_piix
ata6: port disabled. ignoring.
ATA: abnormal status 0xFF on port 0x177
SCSI device sda: 586072368 512-byte hdwr sectors (300069 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
Note also the message midway through about losing some ticks, which if I recall
correctly is not new to this -mm release. I'm not sure who to cc about this.
The IDE device obviously ended up not being detected by the system. Usually
this device comes up as:
Jul 2 12:03:28 tornado kernel: hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive,
2000kB Cache, UDMA(66)
2. Onto some more minor warnings:
ACPI: bus type pci registered
PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
PCI: Not using MMCONFIG.
PCI: Using configuration type 1
ACPI: Interpreter enabled
Is there any way to verify that there really is a BIOS bug there? If it is, is
there anyone within Intel or are there any known contacts who can push and poke
to get this looked at/fixed? (It's a new Intel board, I'd hope they could get
it right..).
Plus we're not using MMCONFIG - even though I have it enabled.
Based on previous postings to lkml, I believe Randy Dunlap may have one of these
boards too - Randy are you seeing this and the next bunch of warnings I am seeing?
3. Power Management warnings, been there ages, but I've had bigger things to
worry about (like fatal oopses) so haven't bothered asking:
Device `[PEX0]' is not power manageable
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1c.0 to 64
Device `[PEX2]' is not power manageable
ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1c.2 to 64
Device `[PEX3]' is not power manageable
ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:00:1c.3 to 64
Device `[PEX4]' is not power manageable
ACPI: PCI Interrupt 0000:00:1c.4[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1c.4 to 64
Device `[PEX5]' is not power manageable
ACPI: PCI Interrupt 0000:00:1c.5[B] -> GSI 16 (level, low) -> IRQ 16
and
Device `[IDES]' is not power manageable
[root@tornado ~]# cat /proc/interrupts
CPU0 CPU1
0: 258266 0 IO-APIC-edge timer
4: 355 0 IO-APIC-edge serial
6: 5 0 IO-APIC-edge floppy
8: 1 0 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
14: 28 0 IO-APIC-edge libata
15: 0 0 IO-APIC-edge libata
16: 0 0 IO-APIC-fasteoi uhci_hcd:usb5
18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
19: 980 0 IO-APIC-fasteoi uhci_hcd:usb3, serial
23: 105 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
313: 82513 0 PCI-MSI-<NULL> eth0
314: 57370 0 PCI-MSI-<NULL> libata
NMI: 217 188
LOC: 258118 257890
ERR: 0
MIS: 0
[root@tornado ~]#
The full dmesg is up at http://www.reub.net/files/kernel/2.6.18-rc1-mm1.dmesg
and config is up at http://www.reub.net/files/kernel/2.6.18-rc1-mm1.config
Minor issues and possibly most if not all are not of concern, but occasionally
supposedly minor things show up much bigger problems when questions are asked
and people start poking around :)
Reuben
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 11:22 ` 2.6.18-rc1-mm1 Reuben Farrelly
@ 2006-07-09 12:22 ` Andrew Morton
2006-07-09 12:56 ` 2.6.18-rc1-mm1 Alan Cox
` (2 more replies)
2006-07-09 17:33 ` 2.6.18-rc1-mm1 Randy.Dunlap
2006-07-10 5:35 ` 2.6.18-rc1-mm1 Randy.Dunlap
2 siblings, 3 replies; 15+ messages in thread
From: Andrew Morton @ 2006-07-09 12:22 UTC (permalink / raw)
To: Reuben Farrelly
Cc: linux-kernel, alan, linux-acpi, rdunlap, greg, john stultz,
Andi Kleen
On Sun, 09 Jul 2006 23:22:14 +1200
Reuben Farrelly <reuben-lkml@reub.net> wrote:
>
>
> On 9/07/2006 9:11 p.m., Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc1/2.6.18-rc1-mm1/
> >
> > - We're getting a relatively large number of crash reports coming out of the
> > core sysfs/kobject/driver/bus code, and they're all really hard to diagnose.
> >
> > I am suspecting that what's happening is that some registration functions
> > are failing and the caller is ignoring that failure. The code proceeds and
> > crashes much later, in obscure ways.
> >
> > All these functions return error codes, and we're not checking them. We
> > should. So there's a patch which marks all these things as __must_check,
> > which causes around 1,500 new warnings.
> >
> > These are all bugs and they all need to be fixed.
>
> Works. Well, it boots without crashing here and has been up for 30 or so
> minutes without incident or so much as a log entry.
Shock. Have you tested suspend-to-ram and suspend-to-disk?
> I assume that the bulk of those warnings about the return error codes will be
> largely dealt with by individual maintainers as there are far too many to post here?
I admire your faith in your fellow man. I'll see what can be done to
reduce the warnings by changing some deregistration/removal API
functions so they return void. That should remove maybe half of them.
As for the rest I guess we just need to slam that patch into mainline and
start bitching at people.
> Some minor problems noted - possibly PCI/ACPI related (read on past the IDE bit
> if that's not your cup of tea).
>
> 1. I've disabled the old IDE stuff and enabled Alan's IDE support
> (CONFIG_SCSI_ATA_GENERIC=y). But it seems to be a bit unhappy with my IDE CD
> burner:
>
> ata_piix 0000:00:1f.1: version 2.00ac5
> ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
> PCI: Setting latency timer of device 0000:00:1f.1 to 64
> ata5: PATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0x30B0 irq 14
> scsi4 : ata_piix
> ata5.00: ATAPI, max UDMA/66
> ata5.00: configured for UDMA/66
> ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata5.00: (BMDMA stat 0x24)
> ata5.00: tag 0 cmd 0xa0 Emask 0x4 stat 0x40 err 0x0 (timeout)
> ata5: soft resetting port
> ata5.00: configured for UDMA/66
> ata5: EH complete
> ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata5.00: (BMDMA stat 0x24)
> ata5.00: tag 0 cmd 0xa0 Emask 0x4 stat 0x40 err 0x0 (timeout)
> ata5: soft resetting port
> ata5.00: configured for UDMA/66
> Losing some ticks... checking if CPU frequency changed.
> ata5: EH complete
> ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata5.00: (BMDMA stat 0x24)
> ata5.00: tag 0 cmd 0xa0 Emask 0x4 stat 0x40 err 0x0 (timeout)
> ata5: soft resetting port
> ata5.00: configured for UDMA/66
> ata5: EH complete
> ata5.00: limiting speed to UDMA/44
> ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata5.00: (BMDMA stat 0x24)
> ata5.00: tag 0 cmd 0xa0 Emask 0x4 stat 0x40 err 0x0 (timeout)
> ata5: soft resetting port
> ata5.00: configured for UDMA/44
> ata5: EH complete
> ata6: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0x30B8 irq 15
> scsi5 : ata_piix
> ata6: port disabled. ignoring.
> ATA: abnormal status 0xFF on port 0x177
> SCSI device sda: 586072368 512-byte hdwr sectors (300069 MB)
> sda: Write Protect is off
> sda: Mode Sense: 00 3a 00 00
> SCSI device sda: drive cache: write back
Alan stuff.
> Note also the message midway through about losing some ticks, which if I recall
> correctly is not new to this -mm release. I'm not sure who to cc about this.
John stuff. I suspect it's natural and normal, if the IDE error handling
did something rude with interrupt holdoff.
> The IDE device obviously ended up not being detected by the system. Usually
> this device comes up as:
>
> Jul 2 12:03:28 tornado kernel: hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive,
> 2000kB Cache, UDMA(66)
>
>
> 2. Onto some more minor warnings:
>
> ACPI: bus type pci registered
> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> PCI: Not using MMCONFIG.
> PCI: Using configuration type 1
> ACPI: Interpreter enabled
>
> Is there any way to verify that there really is a BIOS bug there? If it is, is
> there anyone within Intel or are there any known contacts who can push and poke
> to get this looked at/fixed? (It's a new Intel board, I'd hope they could get
> it right..).
>
> Plus we're not using MMCONFIG - even though I have it enabled.
Andi stuff.
> Based on previous postings to lkml, I believe Randy Dunlap may have one of these
> boards too - Randy are you seeing this and the next bunch of warnings I am seeing?
>
> 3. Power Management warnings, been there ages, but I've had bigger things to
> worry about (like fatal oopses) so haven't bothered asking:
>
> Device `[PEX0]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 17
> PCI: Setting latency timer of device 0000:00:1c.0 to 64
> Device `[PEX2]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 18
> PCI: Setting latency timer of device 0000:00:1c.2 to 64
> Device `[PEX3]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 19
> PCI: Setting latency timer of device 0000:00:1c.3 to 64
> Device `[PEX4]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.4[A] -> GSI 17 (level, low) -> IRQ 17
> PCI: Setting latency timer of device 0000:00:1c.4 to 64
> Device `[PEX5]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.5[B] -> GSI 16 (level, low) -> IRQ 16
ACPI stuff. I suspect the kernel isn't doing anything wrong here.
> and
>
> Device `[IDES]' is not power manageable
I don't know what device that is.
> [root@tornado ~]# cat /proc/interrupts
> CPU0 CPU1
> 0: 258266 0 IO-APIC-edge timer
> 4: 355 0 IO-APIC-edge serial
> 6: 5 0 IO-APIC-edge floppy
> 8: 1 0 IO-APIC-edge rtc
> 9: 0 0 IO-APIC-fasteoi acpi
> 14: 28 0 IO-APIC-edge libata
> 15: 0 0 IO-APIC-edge libata
> 16: 0 0 IO-APIC-fasteoi uhci_hcd:usb5
> 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
> 19: 980 0 IO-APIC-fasteoi uhci_hcd:usb3, serial
> 23: 105 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
> 313: 82513 0 PCI-MSI-<NULL> eth0
> 314: 57370 0 PCI-MSI-<NULL> libata
> NMI: 217 188
> LOC: 258118 257890
> ERR: 0
> MIS: 0
> [root@tornado ~]#
>
> The full dmesg is up at http://www.reub.net/files/kernel/2.6.18-rc1-mm1.dmesg
> and config is up at http://www.reub.net/files/kernel/2.6.18-rc1-mm1.config
>
> Minor issues and possibly most if not all are not of concern, but occasionally
> supposedly minor things show up much bigger problems when questions are asked
> and people start poking around :)
>
Thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 12:22 ` 2.6.18-rc1-mm1 Andrew Morton
@ 2006-07-09 12:56 ` Alan Cox
2006-07-09 14:21 ` 2.6.18-rc1-mm1 Reuben Farrelly
2006-07-09 16:29 ` 2.6.18-rc1-mm1 Jeff Garzik
2006-07-09 18:35 ` 2.6.18-rc1-mm1 Andi Kleen
2006-07-09 21:10 ` 2.6.18-rc1-mm1 john stultz
2 siblings, 2 replies; 15+ messages in thread
From: Alan Cox @ 2006-07-09 12:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Reuben Farrelly, linux-kernel, linux-acpi, rdunlap, greg,
john stultz, Andi Kleen
Ar Sul, 2006-07-09 am 05:22 -0700, ysgrifennodd Andrew Morton:
> > ata5: PATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0x30B0 irq 14
> > scsi4 : ata_piix
> > ata5.00: ATAPI, max UDMA/66
> > ata5.00: configured for UDMA/66
More ATAPI devices getting uppity about mode setting.
> John stuff. I suspect it's natural and normal, if the IDE error handling
> did something rude with interrupt holdoff.
The new libata should be more polite than that, although since the ATA
drive can stall the CPU indefinitely you lose anyway 8(
> > Jul 2 12:03:28 tornado kernel: hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive,
> > 2000kB Cache, UDMA(66)
Can you send me the full hdparm identify stuff for this ?
The old drivers/ide code uses much longer delays than the spec for some
ATAPI commands, and it looks as if there is a good reason for doing
so ...
That or I've got a mistuning case I've missed.
Alan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 12:56 ` 2.6.18-rc1-mm1 Alan Cox
@ 2006-07-09 14:21 ` Reuben Farrelly
2006-07-09 16:29 ` 2.6.18-rc1-mm1 Jeff Garzik
1 sibling, 0 replies; 15+ messages in thread
From: Reuben Farrelly @ 2006-07-09 14:21 UTC (permalink / raw)
To: Alan Cox
Cc: Andrew Morton, linux-kernel, linux-acpi, rdunlap, greg,
john stultz, Andi Kleen
On 10/07/2006 12:56 a.m., Alan Cox wrote:
> Ar Sul, 2006-07-09 am 05:22 -0700, ysgrifennodd Andrew Morton:
>>> ata5: PATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0x30B0 irq 14
>>> scsi4 : ata_piix
>>> ata5.00: ATAPI, max UDMA/66
>>> ata5.00: configured for UDMA/66
>
> More ATAPI devices getting uppity about mode setting.
>
>> John stuff. I suspect it's natural and normal, if the IDE error handling
>> did something rude with interrupt holdoff.
>
> The new libata should be more polite than that, although since the ATA
> drive can stall the CPU indefinitely you lose anyway 8(
It may not be related to ATA. I just reloaded into 2.6.17-mm6 to get the info
for Alan and saw it when booting up on that too:
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
Losing some ticks... checking if CPU frequency changed.
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 128 is not supported by device 0000:00:1d.7
Note it was in a different place in here than in -rc1-mm1 (slightly later in the
bootup).
I'm fairly sure it's not new to this release.
>>> Jul 2 12:03:28 tornado kernel: hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive,
>>> 2000kB Cache, UDMA(66)
>
> Can you send me the full hdparm identify stuff for this ?
sh-3.1# hdparm -I /dev/hda
/dev/hda:
ATAPI CD-ROM, with removable media
Model Number: PIONEER DVD-RW DVR-111D
Serial Number: FADC005671WL
Firmware Revision: 1.23
Standards:
Likely used CD-ROM ATAPI-1
Configuration:
DRQ response: 50us.
Packet size: 12 bytes
Capabilities:
LBA, IORDY(can be disabled)
Buffer size: 64.0kB
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 *udma4
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=240ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* Power Management feature set
* PACKET command feature set
* DEVICE_RESET command
HW reset results:
CBLID- above Vih
Device num = 0 determined by the jumper
sh-3.1#
sh-3.1# hdparm -i /dev/hda
/dev/hda:
Model=PIONEER DVD-RW DVR-111D, FwRev=1.23, SerialNo=
Config={ Fixed Removeable DTR<=5Mbs DTR>10Mbs nonMagnetic }
RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
BuffType=13395, BuffSize=64kB, MaxMultSect=0
(maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 *udma4
AdvancedPM=no
Drive conforms to: Unspecified: ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5
* signifies the current active mode
sh-3.1#
Reuben
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 12:56 ` 2.6.18-rc1-mm1 Alan Cox
2006-07-09 14:21 ` 2.6.18-rc1-mm1 Reuben Farrelly
@ 2006-07-09 16:29 ` Jeff Garzik
1 sibling, 0 replies; 15+ messages in thread
From: Jeff Garzik @ 2006-07-09 16:29 UTC (permalink / raw)
To: Alan Cox
Cc: Andrew Morton, Reuben Farrelly, linux-kernel, linux-acpi, rdunlap,
greg, john stultz, Andi Kleen, linux-ide@vger.kernel.org
Alan Cox wrote:
> The old drivers/ide code uses much longer delays than the spec for some
> ATAPI commands, and it looks as if there is a good reason for doing
> so ...
FWIW, the code that ATADRVR (http://www.ata-atapi.com/) uses to issue
commands does something like
write Command register to start command
if (device == ATAPI) # i.e. not ATA
delay(150 msec)
pound Status / AltStatus, kick DMA engine, whatever else
ATADRVR is open code (for an MS-DOS-level driver), and really worth a
read. Between ATADRVR and drivers/ide, you get a pretty good idea about
what __field experience__ has shown is needed for ATAPI devices.
Jeff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 11:22 ` 2.6.18-rc1-mm1 Reuben Farrelly
2006-07-09 12:22 ` 2.6.18-rc1-mm1 Andrew Morton
@ 2006-07-09 17:33 ` Randy.Dunlap
2006-07-09 21:40 ` 2.6.18-rc1-mm1 Andrew Morton
2006-07-10 5:35 ` 2.6.18-rc1-mm1 Randy.Dunlap
2 siblings, 1 reply; 15+ messages in thread
From: Randy.Dunlap @ 2006-07-09 17:33 UTC (permalink / raw)
To: Reuben Farrelly, mingo; +Cc: akpm, linux-kernel, alan, linux-acpi, greg
On Sun, 09 Jul 2006 23:22:14 +1200 Reuben Farrelly wrote:
> On 9/07/2006 9:11 p.m., Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc1/2.6.18-rc1-mm1/
> >
> > - We're getting a relatively large number of crash reports coming out of the
> > core sysfs/kobject/driver/bus code, and they're all really hard to diagnose.
> >
> > I am suspecting that what's happening is that some registration functions
> > are failing and the caller is ignoring that failure. The code proceeds and
> > crashes much later, in obscure ways.
> >
> > All these functions return error codes, and we're not checking them. We
> > should. So there's a patch which marks all these things as __must_check,
> > which causes around 1,500 new warnings.
> >
> > These are all bugs and they all need to be fixed.
>
> Works. Well, it boots without crashing here and has been up for 30 or so
> minutes without incident or so much as a log entry.
>
> I assume that the bulk of those warnings about the return error codes will be
> largely dealt with by individual maintainers as there are far too many to post here?
Yeah, right. (quoting Kathy Mallory with her usual sarcasm)
> Some minor problems noted - possibly PCI/ACPI related (read on past the IDE bit
> if that's not your cup of tea).
>
> 2. Onto some more minor warnings:
>
> ACPI: bus type pci registered
> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> PCI: Not using MMCONFIG.
> PCI: Using configuration type 1
> ACPI: Interpreter enabled
>
> Is there any way to verify that there really is a BIOS bug there? If it is, is
> there anyone within Intel or are there any known contacts who can push and poke
> to get this looked at/fixed? (It's a new Intel board, I'd hope they could get
> it right..).
>
> Plus we're not using MMCONFIG - even though I have it enabled.
>
> Based on previous postings to lkml, I believe Randy Dunlap may have one of these
> boards too - Randy are you seeing this and the next bunch of warnings I am seeing?
I just found 2.6.18-rc1-mm1. I'll build + check.
I have an Intel ICH7 motherboard with SATA.
Is that close to what you have?
> 3. Power Management warnings, been there ages, but I've had bigger things to
> worry about (like fatal oopses) so haven't bothered asking:
>
> Device `[PEX0]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 17
> PCI: Setting latency timer of device 0000:00:1c.0 to 64
> Device `[PEX2]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 18
> PCI: Setting latency timer of device 0000:00:1c.2 to 64
> Device `[PEX3]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 19
> PCI: Setting latency timer of device 0000:00:1c.3 to 64
> Device `[PEX4]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.4[A] -> GSI 17 (level, low) -> IRQ 17
> PCI: Setting latency timer of device 0000:00:1c.4 to 64
> Device `[PEX5]' is not power manageable
> ACPI: PCI Interrupt 0000:00:1c.5[B] -> GSI 16 (level, low) -> IRQ 16
>
> and
>
> Device `[IDES]' is not power manageable
I guess that's from here:
/sys/firmware/acpi/namespace/ACPI/_SB/PCI0/IDES
which contains 2 directories: PRID and SECD.
Apparently ATA/IDE primary and secondary controllers,
but I'm not sure. Those empty directory structures
don't tell me much.
> [root@tornado ~]# cat /proc/interrupts
> CPU0 CPU1
> 0: 258266 0 IO-APIC-edge timer
> 4: 355 0 IO-APIC-edge serial
> 6: 5 0 IO-APIC-edge floppy
> 8: 1 0 IO-APIC-edge rtc
> 9: 0 0 IO-APIC-fasteoi acpi
> 14: 28 0 IO-APIC-edge libata
> 15: 0 0 IO-APIC-edge libata
> 16: 0 0 IO-APIC-fasteoi uhci_hcd:usb5
> 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
> 19: 980 0 IO-APIC-fasteoi uhci_hcd:usb3, serial
> 23: 105 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
> 313: 82513 0 PCI-MSI-<NULL> eth0
> 314: 57370 0 PCI-MSI-<NULL> libata
"We" need to fix that <NULL> there.
> NMI: 217 188
> LOC: 258118 257890
> ERR: 0
> MIS: 0
> [root@tornado ~]#
>
> The full dmesg is up at http://www.reub.net/files/kernel/2.6.18-rc1-mm1.dmesg
> and config is up at http://www.reub.net/files/kernel/2.6.18-rc1-mm1.config
---
~Randy
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: 2.6.18-rc1-mm1
@ 2006-07-09 17:47 Brown, Len
2006-07-10 8:48 ` 2.6.18-rc1-mm1 Arjan van de Ven
0 siblings, 1 reply; 15+ messages in thread
From: Brown, Len @ 2006-07-09 17:47 UTC (permalink / raw)
To: Randy.Dunlap, Reuben Farrelly, mingo, Van De Ven, Adriaan
Cc: akpm, linux-kernel, alan, linux-acpi, greg
>> 2. Onto some more minor warnings:
>>
>> ACPI: bus type pci registered
>> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
>> PCI: Not using MMCONFIG.
>> PCI: Using configuration type 1
>> ACPI: Interpreter enabled
>>
>> Is there any way to verify that there really is a BIOS bug
>there? If it is, is there anyone within Intel or are there any
>known contacts
>who can push and poke > to get this looked at/fixed?
>(It's a new Intel board, I'd hope they could get it right..).
Arjan should probably comment on that one.
>> 3. Power Management warnings, been there ages, but I've had
>bigger things to
>> worry about (like fatal oopses) so haven't bothered asking:
>>
>> Device `[PEX0]' is not power manageable
>> ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 17
>> PCI: Setting latency timer of device 0000:00:1c.0 to 64
>> Device `[PEX2]' is not power manageable
I'll revert this message to CONFIG_ACPI_DEBUG=y like it used to be.
>I guess that's from here:
>
>/sys/firmware/acpi/namespace/ACPI/_SB/PCI0/IDES
>
>which contains 2 directories: PRID and SECD.
>Apparently ATA/IDE primary and secondary controllers,
>but I'm not sure. Those empty directory structures
>don't tell me much.
/sys/firmware/acpi/namespace should not exist at all
and is waiting to die. You can ignore it.
cheers,
-Len
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 12:22 ` 2.6.18-rc1-mm1 Andrew Morton
2006-07-09 12:56 ` 2.6.18-rc1-mm1 Alan Cox
@ 2006-07-09 18:35 ` Andi Kleen
2006-07-11 19:37 ` 2.6.18-rc1-mm1 Greg KH
2006-07-09 21:10 ` 2.6.18-rc1-mm1 john stultz
2 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2006-07-09 18:35 UTC (permalink / raw)
To: Andrew Morton
Cc: Reuben Farrelly, linux-kernel, alan, linux-acpi, rdunlap, greg,
john stultz, gregkh
On Sun, Jul 09, 2006 at 05:22:52AM -0700, Andrew Morton wrote:
> > ACPI: bus type pci registered
> > PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> > PCI: Not using MMCONFIG.
> > PCI: Using configuration type 1
> > ACPI: Interpreter enabled
> >
> > Is there any way to verify that there really is a BIOS bug there? If it is, is
> > there anyone within Intel or are there any known contacts who can push and poke
> > to get this looked at/fixed? (It's a new Intel board, I'd hope they could get
> > it right..).
> >
> > Plus we're not using MMCONFIG - even though I have it enabled.
>
> Andi stuff.
Greg has patches to relax the checking a bit.
I don't know when he intends to merge them.
Anyways it should be completely harmless - mmconfig does nothing
essential right now.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 12:22 ` 2.6.18-rc1-mm1 Andrew Morton
2006-07-09 12:56 ` 2.6.18-rc1-mm1 Alan Cox
2006-07-09 18:35 ` 2.6.18-rc1-mm1 Andi Kleen
@ 2006-07-09 21:10 ` john stultz
2 siblings, 0 replies; 15+ messages in thread
From: john stultz @ 2006-07-09 21:10 UTC (permalink / raw)
To: Andrew Morton
Cc: Reuben Farrelly, linux-kernel, alan, linux-acpi, rdunlap, greg,
Andi Kleen
On Sun, 2006-07-09 at 05:22 -0700, Andrew Morton wrote:
> On Sun, 09 Jul 2006 23:22:14 +1200
> Reuben Farrelly <reuben-lkml@reub.net> wrote:
> > Note also the message midway through about losing some ticks, which if I recall
> > correctly is not new to this -mm release. I'm not sure who to cc about this.
>
> John stuff. I suspect it's natural and normal, if the IDE error handling
> did something rude with interrupt holdoff.
Actually that's an x86-64 system, so the timekeeping changes shouldn't
really being affecting it.
thanks
-john
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 17:33 ` 2.6.18-rc1-mm1 Randy.Dunlap
@ 2006-07-09 21:40 ` Andrew Morton
0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2006-07-09 21:40 UTC (permalink / raw)
To: Randy.Dunlap
Cc: reuben-lkml, mingo, linux-kernel, alan, linux-acpi, greg,
Thomas Gleixner
On Sun, 9 Jul 2006 10:33:12 -0700
"Randy.Dunlap" <rdunlap@xenotime.net> wrote:
> > [root@tornado ~]# cat /proc/interrupts
> > CPU0 CPU1
> > 0: 258266 0 IO-APIC-edge timer
> > 4: 355 0 IO-APIC-edge serial
> > 6: 5 0 IO-APIC-edge floppy
> > 8: 1 0 IO-APIC-edge rtc
> > 9: 0 0 IO-APIC-fasteoi acpi
> > 14: 28 0 IO-APIC-edge libata
> > 15: 0 0 IO-APIC-edge libata
> > 16: 0 0 IO-APIC-fasteoi uhci_hcd:usb5
> > 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
> > 19: 980 0 IO-APIC-fasteoi uhci_hcd:usb3, serial
> > 23: 105 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
> > 313: 82513 0 PCI-MSI-<NULL> eth0
> > 314: 57370 0 PCI-MSI-<NULL> libata
>
> "We" need to fix that <NULL> there.
Seems that irq_desc[i].handle_irq is msi_irq_wo_maskbit_type or
msi_irq_w_maskbit_type and kernel/irq/chip.c:handle_irq_name() doesn't know
about that.
handle_irq_name() is a bit of a crock - this info should be in the irq_desc
struct or somewhere like that.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 11:22 ` 2.6.18-rc1-mm1 Reuben Farrelly
2006-07-09 12:22 ` 2.6.18-rc1-mm1 Andrew Morton
2006-07-09 17:33 ` 2.6.18-rc1-mm1 Randy.Dunlap
@ 2006-07-10 5:35 ` Randy.Dunlap
2 siblings, 0 replies; 15+ messages in thread
From: Randy.Dunlap @ 2006-07-10 5:35 UTC (permalink / raw)
To: Reuben Farrelly; +Cc: akpm, linux-kernel, alan, linux-acpi, greg
On Sun, 09 Jul 2006 23:22:14 +1200 Reuben Farrelly wrote:
> 2. Onto some more minor warnings:
>
> ACPI: bus type pci registered
> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> PCI: Not using MMCONFIG.
> PCI: Using configuration type 1
Yes, I have all of those.
> ACPI: Interpreter enabled
>
> Is there any way to verify that there really is a BIOS bug there? If it is, is
> there anyone within Intel or are there any known contacts who can push and poke
> to get this looked at/fixed? (It's a new Intel board, I'd hope they could get
> it right..).
>
> Plus we're not using MMCONFIG - even though I have it enabled.
>
> Based on previous postings to lkml, I believe Randy Dunlap may have one of these
> boards too - Randy are you seeing this and the next bunch of warnings I am seeing?
Yes, but Len says that they (*/namespace/*) will be going away.
---
~Randy
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: 2.6.18-rc1-mm1
2006-07-09 17:47 2.6.18-rc1-mm1 Brown, Len
@ 2006-07-10 8:48 ` Arjan van de Ven
2006-07-11 23:00 ` 2.6.18-rc1-mm1 Greg KH
0 siblings, 1 reply; 15+ messages in thread
From: Arjan van de Ven @ 2006-07-10 8:48 UTC (permalink / raw)
To: Brown, Len
Cc: greg, linux-acpi, alan, linux-kernel, akpm, mingo,
Reuben Farrelly, Randy.Dunlap
On Sun, 2006-07-09 at 13:47 -0400, Brown, Len wrote:
> >> 2. Onto some more minor warnings:
> >>
> >> ACPI: bus type pci registered
> >> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> >> PCI: Not using MMCONFIG.
> >> PCI: Using configuration type 1
> >> ACPI: Interpreter enabled
> >>
> >> Is there any way to verify that there really is a BIOS bug
> >there? If it is, is there anyone within Intel or are there any
> >known contacts
> >who can push and poke > to get this looked at/fixed?
> >(It's a new Intel board, I'd hope they could get it right..).
>
> Arjan should probably comment on that one.
I could.. but please next time if you want to CC me use an email address
I actually read ;)
Greg has a patch to relax this check, and Rajesh has a further patch to
relax it more. However, to a large degree we cannot relax it too much
without breaking the reason this check is there: detect a buggy MCFG
table and not crash and burn later on, but rather just not use it.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-09 18:35 ` 2.6.18-rc1-mm1 Andi Kleen
@ 2006-07-11 19:37 ` Greg KH
0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2006-07-11 19:37 UTC (permalink / raw)
To: Andi Kleen
Cc: Andrew Morton, Reuben Farrelly, linux-kernel, alan, linux-acpi,
rdunlap, greg, john stultz
On Sun, Jul 09, 2006 at 08:35:38PM +0200, Andi Kleen wrote:
> On Sun, Jul 09, 2006 at 05:22:52AM -0700, Andrew Morton wrote:
> > > ACPI: bus type pci registered
> > > PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> > > PCI: Not using MMCONFIG.
> > > PCI: Using configuration type 1
> > > ACPI: Interpreter enabled
> > >
> > > Is there any way to verify that there really is a BIOS bug there? If it is, is
> > > there anyone within Intel or are there any known contacts who can push and poke
> > > to get this looked at/fixed? (It's a new Intel board, I'd hope they could get
> > > it right..).
> > >
> > > Plus we're not using MMCONFIG - even though I have it enabled.
> >
> > Andi stuff.
>
> Greg has patches to relax the checking a bit.
>
> I don't know when he intends to merge them.
I think they are all merged into 2.6.18-rc1, as I do't see any MMCONFIG
patches in my tree right now, nor in my queue.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
2006-07-10 8:48 ` 2.6.18-rc1-mm1 Arjan van de Ven
@ 2006-07-11 23:00 ` Greg KH
0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2006-07-11 23:00 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Brown, Len, linux-acpi, alan, linux-kernel, akpm, mingo,
Reuben Farrelly, Randy.Dunlap
On Mon, Jul 10, 2006 at 10:48:49AM +0200, Arjan van de Ven wrote:
> On Sun, 2006-07-09 at 13:47 -0400, Brown, Len wrote:
> > >> 2. Onto some more minor warnings:
> > >>
> > >> ACPI: bus type pci registered
> > >> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> > >> PCI: Not using MMCONFIG.
> > >> PCI: Using configuration type 1
> > >> ACPI: Interpreter enabled
> > >>
> > >> Is there any way to verify that there really is a BIOS bug
> > >there? If it is, is there anyone within Intel or are there any
> > >known contacts
> > >who can push and poke > to get this looked at/fixed?
> > >(It's a new Intel board, I'd hope they could get it right..).
> >
> > Arjan should probably comment on that one.
>
> I could.. but please next time if you want to CC me use an email address
> I actually read ;)
>
> Greg has a patch to relax this check, and Rajesh has a further patch to
> relax it more.
Hm, no, my patch should already be in 2.6.18-rc1, I don't have any
pending MMCONFIG patches in my queue or tree.
So if you think I'm missing one, please resend it to me.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.18-rc1-mm1
@ 2006-07-12 18:12 Chuck Ebbert
0 siblings, 0 replies; 15+ messages in thread
From: Chuck Ebbert @ 2006-07-12 18:12 UTC (permalink / raw)
To: Alan Cox
Cc: Arjan van de Ven, Brown, Len, linux-acpi, Andrew Morton,
Rajesh Shah, Reuben Farrelly, Randy Dunlap, linux-kernel,
Ingo Molnar
In-Reply-To: <20060711230055.GL18838@kroah.com>
On Tue, 11 Jul 2006 16:00:55 -0700, Greg KH wrote:
> > > >> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> > > >> PCI: Not using MMCONFIG.
> > > >> PCI: Using configuration type 1
> > > >> ACPI: Interpreter enabled
> > > >>
> > > >> Is there any way to verify that there really is a BIOS bug
> > > >there? If it is, is there anyone within Intel or are there any
> > > >known contacts
> > > >who can push and poke > to get this looked at/fixed?
> > > >(It's a new Intel board, I'd hope they could get it right..).
> > >
> > > Arjan should probably comment on that one.
> >
> > I could.. but please next time if you want to CC me use an email address
> > I actually read ;)
> >
> > Greg has a patch to relax this check, and Rajesh has a further patch to
> > relax it more.
>
> Hm, no, my patch should already be in 2.6.18-rc1, I don't have any
> pending MMCONFIG patches in my queue or tree.
>
> So if you think I'm missing one, please resend it to me.
What happened to:
http://lkml.org/2006/6/26/640
[patch 0/2] PCI: improve extended config space verification - take #2
I tested the first round of this patchset on i386 and it worked for me (TM).
--
Chuck
"You can't read a newspaper if you can't read." --George W. Bush
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-07-12 18:12 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060709021106.9310d4d1.akpm@osdl.org>
2006-07-09 11:22 ` 2.6.18-rc1-mm1 Reuben Farrelly
2006-07-09 12:22 ` 2.6.18-rc1-mm1 Andrew Morton
2006-07-09 12:56 ` 2.6.18-rc1-mm1 Alan Cox
2006-07-09 14:21 ` 2.6.18-rc1-mm1 Reuben Farrelly
2006-07-09 16:29 ` 2.6.18-rc1-mm1 Jeff Garzik
2006-07-09 18:35 ` 2.6.18-rc1-mm1 Andi Kleen
2006-07-11 19:37 ` 2.6.18-rc1-mm1 Greg KH
2006-07-09 21:10 ` 2.6.18-rc1-mm1 john stultz
2006-07-09 17:33 ` 2.6.18-rc1-mm1 Randy.Dunlap
2006-07-09 21:40 ` 2.6.18-rc1-mm1 Andrew Morton
2006-07-10 5:35 ` 2.6.18-rc1-mm1 Randy.Dunlap
2006-07-09 17:47 2.6.18-rc1-mm1 Brown, Len
2006-07-10 8:48 ` 2.6.18-rc1-mm1 Arjan van de Ven
2006-07-11 23:00 ` 2.6.18-rc1-mm1 Greg KH
-- strict thread matches above, loose matches on Subject: below --
2006-07-12 18:12 2.6.18-rc1-mm1 Chuck Ebbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).