* Re: Oops in 2.6.10-rc1 (almost solved)
@ 2004-11-13 3:45 Chuck Ebbert
2004-11-13 14:28 ` Matt Domsch
0 siblings, 1 reply; 9+ messages in thread
From: Chuck Ebbert @ 2004-11-13 3:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel, Matt Domsch
On Tue, 9 Nov 2004 at 17:01:10 -0800 Linus Torvalds <torvalds@osdl.org> wrote:
> > PS: do you have *any* idea how this could be related to the snd-es1371
> > driver (which is producing the oops then)?
>
> I bet it's overwriting some array, and just corrupting memory after it.
> For example, the edd_info[] array only has 6 entries,
That's almost certainly the problem. There can be up to 16 EDD devices
as of the Jun 30 update to the EDD code.
And sound_class is the next item after edd_info[] in my System.map...
--Chuck Ebbert 12-Nov-04 22:21:27
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-13 3:45 Oops in 2.6.10-rc1 (almost solved) Chuck Ebbert @ 2004-11-13 14:28 ` Matt Domsch 2004-11-13 18:55 ` Matt Domsch 2004-11-14 2:58 ` Matt Domsch 0 siblings, 2 replies; 9+ messages in thread From: Matt Domsch @ 2004-11-13 14:28 UTC (permalink / raw) To: Chuck Ebbert, Christian Kujau; +Cc: Linus Torvalds, linux-kernel On Fri, Nov 12, 2004 at 10:45:12PM -0500, Chuck Ebbert wrote: > On Tue, 9 Nov 2004 at 17:01:10 -0800 Linus Torvalds <torvalds@osdl.org> wrote: > > > > PS: do you have *any* idea how this could be related to the snd-es1371 > > > driver (which is producing the oops then)? > > > > I bet it's overwriting some array, and just corrupting memory after it. > > For example, the edd_info[] array only has 6 entries, > > That's almost certainly the problem. There can be up to 16 EDD devices > as of the Jun 30 update to the EDD code. Bingo... edd_devices[] was too short. When we keep more than 6 signatures, it overruns the end. Also, I rewrote edd_num_devices to be clearer about its goal. This patch is necessary even after the last edd.S patch was reverted. It still doesn't explain why Christian's BIOS reports more devices than he has, that's still UI, so don't re-apply the edd.S patch just reverted. Signed-off-by: Matt Domsch -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ===== drivers/firmware/edd.c 1.30 vs edited ===== --- 1.30/drivers/firmware/edd.c 2004-06-29 09:44:48 -05:00 +++ edited/drivers/firmware/edd.c 2004-11-13 07:56:00 -06:00 @@ -70,7 +70,7 @@ static int edd_dev_is_type(struct edd_device *edev, const char *type); static struct pci_dev *edd_get_pci_dev(struct edd_device *edev); -static struct edd_device *edd_devices[EDDMAXNR]; +static struct edd_device *edd_devices[EDD_MBR_SIG_MAX]; #define EDD_DEVICE_ATTR(_name,_mode,_show,_test) \ struct edd_attribute edd_attr_##_name = { \ @@ -728,9 +728,9 @@ static inline int edd_num_devices(void) { - return min_t(unsigned char, - max_t(unsigned char, edd.edd_info_nr, edd.mbr_signature_nr), - max_t(unsigned char, EDD_MBR_SIG_MAX, EDDMAXNR)); + return max_t(unsigned char, + min_t(unsigned char, EDD_MBR_SIG_MAX, edd.mbr_signature_nr), + min_t(unsigned char, EDDMAXNR, edd.edd_info_nr)); } /** ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-13 14:28 ` Matt Domsch @ 2004-11-13 18:55 ` Matt Domsch 2004-11-14 2:58 ` Matt Domsch 1 sibling, 0 replies; 9+ messages in thread From: Matt Domsch @ 2004-11-13 18:55 UTC (permalink / raw) To: Chuck Ebbert, Christian Kujau; +Cc: Linus Torvalds, linux-kernel On Sat, Nov 13, 2004 at 08:28:35AM -0600, Matt Domsch wrote: > On Fri, Nov 12, 2004 at 10:45:12PM -0500, Chuck Ebbert wrote: > > On Tue, 9 Nov 2004 at 17:01:10 -0800 Linus Torvalds <torvalds@osdl.org> wrote: > > > > > > PS: do you have *any* idea how this could be related to the snd-es1371 > > > > driver (which is producing the oops then)? > > > > > > I bet it's overwriting some array, and just corrupting memory after it. > > > For example, the edd_info[] array only has 6 entries, > > > > That's almost certainly the problem. There can be up to 16 EDD devices > > as of the Jun 30 update to the EDD code. > > Bingo... edd_devices[] was too short. When we keep more > than 6 signatures, it overruns the end. In particular, depending on your .config, with EDD=y it overwrites 40 bytes past the end of edd_devices (here I've already extended it by the necessary amount, but the 40 bytes past its end are all subject to be overwritten): c043a880 b edd_devices c043a8c0 b pci_bios_present c043a8c4 B pci_mmcfg_base_addr c043a8c8 b mmcfg_last_accessed_device c043a8cc b called.0 c043a8d0 B pcibios_enable_irq c043a8d4 b eisa_irq_mask.0 c043a8d8 b broken_hp_bios_irq9 c043a8dc b acer_tm360_irqrouting c043a8e0 b pirq_table c043a8e4 b pirq_router hence the failure Christian saw and attributed to the sound drivers: EIP is at 0xc15d5820 eax: 00000000 ebx: dff20400 ecx: c15d5820 edx: dff205c4 esi: ffffffed edi: dff20400 ebp: dff20400 esp: c17a3e58 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 178, threadinfo=c17a2000 task=dfcf05a0) Stack: c01fa5c8 dff20400 000007ff dff20400 c01fa5ff dff20400 000007ff c15ea400 e082729d dff20400 c15ea400 00000000 e08469df c15ea400 000001f8 000000d0 000000d0 df45ed14 00000000 c018e14e c15ea400 ffffffed dff20400 dff20400 Call Trace: [<c01fa5c8>] pci_enable_device_bars+0x28/0x40 [<c01fa5ff>] pci_enable_device+0x1f/0x40 [<e082729d>] snd_ensoniq_create+0x1d/0x480 [snd_ens1371] [<e08469df>] snd_card_new+0x1cf/0x2c0 [snd] [<c018e14e>] sysfs_new_dirent+0x2e/0x90 [<e0827867>] snd_audiopci_probe+0x87/0x1e0 [snd_ens1371] [<c01fb012>] pci_device_probe_static+0x52/0x70 [<c01fb05c>] __pci_device_probe+0x2c/0x30 [<c01fb08c>] pci_device_probe+0x2c/0x60 [<c0258f4f>] driver_probe_device+0x2f/0x80 [<c02590b2>] driver_attach+0x52/0xa0 [<c02595f8>] bus_add_driver+0x98/0xe0 [<c0259c5f>] driver_register+0x2f/0x40 [<c01fb340>] pci_register_driver+0x40/0x50 [<e08279cf>] alsa_card_ens137x_init+0xf/0x13 [snd_ens1371] [<c0134279>] sys_init_module+0x169/0x240 [<c01041eb>] syscall_call+0x7/0xb With CONFIG_EDD=m, there just wasn't anything interesting in memory following edd_devices[] (thanks module loader for using whole pages I believe). -Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-13 14:28 ` Matt Domsch 2004-11-13 18:55 ` Matt Domsch @ 2004-11-14 2:58 ` Matt Domsch 2004-11-14 4:43 ` Linus Torvalds ` (2 more replies) 1 sibling, 3 replies; 9+ messages in thread From: Matt Domsch @ 2004-11-14 2:58 UTC (permalink / raw) To: Christian Kujau; +Cc: Linus Torvalds, linux-kernel, Chuck Ebbert On Sat, Nov 13, 2004 at 08:28:35AM -0600, Matt Domsch wrote: > It still doesn't explain why Christian's BIOS reports more devices > than he has, that's still UI, so don't re-apply the edd.S patch just reverted. Alexander van Heukelum noted to me that addw here modifies CF, so I think something like should fix that. Christian, if you're in a position to test this, I'd really appreciate it. You've been a fantastic bug reporter / tester! Not ready for Linus yet, and you'll need to re-apply the previous edd.S patch which is now reverted in Linus's tree. As your BIOS reports via CHECK EXTENSIONS PRESENT that you've got more devices than you actually have, hopefully the int13 EXTENDED READ won't succeed for non-existant devices anymore, and then neither will the READ SECTORS call. -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ===== arch/i386/boot/edd.S 1.3 vs edited ===== --- 1.3/arch/i386/boot/edd.S 2004-10-20 03:37:11 -05:00 +++ edited/arch/i386/boot/edd.S 2004-11-13 20:31:58 -06:00 @@ -58,8 +58,12 @@ sti # work around buggy BIOSes popw %dx popw %si - addw $EDD_DEV_ADDR_PACKET_LEN, %sp # remove packet from stack - jnc edd_mbr_store_sig + pushfl # save EFLAGS into ebx + popl %ebx # because addw modifies CF + addw $EDD_DEV_ADDR_PACKET_LEN, %sp # remove packet from stack + pushl %ebx # get back right CF + popfl + jnc edd_mbr_store_sig # otherwise, fall through to the legacy read function edd_mbr_read_sectors: ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-14 2:58 ` Matt Domsch @ 2004-11-14 4:43 ` Linus Torvalds 2004-11-14 11:45 ` Christian 2004-11-14 20:02 ` Christian Kujau 2 siblings, 0 replies; 9+ messages in thread From: Linus Torvalds @ 2004-11-14 4:43 UTC (permalink / raw) To: Matt Domsch; +Cc: Christian Kujau, linux-kernel, Chuck Ebbert On Sat, 13 Nov 2004, Matt Domsch wrote: > > Not ready for Linus yet Indeed. Please don't use pushfl/popfl to save the carry flag. There are tons of better ways. For example, use "lea" instead of "add" to not write the flags (and add a comment). Or save the carry flag in a register with sbb %bx,%bx ant test %bx later. Or any of a million other _standard_ ways to handle this problem. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-14 2:58 ` Matt Domsch 2004-11-14 4:43 ` Linus Torvalds @ 2004-11-14 11:45 ` Christian 2004-11-14 20:02 ` Christian Kujau 2 siblings, 0 replies; 9+ messages in thread From: Christian @ 2004-11-14 11:45 UTC (permalink / raw) To: Matt Domsch; +Cc: Linus Torvalds, linux-kernel, Chuck Ebbert Matt Domsch wrote: > > Alexander van Heukelum noted to me that addw here modifies CF, so I > think something like should fix that. Christian, if you're in a > position to test this, I'd really appreciate it. You've been a yes, i'll do so. right now i am off (and late) to sth. else, but i'll test this in the evening. thank you, Christian. -- BOFH excuse #318: Your EMAIL is now being delivered by the USPS. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-14 2:58 ` Matt Domsch 2004-11-14 4:43 ` Linus Torvalds 2004-11-14 11:45 ` Christian @ 2004-11-14 20:02 ` Christian Kujau 2004-11-14 21:55 ` Matt Domsch 2 siblings, 1 reply; 9+ messages in thread From: Christian Kujau @ 2004-11-14 20:02 UTC (permalink / raw) To: Matt Domsch; +Cc: Linus Torvalds, linux-kernel, Chuck Ebbert -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 sorry, took me a bit longer to get to the testing. Matt Domsch schrieb: > > Not ready for Linus yet, and you'll need to re-apply the previous > edd.S patch which is now reverted in Linus's tree. As your BIOS i've applied the patch to a pristine 2.6.10-rc1, so the (currently reverted) EDD change is still there. tell me, if the patch had to be applied to sth. else. but for now i have to say, that it still oopses: http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.10-rc1_edd-2.txt ... BIOS EDD facility v0.16 2004-Jun-25, 16 devices found ... (oh, i've added an ide-disk yesterday, so hde will show up in dmesg.) sorry, Christian. - -- BOFH excuse #401: Sales staff sold a product we don't offer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBl7nZ+A7rjkF8z0wRAvuHAKCX8TWiDt5DP25OqBEWKecfM6x3HwCeNRoM 1IzHqKpcbWOABXWJ4vC4d1w= =FiKX -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Oops in 2.6.10-rc1 (almost solved) 2004-11-14 20:02 ` Christian Kujau @ 2004-11-14 21:55 ` Matt Domsch 2004-11-15 12:41 ` Oops in 2.6.10-rc1 (solved) Christian Kujau 0 siblings, 1 reply; 9+ messages in thread From: Matt Domsch @ 2004-11-14 21:55 UTC (permalink / raw) To: Christian Kujau; +Cc: Linus Torvalds, linux-kernel, Chuck Ebbert On Sun, Nov 14, 2004 at 09:02:33PM +0100, Christian Kujau wrote: > > Not ready for Linus yet, and you'll need to re-apply the previous > > edd.S patch which is now reverted in Linus's tree. As your BIOS > > i've applied the patch to a pristine 2.6.10-rc1, so the (currently > reverted) EDD change is still there. tell me, if the patch had to be > applied to sth. else. > > but for now i have to say, that it still oopses: > > http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.10-rc1_edd-2.txt OK, the patch below (which Linus applied to his tree yesterday) should fix the oopses. > BIOS EDD facility v0.16 2004-Jun-25, 16 devices found but the patch to edd.S doesn't resolve that EDD believes you've got 16 devices (I would expect it to report 2, as you have only 2 disks). Thanks for the quick testing. Back to the drawing board though for this second part. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ===== drivers/firmware/edd.c 1.30 vs edited ===== --- 1.30/drivers/firmware/edd.c 2004-06-29 09:44:48 -05:00 +++ edited/drivers/firmware/edd.c 2004-11-13 07:56:00 -06:00 @@ -70,7 +70,7 @@ static int edd_dev_is_type(struct edd_device *edev, const char *type); static struct pci_dev *edd_get_pci_dev(struct edd_device *edev); -static struct edd_device *edd_devices[EDDMAXNR]; +static struct edd_device *edd_devices[EDD_MBR_SIG_MAX]; #define EDD_DEVICE_ATTR(_name,_mode,_show,_test) \ struct edd_attribute edd_attr_##_name = { \ @@ -728,9 +728,9 @@ static inline int edd_num_devices(void) { - return min_t(unsigned char, - max_t(unsigned char, edd.edd_info_nr, edd.mbr_signature_nr), - max_t(unsigned char, EDD_MBR_SIG_MAX, EDDMAXNR)); + return max_t(unsigned char, + min_t(unsigned char, EDD_MBR_SIG_MAX, edd.mbr_signature_nr), + min_t(unsigned char, EDDMAXNR, edd.edd_info_nr)); } /** ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Oops in 2.6.10-rc1 (solved) 2004-11-14 21:55 ` Matt Domsch @ 2004-11-15 12:41 ` Christian Kujau 0 siblings, 0 replies; 9+ messages in thread From: Christian Kujau @ 2004-11-15 12:41 UTC (permalink / raw) To: linux-kernel; +Cc: Matt Domsch, Linus Torvalds, Chuck Ebbert -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Matt Domsch schrieb: > > OK, the patch below (which Linus applied to his tree yesterday) should > fix the oopses. > so i've compiled a pristine 2.6.10-rc1-bk24 as your patch should be included there (i've tried to apply your patch with --dry-run -> it did not succeed, -R *would* have been successful) and finally it works! http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.10-rc1-bk24.txt http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/config-2.6.10-rc1-bk24 snd_ens1371 is working fine, no oops, i can load/unload the drivers, no problems ;-) > >>BIOS EDD facility v0.16 2004-Jun-25, 16 devices found > > but the patch to edd.S doesn't resolve that EDD believes you've got 16 > devices (I would expect it to report 2, as you have only 2 disks). but still: BIOS EDD facility v0.16 2004-Jun-25, 6 devices found i have 2 disks now (1 ide, 1 scsi), 2 cdrom drives (ide). as you can see from the dmesg, i have an additional ide-controller onboard: PDC20265: chipset revision 2 PDC20265: ROM enabled at 0xdffe0000 PDC20265: 100% native mode on irq 10 PDC20265: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode. ide2: BM-DMA at 0xb400-0xb407, BIOS settings: hde:DMA, hdf:DMA ide3: BM-DMA at 0xb408-0xb40f, BIOS settings: hdg:pio, hdh:pio Probing IDE interface ide2... hde: ST320413A, ATA DISK drive ide2 at 0xbc00-0xbc07,0xb802 on irq 10 Probing IDE interface ide3... Probing IDE interface ide1... Probing IDE interface ide3... Probing IDE interface ide4... ide4: Wait for ready failed before probe ! Probing IDE interface ide5... ide5: Wait for ready failed before probe ! but there are only 4 ide channels on my board (Gigabyte GA7ZXR): ide0 - with hda+hdb connected (2x cdrom) ide1 - none ide2 - with hde connected (ST320413A) ide3 - none so it's probing for a non-existent ide4+ide5! but it did that even in -bk4 times, so it's not "new behaviour", i guess. http://www.nerdbynature.de/bits/prinz/2.6.10-rc1/dmesg-2.6.9-bk4.txt anyway, it's working now, the oops is gone, but i can do further testing regarding this EDD issue of course. Thanks to all involved, Christian. - -- BOFH excuse #195: We only support a 28000 bps connection. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBmKP4+A7rjkF8z0wRAooxAJ9dD5QEXsEPUJjlBNvtfhtPteGoNwCfdfCA tsYq86N5Y/bpegSXYWS+nkw= =kFOh -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-11-15 12:41 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-11-13 3:45 Oops in 2.6.10-rc1 (almost solved) Chuck Ebbert 2004-11-13 14:28 ` Matt Domsch 2004-11-13 18:55 ` Matt Domsch 2004-11-14 2:58 ` Matt Domsch 2004-11-14 4:43 ` Linus Torvalds 2004-11-14 11:45 ` Christian 2004-11-14 20:02 ` Christian Kujau 2004-11-14 21:55 ` Matt Domsch 2004-11-15 12:41 ` Oops in 2.6.10-rc1 (solved) Christian Kujau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox