* 2.6.25 md oops during boot.
@ 2008-06-04 15:41 Dave Jones
2008-06-04 23:12 ` Neil Brown
0 siblings, 1 reply; 3+ messages in thread
From: Dave Jones @ 2008-06-04 15:41 UTC (permalink / raw)
To: Neil Brown; +Cc: Linux Kernel
Hi Neil,
Here's an odd one.
https://bugzilla.redhat.com/show_bug.cgi?id=442204
Slightly old (.25-rc8-git7), but I don't recall anything changing between
then and .25 final that could explain this.
Dave
BUG: unable to handle kernel NULL pointer dereference at 00000020
IP: [<c04bbb65>] sysfs_addrm_start+0x21/0x7d
Oops: 0000 [#1] SMP
Modules linked in: e1000(+) i2c_piix4 i2c_core sg ac button sr_mod cdrom
ata_piix libata dm_snapshot dm_zero dm_mirror dm_mod mptspi mptscsih mptbase
scsi_transport_spi sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
[last unloaded: scsi_wait_scan]
Pid: 742, comm: mdadm Not tainted (2.6.25-0.218.rc8.git7.fc9.i686 #1)
EIP: 0060:[<c04bbb65>] EFLAGS: 00010246 CPU: 0
EIP is at sysfs_addrm_start+0x21/0x7d
EAX: c04bbc23 EBX: 00000000 ECX: 00000000 EDX: 00000062
ESI: f6ecdc84 EDI: f6ecdc94 EBP: f6ecdc78 ESP: f6ecdc6c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mdadm (pid: 742, ti=f6ecd000 task=f6e22e80 task.ti=f6ecd000)
Stack: f6ecdc84 f6e75bd0 fffffff4 f6ecdca0 c04bbfb7 00000000 00000000 00000000
00000000 00000000 f7853dc4 fffffffe f7af6824 f6ecdcb4 c04bc01c f6ecdcac
c04f08f2 f7853dc4 f6ecdcd0 c04f09ec f7af6824 f6ecdcd0 00000000 f7853dc4
Call Trace:
[<c04bbfb7>] ? create_dir+0x3a/0x72
[<c04bc01c>] ? sysfs_create_dir+0x2d/0x41
[<c04f08f2>] ? kobject_get+0x12/0x17
[<c04f09ec>] ? kobject_add_internal+0xa4/0x145
[<c04f0b21>] ? kobject_add_varg+0x35/0x41
[<c04f0b92>] ? kobject_add+0x43/0x49
[<c05948aa>] ? bind_rdev_to_array+0x124/0x1ab
[<c04cf900>] ? task_has_capability+0x47/0x76
[<c04f4f94>] ? copy_from_user+0x39/0x121
[<c05994d1>] ? md_ioctl+0xf75/0x183b
[<c04cfde2>] ? inode_has_perm+0x5b/0x65
[<c0490ce5>] ? d_free+0x3b/0x4d
[<c0492094>] ? dput+0x34/0xee
[<c04efb51>] ? _atomic_dec_and_lock+0x29/0x44
[<c04958b8>] ? mntput_no_expire+0x16/0x69
[<c0488796>] ? path_put+0x20/0x23
[<c048ab1b>] ? __link_path_walk+0xce7/0xcfc
[<c04e831d>] ? blkdev_driver_ioctl+0x49/0x5b
[<c04e8ab0>] ? blkdev_ioctl+0x781/0x79d
[<c04d06a8>] ? selinux_file_free_security+0x14/0x16
[<c04cfde2>] ? inode_has_perm+0x5b/0x65
[<c04d01b9>] ? file_has_perm+0x7c/0x85
[<c04a241b>] ? block_ioctl+0x16/0x1b
[<c04a2405>] ? block_ioctl+0x0/0x1b
[<c048c99e>] ? vfs_ioctl+0x22/0x69
[<c048cc1e>] ? do_vfs_ioctl+0x239/0x24c
[<c04d034f>] ? selinux_file_ioctl+0xa8/0xab
[<c048cc71>] ? sys_ioctl+0x40/0x5b
[<c0405cf6>] ? syscall_call+0x7/0xb
=======================
Code: 89 31 8d 65 f4 5b 5e 5f 5d c3 55 b9 04 00 00 00 89 e5 57 89 c7 56 89 c6 53
31 c0 89 d3 f3 ab b8 34 fc 71 c0 89 16 e8 1f e2 16 00 <8b> 53 20 b9 f0 b7 4b c0
a1 88 f4 83 c0 53 e8 43 71 fd ff 5f 85
EIP: [<c04bbb65>] sysfs_addrm_start+0x21/0x7d SS:ESP 0068:f6ecdc6c
---[ end trace ca143223eefdc828 ]---
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: 2.6.25 md oops during boot. 2008-06-04 15:41 2.6.25 md oops during boot Dave Jones @ 2008-06-04 23:12 ` Neil Brown 2008-06-04 23:29 ` Neil Brown 0 siblings, 1 reply; 3+ messages in thread From: Neil Brown @ 2008-06-04 23:12 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel On Wednesday June 4, davej@redhat.com wrote: > Hi Neil, > Here's an odd one. > https://bugzilla.redhat.com/show_bug.cgi?id=442204 > > Slightly old (.25-rc8-git7), but I don't recall anything changing between > then and .25 final that could explain this. > > Dave Hi Dave. Yes, Odd. It appear that sysfs_addrm_start is being called with parent_sd == NULL. That implies that sysfs_create_dir is being given a kobj where ->parent is non-NULL, and ->parent->sd is NULL. So kobject_add is being given a parent with a NULL ->sd. So in bind_rdev_to_array, mddev->kobj.sd is NULL. So in md_probe, either kobject_init_and_add is failing to set up ->sd properly (which should result in an error message "md: cannot register md0/md - name in use" ) or alloc_disk is failing. The most likely scenario is that alloc_disk is failing, so the md_probe call in autorun_devices (line 3804 of md.c) fails. The following mddev_find creates a new mddev which is not properly initialised and gets used. I wouldn't say this is a likely scenario as it requires (I think) kmalloc failure very early in boot. But I cannot see any other possible cause. I'll see about getting the error paths handled better. NeilBrown ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 2.6.25 md oops during boot. 2008-06-04 23:12 ` Neil Brown @ 2008-06-04 23:29 ` Neil Brown 0 siblings, 0 replies; 3+ messages in thread From: Neil Brown @ 2008-06-04 23:29 UTC (permalink / raw) To: Dave Jones, Linux Kernel On Thursday June 5, neilb@suse.de wrote: > > I wouldn't say this is a likely scenario as it requires (I think) > kmalloc failure very early in boot. But I cannot see any other > possible cause. On closer inspection, I can see another possible cause. I don't think it is likely (yet) but it might be possible. If two threads enter md_probe for the same mddev, then the second one to get disks_mutex could exit before the first had called kobject_init_and_add, so it could make available an mddev where kobj.sd was NULL. I cannot imagine how two threads could be doing that so early in boot, but I cannot rule it out. This (untested) patch should close both these possible problems. NeilBrown ----------------- Fix error paths if md_probe fails. md_probe can fail (e.g. alloc_disk could fail) without returning an error (as it alway returns NULL). So when we call mddev_find immediately afterwards, we need to check that md_probe actually succeeded. This means checking that mdev->gendisk is non-NULL. Also there is a possible race - if two threads call md_probe for the same device, then one could exit (having checked that ->gendisk exists) before the other has called kobject_init_and_add, thus returning an incomplete kobj which is cause problems when we try to add children to it. So extend the range of protection of disks_mutex slightly to avoid this possibility. Cc: Dave Jones <davej@redhat.com> Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2008-06-03 16:35:41.000000000 +1000 +++ ./drivers/md/md.c 2008-06-05 09:19:56.000000000 +1000 @@ -3363,9 +3363,9 @@ static struct kobject *md_probe(dev_t de disk->queue = mddev->queue; add_disk(disk); mddev->gendisk = disk; - mutex_unlock(&disks_mutex); error = kobject_init_and_add(&mddev->kobj, &md_ktype, &disk->dev.kobj, "%s", "md"); + mutex_unlock(&disks_mutex); if (error) printk(KERN_WARNING "md: cannot register %s/md - name in use\n", disk->disk_name); @@ -3935,8 +3935,10 @@ static void autorun_devices(int part) md_probe(dev, NULL, NULL); mddev = mddev_find(dev); - if (!mddev) { - printk(KERN_ERR + if (!mddev || !mddev->gendisk) { + if (mddev) + mddev_put(mddev); + printk(KERN_ERR "md: cannot allocate memory for md drive.\n"); break; } ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-06-04 23:30 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-04 15:41 2.6.25 md oops during boot Dave Jones 2008-06-04 23:12 ` Neil Brown 2008-06-04 23:29 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox