* What protection does sysfs_readdir have with SMP/Preemption?
@ 2005-11-22 21:33 Steven Rostedt
2005-11-22 21:39 ` Greg KH
0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2005-11-22 21:33 UTC (permalink / raw)
To: LKML; +Cc: Ingo Molnar, Greg KH
Hi,
I'm developing a custom kernel on top of Ingo's -rt patch. My kernel
makes race conditions in the vanilla kernel show up very well :-)
I just hit a bug, actually a page fault in fs/sysfs/dir.c in
sysfs_readdir:
for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
struct sysfs_dirent *next;
const char * name;
int len;
next = list_entry(p, struct sysfs_dirent,
s_sibling);
if (!next->s_element)
continue;
name = sysfs_get_name(next);
len = strlen(name);
if (next->s_dentry)
ino = next->s_dentry->d_inode->i_ino;
^^^^
This is where I had a bad pointer reference.
else
ino = iunique(sysfs_sb, 2);
if (filldir(dirent, name, len, filp->f_pos, ino,
dt_type(next)) < 0)
return 0;
Looking at this code, I don't see anything protecting the s_dentry. For
example, couldn't the following happen:
sysfs_create_dir is called, which calls create_dir. Now we create a
dentry with no d_inode. In sysfs_make_dirent which calls
sysfs_new_dirent which adds to the parents s_children. Then
sysfs_make_dirent sets s_dentry = dentry (the one that was just made
with no d_inode assigned yet). Then create_dir calls sysfs_create which
finally assigns the d_inode.
So, either there is some hidden protection and my modification to the
kernel has caused this to bug, or we have just been lucky the whole time
in the vanilla kernel.
-- Steve
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-22 21:33 What protection does sysfs_readdir have with SMP/Preemption? Steven Rostedt @ 2005-11-22 21:39 ` Greg KH 2005-11-23 4:50 ` Maneesh Soni 0 siblings, 1 reply; 21+ messages in thread From: Greg KH @ 2005-11-22 21:39 UTC (permalink / raw) To: Steven Rostedt, Maneesh Soni; +Cc: LKML, Ingo Molnar On Tue, Nov 22, 2005 at 04:33:22PM -0500, Steven Rostedt wrote: > Hi, > > I'm developing a custom kernel on top of Ingo's -rt patch. My kernel > makes race conditions in the vanilla kernel show up very well :-) > > I just hit a bug, actually a page fault in fs/sysfs/dir.c in > sysfs_readdir: > > > > for (p=q->next; p!= &parent_sd->s_children; p=p->next) { > struct sysfs_dirent *next; > const char * name; > int len; > > next = list_entry(p, struct sysfs_dirent, > s_sibling); > if (!next->s_element) > continue; > > name = sysfs_get_name(next); > len = strlen(name); > if (next->s_dentry) > ino = next->s_dentry->d_inode->i_ino; > > ^^^^ > This is where I had a bad pointer reference. > > else > ino = iunique(sysfs_sb, 2); > > if (filldir(dirent, name, len, filp->f_pos, ino, > dt_type(next)) < 0) > return 0; > > > Looking at this code, I don't see anything protecting the s_dentry. For > example, couldn't the following happen: > > sysfs_create_dir is called, which calls create_dir. Now we create a > dentry with no d_inode. In sysfs_make_dirent which calls > sysfs_new_dirent which adds to the parents s_children. Then > sysfs_make_dirent sets s_dentry = dentry (the one that was just made > with no d_inode assigned yet). Then create_dir calls sysfs_create which > finally assigns the d_inode. > > So, either there is some hidden protection and my modification to the > kernel has caused this to bug, or we have just been lucky the whole time > in the vanilla kernel. I think we've been lucky :( Maneesh, any ideas? thanks, greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-22 21:39 ` Greg KH @ 2005-11-23 4:50 ` Maneesh Soni 2005-11-23 8:18 ` Ingo Molnar 2005-11-23 12:56 ` What protection does sysfs_readdir have with SMP/Preemption? Steven Rostedt 0 siblings, 2 replies; 21+ messages in thread From: Maneesh Soni @ 2005-11-23 4:50 UTC (permalink / raw) To: Greg KH; +Cc: Steven Rostedt, LKML, Ingo Molnar On Tue, Nov 22, 2005 at 01:39:47PM -0800, Greg KH wrote: > On Tue, Nov 22, 2005 at 04:33:22PM -0500, Steven Rostedt wrote: > > Hi, > > > > I'm developing a custom kernel on top of Ingo's -rt patch. My kernel > > makes race conditions in the vanilla kernel show up very well :-) > > > > I just hit a bug, actually a page fault in fs/sysfs/dir.c in > > sysfs_readdir: > > > > > > > > for (p=q->next; p!= &parent_sd->s_children; p=p->next) { > > struct sysfs_dirent *next; > > const char * name; > > int len; > > > > next = list_entry(p, struct sysfs_dirent, > > s_sibling); > > if (!next->s_element) > > continue; > > > > name = sysfs_get_name(next); > > len = strlen(name); > > if (next->s_dentry) > > ino = next->s_dentry->d_inode->i_ino; > > > > ^^^^ > > This is where I had a bad pointer reference. > > > > else > > ino = iunique(sysfs_sb, 2); > > > > if (filldir(dirent, name, len, filp->f_pos, ino, > > dt_type(next)) < 0) > > return 0; > > > > > > Looking at this code, I don't see anything protecting the s_dentry. For > > example, couldn't the following happen: > > > > sysfs_create_dir is called, which calls create_dir. Now we create a > > dentry with no d_inode. In sysfs_make_dirent which calls > > sysfs_new_dirent which adds to the parents s_children. Then > > sysfs_make_dirent sets s_dentry = dentry (the one that was just made > > with no d_inode assigned yet). Then create_dir calls sysfs_create which > > finally assigns the d_inode. > > > > So, either there is some hidden protection and my modification to the > > kernel has caused this to bug, or we have just been lucky the whole time > > in the vanilla kernel. > > I think we've been lucky :( > > Maneesh, any ideas? > The dir operation sysfs_readdir() is called under directory inode's i_sem taken in vfs_readdir() and create_dir() also takes parent directory inode's i_sem. So in this case I think following are the relevant steps happening which look safe to me. cpu 0 vfs_readdir() down(dir inode i_sem) sysfs_readdir(dir) parse through dir->s_dirent s_children list up(dir inode i_sem) cpu 1 sysfs_create_dir() create_dir() down(parent dir inode i_sem) lookup_one_len (allocates & makes the new directory dentry visible) sysfs_make_diret() sysfs_new_dirent() attach the new directory s_dirent to parent's s_children list) up(parent dir inode i_sem) Basically, sysfs_readdir for a directory is protected against any addition/deletion in the directory by directory inode's i_sem. But the bad pointer reference seen in sysfs_readdir() has to be debugged. Assumption here is that if there is a dentry attached to s_dirent, there has to be a inode associated becuase negative dentries are not created in sysfs. Is it possible to get some more information about the recreation scenario. Could you enable DEBUG printks for lib/kobject.c and drivers/base/class.c to see the events happening. Thanks Maneesh -- Maneesh Soni Linux Technology Center, IBM India Software Labs, Bangalore, India email: maneesh@in.ibm.com Phone: 91-80-25044990 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-23 4:50 ` Maneesh Soni @ 2005-11-23 8:18 ` Ingo Molnar 2005-11-23 12:35 ` Steven Rostedt ` (2 more replies) 2005-11-23 12:56 ` What protection does sysfs_readdir have with SMP/Preemption? Steven Rostedt 1 sibling, 3 replies; 21+ messages in thread From: Ingo Molnar @ 2005-11-23 8:18 UTC (permalink / raw) To: Maneesh Soni; +Cc: Greg KH, Steven Rostedt, LKML * Maneesh Soni <maneesh@in.ibm.com> wrote: > But the bad pointer reference seen in sysfs_readdir() has to be > debugged. Assumption here is that if there is a dentry attached to > s_dirent, there has to be a inode associated becuase negative dentries > are not created in sysfs. Is it possible to get some more information > about the recreation scenario. Could you enable DEBUG printks for > lib/kobject.c and drivers/base/class.c to see the events happening. on a related note - i've been carrying the patch below in -rt for 2 months (i.e. Steven's kernel has it too), as a workaround against the crash described below. so it appears that the -rt kernel is triggering some genuine sysfs race. [note that it only happens on an SMP kernel, booting an UP kernel or with maxcpus=1 makes the bug go away.] I have done full kobject debugging but no conclusive results. Also, that particular crash happens earliest with PAGEALLOC enabled. [i have packed up the email discussion related to that crash, and i'm sending it to Maneesh separately. Maneesh, any ideas or suggestions?] note that Steven has a dual-core Athlon64 X2 system. Steven, do you get the crash even with maxcpus=1? Ingo ----- i'm occasionally getting the crash below on a PREEMPT_RT kernel. Might be a PREEMPT_RT bug, or might be some sysfs race only visible under PREEMPT_RT. Any ideas? The crash is at: (gdb) list *0xc01a2095 0xc01a2095 is in sysfs_hash_and_remove (fs/sysfs/inode.c:229). 224 } 225 226 void sysfs_hash_and_remove(struct dentry * dir, const char * name) 227 { 228 struct sysfs_dirent * sd; 229 struct sysfs_dirent * parent_sd = dir->d_fsdata; 230 231 if (dir->d_inode == NULL) 232 /* no inode means this hasn't been made visible yet */ 233 return; (gdb) [...] Calling initcall 0xc05ba6e0: spi_transport_init+0x0/0x30() Calling initcall 0xc05ba710: ahc_linux_init+0x0/0xf0() ACPI: PCI Interrupt 0000:03:04.0[A] -> GSI 18 (level, low) -> IRQ 20 scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 <Adaptec aic7899 Ultra160 SCSI adapter> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs Vendor: SEAGATE Model: ST336706LC Rev: 010A Type: Direct-Access ANSI SCSI revision: 03 scsi0:A:0:0: Tagged Queuing enabled. Depth 32 target0:0:0: Beginning Domain Validation target0:0:0: wide asynchronous. target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63) target0:0:0: Ending Domain Validation Vendor: SEAGATE Model: ST336706LC Rev: 010A Type: Direct-Access ANSI SCSI revision: 03 scsi0:A:1:0: Tagged Queuing enabled. Depth 32 target0:0:1: Beginning Domain Validation target0:0:1: wide asynchronous. target0:0:1: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63) target0:0:1: Ending Domain Validation BUG: Unable to handle kernel paging request at virtual address f6c47fc0 printing eip: c01a2095 *pde = 006cc067 *pte = 36c47000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 EIP: 0060:[<c01a2095>] Not tainted VLI EFLAGS: 00010282 (2.6.14-rc2-rt2) EIP is at sysfs_hash_and_remove+0x15/0x110 eax: f6c47f2c ebx: f6c42f64 ecx: c013edb4 edx: f6c3e5b8 esi: f6c42f5c edi: c04fd880 ebp: c277ec88 esp: c277ec70 ds: 007b es: 007b ss: 0068 preempt: 00000001 Process swapper (pid: 1, threadinfo=c277e000 task=c277d900 stack_left=3132 worst_left=-1) Stack: f6c4a3b8 f6c3e5b8 f6c47f2c f6c42f64 f6c42f5c c04fd880 c277ec90 c01a3ddb c277ecb0 c02a8051 c04fd888 f6c3e5b8 c04fd7a0 f6c42f5c f7ad6c68 c04fd7a0 c277ecbc c02a9d52 00000000 c277ecd4 c02a9f88 f6c42f5c f7ad6c68 f79ac80c Call Trace: [<c010405a>] show_stack+0x7a/0x90 (32) [<c010421e>] show_registers+0x18e/0x1f0 (56) [<c0104420>] die+0x100/0x180 (68) [<c0442ea8>] do_page_fault+0x368/0x556 (92) [<c0103d0b>] error_code+0x4f/0x54 (84) [<c01a3ddb>] sysfs_remove_link+0xb/0x10 (8) [<c02a8051>] class_device_del+0xf1/0x100 (32) [<c02a9d52>] attribute_container_class_device_del+0x12/0x20 (12) [<c02a9f88>] transport_remove_classdev+0x38/0x70 (24) [<c02a9bfd>] attribute_container_device_trigger+0x8d/0xc0 (40) [<c02a9fcd>] transport_remove_device+0xd/0x10 (8) [<c032f01b>] scsi_target_reap+0x9b/0xb0 (20) [<c032feb4>] __scsi_scan_target+0x94/0x130 (44) [<c0330068>] scsi_scan_channel+0x78/0x90 (32) [<c0330109>] scsi_scan_host_selected+0x89/0xf0 (32) [<c0330192>] scsi_scan_host+0x22/0x30 (16) [<c0349a85>] ahc_linux_register_host+0x1b5/0x1c0 (132) [<c034d53d>] ahc_linux_pci_dev_probe+0xed/0x140 (132) [<c022a02d>] pci_call_probe+0xd/0x10 (12) [<c022a081>] __pci_device_probe+0x51/0x60 (20) [<c022a0b9>] pci_device_probe+0x29/0x60 (16) [<c02a6d76>] driver_probe_device+0x36/0xb0 (36) [<c02a6ebd>] __driver_attach+0x4d/0x70 (20) [<c02a6419>] bus_for_each_dev+0x49/0x70 (40) [<c02a6ef9>] driver_attach+0x19/0x20 (12) [<c02a68e1>] bus_add_driver+0x81/0xd0 (36) [<c02a72a1>] driver_register+0x51/0x60 (20) [<c022a34b>] pci_register_driver+0x8b/0xa0 (16) [<c034d59d>] ahc_linux_pci_init+0xd/0x20 (8) [<c05ba799>] ahc_linux_init+0x89/0xf0 (24) [<c059ba62>] do_initcalls+0x32/0xe0 (36) [<c059bb35>] do_basic_setup+0x25/0x30 (8) [<c01003de>] init+0xae/0x2d0 (24) [<c0101359>] kernel_thread_helper+0x5/0xc (1032327196) --------------------------- | preempt count: 00000001 ] | 1-level deep critical section nesting: ---------------------------------------- .. [<c013fc41>] .... add_preempt_count+0x11/0x20 .....[<c01169a0>] .. ( <= try_to_wake_up+0x50/0x440) ------------------------------ | showing all locks held by: | (swapper/1 [c277d900, 116]): ------------------------------ #001: [c067618c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #002: [c0676d0c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #003: [c067788c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #004: [c067840c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #005: [c0678f8c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #006: [c0679b0c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #007: [c067a68c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #008: [c067b20c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #009: [c067bd8c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #010: [c067c90c] {(struct semaphore *)(&hwif->gendev_rel_sem)} ... acquired at: init_hwif_data+0x8d/0x180 #011: [f79a7a00] {(struct semaphore *)(&dev->sem)} ... acquired at: __driver_attach+0x22/0x70 #012: [f7aee8ac] {(struct semaphore *)(&shost->scan_mutex)} ... acquired at: scsi_scan_host_selected+0x56/0xf0 #013: [c04dc604] {kernel_sem.lock} ... acquired at: __reacquire_kernel_lock+0x33/0x70 #014: [c04eed24] {attribute_container_mutex.lock} ... acquired at: attribute_container_device_trigger+0x18/0xc0 Code: 8b 7c 24 08 89 ec 5d c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 83 ec 18 89 5d f4 89 75 f8 89 7d fc 89 45 f0 89 55 ec <8b> b0 94 00 00 00 8b 40 4c 85 c0 75 0e 8b 5d f4 8b 75 f8 8b 7d <0>Kernel panic - not syncing: Attempted to kill init! Signed-off-by: Ingo Molnar <mingo@elte.hu> drivers/base/class.c | 4 ++++ 1 files changed, 4 insertions(+) Index: linux/drivers/base/class.c =================================================================== --- linux.orig/drivers/base/class.c +++ linux/drivers/base/class.c @@ -520,8 +520,10 @@ int class_device_add(struct class_device class_name = make_class_name(class_dev); sysfs_create_link(&class_dev->kobj, &class_dev->dev->kobj, "device"); + /* sysfs_create_link(&class_dev->dev->kobj, &class_dev->kobj, class_name); + */ } /* notify any interfaces this device is now here */ @@ -618,7 +620,9 @@ void class_device_del(struct class_devic if (class_dev->dev) { class_name = make_class_name(class_dev); sysfs_remove_link(&class_dev->kobj, "device"); + /* sysfs_remove_link(&class_dev->dev->kobj, class_name); + */ } if (class_dev->devt_attr) class_device_remove_file(class_dev, class_dev->devt_attr); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-23 8:18 ` Ingo Molnar @ 2005-11-23 12:35 ` Steven Rostedt 2005-11-23 12:54 ` Maneesh Soni 2005-11-23 12:50 ` Maneesh Soni 2005-11-23 12:52 ` [OOPS] sysfs_hash_and_remove (was Re: What protection ....) Maneesh Soni 2 siblings, 1 reply; 21+ messages in thread From: Steven Rostedt @ 2005-11-23 12:35 UTC (permalink / raw) To: Ingo Molnar; +Cc: Maneesh Soni, Greg KH, LKML On Wed, 23 Nov 2005, Ingo Molnar wrote: > > note that Steven has a dual-core Athlon64 X2 system. Steven, do you get > the crash even with maxcpus=1? > Actually Ingo, this happened on my UP test machine, a 368MHz Pentium. But unfortunately, it so far only happened once, and I've been trying to recreate it, with no success. The test that crashed it was running 10 tasks that would read the entire filesystem. I was debugging another bug (something specific to my kernel, or maybe -rt) when I hit this bug. Looking at it, it seemed to not be related to the changes I made. Perhaps it could be related to your changes? -- Steve PS. I only got my AMD64 x2 to debug your kernel ;-) I have no requirement to have my kernel run on it. I just needed a faster machine, also to move off my SMP box to make that a test box too. Since I saw lots of people having problems with -rt and AMD64 I chose to get that one. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-23 12:35 ` Steven Rostedt @ 2005-11-23 12:54 ` Maneesh Soni 0 siblings, 0 replies; 21+ messages in thread From: Maneesh Soni @ 2005-11-23 12:54 UTC (permalink / raw) To: Steven Rostedt; +Cc: Ingo Molnar, Greg KH, LKML On Wed, Nov 23, 2005 at 07:35:04AM -0500, Steven Rostedt wrote: > > On Wed, 23 Nov 2005, Ingo Molnar wrote: > > > > > note that Steven has a dual-core Athlon64 X2 system. Steven, do you get > > the crash even with maxcpus=1? > > > > Actually Ingo, this happened on my UP test machine, a 368MHz Pentium. > > But unfortunately, it so far only happened once, and I've been trying to > recreate it, with no success. The test that crashed it was running 10 > tasks that would read the entire filesystem. I was debugging another bug > (something specific to my kernel, or maybe -rt) when I hit this bug. > Looking at it, it seemed to not be related to the changes I made. Perhaps > it could be related to your changes? > I normally test the sysfs races by running these two loops simultenously on a SMP box. Basically running these will create/delete sysfs files and directories and also do readdir. while true; do insmod drivers/net/dummy.ko; rmmod dummy; done while true; do find /sys/class/net/dummy0/ | xargs cat > /dev/null; done -- Maneesh Soni Linux Technology Center, IBM India Software Labs, Bangalore, India email: maneesh@in.ibm.com Phone: 91-80-25044990 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-23 8:18 ` Ingo Molnar 2005-11-23 12:35 ` Steven Rostedt @ 2005-11-23 12:50 ` Maneesh Soni 2005-11-23 12:52 ` [OOPS] sysfs_hash_and_remove (was Re: What protection ....) Maneesh Soni 2 siblings, 0 replies; 21+ messages in thread From: Maneesh Soni @ 2005-11-23 12:50 UTC (permalink / raw) To: Ingo Molnar; +Cc: Greg KH, Steven Rostedt, LKML On Wed, Nov 23, 2005 at 09:18:45AM +0100, Ingo Molnar wrote: > > * Maneesh Soni <maneesh@in.ibm.com> wrote: > > > But the bad pointer reference seen in sysfs_readdir() has to be > > debugged. Assumption here is that if there is a dentry attached to > > s_dirent, there has to be a inode associated becuase negative dentries > > are not created in sysfs. Is it possible to get some more information > > about the recreation scenario. Could you enable DEBUG printks for > > lib/kobject.c and drivers/base/class.c to see the events happening. > > on a related note - i've been carrying the patch below in -rt for 2 > months (i.e. Steven's kernel has it too), as a workaround against the > crash described below. [ replied in the separate thread ] > so it appears that the -rt kernel is triggering some genuine sysfs race. > [note that it only happens on an SMP kernel, booting an UP kernel or > with maxcpus=1 makes the bug go away.] I have done full kobject > debugging but no conclusive results. Also, that particular crash happens > earliest with PAGEALLOC enabled. [i have packed up the email discussion > related to that crash, and i'm sending it to Maneesh separately. > Maneesh, any ideas or suggestions?] Still waiting for that mail to show up. Looks like this discussion is not on lkml. The kdobject or driver core debugging messages can possibly narrow the problem down to some particular sysfs user like some driver or module and throw some light on how the sysfs calls are being made. > note that Steven has a dual-core Athlon64 X2 system. Steven, do you get > the crash even with maxcpus=1? > > Ingo > -- Maneesh Soni Linux Technology Center, IBM India Software Labs, Bangalore, India email: maneesh@in.ibm.com Phone: 91-80-25044990 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [OOPS] sysfs_hash_and_remove (was Re: What protection ....) 2005-11-23 8:18 ` Ingo Molnar 2005-11-23 12:35 ` Steven Rostedt 2005-11-23 12:50 ` Maneesh Soni @ 2005-11-23 12:52 ` Maneesh Soni 2005-11-24 12:26 ` Maneesh Soni 2006-02-11 0:33 ` Greg KH 2 siblings, 2 replies; 21+ messages in thread From: Maneesh Soni @ 2005-11-23 12:52 UTC (permalink / raw) To: Ingo Molnar; +Cc: Greg KH, Steven Rostedt, LKML On Wed, Nov 23, 2005 at 09:18:45AM +0100, Ingo Molnar wrote: > [..] > on a related note - i've been carrying the patch below in -rt for 2 > months (i.e. Steven's kernel has it too), as a workaround against the > crash described below. > [..] > i'm occasionally getting the crash below on a PREEMPT_RT kernel. Might > be a PREEMPT_RT bug, or might be some sysfs race only visible under > PREEMPT_RT. Any ideas? The crash is at: > > (gdb) list *0xc01a2095 > 0xc01a2095 is in sysfs_hash_and_remove (fs/sysfs/inode.c:229). > 224 } > 225 > 226 void sysfs_hash_and_remove(struct dentry * dir, const char * name) > 227 { > 228 struct sysfs_dirent * sd; > 229 struct sysfs_dirent * parent_sd = dir->d_fsdata; > 230 > 231 if (dir->d_inode == NULL) > 232 /* no inode means this hasn't been made visible yet */ > 233 return; > (gdb) > Looks like here it is crashing due to bogus dentry pointer in the kobject kobj->dentry. Could be some stale pointer? Just got the mbox.. will go thru it more closely.. -- Maneesh Soni Linux Technology Center, IBM India Software Labs, Bangalore, India email: maneesh@in.ibm.com Phone: 91-80-25044990 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....) 2005-11-23 12:52 ` [OOPS] sysfs_hash_and_remove (was Re: What protection ....) Maneesh Soni @ 2005-11-24 12:26 ` Maneesh Soni 2005-11-24 14:34 ` Ingo Molnar 2006-02-11 0:33 ` Greg KH 1 sibling, 1 reply; 21+ messages in thread From: Maneesh Soni @ 2005-11-24 12:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: Greg KH, Steven Rostedt, LKML, James Bottomley On Wed, Nov 23, 2005 at 06:22:13PM +0530, Maneesh Soni wrote: > On Wed, Nov 23, 2005 at 09:18:45AM +0100, Ingo Molnar wrote: > > > [..] > > on a related note - i've been carrying the patch below in -rt for 2 > > months (i.e. Steven's kernel has it too), as a workaround against the > > crash described below. > > > [..] > > > i'm occasionally getting the crash below on a PREEMPT_RT kernel. Might > > be a PREEMPT_RT bug, or might be some sysfs race only visible under > > PREEMPT_RT. Any ideas? The crash is at: > > > > (gdb) list *0xc01a2095 > > 0xc01a2095 is in sysfs_hash_and_remove (fs/sysfs/inode.c:229). > > 224 } > > 225 > > 226 void sysfs_hash_and_remove(struct dentry * dir, const char * name) > > 227 { > > 228 struct sysfs_dirent * sd; > > 229 struct sysfs_dirent * parent_sd = dir->d_fsdata; > > 230 > > 231 if (dir->d_inode == NULL) > > 232 /* no inode means this hasn't been made visible yet */ > > 233 return; > > (gdb) > > > Looks like here it is crashing due to bogus dentry pointer in the kobject > kobj->dentry. Could be some stale pointer? > [From the mbox sent by Ingo] On Sat, Sep 24, 2005 at 09:06:27AM -0500, James Bottomley wrote: > On Sat, 2005-09-24 at 14:36 +0200, Ingo Molnar wrote: > > so to me this seems like that in certain cases the starget->dev gets > > released once too often? Full log attached. > > But that's not what your traces show. The trace shows the target0:0:2 > object going through the last two phases of its lifecycle; remove and > then release (which your trace doesn't show it as having got to). > > The trace also seems to show that the object that is already gone is the > dentry. > > So, based on this, I think it's a dev<->classdev linkage problem. This > is what I think it is: > > Previously, classdevs being interfaces on devs used to (2.6.13 and > before) simply point to their various devs via links. The remove (not > release) order was always dev then classdevs. However, remove is making > the object invisible, so this is actually physically removing pieces of > the sysfs infrastructure. The problem seems to be that the dev sysfs > directory is removed first, followed by the classdevs sysfs dir. Even > though the classdevs sysfs dir contains symlinks into dev, this is fine > because symlinks don't ping the dentries of targets. However, post sysfs symlinks do pin dentry though not directly. The target kobject is pinned (kobject_get) when a symlink pointing to it is created. And as long as the target kobject is alive, the corresponding dentry will remain. The target kobject is un-pinned when the symlink is released. > 2.6.13 greg introduced a backlink from dev->classdev so you can tell > what interfaces exist on a device. In this case, that's the > spi_transport:target0:0:2 dentry in the dev dir. However, the dev dir > removal appears to have released this, and we just don't notice. > > To test the theory, try this patch. However, I have no clue what to do > about this if it truly is the problem ... we have bidirectional cross > object links, so remove order can't be altered to fix it. It looks like > we need to understand why we have to remove the links in the first > place ... if the kobject dir removal does that for us, then they're > unnecessary. yes, sysfs_remove_directory() will remove the contents (including symlinks) first and then remove the directory. But there is some complication in case of bidirectional links. Lets say we have [maneesh@bigbang ingo]$ ls -la a b a: total 8 drwxrwxr-x 2 maneesh maneesh 4096 Nov 23 23:13 . drwxrwxr-x 4 maneesh maneesh 4096 Nov 23 23:12 .. lrwxrwxrwx 1 maneesh maneesh 4 Nov 23 23:13 link-to-b-in-a -> ../b b: total 8 drwxrwxr-x 2 maneesh maneesh 4096 Nov 23 23:13 . drwxrwxr-x 4 maneesh maneesh 4096 Nov 23 23:12 .. lrwxrwxrwx 1 maneesh maneesh 4 Nov 23 23:13 link-to-a-in-b -> ../a so when "link-to-b-in-a" is created a ref is taken for "b" and vice versa. Assuming kobject "a" is deleted first (without removing the symlink "link-to-b-in-a"). Because of extra ref taken while creating "link-to-a-in-b", kobject "a" is not removed and hence sysfs_remove_dir() is also not called. So, IMO, it is necessary to explicitly remove links before unregistering the kobject in case of bidirectional cross symlinks. The patch from James, is working, because it is not creating the cross symlink itself. > James > > diff --git a/drivers/base/class.c b/drivers/base/class.c > --- a/drivers/base/class.c > +++ b/drivers/base/class.c > @@ -520,8 +520,10 @@ int class_device_add(struct class_device > class_name = make_class_name(class_dev); > sysfs_create_link(&class_dev->kobj, > &class_dev->dev->kobj, "device"); > + /* > sysfs_create_link(&class_dev->dev->kobj, &class_dev->kobj, > class_name); > + */ > } > > /* notify any interfaces this device is now here */ > @@ -618,7 +620,9 @@ void class_device_del(struct class_devic > if (class_dev->dev) { > class_name = make_class_name(class_dev); > sysfs_remove_link(&class_dev->kobj, "device"); > + /* > sysfs_remove_link(&class_dev->dev->kobj, class_name); > + */ > } > if (class_dev->devt_attr) > class_device_remove_file(class_dev, class_dev->devt_attr); > -- Maneesh Soni Linux Technology Center, IBM India Software Labs, Bangalore, India email: maneesh@in.ibm.com Phone: 91-80-25044990 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....) 2005-11-24 12:26 ` Maneesh Soni @ 2005-11-24 14:34 ` Ingo Molnar 2005-11-26 22:26 ` James Bottomley 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2005-11-24 14:34 UTC (permalink / raw) To: Maneesh Soni; +Cc: Greg KH, Steven Rostedt, LKML, James Bottomley * Maneesh Soni <maneesh@in.ibm.com> wrote: > So, IMO, it is necessary to explicitly remove links before > unregistering the kobject in case of bidirectional cross symlinks. > > The patch from James, is working, because it is not creating the cross > symlink itself. so, what is your suggestion, what should be done to fix the problem? The patch below: > > + /* > > sysfs_create_link(&class_dev->dev->kobj, &class_dev->kobj, > > class_name); > > + */ > > + /* > > sysfs_remove_link(&class_dev->dev->kobj, class_name); > > + */ isnt fit for upstream inclusion :-) Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....) 2005-11-24 14:34 ` Ingo Molnar @ 2005-11-26 22:26 ` James Bottomley 0 siblings, 0 replies; 21+ messages in thread From: James Bottomley @ 2005-11-26 22:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: Maneesh Soni, Greg KH, Steven Rostedt, LKML On Thu, 2005-11-24 at 15:34 +0100, Ingo Molnar wrote: > * Maneesh Soni <maneesh@in.ibm.com> wrote: > > > So, IMO, it is necessary to explicitly remove links before > > unregistering the kobject in case of bidirectional cross symlinks. > > > > The patch from James, is working, because it is not creating the cross > > symlink itself. > > so, what is your suggestion, what should be done to fix the problem? The > patch below: > isnt fit for upstream inclusion :-) Well, the patch was just intended to confirm the problem diagnosis. The solution Maneesh appears to be advocating to the issue is imposing del ordering, the issue being that device_del is the call that actually removes the directories and symlinks, so all callers have to make sure they've called class_device_del for every class on the device before calling device_del. Since this trigger point happens regardless of references, we can't expect references to get us out of this one, so we'll have to audit the failing code manually. I did find and fix one of these issues in SCSI, but there may be more lurking around ... James ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....) 2005-11-23 12:52 ` [OOPS] sysfs_hash_and_remove (was Re: What protection ....) Maneesh Soni 2005-11-24 12:26 ` Maneesh Soni @ 2006-02-11 0:33 ` Greg KH 2006-02-11 15:46 ` Steven Rostedt 1 sibling, 1 reply; 21+ messages in thread From: Greg KH @ 2006-02-11 0:33 UTC (permalink / raw) To: Maneesh Soni; +Cc: Ingo Molnar, Steven Rostedt, LKML On Wed, Nov 23, 2005 at 06:22:13PM +0530, Maneesh Soni wrote: > On Wed, Nov 23, 2005 at 09:18:45AM +0100, Ingo Molnar wrote: > > > [..] > > on a related note - i've been carrying the patch below in -rt for 2 > > months (i.e. Steven's kernel has it too), as a workaround against the > > crash described below. > > > [..] > > > i'm occasionally getting the crash below on a PREEMPT_RT kernel. Might > > be a PREEMPT_RT bug, or might be some sysfs race only visible under > > PREEMPT_RT. Any ideas? The crash is at: > > > > (gdb) list *0xc01a2095 > > 0xc01a2095 is in sysfs_hash_and_remove (fs/sysfs/inode.c:229). > > 224 } > > 225 > > 226 void sysfs_hash_and_remove(struct dentry * dir, const char * name) > > 227 { > > 228 struct sysfs_dirent * sd; > > 229 struct sysfs_dirent * parent_sd = dir->d_fsdata; > > 230 > > 231 if (dir->d_inode == NULL) > > 232 /* no inode means this hasn't been made visible yet */ > > 233 return; > > (gdb) > > > Looks like here it is crashing due to bogus dentry pointer in the kobject > kobj->dentry. Could be some stale pointer? Did you ever figure anything out here? I'm seeing a lot more reports of this problem lately, especially if you enable slab debugging. For example: http://bugzilla.kernel.org/show_bug.cgi?id=5876 thanks, greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....) 2006-02-11 0:33 ` Greg KH @ 2006-02-11 15:46 ` Steven Rostedt 2006-02-24 1:04 ` Greg KH 0 siblings, 1 reply; 21+ messages in thread From: Steven Rostedt @ 2006-02-11 15:46 UTC (permalink / raw) To: Greg KH; +Cc: Maneesh Soni, Ingo Molnar, LKML On Fri, 10 Feb 2006, Greg KH wrote: > On Wed, Nov 23, 2005 at 06:22:13PM +0530, Maneesh Soni wrote: > > Looks like here it is crashing due to bogus dentry pointer in the kobject > > kobj->dentry. Could be some stale pointer? > > Did you ever figure anything out here? I'm seeing a lot more reports of > this problem lately, especially if you enable slab debugging. For > example: > http://bugzilla.kernel.org/show_bug.cgi?id=5876 > The patch has just been added to the 2.6.16 series so if this does fix the bug, we wont know until someone tries one of the 2.6.16-rc kernels (or later). Have you seen this bug in one of those kernels? -- Steve ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [OOPS] sysfs_hash_and_remove (was Re: What protection ....) 2006-02-11 15:46 ` Steven Rostedt @ 2006-02-24 1:04 ` Greg KH 0 siblings, 0 replies; 21+ messages in thread From: Greg KH @ 2006-02-24 1:04 UTC (permalink / raw) To: Steven Rostedt; +Cc: Maneesh Soni, Ingo Molnar, LKML On Sat, Feb 11, 2006 at 10:46:45AM -0500, Steven Rostedt wrote: > On Fri, 10 Feb 2006, Greg KH wrote: > > > On Wed, Nov 23, 2005 at 06:22:13PM +0530, Maneesh Soni wrote: > > > Looks like here it is crashing due to bogus dentry pointer in the kobject > > > kobj->dentry. Could be some stale pointer? > > > > Did you ever figure anything out here? I'm seeing a lot more reports of > > this problem lately, especially if you enable slab debugging. For > > example: > > http://bugzilla.kernel.org/show_bug.cgi?id=5876 > > > > The patch has just been added to the 2.6.16 series so if this does fix the > bug, we wont know until someone tries one of the 2.6.16-rc kernels (or > later). Ugh, you're right, sorry for the noise. Too many patches floating around here :) > Have you seen this bug in one of those kernels? No I haven't. thanks, greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-23 4:50 ` Maneesh Soni 2005-11-23 8:18 ` Ingo Molnar @ 2005-11-23 12:56 ` Steven Rostedt 2005-11-23 13:58 ` Maneesh Soni 1 sibling, 1 reply; 21+ messages in thread From: Steven Rostedt @ 2005-11-23 12:56 UTC (permalink / raw) To: Maneesh Soni; +Cc: Greg KH, LKML, Ingo Molnar On Wed, 23 Nov 2005, Maneesh Soni wrote: > > The dir operation sysfs_readdir() is called under directory inode's i_sem > taken in vfs_readdir() and create_dir() also takes parent directory inode's > i_sem. So in this case I think following are the relevant steps happening > which look safe to me. > > cpu 0 > vfs_readdir() > down(dir inode i_sem) > sysfs_readdir(dir) > parse through dir->s_dirent s_children list > up(dir inode i_sem) > > > cpu 1 > sysfs_create_dir() > create_dir() > down(parent dir inode i_sem) > lookup_one_len (allocates & makes the new directory dentry visible) > sysfs_make_diret() > sysfs_new_dirent() > attach the new directory s_dirent to parent's s_children list) > up(parent dir inode i_sem) > > > Basically, sysfs_readdir for a directory is protected against any > addition/deletion in the directory by directory inode's i_sem. OK, I missed that, thanks. > > But the bad pointer reference seen in sysfs_readdir() has to be debugged. > Assumption here is that if there is a dentry attached to s_dirent, there > has to be a inode associated becuase negative dentries are not created > in sysfs. Is it possible to get some more information about the recreation > scenario. Could you enable DEBUG printks for lib/kobject.c and > drivers/base/class.c to see the events happening. The bug that I've been fighting in my own kernel is a memory leak. So I started looking into this at what would happen in verious places if an allocation didn't work. In create_dir: error = sysfs_make_dirent(p->d_fsdata, *d, k, mode, SYSFS_DIR); // Above is where the entry is added to the parent link list. if (!error) { error = sysfs_create(*d, mode, init_dir); // If sysfs_create fails to allocate an inode, when below // does the element get removed from the parent? if (!error) { p->d_inode->i_nlink++; (*d)->d_op = &sysfs_dentry_ops; d_rehash(*d); } } if (error && (error != -EEXIST)) { sysfs_put((*d)->d_fsdata); // sysfs_put only seems to handle the kobject portion d_drop(*d); // d_drop handles the unhash } dput(*d); So I'm not sure an error from sysfs_create will remove the object from the link list. In fact, it might be worst since now there's an object on the link list that may no long even be an object. I'll test this by forcing a failure at sysfs_create. -- Steve ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-23 12:56 ` What protection does sysfs_readdir have with SMP/Preemption? Steven Rostedt @ 2005-11-23 13:58 ` Maneesh Soni 2005-11-23 14:15 ` Steven Rostedt 0 siblings, 1 reply; 21+ messages in thread From: Maneesh Soni @ 2005-11-23 13:58 UTC (permalink / raw) To: Steven Rostedt; +Cc: Greg KH, LKML, Ingo Molnar On Wed, Nov 23, 2005 at 07:56:39AM -0500, Steven Rostedt wrote: [..] > > The bug that I've been fighting in my own kernel is a memory leak. So I > started looking into this at what would happen in verious places if an > allocation didn't work. > > In create_dir: > > error = sysfs_make_dirent(p->d_fsdata, *d, k, mode, SYSFS_DIR); > // Above is where the entry is added to the parent link list. > > if (!error) { > error = sysfs_create(*d, mode, init_dir); > // If sysfs_create fails to allocate an inode, when below > // does the element get removed from the parent? > if (!error) { > p->d_inode->i_nlink++; > (*d)->d_op = &sysfs_dentry_ops; > d_rehash(*d); > } > } > if (error && (error != -EEXIST)) { > sysfs_put((*d)->d_fsdata); > // sysfs_put only seems to handle the kobject portion > > d_drop(*d); > // d_drop handles the unhash > } > dput(*d); > > So I'm not sure an error from sysfs_create will remove the object from the > link list. In fact, it might be worst since now there's an object on the > link list that may no long even be an object. > > I'll test this by forcing a failure at sysfs_create. > hmm looks like we got some situation which is not desirable and could lead to bogus sysfs_dirent in the parent list. It may not be the exact problem in this case though, but needs fixing IMO. After sysfs_make_dirent(), the ref count for sysfs dirent will be 2. (one from allocation, and after linking the new dentry to it). On error from sysfs_create(), we do sysfs_put() once, decrementing the ref count to 1. And again when the new dentry for which we couldn't allocate the d_inode, is d_drop()'ed. In sysfs_d_iput() we again sysfs_put(), and decrement the sysfs dirent's ref count to 0, which will be the final sysfs_put(), and it will free the sysfs_dirent but never unlinks it from the parent list. So, parent list could still will having links to the freed sysfs_dirent in its s_children list. so basically list_del_init(&sd->s_sibling) should be done in error path in create_dir(). Could you also put the appended patch in your trial runs.. Thanks Maneesh o Following patch corrects the buggy create_dir() error path. This bug could end up in having a bogus sysfs_dirent in the parent list. Now the newly allocated and linked sysfs_dirent is also un-linked in case of error resulting from sysfs_create() o Many thanks to Steven Rostedt and Ingo Molnar for pointing this out. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> --- linux-2.6.15-rc2-mm1-maneesh/fs/sysfs/dir.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletion(-) diff -puN fs/sysfs/dir.c~fix-create_dir-error-path fs/sysfs/dir.c --- linux-2.6.15-rc2-mm1/fs/sysfs/dir.c~fix-create_dir-error-path 2005-11-23 18:59:36.072449992 +0530 +++ linux-2.6.15-rc2-mm1-maneesh/fs/sysfs/dir.c 2005-11-23 19:07:53.475833184 +0530 @@ -112,7 +112,9 @@ static int create_dir(struct kobject * k } } if (error && (error != -EEXIST)) { - sysfs_put((*d)->d_fsdata); + struct sysfs_dirent *sd = (*d)->d_fsdata; + list_del_init(&sd->s_sibling); + sysfs_put(sd); d_drop(*d); } dput(*d); _ -- Maneesh Soni Linux Technology Center, IBM India Software Labs, Bangalore, India email: maneesh@in.ibm.com Phone: 91-80-25044990 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-23 13:58 ` Maneesh Soni @ 2005-11-23 14:15 ` Steven Rostedt 2005-11-23 14:20 ` Steven Rostedt 2005-11-24 4:16 ` What protection does sysfs_readdir have with SMP/Preemption? Maneesh Soni 0 siblings, 2 replies; 21+ messages in thread From: Steven Rostedt @ 2005-11-23 14:15 UTC (permalink / raw) To: maneesh; +Cc: Greg KH, LKML, Ingo Molnar On Wed, 2005-11-23 at 19:28 +0530, Maneesh Soni wrote: > > hmm looks like we got some situation which is not desirable and could lead > to bogus sysfs_dirent in the parent list. It may not be the exact problem > in this case though, but needs fixing IMO. > > After sysfs_make_dirent(), the ref count for sysfs dirent will be 2. > (one from allocation, and after linking the new dentry to it). On > error from sysfs_create(), we do sysfs_put() once, decrementing the > ref count to 1. And again when the new dentry for which we couldn't > allocate the d_inode, is d_drop()'ed. In sysfs_d_iput() we again > sysfs_put(), and decrement the sysfs dirent's ref count to 0, which will > be the final sysfs_put(), and it will free the sysfs_dirent but never > unlinks it from the parent list. So, parent list could still will having > links to the freed sysfs_dirent in its s_children list. > > so basically list_del_init(&sd->s_sibling) should be done in error path > in create_dir(). > > Could you also put the appended patch in your trial runs.. > I'm already playing around with this. You might want this patch instead. I noticed that if sysfs_make_dirent fails to allocate the sd, then a null will be passed to sysfs_put. But this is not the end of the problems. I'll follow up on that comment right after this. -- Steve Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Index: linux-2.6.15-rc2-git2/fs/sysfs/dir.c =================================================================== --- linux-2.6.15-rc2-git2.orig/fs/sysfs/dir.c 2005-11-23 08:40:33.000000000 -0500 +++ linux-2.6.15-rc2-git2/fs/sysfs/dir.c 2005-11-23 08:52:57.000000000 -0500 @@ -112,7 +112,11 @@ } } if (error && (error != -EEXIST)) { - sysfs_put((*d)->d_fsdata); + struct sysfs_dirent *sd = (*d)->d_fsdata; + if (sd) { + list_del_init(&sd->s_sibling); + sysfs_put(sd); + } d_drop(*d); } dput(*d); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-23 14:15 ` Steven Rostedt @ 2005-11-23 14:20 ` Steven Rostedt 2005-11-23 15:24 ` kobject_register needs return value checks (was: What protection does sysfs_readdir have with SMP/Preemption?) Steven Rostedt 2005-11-24 4:16 ` What protection does sysfs_readdir have with SMP/Preemption? Maneesh Soni 1 sibling, 1 reply; 21+ messages in thread From: Steven Rostedt @ 2005-11-23 14:20 UTC (permalink / raw) To: maneesh; +Cc: Greg KH, LKML, Ingo Molnar On Wed, 2005-11-23 at 09:15 -0500, Steven Rostedt wrote: > > But this is not the end of the problems. I'll follow up on that comment > right after this. Here's what I mean. I'm using the below patch to see what happens on the error cases, and things are still bombing. The patch returns a failure after 30 calls. Even with my previous patch, I'm crashing. PCI: Probing PCI hardware (bus 00) ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 kobject_register failed for CHN1 (-12) [<c01041ee>] dump_stack+0x1e/0x20 [<c02206ab>] kobject_register+0x6b/0x80 [<c0255e33>] acpi_device_register+0x105/0x11b [<c0256b8e>] acpi_add_single_object+0xf6/0x146 [<c0256cee>] acpi_bus_scan+0x110/0x17b [<c03d6297>] acpi_scan_init+0x6b/0x89 [<c03be9e7>] do_initcalls+0x57/0xd0 [<c03bea85>] do_basic_setup+0x25/0x30 [<c01002e5>] init+0x35/0x170 [<c01013e5>] kernel_thread_helper+0x5/0x10 Unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: c01ab6ef *pde = 00000000 Oops: 0000 [#1] PREEMPT Modules linked in: CPU: 0 EIP: 0060:[<c01ab6ef>] Not tainted VLI EFLAGS: 00010296 (2.6.15-rc2-git2) EIP is at create_dir+0xf/0x250 eax: 00000000 ebx: cfe92a0c ecx: cfe8b224 edx: 00000000 esi: cfe92a08 edi: cfe92e08 ebp: cffc1ec0 esp: cffc1e9c ds: 007b es: 007b ss: 0068 Process swapper (pid: 1, threadinfo=cffc0000 task=c127aa50) Stack: 00000000 00000010 cffc1eb8 c023b56b cfff2a40 000000d0 cfe92a08 cfe92a08 cfe92e08 cffc1ee0 c01ab998 cfe92a08 00000000 cfe92a0c cffc1ed8 00000000 00000000 cffc1ef4 c022035f cfe92a08 cfe92a08 fffffffe cffc1f0c c02205db Call Trace: [<c010418b>] show_stack+0xab/0xf0 [<c010437f>] show_registers+0x18f/0x230 [<c01045bd>] die+0xed/0x190 [<c032731a>] do_page_fault+0x33a/0x670 [<c0103df7>] error_code+0x4f/0x54 [<c01ab998>] sysfs_create_dir+0x38/0x80 [<c022035f>] create_dir+0x1f/0x60 [<c02205db>] kobject_add+0x8b/0xf0 [<c0220668>] kobject_register+0x28/0x80 [<c0255e33>] acpi_device_register+0x105/0x11b [<c0256b8e>] acpi_add_single_object+0xf6/0x146 [<c0256cee>] acpi_bus_scan+0x110/0x17b [<c03d6297>] acpi_scan_init+0x6b/0x89 [<c03be9e7>] do_initcalls+0x57/0xd0 [<c03bea85>] do_basic_setup+0x25/0x30 [<c01002e5>] init+0x35/0x170 [<c01013e5>] kernel_thread_helper+0x5/0x10 Code: 37 c0 89 e5 8b 45 08 89 88 8c 00 00 00 31 c0 5d c3 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 18 8b 45 0c 8b 5d 10 <8b> 50 08 ff 4a 70 0f 88 8c 0e 00 00 31 c0 b9 ff ff ff ff 89 df <0>Kernel panic - not syncing: Attempted to kill init! I'm still looking into this, to see who can't handle the -ENOMEM. -- Steve Index: linux-2.6.15-rc2-git2/fs/sysfs/dir.c =================================================================== --- linux-2.6.15-rc2-git2.orig/fs/sysfs/dir.c 2005-10-27 20:02:08.000000000 -0400 +++ linux-2.6.15-rc2-git2/fs/sysfs/dir.c 2005-11-23 08:04:42.000000000 -0500 @@ -98,13 +98,17 @@ { int error; umode_t mode = S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO; + static int breakme = 0; down(&p->d_inode->i_sem); *d = lookup_one_len(n, p, strlen(n)); if (!IS_ERR(*d)) { error = sysfs_make_dirent(p->d_fsdata, *d, k, mode, SYSFS_DIR); if (!error) { - error = sysfs_create(*d, mode, init_dir); + if ((++breakme % 30)) + error = sysfs_create(*d, mode, init_dir); + else + error = -ENOMEM; if (!error) { p->d_inode->i_nlink++; (*d)->d_op = &sysfs_dentry_ops; ^ permalink raw reply [flat|nested] 21+ messages in thread
* kobject_register needs return value checks (was: What protection does sysfs_readdir have with SMP/Preemption?) 2005-11-23 14:20 ` Steven Rostedt @ 2005-11-23 15:24 ` Steven Rostedt 0 siblings, 0 replies; 21+ messages in thread From: Steven Rostedt @ 2005-11-23 15:24 UTC (permalink / raw) To: maneesh Cc: Thibaut VARENE, linux-scsi, linuxraid, Matt Tolentino, len.brown, Ingo Molnar, LKML, Greg KH I'm doing some tests to see what happens on a failure of setting up something in the sysfs, and I've discovered a few areas that don't test the return value of kobject_register. The test was due to a memory problem in a custom kernel that showed that sysfs didn't quite handle the error cases well. I did the following command: find . -name "*.c" ! -type d | xargs grep "kobject_register" I found 27 hits. Of these: 14 - checked the return value. 1 - reference in .mod.c file /* ignore it */ 3 - in comments /* ignore it */ 1 - declaration of actual function /* ignore it */ 1 - EXPORT_SYMBOL /* ignore it */ 1 - inside a printk quote. /* ignore it */ and ... 6 - calls without checking return values. Here are the culprits: ./drivers/acpi/scan.c: kobject_register(&device->kobj); ./drivers/firmware/efivars.c: kobject_register(&new_efivar->kobj); ./drivers/md/md.c: kobject_register(&mddev->kobj); ./drivers/parisc/pdc_stable.c: kobject_register(&entry->kobj); ./fs/partitions/check.c: kobject_register(&p->kobj); ./kernel/params.c: kobject_register(&mk->kobj); Normally, these would not return errors, but in case they do, the kernel should be robust enough to handle it. I tried to CC all the maintainers of the above files. -- Steve ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-23 14:15 ` Steven Rostedt 2005-11-23 14:20 ` Steven Rostedt @ 2005-11-24 4:16 ` Maneesh Soni 2005-11-24 14:32 ` Ingo Molnar 1 sibling, 1 reply; 21+ messages in thread From: Maneesh Soni @ 2005-11-24 4:16 UTC (permalink / raw) To: Steven Rostedt; +Cc: Greg KH, LKML, Ingo Molnar On Wed, Nov 23, 2005 at 09:15:44AM -0500, Steven Rostedt wrote: > On Wed, 2005-11-23 at 19:28 +0530, Maneesh Soni wrote: > > > > > hmm looks like we got some situation which is not desirable and could lead > > to bogus sysfs_dirent in the parent list. It may not be the exact problem > > in this case though, but needs fixing IMO. > > > > After sysfs_make_dirent(), the ref count for sysfs dirent will be 2. > > (one from allocation, and after linking the new dentry to it). On > > error from sysfs_create(), we do sysfs_put() once, decrementing the > > ref count to 1. And again when the new dentry for which we couldn't > > allocate the d_inode, is d_drop()'ed. In sysfs_d_iput() we again > > sysfs_put(), and decrement the sysfs dirent's ref count to 0, which will > > be the final sysfs_put(), and it will free the sysfs_dirent but never > > unlinks it from the parent list. So, parent list could still will having > > links to the freed sysfs_dirent in its s_children list. > > > > so basically list_del_init(&sd->s_sibling) should be done in error path > > in create_dir(). > > > > Could you also put the appended patch in your trial runs.. > > > > I'm already playing around with this. You might want this patch instead. > I noticed that if sysfs_make_dirent fails to allocate the sd, then a > null will be passed to sysfs_put. > > But this is not the end of the problems. I'll follow up on that comment > right after this. > > -- Steve > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org> > > Index: linux-2.6.15-rc2-git2/fs/sysfs/dir.c > =================================================================== > --- linux-2.6.15-rc2-git2.orig/fs/sysfs/dir.c 2005-11-23 08:40:33.000000000 -0500 > +++ linux-2.6.15-rc2-git2/fs/sysfs/dir.c 2005-11-23 08:52:57.000000000 -0500 > @@ -112,7 +112,11 @@ > } > } > if (error && (error != -EEXIST)) { > - sysfs_put((*d)->d_fsdata); > + struct sysfs_dirent *sd = (*d)->d_fsdata; > + if (sd) { > + list_del_init(&sd->s_sibling); > + sysfs_put(sd); > + } > d_drop(*d); > } > dput(*d); > > Agreed. This makes more sense. Thanks Maneesh -- Maneesh Soni Linux Technology Center, IBM India Software Labs, Bangalore, India email: maneesh@in.ibm.com Phone: 91-80-25044990 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: What protection does sysfs_readdir have with SMP/Preemption? 2005-11-24 4:16 ` What protection does sysfs_readdir have with SMP/Preemption? Maneesh Soni @ 2005-11-24 14:32 ` Ingo Molnar 0 siblings, 0 replies; 21+ messages in thread From: Ingo Molnar @ 2005-11-24 14:32 UTC (permalink / raw) To: Maneesh Soni; +Cc: Steven Rostedt, Greg KH, LKML * Maneesh Soni <maneesh@in.ibm.com> wrote: > > I'm already playing around with this. You might want this patch instead. > > I noticed that if sysfs_make_dirent fails to allocate the sd, then a > > null will be passed to sysfs_put. > Agreed. This makes more sense. ok, i've applied Steven's patch. Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2006-02-24 1:05 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-11-22 21:33 What protection does sysfs_readdir have with SMP/Preemption? Steven Rostedt 2005-11-22 21:39 ` Greg KH 2005-11-23 4:50 ` Maneesh Soni 2005-11-23 8:18 ` Ingo Molnar 2005-11-23 12:35 ` Steven Rostedt 2005-11-23 12:54 ` Maneesh Soni 2005-11-23 12:50 ` Maneesh Soni 2005-11-23 12:52 ` [OOPS] sysfs_hash_and_remove (was Re: What protection ....) Maneesh Soni 2005-11-24 12:26 ` Maneesh Soni 2005-11-24 14:34 ` Ingo Molnar 2005-11-26 22:26 ` James Bottomley 2006-02-11 0:33 ` Greg KH 2006-02-11 15:46 ` Steven Rostedt 2006-02-24 1:04 ` Greg KH 2005-11-23 12:56 ` What protection does sysfs_readdir have with SMP/Preemption? Steven Rostedt 2005-11-23 13:58 ` Maneesh Soni 2005-11-23 14:15 ` Steven Rostedt 2005-11-23 14:20 ` Steven Rostedt 2005-11-23 15:24 ` kobject_register needs return value checks (was: What protection does sysfs_readdir have with SMP/Preemption?) Steven Rostedt 2005-11-24 4:16 ` What protection does sysfs_readdir have with SMP/Preemption? Maneesh Soni 2005-11-24 14:32 ` Ingo Molnar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.