* [SYSFS] Kernel Null pointer dereference in sysfs_readdir()
@ 2006-07-12 11:35 Duetsch, Thomas LDE1
2006-07-12 12:06 ` Steven Rostedt
2006-07-12 19:58 ` Maneesh Soni
0 siblings, 2 replies; 9+ messages in thread
From: Duetsch, Thomas LDE1 @ 2006-07-12 11:35 UTC (permalink / raw)
To: linux-kernel; +Cc: maneesh, Steven Rostedt, mingo
Hi,
I'm currently working on a custom kernel based on Ingo's -rt patch
(2.6.16-rt29).
While rebooting my machine, I came across a kernel null pointer
dereference in this code segment in fs/sysfs/dir.c, function
sysfs_readdir():
for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
struct sysfs_dirent *next;
const char * name;
int len;
next = list_entry(p, struct sysfs_dirent,
s_sibling);
if (!next->s_element)
continue;
name = sysfs_get_name(next);
len = strlen(name);
if (next->s_dentry)
PROBLEM -> ino = next->s_dentry->d_inode->i_ino;
else
ino = iunique(sysfs_sb, 2);
Checking the mailing list, I came across this thread:
"What protection does sysfs_readdir have with SMP/Preemption?"
http://lkml.org/lkml/2005/11/22/293
Which handels the exact same problem (And I'm working on the kernel
Steve was working back then).
Reading through your suggestions and solutions, I was wondering, what
would happen if a sysfs file would be deleted instead of created, while
a sysfs_readdir were in progress.
Looking through the code, I don't see, where the parents inode mutex is
taken, to prevent a race condition.
Unfortunately, I can't reproduce the behaviour, nor do I know, which
file was accessed, when this happens.
Like Steve said back then, this might well be a problem in our code, but
since we didn't change the sysfs, maybe it's a vanilla problem as well.
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [SYSFS] Kernel Null pointer dereference in sysfs_readdir()
2006-07-12 11:35 [SYSFS] Kernel Null pointer dereference in sysfs_readdir() Duetsch, Thomas LDE1
@ 2006-07-12 12:06 ` Steven Rostedt
2006-07-12 12:39 ` AW: " Duetsch, Thomas LDE1
2006-07-12 19:58 ` Maneesh Soni
1 sibling, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2006-07-12 12:06 UTC (permalink / raw)
To: Duetsch, Thomas LDE1; +Cc: linux-kernel, maneesh, mingo
On Wed, 2006-07-12 at 13:35 +0200, Duetsch, Thomas LDE1 wrote:
> Hi,
>
> I'm currently working on a custom kernel based on Ingo's -rt patch
> (2.6.16-rt29).
>
> While rebooting my machine, I came across a kernel null pointer
> dereference in this code segment in fs/sysfs/dir.c, function
> sysfs_readdir():
>
> for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
> struct sysfs_dirent *next;
> const char * name;
> int len;
>
> next = list_entry(p, struct sysfs_dirent,
> s_sibling);
> if (!next->s_element)
> continue;
>
> name = sysfs_get_name(next);
> len = strlen(name);
> if (next->s_dentry)
> PROBLEM -> ino = next->s_dentry->d_inode->i_ino;
> else
> ino = iunique(sysfs_sb, 2);
>
Hi Thomas,
Do you have a backtrace to look at? It might be helpful to see what
functions brought us to this point. Also it might help to determine if
the problem is vanilla, -rt, or the custom kernel.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* AW: [SYSFS] Kernel Null pointer dereference in sysfs_readdir()
2006-07-12 12:06 ` Steven Rostedt
@ 2006-07-12 12:39 ` Duetsch, Thomas LDE1
2006-07-12 14:06 ` Steven Rostedt
0 siblings, 1 reply; 9+ messages in thread
From: Duetsch, Thomas LDE1 @ 2006-07-12 12:39 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-kernel, maneesh, mingo
Steven Rostedt wrote:
> On Wed, 2006-07-12 at 13:35 +0200, Duetsch, Thomas LDE1 wrote:
>> Hi,
>>
>> I'm currently working on a custom kernel based on Ingo's -rt patch
>> (2.6.16-rt29).
>>
>> While rebooting my machine, I came across a kernel null pointer
>> dereference in this code segment in fs/sysfs/dir.c, function
>> sysfs_readdir():
>>
>> for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
struct
>> sysfs_dirent *next; const char * name;
>> int len;
>>
>> next = list_entry(p, struct sysfs_dirent,
>> s_sibling);
>> if (!next->s_element)
>> continue;
>>
>> name = sysfs_get_name(next);
>> len = strlen(name);
>> if (next->s_dentry)
>> PROBLEM -> ino = next->s_dentry->d_inode->i_ino;
>> else
>> ino = iunique(sysfs_sb, 2);
>>
>
> Hi Thomas,
>
> Do you have a backtrace to look at? It might be helpful to see what
> functions brought us to this point. Also it might help to determine if
> the problem is vanilla, -rt, or the custom kernel.
>
> Thanks,
>
> -- Steve
Of course, here you go:
<1>BUG: Unable to handle kernel NULL pointer dereference at virtual
address 00000020
<1> printing eip:
<4>c01af187
<1>*pde = 00000000
<0>Oops: 0000 [#1]
<0>PREEMPT
<4>cpu=0 current find:2667
<4>current resource_set: 0
<4> [<c0103fb1>] show_trace+0x21/0x30 (20)
<4> [<c01040ee>] dump_stack+0x1e/0x30 (20)
<4> [<c01044ca>] die+0xba/0x1b0 (68)
<4> [<c01146ed>] do_page_fault+0x37d/0x6d0 (96)
<4> [<c0103b0b>] error_code+0x4f/0x54 (120)
<4> [<c0182a60>] vfs_readdir+0xa0/0xc0 (36)
<4> [<c0182e85>] sys_getdents64+0x75/0xd0 (64)
<4> [<c0102fd9>] syscall_call+0x7/0xb (-8116)
<4>Modules linked in: ide_dump diskdump ipv6 eeprom adm1021 hwmon
i2c_piix4 i2c_core NPCI pcnet32 e100 unix
<0>CPU: 0
<4>EIP: 0060:[<c01af187>] Tainted: P VLI
<4>EFLAGS: 00010286 (2.6.16-rt29-rs_1.0 #1)
<0>EIP is at sysfs_readdir+0xd7/0x240
<0>eax: 00000000 ebx: cfd095a0 ecx: 00000005 edx: cf02be88
<0>esi: cfd0959c edi: ce150c16 ebp: c35b5f50 esp: c35b5f14
<0>ds: 007b es: 007b ss: 0068 preempt: 00000001
<0>Process find (pid: 2667, threadinfo=c35b4000 task=c3686730
wchan=00000000) stack_left=7900 worst_left=-1)
<0>Stack: <0>cfd0959c c3618e60 00000004 00000003 00000000 00000d74
00000004 cfe46a58
<0> 00000005 ce150c10 cf02bb68 cfe46a4c ca844f20 fffffffe cfe47e40
c35b5f74
<0> c0182a60 ca844f20 c35b5f98 c0182d20 cfe47ec8 00001000 ca844f20
c0182d20
<0>Call Trace:
<4>cpu=0 current find:2667
<4>current resource_set: 0
<0> [<c010406a>] show_stack_log_lvl+0xaa/0xe0 (32)
<0> [<c01042ca>] show_registers+0x1ca/0x250 (64)
<0> [<c0104508>] die+0xf8/0x1b0 (68)
<0> [<c01146ed>] do_page_fault+0x37d/0x6d0 (96)
<0> [<c0103b0b>] error_code+0x4f/0x54 (120)
<0> [<c0182a60>] vfs_readdir+0xa0/0xc0 (36)
<0> [<c0182e85>] sys_getdents64+0x75/0xd0 (64)
<0> [<c0102fd9>] syscall_call+0x7/0xb (-8116)
<0>Code: 8d 74 26 00 89 34 24 e8 68 e4 ff ff 89 45 e8 b9 ff ff ff ff 31
c0 8b 7d e8 f2 ae f7 d1 49 89 4d e4 8b 46 20 85 c0 74 74 8b 40 20 <8b>
50 20
0f b7 46 1c 89 54 24 14 8b 4d 08 c1 e8 0c 89 44 24 18
I hope that helps,
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [SYSFS] Kernel Null pointer dereference in sysfs_readdir()
2006-07-12 12:39 ` AW: " Duetsch, Thomas LDE1
@ 2006-07-12 14:06 ` Steven Rostedt
2006-07-12 19:57 ` Maneesh Soni
0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2006-07-12 14:06 UTC (permalink / raw)
To: Duetsch, Thomas LDE1
Cc: linux-kernel, maneesh, mingo, Patrick Mochel, Andrew Morton
[ added Patrick since his name is on the sysfs documentation ]
[ added Andrew since I am including a patch for 2.6.18 ]
On Wed, 2006-07-12 at 14:39 +0200, Duetsch, Thomas LDE1 wrote:
> Steven Rostedt wrote:
> > On Wed, 2006-07-12 at 13:35 +0200, Duetsch, Thomas LDE1 wrote:
> >> Hi,
> >>
> >> I'm currently working on a custom kernel based on Ingo's -rt patch
> >> (2.6.16-rt29).
> >>
> >> While rebooting my machine, I came across a kernel null pointer
> >> dereference in this code segment in fs/sysfs/dir.c, function
> >> sysfs_readdir():
> >>
> >> for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
> struct
> >> sysfs_dirent *next; const char * name;
> >> int len;
> >>
> >> next = list_entry(p, struct sysfs_dirent,
> >> s_sibling);
> >> if (!next->s_element)
> >> continue;
> >>
> >> name = sysfs_get_name(next);
> >> len = strlen(name);
> >> if (next->s_dentry)
> >> PROBLEM -> ino = next->s_dentry->d_inode->i_ino;
> >> else
> >> ino = iunique(sysfs_sb, 2);
> >>
> >
> > Hi Thomas,
> >
> > Do you have a backtrace to look at? It might be helpful to see what
> > functions brought us to this point. Also it might help to determine if
> > the problem is vanilla, -rt, or the custom kernel.
> >
> > Thanks,
> >
> > -- Steve
>
> Of course, here you go:
>
> <1>BUG: Unable to handle kernel NULL pointer dereference at virtual
> address 00000020
> <1> printing eip:
> <4>c01af187
> <1>*pde = 00000000
> <0>Oops: 0000 [#1]
> <0>PREEMPT
> <4>cpu=0 current find:2667
> <4>current resource_set: 0
> <4> [<c0103fb1>] show_trace+0x21/0x30 (20)
> <4> [<c01040ee>] dump_stack+0x1e/0x30 (20)
> <4> [<c01044ca>] die+0xba/0x1b0 (68)
> <4> [<c01146ed>] do_page_fault+0x37d/0x6d0 (96)
> <4> [<c0103b0b>] error_code+0x4f/0x54 (120)
> <4> [<c0182a60>] vfs_readdir+0xa0/0xc0 (36)
> <4> [<c0182e85>] sys_getdents64+0x75/0xd0 (64)
> <4> [<c0102fd9>] syscall_call+0x7/0xb (-8116)
> <4>Modules linked in: ide_dump diskdump ipv6 eeprom adm1021 hwmon
> i2c_piix4 i2c_core NPCI pcnet32 e100 unix
> <0>CPU: 0
> <4>EIP: 0060:[<c01af187>] Tainted: P VLI
> <4>EFLAGS: 00010286 (2.6.16-rt29-rs_1.0 #1)
> <0>EIP is at sysfs_readdir+0xd7/0x240
> <0>eax: 00000000 ebx: cfd095a0 ecx: 00000005 edx: cf02be88
> <0>esi: cfd0959c edi: ce150c16 ebp: c35b5f50 esp: c35b5f14
> <0>ds: 007b es: 007b ss: 0068 preempt: 00000001
> <0>Process find (pid: 2667, threadinfo=c35b4000 task=c3686730
> wchan=00000000) stack_left=7900 worst_left=-1)
> <0>Stack: <0>cfd0959c c3618e60 00000004 00000003 00000000 00000d74
> 00000004 cfe46a58
> <0> 00000005 ce150c10 cf02bb68 cfe46a4c ca844f20 fffffffe cfe47e40
> c35b5f74
> <0> c0182a60 ca844f20 c35b5f98 c0182d20 cfe47ec8 00001000 ca844f20
> c0182d20
> <0>Call Trace:
> <4>cpu=0 current find:2667
> <4>current resource_set: 0
> <0> [<c010406a>] show_stack_log_lvl+0xaa/0xe0 (32)
> <0> [<c01042ca>] show_registers+0x1ca/0x250 (64)
> <0> [<c0104508>] die+0xf8/0x1b0 (68)
> <0> [<c01146ed>] do_page_fault+0x37d/0x6d0 (96)
> <0> [<c0103b0b>] error_code+0x4f/0x54 (120)
> <0> [<c0182a60>] vfs_readdir+0xa0/0xc0 (36)
> <0> [<c0182e85>] sys_getdents64+0x75/0xd0 (64)
> <0> [<c0102fd9>] syscall_call+0x7/0xb (-8116)
> <0>Code: 8d 74 26 00 89 34 24 e8 68 e4 ff ff 89 45 e8 b9 ff ff ff ff 31
> c0 8b 7d e8 f2 ae f7 d1 49 89 4d e4 8b 46 20 85 c0 74 74 8b 40 20 <8b>
> 50 20
> 0f b7 46 1c 89 54 24 14 8b 4d 08 c1 e8 0c 89 44 24 18
>
> I hope that helps,
>
I believe I found the race. Here's the problem:
There's no real protection in the sysfs_readdir regarding that for loop.
So the if statement
if (next->s_dentry)
ino = next->s_dentry->d_inode->i_ino;
has the race.
We _can_ have a s_dentry without a d_inode! Reason is that in attaching an
attribute we have: in fs/sysfs/dir.c sysfs_attach_attr()
sd->s_dentry = dentry;
error = sysfs_create(dentry, (attr->mode & S_IALLUGO) | S_IFREG, init);
Where the dentry->d_inode can be NULL. I don't see any protection in
this code to prevent this from happening.
Now the question is, is it safe to add the test for s_dentry->d_inode too.
I ask this because the s_dentry is in the process of being filled, and
I don't know what effect this will have on what readdir wants. I guess
it may be safe, so I'm giving this patch:
-- Steve
Description:
In the process of creating a sysfs attribute, we can have a state
where the sysfs descriptor can have a dentry with a NULL inode.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.18-rc1/fs/sysfs/dir.c
===================================================================
--- linux-2.6.18-rc1.orig/fs/sysfs/dir.c 2006-07-12 09:43:10.000000000 -0400
+++ linux-2.6.18-rc1/fs/sysfs/dir.c 2006-07-12 10:01:18.000000000 -0400
@@ -445,7 +445,7 @@ static int sysfs_readdir(struct file * f
name = sysfs_get_name(next);
len = strlen(name);
- if (next->s_dentry)
+ if (next->s_dentry && next->s_dentry->d_inode)
ino = next->s_dentry->d_inode->i_ino;
else
ino = iunique(sysfs_sb, 2);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [SYSFS] Kernel Null pointer dereference in sysfs_readdir()
2006-07-12 14:06 ` Steven Rostedt
@ 2006-07-12 19:57 ` Maneesh Soni
2006-07-12 20:28 ` Steven Rostedt
0 siblings, 1 reply; 9+ messages in thread
From: Maneesh Soni @ 2006-07-12 19:57 UTC (permalink / raw)
To: Steven Rostedt
Cc: Duetsch, Thomas LDE1, linux-kernel, mingo, Patrick Mochel,
Andrew Morton
On Wed, Jul 12, 2006 at 10:06:19AM -0400, Steven Rostedt wrote:
> [ added Patrick since his name is on the sysfs documentation ]
> [ added Andrew since I am including a patch for 2.6.18 ]
>
> On Wed, 2006-07-12 at 14:39 +0200, Duetsch, Thomas LDE1 wrote:
> > Steven Rostedt wrote:
> > > On Wed, 2006-07-12 at 13:35 +0200, Duetsch, Thomas LDE1 wrote:
> > >> Hi,
> > >>
> > >> I'm currently working on a custom kernel based on Ingo's -rt patch
> > >> (2.6.16-rt29).
> > >>
> > >> While rebooting my machine, I came across a kernel null pointer
> > >> dereference in this code segment in fs/sysfs/dir.c, function
> > >> sysfs_readdir():
> > >>
> > >> for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
> > struct
> > >> sysfs_dirent *next; const char * name;
> > >> int len;
> > >>
> > >> next = list_entry(p, struct sysfs_dirent,
> > >> s_sibling);
> > >> if (!next->s_element)
> > >> continue;
> > >>
> > >> name = sysfs_get_name(next);
> > >> len = strlen(name);
> > >> if (next->s_dentry)
> > >> PROBLEM -> ino = next->s_dentry->d_inode->i_ino;
> > >> else
> > >> ino = iunique(sysfs_sb, 2);
> > >>
> > >
> > > Hi Thomas,
> > >
> > > Do you have a backtrace to look at? It might be helpful to see what
> > > functions brought us to this point. Also it might help to determine if
> > > the problem is vanilla, -rt, or the custom kernel.
> > >
> > > Thanks,
> > >
> > > -- Steve
> >
> > Of course, here you go:
> >
> > <1>BUG: Unable to handle kernel NULL pointer dereference at virtual
> > address 00000020
> > <1> printing eip:
> > <4>c01af187
> > <1>*pde = 00000000
> > <0>Oops: 0000 [#1]
> > <0>PREEMPT
> > <4>cpu=0 current find:2667
> > <4>current resource_set: 0
> > <4> [<c0103fb1>] show_trace+0x21/0x30 (20)
> > <4> [<c01040ee>] dump_stack+0x1e/0x30 (20)
> > <4> [<c01044ca>] die+0xba/0x1b0 (68)
> > <4> [<c01146ed>] do_page_fault+0x37d/0x6d0 (96)
> > <4> [<c0103b0b>] error_code+0x4f/0x54 (120)
> > <4> [<c0182a60>] vfs_readdir+0xa0/0xc0 (36)
> > <4> [<c0182e85>] sys_getdents64+0x75/0xd0 (64)
> > <4> [<c0102fd9>] syscall_call+0x7/0xb (-8116)
> > <4>Modules linked in: ide_dump diskdump ipv6 eeprom adm1021 hwmon
> > i2c_piix4 i2c_core NPCI pcnet32 e100 unix
> > <0>CPU: 0
> > <4>EIP: 0060:[<c01af187>] Tainted: P VLI
> > <4>EFLAGS: 00010286 (2.6.16-rt29-rs_1.0 #1)
> > <0>EIP is at sysfs_readdir+0xd7/0x240
> > <0>eax: 00000000 ebx: cfd095a0 ecx: 00000005 edx: cf02be88
> > <0>esi: cfd0959c edi: ce150c16 ebp: c35b5f50 esp: c35b5f14
> > <0>ds: 007b es: 007b ss: 0068 preempt: 00000001
> > <0>Process find (pid: 2667, threadinfo=c35b4000 task=c3686730
> > wchan=00000000) stack_left=7900 worst_left=-1)
> > <0>Stack: <0>cfd0959c c3618e60 00000004 00000003 00000000 00000d74
> > 00000004 cfe46a58
> > <0> 00000005 ce150c10 cf02bb68 cfe46a4c ca844f20 fffffffe cfe47e40
> > c35b5f74
> > <0> c0182a60 ca844f20 c35b5f98 c0182d20 cfe47ec8 00001000 ca844f20
> > c0182d20
> > <0>Call Trace:
> > <4>cpu=0 current find:2667
> > <4>current resource_set: 0
> > <0> [<c010406a>] show_stack_log_lvl+0xaa/0xe0 (32)
> > <0> [<c01042ca>] show_registers+0x1ca/0x250 (64)
> > <0> [<c0104508>] die+0xf8/0x1b0 (68)
> > <0> [<c01146ed>] do_page_fault+0x37d/0x6d0 (96)
> > <0> [<c0103b0b>] error_code+0x4f/0x54 (120)
> > <0> [<c0182a60>] vfs_readdir+0xa0/0xc0 (36)
> > <0> [<c0182e85>] sys_getdents64+0x75/0xd0 (64)
> > <0> [<c0102fd9>] syscall_call+0x7/0xb (-8116)
> > <0>Code: 8d 74 26 00 89 34 24 e8 68 e4 ff ff 89 45 e8 b9 ff ff ff ff 31
> > c0 8b 7d e8 f2 ae f7 d1 49 89 4d e4 8b 46 20 85 c0 74 74 8b 40 20 <8b>
> > 50 20
> > 0f b7 46 1c 89 54 24 14 8b 4d 08 c1 e8 0c 89 44 24 18
> >
> > I hope that helps,
> >
>
> I believe I found the race. Here's the problem:
>
> There's no real protection in the sysfs_readdir regarding that for loop.
> So the if statement
>
> if (next->s_dentry)
> ino = next->s_dentry->d_inode->i_ino;
>
> has the race.
>
> We _can_ have a s_dentry without a d_inode! Reason is that in attaching an
> attribute we have: in fs/sysfs/dir.c sysfs_attach_attr()
>
> sd->s_dentry = dentry;
> error = sysfs_create(dentry, (attr->mode & S_IALLUGO) | S_IFREG, init);
>
> Where the dentry->d_inode can be NULL. I don't see any protection in
> this code to prevent this from happening.
>
sysfs_attach_attr() is called from sysfs_lookup() only, and which in turn
is called under parent inode's i_mutex from VFS layer.
But you did help in spotting a bug which could happen like this
i_mutext held
sysfs_lookup()
-->sysfs_attach_attr()
--> sysfs_create() fails
--> sd->s_dentry has a NULL d_inode
--> sysfs_put() frees the sysfs_dirent
--> error returned to lookup
i_mutex released
But the sysfs_dirent with NULL d_inode is never unlinked from
the parent sysfs_dirent. And later on this happens
vfs_readdir()
i_mutex held
-->sysfs_readdir()
--> trips on the freed sysfs_dirent with NULL inode.
I am not sure if it is possible for other thread to see the freed
sysfs_dirent and trip at sd->s_dentry->d_inode but the sysfs_dirent
should have been unlinked from the parent sysfs_dirent's s_children list.
> Now the question is, is it safe to add the test for s_dentry->d_inode too.
> I ask this because the s_dentry is in the process of being filled, and
> I don't know what effect this will have on what readdir wants. I guess
> it may be safe, so I'm giving this patch:
>
> -- Steve
>
>
> Description:
>
> In the process of creating a sysfs attribute, we can have a state
> where the sysfs descriptor can have a dentry with a NULL inode.
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>
> Index: linux-2.6.18-rc1/fs/sysfs/dir.c
> ===================================================================
> --- linux-2.6.18-rc1.orig/fs/sysfs/dir.c 2006-07-12 09:43:10.000000000 -0400
> +++ linux-2.6.18-rc1/fs/sysfs/dir.c 2006-07-12 10:01:18.000000000 -0400
> @@ -445,7 +445,7 @@ static int sysfs_readdir(struct file * f
>
> name = sysfs_get_name(next);
> len = strlen(name);
> - if (next->s_dentry)
> + if (next->s_dentry && next->s_dentry->d_inode)
> ino = next->s_dentry->d_inode->i_ino;
> else
> ino = iunique(sysfs_sb, 2);
>
I think this patch only fixes the sympton. I have tried to keep the
assumption of no negative dentries (dentries with NULL d_inode) valid
in sysfs. So, this does indicate a bug.
Could you please try the following patch instead
o Unlink the half baked sysfs_dirent if sysfs_create() fails.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
---
linux-2.6.18-rc1-git5-maneesh/fs/sysfs/dir.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletion(-)
diff -puN fs/sysfs/dir.c~fix-sysfs_create-errors fs/sysfs/dir.c
--- linux-2.6.18-rc1-git5/fs/sysfs/dir.c~fix-sysfs_create-errors 2006-07-13 01:21:25.000000000 +0530
+++ linux-2.6.18-rc1-git5-maneesh/fs/sysfs/dir.c 2006-07-13 01:21:25.000000000 +0530
@@ -212,6 +212,7 @@ static int sysfs_attach_attr(struct sysf
sd->s_dentry = dentry;
error = sysfs_create(dentry, (attr->mode & S_IALLUGO) | S_IFREG, init);
if (error) {
+ list_del_init(&sd->s_sibling);
sysfs_put(sd);
return error;
}
@@ -236,8 +237,10 @@ static int sysfs_attach_link(struct sysf
if (!err) {
dentry->d_op = &sysfs_dentry_ops;
d_rehash(dentry);
- } else
+ } else {
+ list_del_init(&sd->s_sibling);
sysfs_put(sd);
+ }
return err;
}
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [SYSFS] Kernel Null pointer dereference in sysfs_readdir()
2006-07-12 11:35 [SYSFS] Kernel Null pointer dereference in sysfs_readdir() Duetsch, Thomas LDE1
2006-07-12 12:06 ` Steven Rostedt
@ 2006-07-12 19:58 ` Maneesh Soni
1 sibling, 0 replies; 9+ messages in thread
From: Maneesh Soni @ 2006-07-12 19:58 UTC (permalink / raw)
To: Duetsch, Thomas LDE1; +Cc: linux-kernel, Steven Rostedt, mingo
On Wed, Jul 12, 2006 at 01:35:50PM +0200, Duetsch, Thomas LDE1 wrote:
> Hi,
>
> I'm currently working on a custom kernel based on Ingo's -rt patch
> (2.6.16-rt29).
>
> While rebooting my machine, I came across a kernel null pointer
> dereference in this code segment in fs/sysfs/dir.c, function
> sysfs_readdir():
>
> for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
> struct sysfs_dirent *next;
> const char * name;
> int len;
>
> next = list_entry(p, struct sysfs_dirent,
> s_sibling);
> if (!next->s_element)
> continue;
>
> name = sysfs_get_name(next);
> len = strlen(name);
> if (next->s_dentry)
> PROBLEM -> ino = next->s_dentry->d_inode->i_ino;
> else
> ino = iunique(sysfs_sb, 2);
>
> Checking the mailing list, I came across this thread:
> "What protection does sysfs_readdir have with SMP/Preemption?"
> http://lkml.org/lkml/2005/11/22/293
> Which handels the exact same problem (And I'm working on the kernel
> Steve was working back then).
> Reading through your suggestions and solutions, I was wondering, what
> would happen if a sysfs file would be deleted instead of created, while
> a sysfs_readdir were in progress.
> Looking through the code, I don't see, where the parents inode mutex is
> taken, to prevent a race condition.
>
parent's inode mutex is taken in vfs_readdir() in case of sysfs_readdir() call
and in sysfs_hash_and_remove() in case of delete path.
> Unfortunately, I can't reproduce the behaviour, nor do I know, which
> file was accessed, when this happens.
>
> Like Steve said back then, this might well be a problem in our code, but
>
> since we didn't change the sysfs, maybe it's a vanilla problem as well.
>
I thought at that time the issue was due to bug in error path which got
fixed by Steve's patch.
Thanks
Maneesh
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [SYSFS] Kernel Null pointer dereference in sysfs_readdir()
2006-07-12 19:57 ` Maneesh Soni
@ 2006-07-12 20:28 ` Steven Rostedt
2006-07-12 20:39 ` Maneesh Soni
0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2006-07-12 20:28 UTC (permalink / raw)
To: maneesh
Cc: Duetsch, Thomas LDE1, linux-kernel, mingo, Patrick Mochel,
Andrew Morton
On Thu, 2006-07-13 at 01:27 +0530, Maneesh Soni wrote:
>
> sysfs_attach_attr() is called from sysfs_lookup() only, and which in turn
> is called under parent inode's i_mutex from VFS layer.
Ah, I didn't see the parent mutex lock.
Does sysfs support hard links? Where an inode may belong to two
different dentrys?
>
> But you did help in spotting a bug which could happen like this
>
> i_mutext held
> sysfs_lookup()
> -->sysfs_attach_attr()
> --> sysfs_create() fails
> --> sd->s_dentry has a NULL d_inode
> --> sysfs_put() frees the sysfs_dirent
> --> error returned to lookup
> i_mutex released
>
> But the sysfs_dirent with NULL d_inode is never unlinked from
> the parent sysfs_dirent. And later on this happens
But doesn't this only happen in case of no memory?
Thomas, is the system running low on memory?
>
> vfs_readdir()
> i_mutex held
> -->sysfs_readdir()
> --> trips on the freed sysfs_dirent with NULL inode.
>
> I am not sure if it is possible for other thread to see the freed
> sysfs_dirent and trip at sd->s_dentry->d_inode but the sysfs_dirent
> should have been unlinked from the parent sysfs_dirent's s_children list.
>
> > Now the question is, is it safe to add the test for s_dentry->d_inode too.
> > I ask this because the s_dentry is in the process of being filled, and
> > I don't know what effect this will have on what readdir wants. I guess
> > it may be safe, so I'm giving this patch:
> >
> > -- Steve
> >
> >
> > Description:
> >
> > In the process of creating a sysfs attribute, we can have a state
> > where the sysfs descriptor can have a dentry with a NULL inode.
> >
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> >
> > Index: linux-2.6.18-rc1/fs/sysfs/dir.c
> > ===================================================================
> > --- linux-2.6.18-rc1.orig/fs/sysfs/dir.c 2006-07-12 09:43:10.000000000 -0400
> > +++ linux-2.6.18-rc1/fs/sysfs/dir.c 2006-07-12 10:01:18.000000000 -0400
> > @@ -445,7 +445,7 @@ static int sysfs_readdir(struct file * f
> >
> > name = sysfs_get_name(next);
> > len = strlen(name);
> > - if (next->s_dentry)
> > + if (next->s_dentry && next->s_dentry->d_inode)
> > ino = next->s_dentry->d_inode->i_ino;
> > else
> > ino = iunique(sysfs_sb, 2);
> >
>
> I think this patch only fixes the sympton. I have tried to keep the
> assumption of no negative dentries (dentries with NULL d_inode) valid
> in sysfs. So, this does indicate a bug.
Something else that might help is knowing what the other tasks where
doing at the time. Thomas, do you also have the task dump? You can
send that to me offline if you like.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [SYSFS] Kernel Null pointer dereference in sysfs_readdir()
2006-07-12 20:28 ` Steven Rostedt
@ 2006-07-12 20:39 ` Maneesh Soni
2006-07-13 2:59 ` Steven Rostedt
0 siblings, 1 reply; 9+ messages in thread
From: Maneesh Soni @ 2006-07-12 20:39 UTC (permalink / raw)
To: Steven Rostedt
Cc: Duetsch, Thomas LDE1, linux-kernel, mingo, Patrick Mochel,
Andrew Morton
On Wed, Jul 12, 2006 at 04:28:38PM -0400, Steven Rostedt wrote:
> On Thu, 2006-07-13 at 01:27 +0530, Maneesh Soni wrote:
>
> >
> > sysfs_attach_attr() is called from sysfs_lookup() only, and which in turn
> > is called under parent inode's i_mutex from VFS layer.
>
> Ah, I didn't see the parent mutex lock.
>
> Does sysfs support hard links? Where an inode may belong to two
> different dentrys?
>
No, only symbolic links are supported.
> >
> > But you did help in spotting a bug which could happen like this
> >
> > i_mutext held
> > sysfs_lookup()
> > -->sysfs_attach_attr()
> > --> sysfs_create() fails
> > --> sd->s_dentry has a NULL d_inode
> > --> sysfs_put() frees the sysfs_dirent
> > --> error returned to lookup
> > i_mutex released
> >
> > But the sysfs_dirent with NULL d_inode is never unlinked from
> > the parent sysfs_dirent. And later on this happens
>
> But doesn't this only happen in case of no memory?
>
Right, but this is a bug anyway.
> Thomas, is the system running low on memory?
>
> >
> > vfs_readdir()
> > i_mutex held
> > -->sysfs_readdir()
> > --> trips on the freed sysfs_dirent with NULL inode.
> >
> > I am not sure if it is possible for other thread to see the freed
> > sysfs_dirent and trip at sd->s_dentry->d_inode but the sysfs_dirent
> > should have been unlinked from the parent sysfs_dirent's s_children list.
> >
> > > Now the question is, is it safe to add the test for s_dentry->d_inode too.
> > > I ask this because the s_dentry is in the process of being filled, and
> > > I don't know what effect this will have on what readdir wants. I guess
> > > it may be safe, so I'm giving this patch:
> > >
> > > -- Steve
> > >
> > >
> > > Description:
> > >
> > > In the process of creating a sysfs attribute, we can have a state
> > > where the sysfs descriptor can have a dentry with a NULL inode.
> > >
> > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > >
> > > Index: linux-2.6.18-rc1/fs/sysfs/dir.c
> > > ===================================================================
> > > --- linux-2.6.18-rc1.orig/fs/sysfs/dir.c 2006-07-12 09:43:10.000000000 -0400
> > > +++ linux-2.6.18-rc1/fs/sysfs/dir.c 2006-07-12 10:01:18.000000000 -0400
> > > @@ -445,7 +445,7 @@ static int sysfs_readdir(struct file * f
> > >
> > > name = sysfs_get_name(next);
> > > len = strlen(name);
> > > - if (next->s_dentry)
> > > + if (next->s_dentry && next->s_dentry->d_inode)
> > > ino = next->s_dentry->d_inode->i_ino;
> > > else
> > > ino = iunique(sysfs_sb, 2);
> > >
> >
> > I think this patch only fixes the sympton. I have tried to keep the
> > assumption of no negative dentries (dentries with NULL d_inode) valid
> > in sysfs. So, this does indicate a bug.
>
> Something else that might help is knowing what the other tasks where
> doing at the time. Thomas, do you also have the task dump? You can
> send that to me offline if you like.
>
yup... please cc me also.
thanks
Maneesh
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [SYSFS] Kernel Null pointer dereference in sysfs_readdir()
2006-07-12 20:39 ` Maneesh Soni
@ 2006-07-13 2:59 ` Steven Rostedt
0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2006-07-13 2:59 UTC (permalink / raw)
To: maneesh; +Cc: Duetsch, Thomas LDE1, linux-kernel, mingo, Andrew Morton,
Greg KH
[ removed Patrick, added Greg KH since I ignorantly looked at the file
for maintainer instead of the MAINTAINERS file. Well, I actually did
look at MAINTAINERS but I stupidly did a case sensitive search for
"sysfs" instead of SYSFS ]
On Thu, 2006-07-13 at 02:09 +0530, Maneesh Soni wrote:
> On Wed, Jul 12, 2006 at 04:28:38PM -0400, Steven Rostedt wrote:
> > On Thu, 2006-07-13 at 01:27 +0530, Maneesh Soni wrote:
> >
> > >
> > > sysfs_attach_attr() is called from sysfs_lookup() only, and which in turn
> > > is called under parent inode's i_mutex from VFS layer.
> >
> > Ah, I didn't see the parent mutex lock.
> >
> > Does sysfs support hard links? Where an inode may belong to two
> > different dentrys?
> >
> No, only symbolic links are supported.
>
> > >
> > > But you did help in spotting a bug which could happen like this
> > >
> > > i_mutext held
> > > sysfs_lookup()
> > > -->sysfs_attach_attr()
> > > --> sysfs_create() fails
> > > --> sd->s_dentry has a NULL d_inode
> > > --> sysfs_put() frees the sysfs_dirent
> > > --> error returned to lookup
> > > i_mutex released
> > >
> > > But the sysfs_dirent with NULL d_inode is never unlinked from
> > > the parent sysfs_dirent. And later on this happens
> >
> > But doesn't this only happen in case of no memory?
> >
> Right, but this is a bug anyway.
>
> > Thomas, is the system running low on memory?
> >
> > >
> > > vfs_readdir()
> > > i_mutex held
> > > -->sysfs_readdir()
> > > --> trips on the freed sysfs_dirent with NULL inode.
> > >
> > > I am not sure if it is possible for other thread to see the freed
> > > sysfs_dirent and trip at sd->s_dentry->d_inode but the sysfs_dirent
> > > should have been unlinked from the parent sysfs_dirent's s_children list.
> > >
> > > > Now the question is, is it safe to add the test for s_dentry->d_inode too.
> > > > I ask this because the s_dentry is in the process of being filled, and
> > > > I don't know what effect this will have on what readdir wants. I guess
> > > > it may be safe, so I'm giving this patch:
> > > >
> > > > -- Steve
> > > >
> > > >
> > > > Description:
> > > >
> > > > In the process of creating a sysfs attribute, we can have a state
> > > > where the sysfs descriptor can have a dentry with a NULL inode.
> > > >
> > > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > > >
> > > > Index: linux-2.6.18-rc1/fs/sysfs/dir.c
> > > > ===================================================================
> > > > --- linux-2.6.18-rc1.orig/fs/sysfs/dir.c 2006-07-12 09:43:10.000000000 -0400
> > > > +++ linux-2.6.18-rc1/fs/sysfs/dir.c 2006-07-12 10:01:18.000000000 -0400
> > > > @@ -445,7 +445,7 @@ static int sysfs_readdir(struct file * f
> > > >
> > > > name = sysfs_get_name(next);
> > > > len = strlen(name);
> > > > - if (next->s_dentry)
> > > > + if (next->s_dentry && next->s_dentry->d_inode)
> > > > ino = next->s_dentry->d_inode->i_ino;
> > > > else
> > > > ino = iunique(sysfs_sb, 2);
> > > >
> > >
> > > I think this patch only fixes the sympton. I have tried to keep the
> > > assumption of no negative dentries (dentries with NULL d_inode) valid
> > > in sysfs. So, this does indicate a bug.
> >
> > Something else that might help is knowing what the other tasks where
> > doing at the time. Thomas, do you also have the task dump? You can
> > send that to me offline if you like.
> >
>
> yup... please cc me also.
Thomas, perhaps CC Greg KH on this too, since he _is_ the maintainer for
sysfs.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-07-13 2:59 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-12 11:35 [SYSFS] Kernel Null pointer dereference in sysfs_readdir() Duetsch, Thomas LDE1
2006-07-12 12:06 ` Steven Rostedt
2006-07-12 12:39 ` AW: " Duetsch, Thomas LDE1
2006-07-12 14:06 ` Steven Rostedt
2006-07-12 19:57 ` Maneesh Soni
2006-07-12 20:28 ` Steven Rostedt
2006-07-12 20:39 ` Maneesh Soni
2006-07-13 2:59 ` Steven Rostedt
2006-07-12 19:58 ` Maneesh Soni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox