public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC]: Possible race condition on an SMP between proc_lookupfd and tasks on other cpus
@ 2006-10-11  9:16 Chandru
  2006-10-12 19:29 ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Chandru @ 2006-10-11  9:16 UTC (permalink / raw)
  To: linux-kernel

Hi All,
I am running a RHEL5  distro kernel ( which seems to be quite close to 
Vannilla kernel ) and am having a problem on one of my system 
(PPC64).    The system crashes ( goes in to xmon ) every now and then 
while running TCP stress tests on the system.   The following is the 
backtrace and exception information ( from distro kernel, which might be 
of very little help).

f:mon> e
cpu 0xf: Vector: 300 (Data Access) at [c0000000eaa1b490]
    pc: c0000000001351e0: .tid_fd_revalidate+0x64/0x220
    lr: c0000000001351cc: .tid_fd_revalidate+0x50/0x220
    sp: c0000000eaa1b710
   msr: 8000000000009032
   dar: 6b6b6b6b6b6b6b6b
 dsisr: 40000000
  current = 0xc0000001182864f0
  paca    = 0xc000000000456300
    pid   = 24558, comm = netstat
f:mon> t
[c0000000eaa1b7b0] c000000000138118 .proc_lookupfd+0x17c/0x21c
[c0000000eaa1b860] c0000000000f359c .do_lookup+0x108/0x268
[c0000000eaa1b920] c0000000000f65f8 .__link_path_walk+0xc58/0x1364
[c0000000eaa1ba00] c0000000000f6da0 .link_path_walk+0x9c/0x184
[c0000000eaa1bb40] c0000000000f7364 .do_path_lookup+0x304/0x398
[c0000000eaa1bbf0] c0000000000f7db8 .__user_walk_fd+0x58/0x88
[c0000000eaa1bc90] c0000000000edcdc .sys_readlinkat+0x44/0x130
[c0000000eaa1bdc0] c000000000016784 .compat_sys_readlink+0x14/0x28
[c0000000eaa1be30] c00000000000871c syscall_exit+0x0/0x40


 From code analysis ( vannilla and distro kernel), it looks like there 
can exist a small time window between

 spin_unlock(&files->file_lock) in proc_fd_instantiate()

and fcheck_files() in tid_fd_revalidate()   during which the contents 
of  'struct files_struct files' of a task could be released/cleared by 
that task ( during an exec probably ).

i.e between....
static struct dentry *proc_fd_instantiate(struct inode *dir,....)
{
...
...
        if (file->f_mode & 2)
                inode->i_mode |= S_IWUSR | S_IXUSR;
        spin_unlock(&files->file_lock);                    <------- 1.
        put_files_struct(files);

        inode->i_op = &proc_pid_link_inode_operations;
        inode->i_size = 64;
        ei->op.proc_get_link = proc_fd_link;
        dentry->d_op = &tid_fd_dentry_operations;
        d_add(dentry, inode);
        /* Close the race of the process dying before we return the 
dentry */
        if (tid_fd_revalidate(dentry, NULL))         <-------- 2.
...
...
}


static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
{
...
...
        if (task) {
                files = get_files_struct(task);
                if (files) {
                        rcu_read_lock();
                        if (fcheck_files(files, fd)) {           
<---------- 2
...
...
}

Could this code analysis be right? and can this race condition be fixed?.

Thanks,
Chandru

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC]: Possible race condition on an SMP between proc_lookupfd and tasks on other cpus
  2006-10-11  9:16 [RFC]: Possible race condition on an SMP between proc_lookupfd and tasks on other cpus Chandru
@ 2006-10-12 19:29 ` Eric W. Biederman
  2006-10-13 12:15   ` Chandru
  0 siblings, 1 reply; 4+ messages in thread
From: Eric W. Biederman @ 2006-10-12 19:29 UTC (permalink / raw)
  To: Chandru; +Cc: linux-kernel, Andrew Morton

Chandru <chandru@in.ibm.com> writes:

> Hi All,
> I am running a RHEL5  distro kernel ( which seems to be quite close to Vannilla
> kernel ) and am having a problem on one of my system (PPC64).    The system
> crashes ( goes in to xmon ) every now and then while running TCP stress tests on
> the system.   The following is the backtrace and exception information ( from
> distro kernel, which might be of very little help).
>
> f:mon> e
> cpu 0xf: Vector: 300 (Data Access) at [c0000000eaa1b490]
>    pc: c0000000001351e0: .tid_fd_revalidate+0x64/0x220
>    lr: c0000000001351cc: .tid_fd_revalidate+0x50/0x220
>    sp: c0000000eaa1b710
>   msr: 8000000000009032
>   dar: 6b6b6b6b6b6b6b6b
> dsisr: 40000000
>  current = 0xc0000001182864f0
>  paca    = 0xc000000000456300
>    pid   = 24558, comm = netstat
> f:mon> t
> [c0000000eaa1b7b0] c000000000138118 .proc_lookupfd+0x17c/0x21c
> [c0000000eaa1b860] c0000000000f359c .do_lookup+0x108/0x268
> [c0000000eaa1b920] c0000000000f65f8 .__link_path_walk+0xc58/0x1364
> [c0000000eaa1ba00] c0000000000f6da0 .link_path_walk+0x9c/0x184
> [c0000000eaa1bb40] c0000000000f7364 .do_path_lookup+0x304/0x398
> [c0000000eaa1bbf0] c0000000000f7db8 .__user_walk_fd+0x58/0x88
> [c0000000eaa1bc90] c0000000000edcdc .sys_readlinkat+0x44/0x130
> [c0000000eaa1bdc0] c000000000016784 .compat_sys_readlink+0x14/0x28
> [c0000000eaa1be30] c00000000000871c syscall_exit+0x0/0x40
>
>
> From code analysis ( vannilla and distro kernel), it looks like there can exist
> a small time window between
>
> spin_unlock(&files->file_lock) in proc_fd_instantiate()
>
> and fcheck_files() in tid_fd_revalidate()   during which the contents of
> 'struct files_struct files' of a task could be released/cleared by that task (
> during an exec probably ).


> Could this code analysis be right? and can this race condition be fixed?.

The window you see exists, but it is there by design.

tid_fd_revalidate is designed to be called any time after prod_fd_instantiate()
runs.  So it requires all of the locks it needs.  It's purpose in life
is to verify the permissions.

We have earlier increased the reference count of everything when we grabbed
the dentry.

The final tid_fd_revalidate in proc_fd_instantiate was added recently
to ensure we have a consistent set of checks before returning a dentry
to a user. 

So I think you have a legitimate problem but it isn't because we drop
and reaquire the locks.

There was a little recent work making some of the fdtable access
no-rcu.   See commit ca99c1da080345e227cfb083c330a184d42e27f3.  But
I don't think that applies here.

You certain seem to be in one of the proc stress conditions so this
may not be a unique bug.

Digging through the disassembly and figuring out which access you died
on would be interesting, so we could know with precision which part
of tid_fd_revalidate we are dying in.  My ppc64 isn't good enough 
especially without the matching binaries to figure that out though.
All I know is that you are about 25 instructions into
tid_fd_revalidate.

I don't have a clue where to start to dig into this.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC]: Possible race condition on an SMP between proc_lookupfd and tasks on other cpus
  2006-10-12 19:29 ` Eric W. Biederman
@ 2006-10-13 12:15   ` Chandru
  2006-10-13 16:29     ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Chandru @ 2006-10-13 12:15 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Andrew Morton

Eric W. Biederman wrote:
> You certain seem to be in one of the proc stress conditions so this
> may not be a unique bug.
>
> Digging through the disassembly and figuring out which access you died
> on would be interesting, so we could know with precision which part
> of tid_fd_revalidate we are dying in.  My ppc64 isn't good enough 
> especially without the matching binaries to figure that out though.
> All I know is that you are about 25 instructions into
> tid_fd_revalidate.
>   
 From the disassembly in xmon, the code that failed while running was in 
fcheck_files() called from tid_fd_revalidate()

f:mon> di c0000000001351e0
c0000000001351e0  800b0000      lwz     r0,0(r11)    <<<<<------ PC
c0000000001351e4  7f804840      cmplw   cr7,r0,r9
c0000000001351e8  409d00e0      ble     cr7,c0000000001352c8    #
.tid_fd_revalidate+0x14c/0x220
c0000000001351ec  e96b0008      ld      r11,8(r11)
c0000000001351f0  79291f24      rldicr  r9,r9,3,60
c0000000001351f4  7c09582a      ldx     r0,r9,r11
c0000000001351f8  2fa00000      cmpdi   cr7,r0,0
c0000000001351fc  419e00cc      beq     cr7,c0000000001352c8    #


r11 gets loaded with 'fdt'  as in  struct fdtable *fdt = 
files_fdtable(files);

include/linux/file.h:97
             b3c:       e9 6a 00 08     ld      r11,8(r10)

Following are the register contents...

f:mon> r
R00 = c0000000001351cc   R16 = 00000000fae9f35a
R01 = c0000000eaa1b710   R17 = 0000000010020000
R02 = c00000000056cd20   R18 = 00000000fae9f15a
R03 = c00000018b5f89a0   R19 = 00000000fae9e966
R04 = c0000001e172cb80   R20 = 00000000fae9e95a
R05 = c0000001349e88f8   R21 = 0000000010030000
R06 = c00000018b5f8ca0   R22 = 00000000fae9e968
R07 = 000000007fffffff   R23 = c0000000105e3920
R08 = 0000000000135075   R24 = c0000001349e88a8
R09 = 000000000000028c   R25 = c00000023dca8628
R10 = c00000018b5f89a0   R26 = c0000001e172ca30
R11 = 6b6b6b6b6b6b6b6b   R27 = c00000023dca8628
R12 = d0000000001bb3f8   R28 = 000000000000028c
R13 = c000000000456300   R29 = c0000001e172ca30
R14 = 00000000ffffffff   R30 = c0000000004a9250
R15 = 0000000000000000   R31 = c0000001349e88c8
pc  = c0000000001351e0 .tid_fd_revalidate+0x64/0x220
lr  = c0000000001351cc .tid_fd_revalidate+0x50/0x220
msr = 8000000000009032   cr  = 84000482
ctr = c00000000016a770   xer = 0000000000000000   trap =  300
dar = 6b6b6b6b6b6b6b6b   dsisr = 40000000


Now r11=6b6b6b6b6b6b6b6b = 'fdt'
and an attempt is made to access fdt->max_fds as in 'if (fd < fdt->max_fds)'  which fails

include/linux/file.h:99
             b44:       80 0b 00 00     lwz     r0,0(r11)   <<<<<<------PC 
             b48:       7f 80 48 40     cmplw   cr7,r0,r9

Further the contents of 'struct files_struct *files' were all filled 
with '0x6b's.  I tried to look in to for which task the lookup was being 
done and the task seemed to be 'execve04'.  Also noticed that another 
cpu ( cpu #1)  was running on an execve code path ( do not have the 
state of the system right now to copy it's backtrace  :(  ).

Thanks,
Chandru


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC]: Possible race condition on an SMP between proc_lookupfd and tasks on other cpus
  2006-10-13 12:15   ` Chandru
@ 2006-10-13 16:29     ` Eric W. Biederman
  0 siblings, 0 replies; 4+ messages in thread
From: Eric W. Biederman @ 2006-10-13 16:29 UTC (permalink / raw)
  To: Chandru; +Cc: linux-kernel, Andrew Morton

Chandru <chandru@in.ibm.com> writes:

> Eric W. Biederman wrote:
>> You certain seem to be in one of the proc stress conditions so this
>> may not be a unique bug.
>>
>> Digging through the disassembly and figuring out which access you died
>> on would be interesting, so we could know with precision which part
>> of tid_fd_revalidate we are dying in.  My ppc64 isn't good enough especially
>> without the matching binaries to figure that out though.
>> All I know is that you are about 25 instructions into
>> tid_fd_revalidate.
>>
> From the disassembly in xmon, the code that failed while running was in
> fcheck_files() called from tid_fd_revalidate()

> Further the contents of 'struct files_struct *files' were all filled with
> '0x6b's.  I tried to look in to for which task the lookup was being done and the
> task seemed to be 'execve04'.  Also noticed that another cpu ( cpu #1)  was
> running on an execve code path ( do not have the state of the system right now
> to copy it's backtrace  :(  ).

Which kernel version did you say you were running.

This appears to be a case of calling put_files_struct without changing
the pointer in the task_struct.  So we are reading a stale pointer
in get_files_struct.

It looks like the patch below added fixed a bug that has the same symptoms
as what you are describing.  So I suspect this is fixed in linus's tree
already.

>commit 3b9b8ab65d8eed784b9164d03807cb2bda7b5cd6
>Author: Kirill Korotaev <dev@sw.ru>
>Date:   Fri Sep 29 02:00:05 2006 -0700
>
>    [PATCH] Fix unserialized task->files changing
>    
>    Fixed race on put_files_struct on exec with proc.  Restoring files on
>    current on error path may lead to proc having a pointer to already kfree-d
>    files_struct.
>    
>    ->files changing at exit.c and khtread.c are safe as exit_files() makes all
>    things under lock.
>    
>    Found during OpenVZ stress testing.
>    
>    [akpm@osdl.org: add export]
>    Signed-off-by: Pavel Emelianov <xemul@openvz.org>
>    Signed-off-by: Kirill Korotaev <dev@openvz.org>
>    Signed-off-by: Andrew Morton <akpm@osdl.org>
>    Signed-off-by: Linus Torvalds <torvalds@osdl.org>


Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-10-13 16:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-11  9:16 [RFC]: Possible race condition on an SMP between proc_lookupfd and tasks on other cpus Chandru
2006-10-12 19:29 ` Eric W. Biederman
2006-10-13 12:15   ` Chandru
2006-10-13 16:29     ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox