public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* ptrace patch fails stress testing
@ 2003-04-01 18:22 linas
  2003-04-01 21:25 ` John M Flinchbaugh
  2003-04-02 11:49 ` Alan Cox
  0 siblings, 2 replies; 6+ messages in thread
From: linas @ 2003-04-01 18:22 UTC (permalink / raw)
  To: alan; +Cc: linas, ppc, linux-kernel

Hi,

I've got a number of machines here that crash after installing
the recent ptrace fix.  The crash only occurrs when machines 
are highly stressed.

The problem appears to be that task->mm is dereferenced without 
looking to see if mm is NULL.  e.g. in the sched.h in the 
is_dumpable() macro, we have task->mm->dumpable .  I'm sitting
in front of a KDB session and I'm clearly looking at task->mm
which is NULL. 

In my particular case, the crash is *always* in kernel/ptrace.c
in access_process_vm(),  (which is called when something tries
to read /proc/pid/cmd_line).  There seem to be a few other places
in the kernel where task->mm is dererenced without checking mm,
but these are rare (?)  Most (?) places seem to make a point of
checking for NULL before using mm.

Why, how and under what conditions this race condition occurs, 
I don't know.  What the best fix is, I don't know.

I can try to just add a check for NULL, but I'd like someone 
to tell me that 'yes this is the right way to fix this.' 
(As opposed for trying to get some lock or trying to force 
the process to get paged in or whatever.)

BTW, this is an SMP machine, don't know if that matters.

Comments? Suggestions?

--linas


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace patch fails stress testing
  2003-04-01 18:22 linas
@ 2003-04-01 21:25 ` John M Flinchbaugh
  2003-04-02 11:49 ` Alan Cox
  1 sibling, 0 replies; 6+ messages in thread
From: John M Flinchbaugh @ 2003-04-01 21:25 UTC (permalink / raw)
  To: linas; +Cc: alan, linas, ppc, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

On Tue, Apr 01, 2003 at 12:22:56PM -0600, linas@austin.ibm.com wrote:
> I've got a number of machines here that crash after installing
> the recent ptrace fix.  The crash only occurrs when machines 
> are highly stressed.

i've seen suspicious behaviour since installing the ptrace patch as
well, but i couldn't really narrow it down to anything, as i have
absolutely nothing of interest in logs.  i just login during the day
and notice that it had rebooted itself (uncleanly) over night at
around 3-5am.  this is when the disk-intensive nightly stuff happens,
like some backups, updatedb, etc.

i was starting to suspect power issues, since i never see oopses in
logs.

this is a dual athlon box.
-- 
____________________}John Flinchbaugh{______________________
| glynis@hjsoft.com         http://www.hjsoft.com/~glynis/ |
~~Powered by Linux: Reboots are for hardware upgrades only~~

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace patch fails stress testing
  2003-04-01 18:22 linas
  2003-04-01 21:25 ` John M Flinchbaugh
@ 2003-04-02 11:49 ` Alan Cox
  2003-04-02 14:45   ` Keith Owens
  1 sibling, 1 reply; 6+ messages in thread
From: Alan Cox @ 2003-04-02 11:49 UTC (permalink / raw)
  To: linas; +Cc: linas, ppc, Linux Kernel Mailing List

On Tue, 2003-04-01 at 19:22, linas@austin.ibm.com wrote:
> The problem appears to be that task->mm is dereferenced without 
> looking to see if mm is NULL.  e.g. in the sched.h in the 
> is_dumpable() macro, we have task->mm->dumpable .  I'm sitting
> in front of a KDB session and I'm clearly looking at task->mm
> which is NULL. 
> Why, how and under what conditions this race condition occurs, 
> I don't know.  What the best fix is, I don't know.

Zombie process. The patch checks ->mm but must also check ->mm != NULL
first.


> I can try to just add a check for NULL, but I'd like someone 
> to tell me that 'yes this is the right way to fix this.' 

It is I think.


Alan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace patch fails stress testing
  2003-04-02 11:49 ` Alan Cox
@ 2003-04-02 14:45   ` Keith Owens
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Owens @ 2003-04-02 14:45 UTC (permalink / raw)
  To: linas, Linux Kernel Mailing List

On Tue, 2003-04-01 at 19:22, linas@austin.ibm.com wrote:
> The problem appears to be that task->mm is dereferenced without 
> looking to see if mm is NULL.  e.g. in the sched.h in the 
> is_dumpable() macro, we have task->mm->dumpable .  I'm sitting
> in front of a KDB session and I'm clearly looking at task->mm
> which is NULL. 

Sorry, KDB is an illegal kernel patch.  Linus has spoken, the kernel
does not need debuggers.

All right, assume a smiley there.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace patch fails stress testing
@ 2003-04-03 15:22 James Cownie
  2003-04-03 19:53 ` Chris Wright
  0 siblings, 1 reply; 6+ messages in thread
From: James Cownie @ 2003-04-03 15:22 UTC (permalink / raw)
  To: linux-kernel

Alan wrote :-

> On Tue, 2003-04-01 at 19:22, linas@austin.ibm.com wrote:
> > The problem appears to be that task->mm is dereferenced without
> > looking to see if mm is NULL. e.g. in the sched.h in the
> > is_dumpable() macro, we have task->mm->dumpable . I'm sitting
> > in front of a KDB session and I'm clearly looking at task->mm
> > which is NULL.
> > Why, how and under what conditions this race condition occurs,
> > I don't know. What the best fix is, I don't know.
> 
> Zombie process. The patch checks ->mm but must also check ->mm != NULL
> first.

We're seeing this 100% reliably with out TotalView debugger, and as
Alan suggests it happens when trying to make a ptrace call on a zombie
process.

FWIW the oops looks like this 

  >>EIP; c01197f3 <ptrace_check_attach+13/50>   <=====
  Trace; c0109bc6 <sys_ptrace+ba/580>
  Trace; c0106cb8 <error_code+34/3c>
  Trace; c0106bc7 <system_call+33/38>
  Code;  c01197f3 <ptrace_check_attach+13/50>
  00000000 <_EIP>:
  Code;  c01197f3 <ptrace_check_attach+13/50>   <=====
     0:   f6 40 7c 01               testb  $0x1,0x7c(%eax)   <=====
  Code;  c01197f7 <ptrace_check_attach+17/50>
     4:   75 07                     jne    d <_EIP+0xd> c0119800 <ptrace_check_attach+20/50>
  Code;  c01197f9 <ptrace_check_attach+19/50>
     6:   b8 ff ff ff ff            mov    $0xffffffff,%eax
  Code;  c01197fe <ptrace_check_attach+1e/50>
     b:   c3                        ret    
  Code;  c01197ff <ptrace_check_attach+1f/50>
     c:   90                        nop    
  Code;  c0119800 <ptrace_check_attach+20/50>
     d:   f6 42 18 01               testb  $0x1,0x18(%edx)
  Code;  c0119804 <ptrace_check_attach+24/50>
    11:   75 0a                     jne    1d <_EIP+0x1d> c0119810 <ptrace_check_attach+30/50>
  Code;  c0119806 <ptrace_check_attach+26/50>
    13:   b8 00 00 00 00            mov    $0x0,%eax

which corresponds to checking a null mm.

Following Alan, the fix, then is to have is_dumpable look like this :-

#define is_dumpable(tsk)	((tsk)->task_dumpable && (tsk)->mm && (tsk)->mm->dumpable)

(and be prepared un user space to get EPERM back from some ptrace
calls which previously "worked" ok.)

-- Jim 

James Cownie	<jcownie@etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ptrace patch fails stress testing
  2003-04-03 15:22 ptrace patch fails stress testing James Cownie
@ 2003-04-03 19:53 ` Chris Wright
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Wright @ 2003-04-03 19:53 UTC (permalink / raw)
  To: James Cownie; +Cc: linux-kernel

* James Cownie (jcownie@etnus.com) wrote:
> 
> We're seeing this 100% reliably with out TotalView debugger, and as
> Alan suggests it happens when trying to make a ptrace call on a zombie
> process.

Yup, I can reliably reproduce this as well.  I'm also using the same patch
in is_dumpable().

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-04-03 19:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-03 15:22 ptrace patch fails stress testing James Cownie
2003-04-03 19:53 ` Chris Wright
  -- strict thread matches above, loose matches on Subject: below --
2003-04-01 18:22 linas
2003-04-01 21:25 ` John M Flinchbaugh
2003-04-02 11:49 ` Alan Cox
2003-04-02 14:45   ` Keith Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox