public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dinakar Guniguntala <dino@in.ibm.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Sripathi Kodi <sripathik@in.ibm.com>,
	linux-kernel@vger.kernel.org, Roland McGrath <roland@redhat.com>,
	Ingo Molnar <mingo@elte.hu>, Andrew Morton <akpm@osdl.org>
Subject: Re: [PATCH] do_wait fix for 2.6.10-rc1
Date: Tue, 9 Nov 2004 20:01:18 +0530	[thread overview]
Message-ID: <20041109143118.GA8961@in.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0411081708000.2301@ppc970.osdl.org>

Linus,

All of your patches address the TASK_STOPPED case, whereas the current
problem that Sripathi reported does not have a task in the stopped state.
So the problem still remains.  Let me restate the problem


Thread A (PID 1)                Thread B (PID 1)        PID 2

                                   fork()
                                                        do something

   wait()                          wait()               exit()


   do_wait()                       do_wait()


   switch(p->state)              switch(p->state)
     -> default                    -> default
        (EXIT_ZOMBIE)                 (EXIT_ZOMBIE)


   wait_task_zombie()            wait_task_zombie()


   state = EXIT_DEAD
    (wins race)                     (loses race)
                                    return 0
    reap child

                                 since flag is set,
                              this thread will now hang



Hence the patches Sripathi sent which did fix the problem, but they end up 
busy looping because of schedule. The only clean solution seems to be to 
wake up the siblings if there are any in the case we reap a process from 
wait_task_zombie. What do you think?

Regards,

Dinakar


On Mon, Nov 08, 2004 at 05:09:59PM -0800, Linus Torvalds wrote:
> 
> 
> Ok, I haven't gotten any response on this one, but I'm running it in my 
> tree, and look as I might, it still seems right to me. It should fix not 
> only the problem Sripathi Kodi saw (can we verify that?), but also a more 
> fundamental problem with the access of stale memory.
> 
> I'll commit it. I would still prefer to have somebody else check out my 
> logic (or lack there-of).
> 
> 		Linus
> 
> On Mon, 8 Nov 2004, Linus Torvalds wrote:
> > 
> > Anyway, if I'm right, the suggested fix would be something like this (this 
> > replaces the earlier patches, since it also makes the zero return case go 
> > away - we don't need to mark anything runnable, since we restart the whole 
> > loop).
> > 
> > NOTE! -EAGAIN should be safe, because the other routines involved can only
> > return -EFAULT as an error, so this is all unique to the "try again"  
> > case.
> > 
> > Ok, three patches for the same piece of code withing minutes. Please tell 
> > me this one is not also broken..
> > 
> > 			Linus
> > 
> > ----
> > ===== kernel/exit.c 1.166 vs edited =====
> > --- 1.166/kernel/exit.c	2004-11-04 11:13:19 -08:00
> > +++ edited/kernel/exit.c	2004-11-08 08:34:37 -08:00
> > @@ -1201,8 +1201,15 @@
> >  		write_unlock_irq(&tasklist_lock);
> >  bail_ref:
> >  		put_task_struct(p);
> > -		read_lock(&tasklist_lock);
> > -		return 0;
> > +		/*
> > +		 * We are returning to the wait loop without having successfully
> > +		 * removed the process and having released the lock. We cannot
> > +		 * continue, since the "p" task pointer is potentially stale.
> > +		 *
> > +		 * Return -EAGAIN, and do_wait() will restart the loop from the
> > +		 * beginning. Do _not_ re-acquire the lock.
> > +		 */
> > +		return -EAGAIN;
> >  	}
> >  
> >  	/* move to end of parent's list to avoid starvation */
> > @@ -1343,6 +1350,8 @@
> >  							   (options & WNOWAIT),
> >  							   infop,
> >  							   stat_addr, ru);
> > +				if (retval == -EAGAIN)
> > +					goto repeat;
> >  				if (retval != 0) /* He released the lock.  */
> >  					goto end;
> >  				break;
> > 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2004-11-09 14:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-05  9:57 [PATCH] do_wait fix for 2.6.10-rc1 Sripathi Kodi
2004-11-05 10:03 ` Ingo Molnar
     [not found] ` <Pine.LNX.4.58.0411051101500.30457@ppc970.osdl.org>
2004-11-08 14:27   ` Sripathi Kodi
2004-11-08 16:01     ` Linus Torvalds
2004-11-08 16:13       ` Linus Torvalds
2004-11-08 16:37         ` Linus Torvalds
2004-11-09  1:09           ` Linus Torvalds
2004-11-09 14:31             ` Dinakar Guniguntala [this message]
2004-11-09 16:11               ` Linus Torvalds
2004-11-10 14:35                 ` Dinakar Guniguntala
2004-11-10 17:04                   ` Linus Torvalds
2004-11-11 11:20                     ` Sripathi Kodi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041109143118.GA8961@in.ibm.com \
    --to=dino@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=roland@redhat.com \
    --cc=sripathik@in.ibm.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox