public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: george anzinger <george@mvista.com>
To: Bruce Janson <bruce@cs.usyd.edu.au>, linux-kernel@vger.kernel.org
Subject: How should nano_sleep be fixed (was: ptrace(), fork(), sleep(), exit(),  SIGCHLD)
Date: Wed, 15 Aug 2001 17:59:51 -0700	[thread overview]
Message-ID: <3B7B1B07.A9FB293B@mvista.com> (raw)
In-Reply-To: <20010813093116Z270036-761+611@vger.kernel.org> <20010814092849.E13892@pc8.lineo.fr> <20010814201825Z270798-760+1687@vger.kernel.org> <3B7A9953.744977A5@mvista.com> <3B7AB93D.F8B5B455@mvista.com>

The problem is that nano_sleep needs to continue sleeping if it wakes up
on a signal which is not delivered to the task.  This happens when
"strace" or "ptrace" cause otherwise blocked signals to be delivered. 
Do_signal() returns 0 if it does not deliver a signal, a 1 if it does so
I propose the following changes to nano_sleep:

asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec
*rmtp)
{
	struct timespec t;
	unsigned long expire;
+	struct pt_regs * regs = (struct pt_regs *) &rqtp;

	if(copy_from_user(&t, rqtp, sizeof(struct timespec)))
		return -EFAULT;

	if (t.tv_nsec >= 1000000000L || t.tv_nsec < 0 || t.tv_sec < 0)
		return -EINVAL;


	if (t.tv_sec == 0 && t.tv_nsec <= 2000000L &&
	    current->policy != SCHED_OTHER)
	{
		/*
		 * Short delay requests up to 2 ms will be handled with
		 * high precision by a busy wait for all real-time processes.
		 *
		 * Its important on SMP not to do this holding locks.
		 */
		udelay((t.tv_nsec + 999) / 1000);
		return 0;
	}

	expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);

-	current->state = TASK_INTERRUPTIBLE;
-	expire = schedule_timeout(expire);
+	regs->eax = -EINTR;
+	do {
+               current->state = TASK_INTERRUPTIBLE;
+       } while((expire = schedule_timeout(expire)) &&
!do_signal(regs,NULL));

	if (expire) {
		if (rmtp) {
			jiffies_to_timespec(expire, &t);
			if (copy_to_user(rmtp, &t, sizeof(struct timespec)))
				return -EFAULT;
		}
		return -EINTR;
	}
	return 0;
}

BUT, it turns out that do_signal() is in the "arch" code, and further
that different arch's have different calling sequences (and, of course,
pt_regs is different also).  This is the ONLY place in the kernel where
platform independent code needs to call do_signal() :(  There does not
seem to be a clean answer for this issue.  I suppose we could put
something like this in timer.c:

#ifndef _do_signal
#define _do_signal(a,b) 1
#endif

and then leave it to the platform code to define a uniform
_do_signal(a,b) interface.  This way each platform will work as it does
now and better once they define the macro.  If this is the path to take,
what should the parameters "a" & "b" be?  We need to cover all platforms
needs.

Another way to solve this issue is to move nano_sleep into the "arch"
signal.c file, but then each would have to change and things would be
broken (i.e. nano_sleep would not work) until the platform made the
move.  I suppose the current nano_sleep could stay in the kernel and
each platform could implement one in their area with a different name. 
When all were done, the current code could be deleted.

How should this be approached?

George


george anzinger wrote:
> 
> george anzinger wrote:
> >
> > Bruce Janson wrote:
> > >
> > > In article <20010814092849.E13892@pc8.lineo.fr>,
> > > christophe =?iso-8859-1?Q?barb=E9?=  <christophe.barbe@lineo.fr> wrote:
> > > ..
> > > >Le lun, 13 aoû 2001 10:29:32, Bruce Janson a écrit :
> > > ..
> > > >>     The following program behaves incorrectly when traced:
> > > ..
> > > >Have you receive off-line answers?
> > > ..
> > >
> > > No, though I did receive an offline reply from someone who appeared
> > > to have misunderstood the post.  In case it wasn't clear, the problem
> > > is that the above program behaves differently when traced to how it
> > > behaves when not traced.  (I do realise that in general, under newer
> > > Unices, when not ignored, a SIGCHLD signal may accompany the death of
> > > a child.)
> > >
> > > >I guess that it's certainly more a strace issue and that it's perhaps
> > > ..
> > >
> > > It's not clear to me whether it is a kernel, glibc or strace bug, but
> > > it does appear to be a bug.
> >
> > I don't have the code for usleep() handy and the man page is not much
> > help, but here goes:
> >
> > I think strace is using ptrace() which causes signals to be redirected
> > to wake up the parent (strace in this case).  In particular, blocked
> > signals are no longer blocked.  What this means is that a.) SIG CHILD is
> > posted, b.) the signal, not being blocked, the child is wakened, c.)
> > ptrace returns to the parent, d) the parent does what ever and tells the
> > kernel (ptrace) to continue the child with the original mask, e.) the
> > signal code returns 0 with out delivering the signal to the child.
> > Looks good, right?  Wrong!  The wake up at (b) pulls the child out of
> > the timer queue so when signal returns, the sleep (I assume nano sleep
> > is actually used here) call returns with the remaining sleep time as a
> > value.
> >
> > This is an issue for debugging also (same ptrace...).  The fix is to fix
> > nano_sleep to match the standard which says it should only return on a
> > signal if the signal is delivered to the program (i.e. not on internal
> > "do nothing" signals).  Signal in the kernel returns 1 if it calls the
> > task and 0 otherwise, thus nano sleep might be changed as follows:
> >
> >         expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);
> >
> >         current->state = TASK_INTERRUPTIBLE;
> > -       expire = schedule_timeout(expire);
> > +       while (expire = schedule_timeout(expire) && !signal());
> Still not quite right.  regs needs to be dummied up (see sys_sigpause)
> and then:
> +       while (expire = schedule_timeout(expire) && !do_signal(regs,
> NULL));
> >
> >         if (expire) {
> >                 if (rmtp) {
> >                         jiffies_to_timespec(expire, &t);
> >                         if (copy_to_user(rmtp, &t, sizeof(struct timespec)))
> >                                 return -EFAULT;
> >                 }
> >                 return -EINTR;
> >         }
> >         return 0;
> >
> > This code is in ../kernel/timer.c
> >
> > Note that this assumes that nano_sleep() underlies usleep().  If
> > setitimer (via sleep() or otherwise) is used, the problem and fix is in
> > the library.  In that case, the code needs to notice that it was
> > awakened but the alarm handler was not called.  Still, with out the full
> > spec on usleep() it is not clear what it should do.
> >
> > In any case, this is a bug in nano_sleep(), where the spec is clear on
> > this point.
> >
> > George
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2001-08-16  1:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-08-13  8:29 ptrace(), fork(), sleep(), exit(), SIGCHLD Bruce Janson
2001-08-14  7:28 ` christophe barbé
2001-08-14 15:06   ` Bruce Janson
2001-08-15 15:46     ` george anzinger
2001-08-15 17:53       ` george anzinger
2001-08-15 18:02       ` george anzinger
2001-08-16  0:59         ` george anzinger [this message]
2001-08-16 10:17           ` How should nano_sleep be fixed (was: ptrace(), fork(), sleep(), exit(), SIGCHLD) christophe barbé
2001-08-16 10:29             ` Russell King
2001-08-16 14:16               ` george anzinger
2001-08-16 16:00               ` christophe barbé
2001-08-16 16:12                 ` Russell King
2001-08-16 18:17                   ` george anzinger
2001-08-17 18:25                     ` george anzinger
2001-08-17 18:57                       ` Victor Yodaiken
2001-08-17 19:56                         ` george anzinger
2001-08-22 18:40                           ` Russell King
2001-08-23 20:04                             ` george anzinger
2001-08-23 20:11                               ` Russell King
2001-08-23 21:13                                 ` george anzinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3B7B1B07.A9FB293B@mvista.com \
    --to=george@mvista.com \
    --cc=bruce@cs.usyd.edu.au \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox