Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oleg Nesterov <oleg@redhat.com>
To: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
Date: Fri, 11 Jun 2010 18:57:51 +0200	[thread overview]
Message-ID: <20100611165750.GA7019@redhat.com> (raw)
In-Reply-To: <20100611164050.GA19325@dhcp-lab-161.englab.brq.redhat.com>

On 06/11, Stanislaw Gruszka wrote:
>
> On Fri, Jun 11, 2010 at 05:15:33PM +0200, Oleg Nesterov wrote:
> > On 06/11, Stanislaw Gruszka wrote:
> > >
> > > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote:
> > > > thread_group_cputime() looks as if it is rcu-safe, but in fact this
> > > > was wrong until ea6d290c which pins task->signal to task_struct.
> > > > It checks ->sighand != NULL under rcu, but this can't help if ->signal
> > > > can go away. Fortunately the caller either holds ->siglock, or it is
> > > > fastpath_timer_check() which uses current and checks exit_state == 0.
> > >
> > > Hmm, I thought we avoided calling thread_group_cputime() from
> > > fastpatch_timer_check(), but seems it is still possible when we
> > > call run_posix_cpu_timers() on two different cpus simultaneously ...
> >
> > No, we can't. thread_group_cputimer() does test-and-set ->running
> > under cputimer->lock.
> >
> > But when I sent these patches, I realized we have another race here
> > (with or without these patches). I am already doing the fix.
>
> Don't know what you catch, I was thinking about:
>
> cpu0							cpu1
>
> fastpath_timer_check():
>
> if (sig->cputimer.running) {
>   struct task_cputime group_sample;
> 							stop_process_timers():	
>         						
> 							spin_lock_irqsave(&cputimer->lock, flags);
>         						cputimer->running = 0;
>         						spin_unlock_irqrestore(&cputimer->lock, flags);
> 				
>   thread_group_cputimer(tsk, &group_sample);

Yes, I was thinking about this race too. Please wait a bit, I'll send
the patch.

In short: it is safe to call thread_group_cputime() lockless, but
thread_group_cputimer() must not be called without siglock/tasklist
(oh, and imho we should rename them somehow, their names are almost
 identical). And in fact fastpath_timer_check() does not need
thread_group_cputimer().

> > > > - Since ea6d290c commit tsk->signal is stable, we can read it first
> > > >   and avoid the initialization from INIT_CPUTIME.
> > > >
> > > > - Even if tsk->signal is always valid, we still have to check it
> > > >   is safe to use next_thread() under rcu_read_lock(). Currently
> > > >   the code checks ->sighand != NULL, change it to use pid_alive()
> > > >   which is commonly used to ensure the task wasn't unhashed before
> > > >   we take rcu_read_lock().
> > >
> > > I'm not sure how important are values of almost dead task, but
> > > perhaps would be better to return times form all threads
> > > using as base sig->curr_target in loop.
> >
> > Could you clarify?
>
> Avoid pid_alive check and loop starting from sig->curr_target:
>
>        t = tsk = sig->curr_target;
>        do {
>                 times->utime = cputime_add(times->utime, t->utime);
>                 times->stime = cputime_add(times->stime, t->stime);
>                 times->sum_exec_runtime += t->se.sum_exec_runtime;
>        } while_each_thread(tsk, t);
>
> I don't know what are rules regarding accessing sig->curr_target, but
> if this is done under sighand->siglock we should be safe. Question
> if if we always have lock taken, we tried to assure that in the past,
> but if we really do?

Ah, you are talking about thread_group_cputime().

Without ->siglock this is not safe. We can change __exit_signal() to
nullify ->curr_target in the group_dead case, then the code above
could check sig->curr_target != NULL.

But this is too subtle imho, and not needed. Instead we should move
group_leader into ->signal (and kill signal->leader_pid). I am going
to do more cleanups in this area "later".


Anyway. This all has nothing to do with this patch. The 4/5 change
in thread_group_cputime() is cleanup, and it ccan help to make
/proc/pid/stat /proc/pid/status lockless.

With or without 5/5 thread_group_cputime() can be called lockless
and race with exit/fork. This is fine by itself, but this is wrong
because the caller sets ->running.

Oleg.

next prev parent reply	other threads:[~2010-06-11 16:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov
2010-06-11 14:35 ` Stanislaw Gruszka
2010-06-11 15:15   ` Oleg Nesterov
2010-06-11 16:40     ` Stanislaw Gruszka
2010-06-11 16:57       ` Oleg Nesterov [this message]
2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100611165750.GA7019@redhat.com \
    --to=oleg@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=sgruszka@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.