All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
To: Peter Zijlstra <peterz@infradead.org>, <bsegall@google.com>,
	<rostedt@goodmis.org>
Cc: <fweisbec@gmail.com>, <mingo@redhat.com>,
	<acme@ghostprotocols.net>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/8] sched/core: Skip wakeup when task is already running.
Date: Mon, 5 May 2014 15:32:20 +0900	[thread overview]
Message-ID: <53673074.1040406@cn.fujitsu.com> (raw)
In-Reply-To: <20140422181820.GT26782@laptop.programming.kicks-ass.net>

Hi all,
Thanx for your reply, and sorry for the late.

On 04/23/2014 03:18 AM, Peter Zijlstra wrote:
> On Tue, Apr 22, 2014 at 10:10:52AM -0700, bsegall@google.com wrote:
>> This is all expected behavior, and the somewhat less than useful trace
>> events are expected. A task setting p->state to TASK_RUNNING without
>> locks is fine if and only p == current. The standard deschedule loop is
>> basically:
>>
>> while (1) {
>>    set_current_state(TASK_(UN)INTERRUPTIBLE);
>>    if (should_still_sleep)
>>      schedule();
>> }
>> set_current_state(TASK_RUNNING);
>>
>> Which can produce this in a race.
>>
>> The only problem this causes is a wasted check_preempt_curr call in the
>> racing case, and a somewhat inaccurate sched:sched_wakeup trace event.
>> Note that even if you did recheck in ttwu_do_wakeup you could still race
>> and get an "inaccurate" trace event.

Yes, even I recheck it in ttwu_do_wakeup(), I could still race.

Now I can not catch up a good idea to avoid this race.
But I think it is not too expensive.

About the event of sched:sched_wakeup we can get the bug information
as I mentioned at the first mail of this thread:

[root@yds-PC linux]# perf sched latency|tail
bash:25186 | 0.410 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
bash:25174 | 0.407 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
bash:25268 | 0.367 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
bash:25279 | 0.358 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
bash:25238 | 0.286 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
-----------------------------------------------------------------------------------------
TOTAL: | 1344.539 ms | 9604 |
---------------------------------------------------
*INFO: 0.557% state machine bugs (51 out of 9161)*

It is caused by the issue we discussed in this thread, that ttwu_do_wakeup()
could be called to wakeup a task which is on run queue.

There are two solutions in my mind:
* Add a new trace event, such as sched:sched_enqueue. Then we can trace
the event of it and get the timestamp when a task enqueue and start waiting
cpu. In this way, we can calculate the latency time with
(sched_in_time - sched_enqueue_time) in `perf sched latency`.

* Move the current trace point from ttwu_do_wakeup() to
ttwu_activate(). Currently the sched:sched_wakeup can tell user very little.
When we get a sched:sched_wakeup:
a) We can not say a task is inserted into run queue, it is also used for 
task
which is on_rq and only change the task->state to TASK_RUNNING.
b) We can not say the task->state is changed from {UN}INTERRUPTABLE to
RUNNING, sometimes task->state is already changed to RUNNING by other cpu.

I prefer the second one, anyway, current sched_wakeup tells user none.

>> Heck, even if the ttwu is
>> _necessary_ because p is currently trying to take rq->lock to
>> deschedule, you won't get a matching sched_switch event, because the
>> ttwu is running before schedule is.
>>
>> You could sorta fix this I guess by tracking every write to p->state
>> with trace events, but that would be a somewhat different change, and
>> might be considered too expensive for all I know (and the trace events
>> could /still/ be resolved in a different order across cpus compared to
>> p->state's memory).
> Ah, you're saying that a second task could try a spurious wakeup between
> set_current_state() and schedule(). Yes, that'll trigger this indeed.
>
>
> .
>


  reply	other threads:[~2014-05-05  7:32 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-15 12:32 [PATCH 0/8] perf sched: Add trace event for sched wait Dongsheng Yang
2014-04-15 12:32 ` [PATCH 1/8] sched & trace: Add a trace event for wait Dongsheng Yang
2014-04-15 13:49   ` Peter Zijlstra
2014-04-16 14:23     ` Steven Rostedt
2014-04-15 12:32 ` [PATCH 2/8] sched/wait: Add trace point before add task into wait queue Dongsheng Yang
2014-04-15 12:32 ` [PATCH 3/8] sched/wait: Use __add_wait_queue{_tail}_exclusive() as possible Dongsheng Yang
2014-04-15 13:49   ` Peter Zijlstra
2014-04-16  9:51     ` Dongsheng Yang
2014-04-15 12:32 ` [PATCH 4/8] sched/core: Skip wakeup when task is already running Dongsheng Yang
2014-04-15 13:53   ` Peter Zijlstra
2014-04-16 10:22     ` Dongsheng Yang
2014-04-22 11:56       ` Dongsheng Yang
2014-04-22 13:23         ` Peter Zijlstra
2014-04-22 17:10         ` bsegall
2014-04-22 17:53           ` Steven Rostedt
2014-04-22 18:18           ` Peter Zijlstra
2014-05-05  6:32             ` Dongsheng Yang [this message]
2014-05-05  6:34               ` [PATCH] sched: Move the wakeup tracepoint from ttwu_do_wakeup() to ttwu_activate() Dongsheng Yang
2014-05-05 14:00                 ` Steven Rostedt
2014-05-06  0:19                   ` Dongsheng Yang
2014-05-06  0:26                     ` Dongsheng Yang
2014-05-06  2:06                     ` Steven Rostedt
2014-05-06  1:29                       ` Dongsheng Yang
2014-05-06  1:52                         ` [PATCH] sched: Distinguish sched_wakeup event when wake up a task which did schedule out or not Dongsheng Yang
2014-05-09  0:16                           ` Dongsheng Yang
2014-05-09  1:27                             ` Steven Rostedt
2014-05-10 15:29                           ` Peter Zijlstra
     [not found]                             ` <536F90BE.2080806@gmail.com>
2014-05-11 15:24                               ` Fwd: " Dongsheng Yang
2014-05-11 16:35                                 ` Peter Zijlstra
2014-05-11 18:52                                   ` Steven Rostedt
2014-05-12  6:47                                     ` Peter Zijlstra
2014-05-12  8:58                                       ` Dongsheng Yang
2014-05-12 14:09                                       ` Steven Rostedt
2014-05-12 15:09                                         ` Peter Zijlstra
2014-05-12 15:17                                           ` Steven Rostedt
2014-05-12 15:28                                             ` Peter Zijlstra
2014-04-15 12:32 ` [PATCH 5/8] perf tools: record and process sched:sched_wait event Dongsheng Yang
2014-04-15 12:32 ` [PATCH 6/8] perf tools: add missing event for perf sched record Dongsheng Yang
2014-04-15 12:32 ` [PATCH 7/8] perf tools: Adapt the TASK_STATE_TO_CHAR_STR to new value in kernel space Dongsheng Yang
2014-04-15 12:32 ` [PATCH 8/8] perf tools: Clarify the output of perf sched map Dongsheng Yang
2014-04-15 13:54 ` [PATCH 0/8] perf sched: Add trace event for sched wait Peter Zijlstra
2014-04-16 10:28   ` Dongsheng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53673074.1040406@cn.fujitsu.com \
    --to=yangds.fnst@cn.fujitsu.com \
    --cc=acme@ghostprotocols.net \
    --cc=bsegall@google.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.