From: Peter Zijlstra <peterz@infradead.org>
To: "Ni, BaoleX" <baolex.ni@intel.com>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
"acme@kernel.org" <acme@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"alexander.shishkin@linux.intel.com"
<alexander.shishkin@linux.intel.com>,
"Liu, Chuansheng" <chuansheng.liu@intel.com>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: hit a KASan bug related to Perf during stress test
Date: Mon, 24 Oct 2016 11:53:41 +0200 [thread overview]
Message-ID: <20161024095341.GF3102@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <318B87A793BE164187D8851D6CE09D64371C8811@shsmsx102.ccr.corp.intel.com>
On Mon, Oct 24, 2016 at 09:35:46AM +0000, Ni, BaoleX wrote:
>
> [32736.018823] BUG: KASan: use after free in task_tgid_nr_ns+0x35/0xb0 at addr ffff8800265568c0
> [32736.028309] Read of size 8 by task dumpsys/11268
> [32736.033511] =============================================================================
> [32736.042700] BUG task_struct (Tainted: G W O): kasan: bad access detected
'W' this wasn't the first WARN you got, this means this might be the
result of prior borkage.
Also, it says: "BUG task_struct", does that mean task_struct was the
object accessed after free?
> [32736.051002] -----------------------------------------------------------------------------
> [32736.051002]
> [32736.061840] Disabling lock debugging due to kernel taint
> [32736.067830] INFO: Slab 0xffffea0000995400 objects=5 used=3 fp=0xffff880026550000 flags=0x4000000000004080
> [32736.078572] INFO: Object 0xffff880026556440 @offset=25664 fp=0x (null)
> ...
> [32738.776936] CPU: 0 PID: 11268 Comm: dumpsys Tainted: G B W O 3.14.70-x86_64-02260-g162539f #1
> [32738.787092] Hardware name: Insyde CherryTrail/T3 MRD, BIOS CHTMRD.A6.002.016 09/20/2016
> [32738.796082] ffff880026550000 0000000000000086 0000000000000000 ffff880065e05a70
> [32738.796215] ffffffff81fc9427 ffff880065803b40 ffff880026556440 ffff880065e05aa0
> [32738.796345] ffffffff8123fe2d ffff880065803b40 ffffea0000995400 ffff880026556440
> [32738.796475] Call Trace:
> [32738.796510] <NMI>
> [32738.796585] [<ffffffff81fc9427>] dump_stack+0x67/0x90
> [32738.802404] [<ffffffff8123fe2d>] print_trailer+0xfd/0x170
> [32738.808603] [<ffffffff81244f26>] object_err+0x36/0x40
> [32738.814417] [<ffffffff812467ed>] kasan_report_error+0x1fd/0x3d0
> [32738.821193] [<ffffffff81131b84>] ? __rcu_read_unlock+0x24/0x90
> [32738.827881] [<ffffffff81fe0888>] ? preempt_count_sub+0x18/0xf0
> [32738.834565] [<ffffffff811db32c>] ? perf_output_put_handle+0x5c/0x170
> [32738.841833] [<ffffffff81246e70>] kasan_report+0x40/0x50
> [32738.847838] [<ffffffff810d9975>] ? task_tgid_nr_ns+0x35/0xb0
> [32738.854327] [<ffffffff81245d59>] __asan_load8+0x69/0xa0
> [32738.860333] [<ffffffff811dba18>] ? perf_output_copy+0x88/0x120
> [32738.867020] [<ffffffff810d9975>] task_tgid_nr_ns+0x35/0xb0
So here we did: perf_event_[pt]id(event, current);
How can _current_ not be valid anymore?
> [32738.873319] [<ffffffff811cd5d8>] __perf_event_header__init_id+0xb8/0x200
> [32738.880970] [<ffffffff811d6f19>] perf_prepare_sample+0xa9/0x4a0
> [32738.887754] [<ffffffff811d7700>] __perf_event_overflow+0x3f0/0x460
> [32738.894835] [<ffffffff81022998>] ? x86_perf_event_set_period+0x128/0x210
> [32738.902496] [<ffffffff811d8494>] perf_event_overflow+0x14/0x20
> [32738.909180] [<ffffffff8102cabc>] intel_pmu_handle_irq+0x25c/0x520
> [32738.916156] [<ffffffff81245945>] ? __asan_store8+0x15/0xa0
> [32738.922460] [<ffffffff81fddb8b>] perf_event_nmi_handler+0x2b/0x50
> [32738.929437] [<ffffffff81fdd4a8>] nmi_handle+0x88/0x230
> [32738.935346] [<ffffffff81009873>] do_nmi+0x193/0x490
> [32738.940963] [<ffffffff81fdc6d6>] end_repeat_nmi+0x1a/0x1e
> [32738.947163] [<ffffffff81245d22>] ? __asan_load8+0x32/0xa0
> [32738.953358] [<ffffffff81245d22>] ? __asan_load8+0x32/0xa0
> [32738.959554] [<ffffffff81245d22>] ? __asan_load8+0x32/0xa0
> [32738.965718] <<EOE>>
> [32738.965787] [<ffffffff811065a2>] ? check_preempt_wakeup+0x1a2/0x3a0
> [32738.972970] [<ffffffff810f4618>] check_preempt_curr+0xf8/0x120
> [32738.979658] [<ffffffff810f465d>] ttwu_do_wakeup+0x1d/0x1b0
> [32738.985953] [<ffffffff810f4909>] ttwu_do_activate.constprop.105+0x89/0x90
> [32738.993710] [<ffffffff810f87fe>] try_to_wake_up+0x29e/0x4e0
> [32739.000100] [<ffffffff810f8aaf>] default_wake_function+0x2f/0x40
> [32739.006979] [<ffffffff81114338>] autoremove_wake_function+0x18/0x50
> [32739.014149] [<ffffffff81fe0888>] ? preempt_count_sub+0x18/0xf0
> [32739.020836] [<ffffffff81113ab9>] __wake_up_common+0x79/0xb0
> [32739.027232] [<ffffffff81113d69>] __wake_up+0x39/0x50
> [32739.032945] [<ffffffff81135918>] __call_rcu_nocb_enqueue+0x158/0x160
> [32739.040207] [<ffffffff81135a4c>] __call_rcu+0x12c/0x450
And while we just called release_task(), that call_rcu() should still be
pending at this point, also I don't think that can be current until
after do_task_dead() where we schedule away from the dead task and
change current.
> [32739.046207] [<ffffffff81135dcd>] call_rcu+0x1d/0x20
> [32739.051821] [<ffffffff810ae2da>] release_task+0x6aa/0x8d0
> [32739.058022] [<ffffffff8111e86f>] ? do_raw_write_unlock+0x6f/0xd0
> [32739.064900] [<ffffffff810b1002>] do_exit+0xe52/0x1020
> [32739.070712] [<ffffffff810b1222>] SyS_exit+0x22/0x30
> [32739.076328] [<ffffffff81fe9063>] sysenter_dispatch+0x7/0x1f
> [32739.082725] [<ffffffff8152f33b>] ? trace_hardirqs_on_thunk+0x3a/0x3c
Oleg, any idea?
next parent reply other threads:[~2016-10-24 9:53 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <318B87A793BE164187D8851D6CE09D64371C8811@shsmsx102.ccr.corp.intel.com>
2016-10-24 9:53 ` Peter Zijlstra [this message]
2016-10-24 11:15 ` hit a KASan bug related to Perf during stress test Oleg Nesterov
2016-10-24 11:24 ` Peter Zijlstra
2016-10-24 12:02 ` Oleg Nesterov
2016-10-24 12:10 ` Oleg Nesterov
2016-10-24 12:22 ` Peter Zijlstra
2016-10-24 12:29 ` Oleg Nesterov
2016-10-24 12:38 ` Peter Zijlstra
2016-10-24 13:25 ` Oleg Nesterov
2016-10-24 13:40 ` Oleg Nesterov
2016-10-24 14:17 ` Peter Zijlstra
2016-10-24 14:36 ` Peter Zijlstra
2016-10-24 15:39 ` Oleg Nesterov
2016-10-24 15:53 ` Oleg Nesterov
2016-10-25 6:55 ` Ni, BaoleX
2016-10-25 9:28 ` Peter Zijlstra
2016-10-25 14:41 ` Oleg Nesterov
2016-10-26 9:03 ` Peter Zijlstra
2016-10-26 16:10 ` Oleg Nesterov
2016-10-24 12:19 ` Peter Zijlstra
2016-10-24 11:27 ` Peter Zijlstra
2016-10-24 11:29 ` Peter Zijlstra
2016-10-24 12:04 ` Jiri Olsa
2016-10-24 12:12 ` Peter Zijlstra
2016-10-24 12:11 ` Peter Zijlstra
2016-10-24 12:21 ` Oleg Nesterov
2016-10-24 12:27 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161024095341.GF3102@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=baolex.ni@intel.com \
--cc=chuansheng.liu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox