From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Daniel Díaz" <daniel.diaz@linaro.org>,
"Naresh Kamboju" <naresh.kamboju@linaro.org>,
"Stephen Rothwell" <sfr@canb.auug.org.au>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
zenglg.jy@cn.fujitsu.com,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
"Viresh Kumar" <viresh.kumar@linaro.org>,
"X86 ML" <x86@kernel.org>,
"open list" <linux-kernel@vger.kernel.org>,
lkft-triage@lists.linaro.org,
"Eric W. Biederman" <ebiederm@xmission.com>,
linux-mm <linux-mm@kvack.org>,
linux-m68k <linux-m68k@lists.linux-m68k.org>,
"Linux-Next Mailing List" <linux-next@vger.kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
kasan-dev <kasan-dev@googlegroups.com>,
"Dmitry Vyukov" <dvyukov@google.com>,
"Geert Uytterhoeven" <geert@linux-m68k.org>,
"Christian Brauner" <christian.brauner@ubuntu.com>,
"Ingo Molnar" <mingo@redhat.com>, "LTP List" <ltp@lists.linux.it>,
"Al Viro" <viro@zeniv.linux.org.uk>
Subject: Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
Date: Thu, 22 Oct 2020 22:02:14 -0700 [thread overview]
Message-ID: <20201023050214.GG23681@linux.intel.com> (raw)
In-Reply-To: <CAHk-=wgqAp5B46SWzgBt6UkheVGFPs2rrE6H4aqLExXE1TXRfQ@mail.gmail.com>
On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote:
> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz <daniel.diaz@linaro.org> wrote:
> >
> > The kernel Naresh originally referred to is here:
> > https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
>
> Thanks.
>
> And when I started looking at it, I realized that my original idea
> ("just look for __put_user_nocheck_X calls, there aren't so many of
> those") was garbage, and that I was just being stupid.
>
> Yes, the commit that broke was about __put_user(), but in order to not
> duplicate all the code, it re-used the regular put_user()
> infrastructure, and so all the normal put_user() calls are potential
> problem spots too if this is about the compiler interaction with KASAN
> and the asm changes.
>
> So it's not just a couple of special cases to look at, it's all the
> normal cases too.
>
> Ok, back to the drawing board, but I think reverting it is probably
> the right thing to do if I can't think of something smart.
>
> That said, since you see this on x86-64, where the whole ugly trick with that
>
> register asm("%"_ASM_AX)
>
> is unnecessary (because the 8-byte case is still just a single
> register, no %eax:%edx games needed), it would be interesting to hear
> if the attached patch fixes it. That would confirm that the problem
> really is due to some register allocation issue interaction (or,
> alternatively, it would tell me that there's something else going on).
I haven't reproduced the crash, but I did find a smoking gun that confirms the
"register shenanigans are evil shenanigans" theory. I ran into a similar thing
recently where a seemingly innocuous line of code after loading a value into a
register variable wreaked havoc because it clobbered the input register.
This put_user() in schedule_tail():
if (current->set_child_tid)
put_user(task_pid_vnr(current), current->set_child_tid);
generates the following assembly with KASAN out-of-line:
0xffffffff810dccc9 <+73>: xor %edx,%edx
0xffffffff810dcccb <+75>: xor %esi,%esi
0xffffffff810dcccd <+77>: mov %rbp,%rdi
0xffffffff810dccd0 <+80>: callq 0xffffffff810bf5e0 <__task_pid_nr_ns>
0xffffffff810dccd5 <+85>: mov %r12,%rdi
0xffffffff810dccd8 <+88>: callq 0xffffffff81388c60 <__asan_load8>
0xffffffff810dccdd <+93>: mov 0x590(%rbp),%rcx
0xffffffff810dcce4 <+100>: callq 0xffffffff817708a0 <__put_user_4>
0xffffffff810dcce9 <+105>: pop %rbx
0xffffffff810dccea <+106>: pop %rbp
0xffffffff810dcceb <+107>: pop %r12
__task_pid_nr_ns() returns the pid in %rax, which gets clobbered by
__asan_load8()'s check on current for the current->set_child_tid dereference.
WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: ltp@lists.linux.it
Subject: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
Date: Thu, 22 Oct 2020 22:02:14 -0700 [thread overview]
Message-ID: <20201023050214.GG23681@linux.intel.com> (raw)
In-Reply-To: <CAHk-=wgqAp5B46SWzgBt6UkheVGFPs2rrE6H4aqLExXE1TXRfQ@mail.gmail.com>
On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote:
> On Thu, Oct 22, 2020 at 6:36 PM Daniel D?az <daniel.diaz@linaro.org> wrote:
> >
> > The kernel Naresh originally referred to is here:
> > https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
>
> Thanks.
>
> And when I started looking at it, I realized that my original idea
> ("just look for __put_user_nocheck_X calls, there aren't so many of
> those") was garbage, and that I was just being stupid.
>
> Yes, the commit that broke was about __put_user(), but in order to not
> duplicate all the code, it re-used the regular put_user()
> infrastructure, and so all the normal put_user() calls are potential
> problem spots too if this is about the compiler interaction with KASAN
> and the asm changes.
>
> So it's not just a couple of special cases to look at, it's all the
> normal cases too.
>
> Ok, back to the drawing board, but I think reverting it is probably
> the right thing to do if I can't think of something smart.
>
> That said, since you see this on x86-64, where the whole ugly trick with that
>
> register asm("%"_ASM_AX)
>
> is unnecessary (because the 8-byte case is still just a single
> register, no %eax:%edx games needed), it would be interesting to hear
> if the attached patch fixes it. That would confirm that the problem
> really is due to some register allocation issue interaction (or,
> alternatively, it would tell me that there's something else going on).
I haven't reproduced the crash, but I did find a smoking gun that confirms the
"register shenanigans are evil shenanigans" theory. I ran into a similar thing
recently where a seemingly innocuous line of code after loading a value into a
register variable wreaked havoc because it clobbered the input register.
This put_user() in schedule_tail():
if (current->set_child_tid)
put_user(task_pid_vnr(current), current->set_child_tid);
generates the following assembly with KASAN out-of-line:
0xffffffff810dccc9 <+73>: xor %edx,%edx
0xffffffff810dcccb <+75>: xor %esi,%esi
0xffffffff810dcccd <+77>: mov %rbp,%rdi
0xffffffff810dccd0 <+80>: callq 0xffffffff810bf5e0 <__task_pid_nr_ns>
0xffffffff810dccd5 <+85>: mov %r12,%rdi
0xffffffff810dccd8 <+88>: callq 0xffffffff81388c60 <__asan_load8>
0xffffffff810dccdd <+93>: mov 0x590(%rbp),%rcx
0xffffffff810dcce4 <+100>: callq 0xffffffff817708a0 <__put_user_4>
0xffffffff810dcce9 <+105>: pop %rbx
0xffffffff810dccea <+106>: pop %rbp
0xffffffff810dcceb <+107>: pop %r12
__task_pid_nr_ns() returns the pid in %rax, which gets clobbered by
__asan_load8()'s check on current for the current->set_child_tid dereference.
next prev parent reply other threads:[~2020-10-23 5:02 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-21 16:58 mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000] Naresh Kamboju
2020-10-21 16:58 ` [LTP] " Naresh Kamboju
2020-10-21 17:05 ` Linus Torvalds
2020-10-21 17:05 ` [LTP] " Linus Torvalds
2020-10-21 17:22 ` Naresh Kamboju
2020-10-21 17:22 ` [LTP] " Naresh Kamboju
2020-10-22 20:55 ` Naresh Kamboju
2020-10-22 20:55 ` [LTP] " Naresh Kamboju
2020-10-22 23:43 ` Linus Torvalds
2020-10-22 23:43 ` [LTP] " Linus Torvalds
2020-10-23 0:11 ` Linus Torvalds
2020-10-23 0:11 ` [LTP] " Linus Torvalds
2020-10-23 0:22 ` Linus Torvalds
2020-10-23 0:22 ` [LTP] " Linus Torvalds
2020-10-23 1:36 ` Daniel Díaz
2020-10-23 1:36 ` Daniel =?unknown-8bit?q?D=C3=ADaz?=
2020-10-23 3:05 ` Linus Torvalds
2020-10-23 3:05 ` Linus Torvalds
2020-10-23 5:02 ` Sean Christopherson [this message]
2020-10-23 5:02 ` Sean Christopherson
2020-10-23 7:14 ` Rasmus Villemoes
2020-10-23 7:14 ` Rasmus Villemoes
2020-10-23 15:54 ` Linus Torvalds
2020-10-23 15:54 ` Linus Torvalds
2020-10-23 16:32 ` Linus Torvalds
2020-10-23 16:32 ` Linus Torvalds
2020-10-23 17:50 ` Naresh Kamboju
2020-10-23 17:50 ` Naresh Kamboju
2020-10-23 15:52 ` Linus Torvalds
2020-10-23 15:52 ` Linus Torvalds
2020-10-23 17:00 ` Naresh Kamboju
2020-10-23 17:00 ` Naresh Kamboju
2020-10-23 17:50 ` Linus Torvalds
2020-10-23 17:50 ` Linus Torvalds
2020-10-23 21:15 ` Song Liu
2020-10-23 21:15 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201023050214.GG23681@linux.intel.com \
--to=sean.j.christopherson@intel.com \
--cc=christian.brauner@ubuntu.com \
--cc=daniel.diaz@linaro.org \
--cc=dvyukov@google.com \
--cc=ebiederm@xmission.com \
--cc=geert@linux-m68k.org \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=linux-mm@kvack.org \
--cc=linux-next@vger.kernel.org \
--cc=lkft-triage@lists.linaro.org \
--cc=ltp@lists.linux.it \
--cc=mingo@redhat.com \
--cc=naresh.kamboju@linaro.org \
--cc=peterz@infradead.org \
--cc=sfr@canb.auug.org.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viresh.kumar@linaro.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=zenglg.jy@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.