All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sasha.levin@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Dave Jones <davej@redhat.com>, Michal Hocko <mhocko@suse.cz>,
	Borislav Petkov <bp@alien8.de>,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: Hang on large copy_from_user with PREEMPT_NONE
Date: Mon, 06 Apr 2015 15:08:24 -0400	[thread overview]
Message-ID: <5522D9A8.5080104@oracle.com> (raw)
In-Reply-To: <CA+55aFztF9D1yK5zbbG9PiQufQU+6BX-hc4toDS_MHx4ieeYqQ@mail.gmail.com>

On 04/06/2015 01:26 PM, Linus Torvalds wrote:
> On Sun, Apr 5, 2015 at 8:59 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>>
>> This is the result of getting copy_user_handle_tail to zero out a large block of
>> kernel memory very inefficiently:
> 
> Ugh.
> 
> Normally we should be able to just do
> 
>         if (zerorest)
>                 memset(to, 0, len);
> 
> and be done with it.
> 
> The only reason we don't do that actually looks like a buglet in
> 'copy_in_user()', which can have a source fault but should *not*
> necessarily try to clear the rest of the destination buffer. But it
> uses the shared "copy_user_generic()" logic, so it doesn't even
> realize that.
> 
> I call it a "buglet", because there's not necessarily anything
> horribly wrong with clearing the tail, it's just completely wasted
> work. And it makes the "oops, source is bad" case unnecessarily
> horribly slow.
> 
> In fact, the whole "zerorest" thing is garbage, I think. The
> copy_user_generic() code seems to always set it to just 'len', and
> it's because it doesn't even know or care about the actual direction.
> 
> The *real* test should just be "is the destination a kernel space
> buffer" (we have done the "access_ok()" things independently). And
> that test we can do without any 'zerorest' parameter.
> 
> So the attached (untested) patch should
> 
>  (a) remove the pointless 'zerorest' parameter
> 
>  (b) fix the 'copy_in_user()' buglet
> 
>  (c) make the kernel destination case be much more efficient with just
> a simple 'memset()'
> 
> Hmm? Comments? Sasha, do you have the initial random number state to
> recreate this easily?

Your patch just makes it hang in memset instead:

[  963.104556] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [trinity-c224:9845]
[  963.104556] Modules linked in:
[  963.104556] irq event stamp: 3773324
[  963.104556] hardirqs last  enabled at (3773323): [<ffffffffb7af8e3c>] restore_args+0x0/0x34
[  963.104556] hardirqs last disabled at (3773324): [<ffffffffb7af919e>] apic_timer_interrupt+0x6e/0x80
[  963.104556] softirqs last  enabled at (3773322): [<ffffffffad1f6589>] __do_softirq+0x709/0xd40
[  963.104556] softirqs last disabled at (3773317): [<ffffffffad1f746d>] irq_exit+0x29d/0x320
[  963.104556] CPU: 1 PID: 9845 Comm: trinity-c224 Not tainted 4.0.0-rc6-next-20150402-sasha-00039-ge0bdae3-dirty #2130
[  963.104556] task: ffff8802a3560000 ti: ffff8802a3568000 task.ti: ffff8802a3568000
[  963.104556] RIP: 0010:[<ffffffffaefbd173>]  [<ffffffffaefbd173>] memset_orig+0x33/0xb0
[  963.104556] RSP: 0000:ffff8802a356fdf0  EFLAGS: 00010212
[  963.104556] RAX: 0000000000000000 RBX: 000000007e777000 RCX: 000000000042f8bf
[  963.104556] RDX: 000000007e777000 RSI: 0000000000000000 RDI: ffffc9008f766000
[  963.104556] RBP: ffff8802a356fe18 R08: 000000000fceee00 R09: 0000000000000000
[  963.104556] R10: ffffc90021bd2000 R11: fffff52014069200 R12: 000000000fceee00
[  963.104556] R13: fffff520136892d8 R14: 000000000fceee00 R15: fffff520140691ff
[  963.104556] FS:  00007fd40c4fe700(0000) GS:ffff8800a2800000(0000) knlGS:0000000000000000
[  963.104556] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  963.104556] CR2: 00007fd40c018000 CR3: 00000002a351a000 CR4: 00000000000007a0
[  963.104556] DR0: 00007fd409b8f000 DR1: 0000000000000000 DR2: 0000000000000000
[  963.104556] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[  963.104556] Stack:
[  963.104556]  ffffffffad63b4ac ffffffffb7afd69c 000000007e777000 ffff8802a356fe78
[  963.104556]  ffff8802a356ff18 ffff8802a356fe38 ffffffffaefbd759 0000000000010206
[  963.104556]  1ffff100546adfcb ffff8802a356ff48 ffffffffad3c5ef0 ffffc90020b49000
[  963.104556] Call Trace:
[  963.104556]  [<ffffffffad63b4ac>] ? memset+0x2c/0x40
[  963.104556]  [<ffffffffb7afd69c>] ? bad_to_user+0x66/0x1391
[  963.104556]  [<ffffffffaefbd759>] copy_user_handle_tail+0x69/0x80
[  963.104556]  [<ffffffffad3c5ef0>] SyS_init_module+0x150/0x210
[  963.104556]  [<ffffffffad3c5da0>] ? load_module+0x96a0/0x96a0
[  963.104556]  [<ffffffffaefbd3b8>] ? trace_hardirqs_on_thunk+0x17/0x19
[  963.104556]  [<ffffffffb7af83b6>] system_call_fastpath+0x16/0x84
[  963.104556] Code: b8 01 01 01 01 01 01 01 01 48 0f af c1 41 89 f9 41 83 e1 07 75 70 48 89 d1 48 c1 e9 06 74 39 66 0f 1f 84 00 00 00 00 00 48 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89

It's easy to reproduce it with trinity by adding -cinit_module, no need to touch the seed.


Thanks,
Sasha

  reply	other threads:[~2015-04-06 19:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-06  3:59 Hang on large copy_from_user with PREEMPT_NONE Sasha Levin
2015-04-06 11:24 ` Borislav Petkov
2015-04-06 14:51   ` Sasha Levin
2015-04-06 16:52     ` Borislav Petkov
2015-04-06 17:26 ` Linus Torvalds
2015-04-06 19:08   ` Sasha Levin [this message]
2015-04-06 19:36     ` Borislav Petkov
2015-04-06 23:45       ` Sasha Levin
2015-04-06 20:42     ` Linus Torvalds
2015-04-07  1:49       ` Rusty Russell
2015-04-07  9:31       ` Ingo Molnar
2015-04-07 10:39         ` Borislav Petkov
2015-04-07 11:05           ` Ingo Molnar
2015-04-07 14:30             ` Michal Hocko
2015-04-07 14:37               ` Ingo Molnar
2015-04-07 17:00                 ` Sasha Levin
2015-04-07 17:33                   ` Linus Torvalds
2015-04-07 17:58                     ` Dave Jones
     [not found]                       ` <CA+55aFyxCb9aDfh0L4gyvHMSefOFoD7zftRpWbnvf5j9iZVaMw@mail.gmail.com>
2015-04-07 21:09                         ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5522D9A8.5080104@oracle.com \
    --to=sasha.levin@oracle.com \
    --cc=bp@alien8.de \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.