All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	syzkaller-bugs@googlegroups.com,
	Gargi Sharma <gs051095@gmail.com>,
	Alexey Dobriyan <adobriyan@gmail.com>
Subject: Re: proc_flush_task oops
Date: Wed, 20 Dec 2017 12:25:52 -0600	[thread overview]
Message-ID: <871sjp1cjz.fsf@xmission.com> (raw)
In-Reply-To: <20171220052803.GA17079@codemonkey.org.uk> (Dave Jones's message of "Wed, 20 Dec 2017 00:28:03 -0500")

Dave Jones <davej@codemonkey.org.uk> writes:

> On Tue, Dec 19, 2017 at 07:54:24PM -0600, Eric W. Biederman wrote:
>
>  > > *Scratches my head*  I am not seeing anything obvious.
>  > 
>  > Can you try this patch as you reproduce this issue?
>  > 
>  > diff --git a/kernel/pid.c b/kernel/pid.c
>  > index b13b624e2c49..df9e5d4d8f83 100644
>  > --- a/kernel/pid.c
>  > +++ b/kernel/pid.c
>  > @@ -210,6 +210,7 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>  >                 goto out_unlock;
>  >         for ( ; upid >= pid->numbers; --upid) {
>  >                 /* Make the PID visible to find_pid_ns. */
>  > +               WARN_ON(!upid->ns->proc_mnt);
>  >                 idr_replace(&upid->ns->idr, pid, upid->nr);
>  >                 upid->ns->pid_allocated++;
>  >         }
>  > 
>  > 
>  > If the warning triggers it means the bug is in alloc_pid and somehow
>  > something has gotten past the is_child_reaper check.
>
> You're onto something.
>
> WARNING: CPU: 1 PID: 12020 at kernel/pid.c:213 alloc_pid+0x230/0x280
> CPU: 1 PID: 12020 Comm: trinity-c29 Not tainted 4.15.0-rc4-think+ #3 
> RIP: 0010:alloc_pid+0x230/0x280
> RSP: 0018:ffffc90009977d48 EFLAGS: 00010046
> RAX: 0000000000000030 RBX: ffff8804fb431280 RCX: 8f5c28f5c28f5c29
> RDX: ffff88050a00de40 RSI: ffffffff82005218 RDI: ffff8804fc6aa9a8
> RBP: ffff8804fb431270 R08: 0000000000000000 R09: 0000000000000001
> R10: ffffc90009977cc0 R11: eab94e31da7171b7 R12: ffff8804fb431260
> R13: ffff8804fb431240 R14: ffffffff82005200 R15: ffff8804fb431268
> FS:  00007f49b9065700(0000) GS:ffff88050a000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f49b906a000 CR3: 00000004f7446001 CR4: 00000000001606e0
> DR0: 00007f0b4c405000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Call Trace:
>  copy_process.part.41+0x14fa/0x1e30
>  _do_fork+0xe7/0x720
>  ? rcu_read_lock_sched_held+0x6c/0x80
>  ? syscall_trace_enter+0x2d7/0x340
>  do_syscall_64+0x60/0x210
>  entry_SYSCALL64_slow_path+0x25/0x25
>
> followed immediately by...
>
> Oops: 0000 [#1] SMP
> CPU: 1 PID: 12020 Comm: trinity-c29 Tainted: G        W        4.15.0-rc4-think+ #3 
> RIP: 0010:proc_flush_task+0x8e/0x1b0
> RSP: 0018:ffffc90009977c40 EFLAGS: 00010286
> RAX: 0000000000000001 RBX: 0000000000000001 RCX: 00000000fffffffb
> RDX: 0000000000000000 RSI: ffffc90009977c50 RDI: 0000000000000000
> RBP: ffffc90009977c63 R08: 0000000000000000 R09: 0000000000000002
> R10: ffffc90009977b70 R11: ffffc90009977c64 R12: 0000000000000004
> R13: 0000000000000000 R14: 0000000000000004 R15: ffff8804fb431240
> FS:  00007f49b9065700(0000) GS:ffff88050a000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000004f7446001 CR4: 00000000001606e0
> DR0: 00007f0b4c405000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Call Trace:
>  ? release_task+0xaf/0x680
>  release_task+0xd2/0x680
>  ? wait_consider_task+0xb82/0xce0
>  wait_consider_task+0xbe9/0xce0
>  ? do_wait+0xe1/0x330
>  do_wait+0x151/0x330
>  kernel_wait4+0x8d/0x150
>  ? task_stopped_code+0x50/0x50
>  SYSC_wait4+0x95/0xa0
>  ? rcu_read_lock_sched_held+0x6c/0x80
>  ? syscall_trace_enter+0x2d7/0x340
>  ? do_syscall_64+0x60/0x210
>  do_syscall_64+0x60/0x210
>  entry_SYSCALL64_slow_path+0x25/0x25

I am not seeing where things go wrong, but that puts the recent pid bitmap, bit
hash to idr change in the suspect zone.

Can you try reverting that change:

e8cfbc245e24 ("pid: remove pidhash")
95846ecf9dac ("pid: replace pid bitmap implementation with IDR API")

While keeping the warning in place so we can see if this fixes the
allocation problem?

Eric

  reply	other threads:[~2017-12-20 18:26 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-18 21:44 proc_flush_task oops Dave Jones
2017-12-18 22:15 ` Al Viro
2017-12-18 23:10   ` Dave Jones
2017-12-18 23:50     ` Linus Torvalds
2017-12-19  1:22       ` Dave Jones
2017-12-19  3:39       ` Dave Jones
2017-12-19 10:49         ` Tetsuo Handa
2017-12-19 18:25           ` Eric W. Biederman
2017-12-19 18:27         ` Eric W. Biederman
2017-12-19 19:30           ` Dave Jones
2017-12-19 21:44             ` Eric W. Biederman
2017-12-20  1:54               ` Eric W. Biederman
2017-12-20  5:28                 ` Dave Jones
2017-12-20 18:25                   ` Eric W. Biederman [this message]
2017-12-21  3:16                     ` Dave Jones
2017-12-21  8:26                       ` Eric W. Biederman
2017-12-21 10:38                         ` Alexey Dobriyan
2017-12-21 14:25                           ` Dave Jones
2017-12-21 16:41                             ` Eric W. Biederman
2017-12-21 22:00                           ` Dave Jones
2017-12-22  1:31                             ` Eric W. Biederman
2017-12-22  3:35                               ` Dave Jones
2017-12-22  7:58                                 ` Eric W. Biederman
2017-12-22 10:13                                   ` Alexey Dobriyan
2017-12-22 14:41                                     ` Eric W. Biederman
2017-12-22 16:11                                       ` [TEST PATCH] pid: fix allocating pid 2 for init (was Re: proc_flush_task oops) Alexey Dobriyan
2017-12-24  3:12                                         ` Eric W. Biederman
2017-12-24  3:16                                           ` [PATCH] pid: Handle failure to allocate the first pid in a pid namespace Eric W. Biederman
2017-12-20  8:00                 ` proc_flush_task oops Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871sjp1cjz.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=adobriyan@gmail.com \
    --cc=davej@codemonkey.org.uk \
    --cc=gs051095@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.