From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zefan Li Subject: Re: [PATCH cgroup/for-4.4] cgroup: fix race condition around termination check in css_task_iter_next() Date: Thu, 29 Oct 2015 09:53:36 +0800 Message-ID: <56317C20.80107@huawei.com> References: <20151027084504.GB8783@mtj.duckdns.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20151027084504.GB8783-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Tejun Heo , Johannes Weiner Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Calvin Owens , kernel-team-b10kYP2dOMg@public.gmane.org On 2015/10/27 16:45, Tejun Heo wrote: > css_task_iter_next() checked @it->cur_task before grabbing > css_set_lock and assumed that the result won't change afterwards; > however, tasks could leave the cgroup being iterated terminating the > iterator before css_task_lock is acquired. If this happens, > css_task_iter_next() tries to calculate the current task from NULL > cg_list pointer leading to the following oops. > > BUG: unable to handle kernel paging request at fffffffffffff7d0 > IP: [] css_task_iter_next+0x42/0x80 > ... > CPU: 4 PID: 6391 Comm: JobQDisp2 Not tainted 4.0.9-22_fbk4_rc3_81616_ge8d9cb6 #1 > Hardware name: Quanta Freedom/Winterfell, BIOS F03_3B08 03/04/2014 > task: ffff880868e46400 ti: ffff88083404c000 task.ti: ffff88083404c000 > RIP: 0010:[] [] css_task_iter_next+0x42/0x80 > RSP: 0018:ffff88083404fd28 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff88083404fd68 RCX: ffff8804697fb8b0 > RDX: fffffffffffff7c0 RSI: ffff8803b7dff800 RDI: ffffffff822c0278 > RBP: ffff88083404fd38 R08: 0000000000017160 R09: ffff88046f4070c0 > R10: ffffffff810d61f7 R11: 0000000000000293 R12: ffff880863bf8400 > R13: ffff88046b87fd80 R14: 0000000000000000 R15: ffff88083404fe58 > FS: 00007fa0567e2700(0000) GS:ffff88046f900000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: fffffffffffff7d0 CR3: 0000000469568000 CR4: 00000000001406e0 > Stack: > 0000000000000246 0000000000000000 ffff88083404fde8 ffffffff810d6248 > ffff88083404fd68 0000000000000000 ffff8803b7dff800 000001ef000001ee > 0000000000000000 0000000000000000 ffff880863bf8568 0000000000000000 > Call Trace: > [] cgroup_pidlist_start+0x258/0x550 > [] cgroup_seqfile_start+0x1d/0x20 > [] kernfs_seq_start+0x5f/0xa0 > [] seq_read+0x166/0x380 > [] kernfs_fop_read+0x11d/0x180 > [] __vfs_read+0x18/0x50 > [] vfs_read+0x8d/0x150 > [] SyS_read+0x4f/0xb0 > [] system_call_fastpath+0x12/0x17 > > Fix it by moving the termination condition check inside css_set_lock. > @it->cur_task is now cleared after being put and @it->task_pos is > tested for termination instead of @it->cset_pos as they indicate the > same condition and @it->task_pos is what's being dereferenced. > > Signed-off-by: Tejun Heo > Reported-by: Calvin Owens > Fixes: ed27b9f7a17d ("cgroup: don't hold css_set_rwsem across css task iteration") Acked-by: Zefan Li From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965005AbbJ2ByM (ORCPT ); Wed, 28 Oct 2015 21:54:12 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:3484 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750911AbbJ2ByK (ORCPT ); Wed, 28 Oct 2015 21:54:10 -0400 Subject: Re: [PATCH cgroup/for-4.4] cgroup: fix race condition around termination check in css_task_iter_next() To: Tejun Heo , Johannes Weiner References: <20151027084504.GB8783@mtj.duckdns.org> CC: , , Calvin Owens , From: Zefan Li Message-ID: <56317C20.80107@huawei.com> Date: Thu, 29 Oct 2015 09:53:36 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20151027084504.GB8783@mtj.duckdns.org> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.19.236] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/10/27 16:45, Tejun Heo wrote: > css_task_iter_next() checked @it->cur_task before grabbing > css_set_lock and assumed that the result won't change afterwards; > however, tasks could leave the cgroup being iterated terminating the > iterator before css_task_lock is acquired. If this happens, > css_task_iter_next() tries to calculate the current task from NULL > cg_list pointer leading to the following oops. > > BUG: unable to handle kernel paging request at fffffffffffff7d0 > IP: [] css_task_iter_next+0x42/0x80 > ... > CPU: 4 PID: 6391 Comm: JobQDisp2 Not tainted 4.0.9-22_fbk4_rc3_81616_ge8d9cb6 #1 > Hardware name: Quanta Freedom/Winterfell, BIOS F03_3B08 03/04/2014 > task: ffff880868e46400 ti: ffff88083404c000 task.ti: ffff88083404c000 > RIP: 0010:[] [] css_task_iter_next+0x42/0x80 > RSP: 0018:ffff88083404fd28 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff88083404fd68 RCX: ffff8804697fb8b0 > RDX: fffffffffffff7c0 RSI: ffff8803b7dff800 RDI: ffffffff822c0278 > RBP: ffff88083404fd38 R08: 0000000000017160 R09: ffff88046f4070c0 > R10: ffffffff810d61f7 R11: 0000000000000293 R12: ffff880863bf8400 > R13: ffff88046b87fd80 R14: 0000000000000000 R15: ffff88083404fe58 > FS: 00007fa0567e2700(0000) GS:ffff88046f900000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: fffffffffffff7d0 CR3: 0000000469568000 CR4: 00000000001406e0 > Stack: > 0000000000000246 0000000000000000 ffff88083404fde8 ffffffff810d6248 > ffff88083404fd68 0000000000000000 ffff8803b7dff800 000001ef000001ee > 0000000000000000 0000000000000000 ffff880863bf8568 0000000000000000 > Call Trace: > [] cgroup_pidlist_start+0x258/0x550 > [] cgroup_seqfile_start+0x1d/0x20 > [] kernfs_seq_start+0x5f/0xa0 > [] seq_read+0x166/0x380 > [] kernfs_fop_read+0x11d/0x180 > [] __vfs_read+0x18/0x50 > [] vfs_read+0x8d/0x150 > [] SyS_read+0x4f/0xb0 > [] system_call_fastpath+0x12/0x17 > > Fix it by moving the termination condition check inside css_set_lock. > @it->cur_task is now cleared after being put and @it->task_pos is > tested for termination instead of @it->cset_pos as they indicate the > same condition and @it->task_pos is what's being dereferenced. > > Signed-off-by: Tejun Heo > Reported-by: Calvin Owens > Fixes: ed27b9f7a17d ("cgroup: don't hold css_set_rwsem across css task iteration") Acked-by: Zefan Li