From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753343AbcERT5i (ORCPT ); Wed, 18 May 2016 15:57:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37466 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752101AbcERT5h (ORCPT ); Wed, 18 May 2016 15:57:37 -0400 Date: Wed, 18 May 2016 21:57:33 +0200 From: Oleg Nesterov To: Peter Zijlstra Cc: Kirill Tkhai , linux-kernel@vger.kernel.org, Ingo Molnar , Vladimir Davydov , Kirill Tkhai , Christoph Lameter Subject: Re: [PATCH 3/3] introduce task_rcu_dereference() Message-ID: <20160518195733.GA15914@redhat.com> References: <1413962231.19914.130.camel@tkhai> <20141027195339.GA11736@redhat.com> <20141027195446.GD11736@redhat.com> <20160518170218.GY3192@twins.programming.kicks-ass.net> <20160518182318.GA15661@redhat.com> <20160518191045.GP3193@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160518191045.GP3193@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 18 May 2016 19:57:36 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/18, Peter Zijlstra wrote: > > OK, something like so then? Yes thanks! Just one note, > +struct task_struct *task_rcu_dereference(struct task_struct **ptask) > +{ > + struct sighand_struct *sighand; > + struct task_struct *task; > + > + /* > + * We need to verify that release_task() was not called and thus > + * delayed_put_task_struct() can't run and drop the last reference > + * before rcu_read_unlock(). We check task->sighand != NULL, > + * but we can read the already freed and reused memory. > + */ > +retry: > + task = rcu_dereference(*ptask); > + if (!task) > + return NULL; > + > + probe_kernel_address(&task->sighand, sighand); OK. Then I'll re-send the patch which adds the probe_slab_address() helper on top of this change. We do not want __probe_kernel_read() if if CONFIG_DEBUG_PAGEALLOC=n. > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -1374,30 +1374,15 @@ static void task_numa_compare(struct task_numa_env *env, > int dist = env->dist; > bool assigned = false; > > - rcu_read_lock(); > - > - raw_spin_lock_irq(&dst_rq->lock); > - cur = dst_rq->curr; > - /* > - * No need to move the exiting task or idle task. > - */ > - if ((cur->flags & PF_EXITING) || is_idle_task(cur)) > - cur = NULL; > - else { > - /* > - * The task_struct must be protected here to protect the > - * p->numa_faults access in the task_weight since the > - * numa_faults could already be freed in the following path: > - * finish_task_switch() > - * --> put_task_struct() > - * --> __put_task_struct() > - * --> task_numa_free() > - */ > - get_task_struct(cur); > + cur = try_get_task_struct(&dst_rq->curr); Do we really want try_get_task_struct() here? How about the change below? To me it would be more clean to do get_task_struct() in task_numa_assign(), it clearly pairs with put_task_struct(best_task) and task_numa_compare() looks a bit simpler this way, no need to put_task_struct() if we nullify cur. What do you think? In any case I think the change in sched/fair.c should probably come as a separate patch, but this is up to you. Oleg. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 40748dc..8e7083e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1254,6 +1254,8 @@ static void task_numa_assign(struct task_numa_env *env, { if (env->best_task) put_task_struct(env->best_task); + if (p) + get_task_struct(p); env->best_task = p; env->best_imp = imp; @@ -1321,31 +1323,11 @@ static void task_numa_compare(struct task_numa_env *env, long imp = env->p->numa_group ? groupimp : taskimp; long moveimp = imp; int dist = env->dist; - bool assigned = false; rcu_read_lock(); - - raw_spin_lock_irq(&dst_rq->lock); - cur = dst_rq->curr; - /* - * No need to move the exiting task or idle task. - */ - if ((cur->flags & PF_EXITING) || is_idle_task(cur)) + cur = task_rcu_dereference(&dst_rq->curr); + if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur))) cur = NULL; - else { - /* - * The task_struct must be protected here to protect the - * p->numa_faults access in the task_weight since the - * numa_faults could already be freed in the following path: - * finish_task_switch() - * --> put_task_struct() - * --> __put_task_struct() - * --> task_numa_free() - */ - get_task_struct(cur); - } - - raw_spin_unlock_irq(&dst_rq->lock); /* * Because we have preemption enabled we can get migrated around and @@ -1428,7 +1410,6 @@ balance: */ if (!load_too_imbalanced(src_load, dst_load, env)) { imp = moveimp - 1; - put_task_struct(cur); cur = NULL; goto assign; } @@ -1454,16 +1435,9 @@ balance: env->dst_cpu = select_idle_sibling(env->p, env->dst_cpu); assign: - assigned = true; task_numa_assign(env, cur, imp); unlock: rcu_read_unlock(); - /* - * The dst_rq->curr isn't assigned. The protection for task_struct is - * finished. - */ - if (cur && !assigned) - put_task_struct(cur); } static void task_numa_find_cpu(struct task_numa_env *env,