From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756221AbeASSzM (ORCPT ); Fri, 19 Jan 2018 13:55:12 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:55684 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755830AbeASSzD (ORCPT ); Fri, 19 Jan 2018 13:55:03 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 8903860376 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=pkondeti@codeaurora.org Date: Sat, 20 Jan 2018 00:24:55 +0530 From: Pavan Kondeti To: Steven Rostedt Cc: williams@redhat.com, Ingo Molnar , LKML , Peter Zijlstra , Thomas Gleixner , bristot@redhat.com, jkacur@redhat.com, efault@gmx.de, hpa@zytor.com, torvalds@linux-foundation.org, swood@redhat.com, linux-tip-commits@vger.kernel.org Subject: Re: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic Message-ID: <20180119185455.GB6563@codeaurora.org> References: <20170424114732.1aac6dc4@gandalf.local.home> <20180119100353.7f9f5154@gandalf.local.home> <20180119174617.GA6563@codeaurora.org> <20180119131121.22dac3d3@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180119131121.22dac3d3@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 19, 2018 at 01:11:21PM -0500, Steven Rostedt wrote: > On Fri, 19 Jan 2018 23:16:17 +0530 > Pavan Kondeti wrote: > > > I am thinking of another problem because of the race between > > rto_push_irq_work_func() and rq_attach_root() where rq->rd is modified. > > > > Lets say, we cache the rq->rd here and queued the IRQ work on a remote > > CPU. In the mean time, the rq_attach_root() might drop all the references > > to this cached (old) rd and wants to free it. The rq->rd is freed in > > RCU-sched callback. If that remote CPU is in RCU quiescent state, the rq->rd > > can get freed before the IRQ work is executed. This results in the corruption > > of the remote CPU's IRQ work list. Right? > > > > Taking rq->lock in rto_push_irq_work_func() also does not help here. Probably > > we have to wait for the IRQ work to finish before freeing the older root domain > > in RCU-sched callback. > > I was wondering about this too. Yeah, it would require an RCU like > update. Once the rd was unreferenced, it would need to wait for the > irq works to to finish before freeing it. > > The easy way to do this is to simply up the refcount when sending the > domain. Something like this: > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index 862a513adca3..89a086ed2b16 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -1907,9 +1907,8 @@ static void push_rt_tasks(struct rq *rq) > * the rt_loop_next will cause the iterator to perform another scan. > * > */ > -static int rto_next_cpu(struct rq *rq) > +static int rto_next_cpu(struct root_domain *rd) > { > - struct root_domain *rd = rq->rd; > int next; > int cpu; > > @@ -1985,19 +1984,24 @@ static void tell_cpu_to_push(struct rq *rq) > * Otherwise it is finishing up and an ipi needs to be sent. > */ > if (rq->rd->rto_cpu < 0) > - cpu = rto_next_cpu(rq); > + cpu = rto_next_cpu(rq->rd); > > raw_spin_unlock(&rq->rd->rto_lock); > > rto_start_unlock(&rq->rd->rto_loop_start); > > - if (cpu >= 0) > + if (cpu >= 0) { > + /* Make sure the rd does not get freed while pushing */ > + sched_get_rd(rq->rd); > irq_work_queue_on(&rq->rd->rto_push_work, cpu); > + } > } Since this is covered by rq->lock, it is guaranteed that we increment the refcount on the older rd before RCU-sched callback is queued in rq_attach_root(). Either we keep older rd alive or use the updated rd. We are good here, I think. Thanks, Pavan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.