From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57C8C3C1F52 for ; Tue, 12 May 2026 15:34:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778600068; cv=none; b=pBlYeQ+e4/S05f7REZ+wKjcc6i1G4qF9OCf0HONR793RhuRlzihOOyr6z+x0lYcN17kKt2jhEb/5Zpfcup7x1bAdptWZyhtlgnhEANQo48SV16KaZNmnRAVp+KgeF31bYnejezv+tMowsxXAH74ElcYzHMch0PwAtYAmSvSRjS8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778600068; c=relaxed/simple; bh=N7Z2uCYWcaIQFmOFPvu1UGHHhSQ4t1Yo75t7JiJquDY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=KkyjscaovJKurG1g0rvq3tVTYanTMaBFhSfMn1fN4DV+PRF1eRYqBubDi5vIX52V5H9D+jk5N9LUX/5/4uYTYEvVNSqoSpmVueBa1GCnas77nd1e9Oyq6U8zCftKB3zM9KhRdFsHHePF2EhFzT2ckyF0sy4yb+azrqTKAy5AjCs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NDLeXCNd; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=riO0ytSm; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NDLeXCNd"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="riO0ytSm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778600066; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/FTCSZYNPAu7ulewpDi0mseRAbsQTkou3usyTjEF8Ic=; b=NDLeXCNdRFosLx9n1fsziWLbKxRwwrm92ar0P7s9B4Y6/CH8vJ99fw0EgPAPWgmMTdSI31 MdP2fktgheKzEPLgMj6rJq38674PFBZRXkPCO6v2H95Svv3tmatFLWoPsnUnDCl0vYWyzI 3QiLHzVf+wfCaUgoVAZnluzM29QiBug= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-627-o7s0QnQbO-ybyKOF1L_Z3Q-1; Tue, 12 May 2026 11:34:24 -0400 X-MC-Unique: o7s0QnQbO-ybyKOF1L_Z3Q-1 X-Mimecast-MFC-AGG-ID: o7s0QnQbO-ybyKOF1L_Z3Q_1778600064 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-44ffa15dc8cso3586027f8f.1 for ; Tue, 12 May 2026 08:34:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1778600064; x=1779204864; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/FTCSZYNPAu7ulewpDi0mseRAbsQTkou3usyTjEF8Ic=; b=riO0ytSmKpUTO8F82pd96+dqPonCZu73+f7GOmdgHFkCh/xo6O/gyQhZYa8xkx9NnA 5YcaY/QZd3oetPpiJ61UEPN/tDgf2QYkNha85Tk5DX8BThcifcyt5wtkkfemmid1TnkF 2ts04todKGnVMJymUKxKEyLEVKRVvpB3rwMUAGXs/SPViggFenPIIB9jzbISdpQdMwRh Y50S9ZUwRAWfyWvIbVIb8LPFBRW85mySyOOUU9m5626nxhsT9tTwzMzlkwlR6dg5ejdz cSylOjodyff32SpVsniRV5dOW32j4wF8lZm4R2YNUDXTXRS3uCRlgUtS0pL5PugF30xX BCGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778600064; x=1779204864; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/FTCSZYNPAu7ulewpDi0mseRAbsQTkou3usyTjEF8Ic=; b=NzujYOAB7ZD8eU6mmCW6d5yO3PisceKc2t8KUQaHKLhPz73TLEsEl245RO6U/igI2B UFezpHM6iOteyMyklXvw4zAntKQvSXJOnjCPyP6OnQq/VJ5lfNbaJrPNL5c8HxiMjuU/ Lu+7JMNveekZvz0RHWVzGMoVF0D0hk2V3zrC2ARE18k8Mmqgqz2zGVwffVt36WakpRKc +ckbWmB/1reDSzE1HSYRWqq7oE4F/HI74l9uuY9tv4biXxUg6WRxA1DPzj5SpJovXrlA DcRGbXT3byt0JzHV1543qogh4oTE/GIo+xfpeNjGhrGRDMZwcOXA3M7WW9ZN9bKiuRGZ q/4g== X-Forwarded-Encrypted: i=1; AFNElJ+ByYNITM0VgOkXZ9Cn6/10IZYWqvPTqGR3k8Tb3AAcOw4HOv1CqHxSZICctvSZOaOSa5cC+JsTIuTGosY=@vger.kernel.org X-Gm-Message-State: AOJu0YzpocfYzG0l7234NFRGTX+yvLksfGdshiZ40N1Z6iOJ2jKMuOa8 tdcdMMrTGnEXEj6iGu3wyKLPLCW1VlX0b0U3tlVjwLlYDBHMJpuDuKWBlaNqAQcYPgcblzRW5U/ uSxq3UyJegMZ4j8YrOvdAZm2Fsl8BJICn3A2GJcUtoIkI3tP3pypUpaRkLHFqUm23Gw== X-Gm-Gg: Acq92OH+HcNXQ0Os3hylIydcim3vr4pEBXyk1IihDnJlyt3+V+3Q+mMqg9UIy34gBkr Wr6QBZlgYCgaA7vP7oXOlzHqBPLtqw/ut+6xLhLRoj1Tywslu1pNL/mHMd3sY1dPKAaZzht1h+6 p059gqAwuZ3G3UkqJVuWB0bhDrS8lqqF11XCkCPQaVfTIsUnS7rCgV7H4FELz0OpSrfY2oVtWPc M0PPFuT8I4NSaKRCx+/zJQaWppej0Jr/0RcDn+H0HBkd8rFIz6VVbqVpnr6WQt+aICcrCs05qlu ojG/vc3iJM2JsvpO8O8siN89Tv2M+quQGaw0Wh85EcgWlPvNWJOcsntS2IW+kAKTJwz+SE78GMY jczHGzmMFLvKNTinSAfwUwf8xVAt7XU1UpweDgGY9SWANK7WyGkYm X-Received: by 2002:a05:600c:8b6b:b0:48a:7f90:2231 with SMTP id 5b1f17b1804b1-48e8fe7c959mr52987075e9.19.1778600063631; Tue, 12 May 2026 08:34:23 -0700 (PDT) X-Received: by 2002:a05:600c:8b6b:b0:48a:7f90:2231 with SMTP id 5b1f17b1804b1-48e8fe7c959mr52986435e9.19.1778600062755; Tue, 12 May 2026 08:34:22 -0700 (PDT) Received: from jlelli-thinkpadt14gen4.remote.csb ([151.29.56.132]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48e8e5efe57sm35185705e9.2.2026.05.12.08.34.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2026 08:34:21 -0700 (PDT) Date: Tue, 12 May 2026 17:34:19 +0200 From: Juri Lelli To: Andrea Righi Cc: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , Frederic Weisbecker , linux-kernel@vger.kernel.org, David Haufe , Cao Ruichuang Subject: Re: [PATCH] sched/deadline: Make dl-server nohz full aware Message-ID: References: <20260512-upstream-fix-dlserver-nohzfull-b4-v1-1-a94844387ae7@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Hi Andrea, On 12/05/26 16:55, Andrea Righi wrote: > Hi Juri, Thanks from the quick review! > On Tue, May 12, 2026 at 11:02:37AM +0200, Juri Lelli wrote: > > The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking > > isolation guarantees. The timer executes on a housekeeping core and > > eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores > > even when only a single task is running. > > > > The problem is that dl-servers are not coordinated with nohz_full tick > > state. Timers can fire and send IPIs to otherwise undisturbed cores. > > > > Fix by managing servers in sched_can_stop_tick(): > > > > - When RT tasks run with CFS/SCX tasks, start the appropriate server > > and keep the tick running > > - When only RT tasks remain, stop all servers and allow tick to stop > > (except for >1 RR tasks which need the tick for round-robin) > > - When only CFS/SCX tasks remain, stop all servers before stopping tick > > > > Introduce dl_servers_stop_all() to reduce duplication and abstract > > server management from core.c. Unify RT handling into one block that > > handles both RR and FIFO cases. > > > > Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server") > > Reported-by: David Haufe > > Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com > > Signed-off-by: Juri Lelli > > --- > > I had to modify my first original attempt at fixing this (please take a > > look at the linked report/discussion) to also take SCX into > > consideration. > > As mentioned by Frederic, we don't allow to load BPF schedulers when isolcpus= > is used, so I think we can simplify the sched_can_stop_tick() part. Right! Thanks for confirming. > > > > FYI, I temporarily pushed the script I'm using to repro and verify the > > fix here > > > > https://github.com/jlelli/sched-deadline-tests/blob/master/test-dlserver-nohz.sh > > --- > > kernel/sched/core.c | 43 +++++++++++++++++++++++-------------------- > > kernel/sched/deadline.c | 14 ++++++++++++++ > > kernel/sched/sched.h | 1 + > > 3 files changed, 38 insertions(+), 20 deletions(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index b905805bbcbe4..98759255c306b 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -1414,30 +1414,35 @@ static inline bool __need_bw_check(struct rq *rq, struct task_struct *p) > > > > bool sched_can_stop_tick(struct rq *rq) > > { > > - int fifo_nr_running; > > - > > /* Deadline tasks, even if single, need the tick */ > > if (rq->dl.dl_nr_running) > > return false; > > > > /* > > - * If there are more than one RR tasks, we need the tick to affect the > > - * actual RR behaviour. > > + * If there are RT tasks, we may need the tick (for >1 RR tasks), > > + * but we must also service lower-priority CFS/SCX tasks via dl-servers. > > No need to mention SCX, maybe we can add a note that SCX is incompatible with > isolcpus, so there's no SCX task to run here. Ack. > > > */ > > - if (rq->rt.rr_nr_running) { > > - if (rq->rt.rr_nr_running == 1) > > - return true; > > - else > > + if (rq->rt.rt_nr_running) { > > + if (rq->cfs.h_nr_queued) { > > + dl_server_start(&rq->fair_server); > > + return false; > > + } > > +#ifdef CONFIG_SCHED_CLASS_EXT > > + if (rq->scx.nr_running) { > > + dl_server_start(&rq->ext_server); > > + return false; > > + } > > +#endif > > This #ifdef block can go away. > > > + /* > > + * Only RT tasks, no CFS/SCX. Stop servers to prevent spurious > > CFS/SCX -> CFS. > > > + * wakeups. Tick can stop for single RR or any FIFO, but must > > + * run for multiple RR (round-robin behavior). > > + */ > > + dl_servers_stop_all(rq); > > + if (rq->rt.rr_nr_running > 1) > > return false; > > - } > > - > > - /* > > - * If there's no RR tasks, but FIFO tasks, we can skip the tick, no > > - * forced preemption between FIFO tasks. > > - */ > > - fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running; > > - if (fifo_nr_running) > > return true; > > + } > > > > /* > > * If there are no DL,RR/FIFO tasks, there must only be CFS or SCX tasks > > @@ -1462,6 +1467,7 @@ bool sched_can_stop_tick(struct rq *rq) > > return false; > > } > > > > + dl_servers_stop_all(rq); > > return true; > > } > > #endif /* CONFIG_NO_HZ_FULL */ > > @@ -8810,10 +8816,7 @@ int sched_cpu_dying(unsigned int cpu) > > WARN(true, "Dying CPU not properly vacated!"); > > dump_rq_tasks(rq, KERN_WARNING); > > } > > - dl_server_stop(&rq->fair_server); > > -#ifdef CONFIG_SCHED_CLASS_EXT > > - dl_server_stop(&rq->ext_server); > > -#endif > > + dl_servers_stop_all(rq); > > rq_unlock_irqrestore(rq, &rf); > > > > calc_load_migrate(rq); > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > > index edca7849b165d..c2b3d6bbe4828 100644 > > --- a/kernel/sched/deadline.c > > +++ b/kernel/sched/deadline.c > > @@ -1826,6 +1826,20 @@ void dl_server_stop(struct sched_dl_entity *dl_se) > > dl_se->dl_server_active = 0; > > } > > > > +/* > > + * Stop all dl-servers on this runqueue. Called when transitioning to a state > > + * where the tick can be stopped (e.g., single RR/FIFO task, or no RT tasks). > > + * This ensures server timers are disarmed and won't cause spurious wakeups on > > + * nohz_full isolated cores. > > + */ > > +void dl_servers_stop_all(struct rq *rq) > > +{ > > + dl_server_stop(&rq->fair_server); > > +#ifdef CONFIG_SCHED_CLASS_EXT > > + dl_server_stop(&rq->ext_server); > > +#endif > > +} > > And I think the dl_servers_stop_all() helper still makes sense, stopping the > ext_server is still needed in sched_cpu_dying() and calling dl_server_stop() on > an already-inactive server is harmless in the no-RT path. And ack to all the above. Will send out a v2 soon. Best, Juri