linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression
@ 2025-07-30 16:51 David Haufe
  2025-07-31  7:01 ` Juri Lelli
  0 siblings, 1 reply; 8+ messages in thread
From: David Haufe @ 2025-07-30 16:51 UTC (permalink / raw)
  To: Juri Lelli; +Cc: linux-kernel

[1.] Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call
Interrupts on isolcpu/nohz_full cores, performance regression
[2.] The code for dl_server_timer is causing new IPI/Function Call
Interrupts to fire on isolcpu/nohz_full cores which previously had no
interrupts. When there is a single, SCHED_OTHER process running on an
isolcpu/nohz_full core, dl_server_timer executes on a housekeeping
core. This ultimately invokes add_nr_running() and
sched_update_tick_dependency() and finally tick_nohz_dep_set_cpu().
Setting the single process running on an isolcpu/nohz_full core to
FIFO (rt priority) prevents this new interrupt, as it is not seen as a
fair schedule process anymore. Having to use rt priority is
unnecessary and a regression to prior kernels. Kernel function_graph
trace below showing core 0 (housekeeping) sending the IPI to core 19
(nohz_full, isolcpu, rcu_nocb_poll) which is running a single
SCHED_OTHER process. I believe this has been observed by others.
https://community.clearlinux.org/t/sysjitter-worse-in-kernel-6-12-than-6-6/10206

dl_server_timer() {
  0)               |    raw_spin_rq_lock_nested() {
  0)               |      _raw_spin_lock() {
  0)   0.051 us    |        preempt_count_add();
  0)   0.160 us    |      }
  0)   0.357 us    |    }
  0)               |    update_rq_clock() {
  0)   0.115 us    |      arch_scale_cpu_capacity();
  0)   0.498 us    |    }
  0)   0.222 us    |    fair_server_has_tasks();
  0)               |    enqueue_dl_entity() {
  0)   0.056 us    |      replenish_dl_entity();
  0)   0.050 us    |      sched_can_stop_tick();
  0)               |      tick_nohz_dep_set_cpu() {
  0)   0.051 us    |        preempt_count_add();
  0)               |        tick_nohz_full_kick_cpu() {
  0)   0.052 us    |          preempt_count_add();
  0)               |          __smp_call_single_queue() {
  0)               |            /* csd_queue_cpu: cpu=19
callsite=return_to_handler+0x0/0x40 func=nohz_full_kick_func
csd=000000001d10a127 */
  0)   0.178 us    |            call_function_single_prep_ipi();
  0)               |            /* ipi_send_cpu: cpu=19
callsite=return_to_handler+0x0/0x40
callback=generic_smp_call_function_single_interrupt+0x0/0x20 */
  0)               |            native_send_call_func_single_ipi() {
  0)               |              x2apic_send_IPI() {
  0)   0.084 us    |                __x2apic_send_IPI_dest();
  0)   0.203 us    |              }
  0)   0.298 us    |            }
  0)   0.713 us    |          }
  0)   0.053 us    |          preempt_count_sub();
  0)   1.080 us    |        }
  0)   0.052 us    |        preempt_count_sub();
  0)   1.586 us    |      }
  0)   0.237 us    |      cpupri_set();
  0)               |      cpudl_set() {
  0)               |        _raw_spin_lock_irqsave() {
  0)   0.053 us    |          preempt_count_add();
  0)   0.176 us    |        }
  0)   0.141 us    |        cpudl_heapify_up();
  0)               |        _raw_spin_unlock_irqrestore() {
  0)   0.052 us    |          preempt_count_sub();
  0)   0.158 us    |        }
  0)   0.725 us    |      }
  0)   3.014 us    |    }
  0)               |    resched_curr() {
  0)               |      __resched_curr() {
  0)               |        /* ipi_send_cpu: cpu=19
callsite=return_to_handler+0x0/0x40 callback=0x0 */
  0)               |        native_smp_send_reschedule() {
  0)               |          x2apic_send_IPI() {
  0)   0.084 us    |            __x2apic_send_IPI_dest();
  0)   0.200 us    |          }
  0)   0.296 us    |        }
  0)   0.580 us    |      }
  0)   0.678 us    |    }
  0)               |    raw_spin_rq_unlock() {
  0)               |      _raw_spin_unlock() {
  0)   0.052 us    |        preempt_count_sub();
  0)   0.159 us    |      }
  0)   0.260 us    |    }
  0)   5.436 us    |  }

[3.] SCHED_DEADLINE, CFS, nohz, isolcpu, nohz_full_kick_func
[4.] 6.14.11, code appears first in 6.12
[5.] 6.4.16 does not have this issue

-- 
DISCLAIMER: NOTICE REGARDING PRIVACY AND CONFIDENTIALITY 

The information 
contained in and/or accompanying this communication is intended only for 
use by the addressee(s) named herein and may contain legally privileged 
and/or confidential information. If you are not the intended recipient of 
this e-mail, you are hereby notified that any dissemination, distribution 
or copying of this information, and any attachments thereto, is strictly 
prohibited. If you have received this e-mail in error, please immediately 
notify the sender and permanently delete the original and any copy of any 
e-mail and any printout thereof. Electronic transmissions cannot be 
guaranteed to be secure or error-free. The sender therefore does not accept 
liability for any errors or omissions in the contents of this message which 
arise as a result of e-mail transmission. Simplex Trading, LLC and its 
affiliates reserves the right to intercept, monitor, and retain electronic 
communications to and from its system as permitted by law. Simplex Trading, 
LLC is a registered Broker Dealer with CBOE and a Member of SIPC.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression
  2025-07-30 16:51 Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression David Haufe
@ 2025-07-31  7:01 ` Juri Lelli
  2025-07-31 17:50   ` David Haufe
       [not found]   ` <CAKJHwtOZkrR9kEj+tffq=o0i1fPi3P+8BTHz3RyPDmn=uDOF7g@mail.gmail.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Juri Lelli @ 2025-07-31  7:01 UTC (permalink / raw)
  To: David Haufe; +Cc: linux-kernel

Hello,

Thanks for the report.

On 30/07/25 11:51, David Haufe wrote:
> [1.] Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call
> Interrupts on isolcpu/nohz_full cores, performance regression
> [2.] The code for dl_server_timer is causing new IPI/Function Call
> Interrupts to fire on isolcpu/nohz_full cores which previously had no
> interrupts. When there is a single, SCHED_OTHER process running on an
> isolcpu/nohz_full core, dl_server_timer executes on a housekeeping
> core. This ultimately invokes add_nr_running() and
> sched_update_tick_dependency() and finally tick_nohz_dep_set_cpu().
> Setting the single process running on an isolcpu/nohz_full core to
> FIFO (rt priority) prevents this new interrupt, as it is not seen as a
> fair schedule process anymore. Having to use rt priority is
> unnecessary and a regression to prior kernels. Kernel function_graph
> trace below showing core 0 (housekeeping) sending the IPI to core 19
> (nohz_full, isolcpu, rcu_nocb_poll) which is running a single
> SCHED_OTHER process. I believe this has been observed by others.
> https://community.clearlinux.org/t/sysjitter-worse-in-kernel-6-12-than-6-6/10206

Would you be able to check if the following branch, containing multiple
fixes for dl-server, is still affected by the regression?

Thanks,
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression
  2025-07-31  7:01 ` Juri Lelli
@ 2025-07-31 17:50   ` David Haufe
       [not found]   ` <CAKJHwtOZkrR9kEj+tffq=o0i1fPi3P+8BTHz3RyPDmn=uDOF7g@mail.gmail.com>
  1 sibling, 0 replies; 8+ messages in thread
From: David Haufe @ 2025-07-31 17:50 UTC (permalink / raw)
  To: Juri Lelli; +Cc: linux-kernel

Kernel 6.16 shows the issue. /kernel/sched/fair.c calls
dl_server_start() and there is no assessment prior to that point or
later of the isolcpu/nohz_full+single-process condition of the core.
Same function_graph trace generated. Code is the same at
tip+sched/core.


On Thu, Jul 31, 2025 at 2:02 AM Juri Lelli <juri.lelli@redhat.com> wrote:
>
> Hello,
>
> Thanks for the report.
>
> On 30/07/25 11:51, David Haufe wrote:
> > [1.] Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call
> > Interrupts on isolcpu/nohz_full cores, performance regression
> > [2.] The code for dl_server_timer is causing new IPI/Function Call
> > Interrupts to fire on isolcpu/nohz_full cores which previously had no
> > interrupts. When there is a single, SCHED_OTHER process running on an
> > isolcpu/nohz_full core, dl_server_timer executes on a housekeeping
> > core. This ultimately invokes add_nr_running() and
> > sched_update_tick_dependency() and finally tick_nohz_dep_set_cpu().
> > Setting the single process running on an isolcpu/nohz_full core to
> > FIFO (rt priority) prevents this new interrupt, as it is not seen as a
> > fair schedule process anymore. Having to use rt priority is
> > unnecessary and a regression to prior kernels. Kernel function_graph
> > trace below showing core 0 (housekeeping) sending the IPI to core 19
> > (nohz_full, isolcpu, rcu_nocb_poll) which is running a single
> > SCHED_OTHER process. I believe this has been observed by others.
> > https://community.clearlinux.org/t/sysjitter-worse-in-kernel-6-12-than-6-6/10206
>
> Would you be able to check if the following branch, containing multiple
> fixes for dl-server, is still affected by the regression?
>
> Thanks,
> Juri
>

-- 
DISCLAIMER: NOTICE REGARDING PRIVACY AND CONFIDENTIALITY 

The information 
contained in and/or accompanying this communication is intended only for 
use by the addressee(s) named herein and may contain legally privileged 
and/or confidential information. If you are not the intended recipient of 
this e-mail, you are hereby notified that any dissemination, distribution 
or copying of this information, and any attachments thereto, is strictly 
prohibited. If you have received this e-mail in error, please immediately 
notify the sender and permanently delete the original and any copy of any 
e-mail and any printout thereof. Electronic transmissions cannot be 
guaranteed to be secure or error-free. The sender therefore does not accept 
liability for any errors or omissions in the contents of this message which 
arise as a result of e-mail transmission. Simplex Trading, LLC and its 
affiliates reserves the right to intercept, monitor, and retain electronic 
communications to and from its system as permitted by law. Simplex Trading, 
LLC is a registered Broker Dealer with CBOE and a Member of SIPC.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression
       [not found]   ` <CAKJHwtOZkrR9kEj+tffq=o0i1fPi3P+8BTHz3RyPDmn=uDOF7g@mail.gmail.com>
@ 2025-08-01  9:06     ` Juri Lelli
  2025-08-01 15:28       ` David Haufe
  0 siblings, 1 reply; 8+ messages in thread
From: Juri Lelli @ 2025-08-01  9:06 UTC (permalink / raw)
  To: David Haufe; +Cc: linux-kernel

Hi,

On 31/07/25 12:48, David Haufe wrote:
> Kernel 6.16 shows the issue. /kernel/sched/fair.c calls dl_server_start()
> and there is no assessment prior to that point or later of the
> isolcpu/nohz_full+single-process condition of the core. Same function_graph
> trace generated. Code is the same at tip+sched/core.
> 
> On Thu, Jul 31, 2025 at 2:02 AM Juri Lelli <juri.lelli@redhat.com> wrote:
> 
> > Hello,
> >
> > Thanks for the report.
> >
> > On 30/07/25 11:51, David Haufe wrote:
> > > [1.] Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call
> > > Interrupts on isolcpu/nohz_full cores, performance regression
> > > [2.] The code for dl_server_timer is causing new IPI/Function Call
> > > Interrupts to fire on isolcpu/nohz_full cores which previously had no
> > > interrupts. When there is a single, SCHED_OTHER process running on an
> > > isolcpu/nohz_full core, dl_server_timer executes on a housekeeping
> > > core. This ultimately invokes add_nr_running() and
> > > sched_update_tick_dependency() and finally tick_nohz_dep_set_cpu().
> > > Setting the single process running on an isolcpu/nohz_full core to
> > > FIFO (rt priority) prevents this new interrupt, as it is not seen as a
> > > fair schedule process anymore. Having to use rt priority is
> > > unnecessary and a regression to prior kernels. Kernel function_graph
> > > trace below showing core 0 (housekeeping) sending the IPI to core 19
> > > (nohz_full, isolcpu, rcu_nocb_poll) which is running a single
> > > SCHED_OTHER process. I believe this has been observed by others.
> > >
> > https://community.clearlinux.org/t/sysjitter-worse-in-kernel-6-12-than-6-6/10206
> >
> > Would you be able to check if the following branch, containing multiple
> > fixes for dl-server, is still affected by the regression?

Apologies, I forgot to share the actual branch. :-/

Could you please test with

https://github.com/jlelli/linux/commits/upstream/fix-dlserver-1/

Among various other fixes, 219a63335b67 ("sched/deadline: Don't count
nr_running twice for dl_server proxy tasks") is making sure we don't
count fair tasks twice, so I am wondering if it can have an effect on
entering nohz_full.

Thanks,
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression
  2025-08-01  9:06     ` Juri Lelli
@ 2025-08-01 15:28       ` David Haufe
  2025-08-04 14:49         ` Juri Lelli
  0 siblings, 1 reply; 8+ messages in thread
From: David Haufe @ 2025-08-01 15:28 UTC (permalink / raw)
  To: Juri Lelli; +Cc: linux-kernel

I am sorry, but we cannot get this branch to boot on our hardware.
Looking through the code of the branch, it will not address the issue.
I believe the issue is more fundamental. In
fair.c->enqueue_task_fair(), dl_server_start() is called when the
single fair/SCHED_OTHER task is added to the isolcpu/nohz_full core.
The check here is simply checking if there is 1 or more process and
kicks off the dl_server_start() and the housekeeping timer in
start_dl_timer(). Once this timer is running, it will invoke
dl_server_timer() continuously. This timer calls __enqueue_dl_entity()
and then inc_dl_tasks(). inc_dl_tasks() increments
dl_rq->dl_nr_running++ and invokes add_nr_running(). This code will
eventually call the sched_can_stop_tick() function but
rq->dl.dl_nr_running now != 0, so this function will always return
false. Something needs to be done to prevent this timer from running
in the first place, or maybe have some checks around single
"fair/SCHED_OTHER/etc" process running on an isolcpu/nohz_full core
which prevents the need for the deadline code to run for the core.

On Fri, Aug 1, 2025 at 4:06 AM Juri Lelli <juri.lelli@redhat.com> wrote:
>
> Hi,
>
> On 31/07/25 12:48, David Haufe wrote:
> > Kernel 6.16 shows the issue. /kernel/sched/fair.c calls dl_server_start()
> > and there is no assessment prior to that point or later of the
> > isolcpu/nohz_full+single-process condition of the core. Same function_graph
> > trace generated. Code is the same at tip+sched/core.
> >
> > On Thu, Jul 31, 2025 at 2:02 AM Juri Lelli <juri.lelli@redhat.com> wrote:
> >
> > > Hello,
> > >
> > > Thanks for the report.
> > >
> > > On 30/07/25 11:51, David Haufe wrote:
> > > > [1.] Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call
> > > > Interrupts on isolcpu/nohz_full cores, performance regression
> > > > [2.] The code for dl_server_timer is causing new IPI/Function Call
> > > > Interrupts to fire on isolcpu/nohz_full cores which previously had no
> > > > interrupts. When there is a single, SCHED_OTHER process running on an
> > > > isolcpu/nohz_full core, dl_server_timer executes on a housekeeping
> > > > core. This ultimately invokes add_nr_running() and
> > > > sched_update_tick_dependency() and finally tick_nohz_dep_set_cpu().
> > > > Setting the single process running on an isolcpu/nohz_full core to
> > > > FIFO (rt priority) prevents this new interrupt, as it is not seen as a
> > > > fair schedule process anymore. Having to use rt priority is
> > > > unnecessary and a regression to prior kernels. Kernel function_graph
> > > > trace below showing core 0 (housekeeping) sending the IPI to core 19
> > > > (nohz_full, isolcpu, rcu_nocb_poll) which is running a single
> > > > SCHED_OTHER process. I believe this has been observed by others.
> > > >
> > > https://community.clearlinux.org/t/sysjitter-worse-in-kernel-6-12-than-6-6/10206
> > >
> > > Would you be able to check if the following branch, containing multiple
> > > fixes for dl-server, is still affected by the regression?
>
> Apologies, I forgot to share the actual branch. :-/
>
> Could you please test with
>
> https://github.com/jlelli/linux/commits/upstream/fix-dlserver-1/
>
> Among various other fixes, 219a63335b67 ("sched/deadline: Don't count
> nr_running twice for dl_server proxy tasks") is making sure we don't
> count fair tasks twice, so I am wondering if it can have an effect on
> entering nohz_full.
>
> Thanks,
> Juri
>

-- 
DISCLAIMER: NOTICE REGARDING PRIVACY AND CONFIDENTIALITY 

The information 
contained in and/or accompanying this communication is intended only for 
use by the addressee(s) named herein and may contain legally privileged 
and/or confidential information. If you are not the intended recipient of 
this e-mail, you are hereby notified that any dissemination, distribution 
or copying of this information, and any attachments thereto, is strictly 
prohibited. If you have received this e-mail in error, please immediately 
notify the sender and permanently delete the original and any copy of any 
e-mail and any printout thereof. Electronic transmissions cannot be 
guaranteed to be secure or error-free. The sender therefore does not accept 
liability for any errors or omissions in the contents of this message which 
arise as a result of e-mail transmission. Simplex Trading, LLC and its 
affiliates reserves the right to intercept, monitor, and retain electronic 
communications to and from its system as permitted by law. Simplex Trading, 
LLC is a registered Broker Dealer with CBOE and a Member of SIPC.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression
  2025-08-01 15:28       ` David Haufe
@ 2025-08-04 14:49         ` Juri Lelli
  2025-08-04 15:44           ` David Haufe
  0 siblings, 1 reply; 8+ messages in thread
From: Juri Lelli @ 2025-08-04 14:49 UTC (permalink / raw)
  To: David Haufe; +Cc: linux-kernel

On 01/08/25 10:28, David Haufe wrote:
> I am sorry, but we cannot get this branch to boot on our hardware.
> Looking through the code of the branch, it will not address the issue.
> I believe the issue is more fundamental. In
> fair.c->enqueue_task_fair(), dl_server_start() is called when the
> single fair/SCHED_OTHER task is added to the isolcpu/nohz_full core.
> The check here is simply checking if there is 1 or more process and
> kicks off the dl_server_start() and the housekeeping timer in
> start_dl_timer(). Once this timer is running, it will invoke
> dl_server_timer() continuously. This timer calls __enqueue_dl_entity()
> and then inc_dl_tasks(). inc_dl_tasks() increments
> dl_rq->dl_nr_running++ and invokes add_nr_running(). This code will
> eventually call the sched_can_stop_tick() function but
> rq->dl.dl_nr_running now != 0, so this function will always return
> false. Something needs to be done to prevent this timer from running
> in the first place, or maybe have some checks around single
> "fair/SCHED_OTHER/etc" process running on an isolcpu/nohz_full core
> which prevents the need for the deadline code to run for the core.

The fix commit I mentioned should at least make entering nohz_full work
again even when the dl_server is active (but deferred). We still have
the 1 dl_server_timer firing each second (after recent additional fix by
Peter), though. At least this is what I am seeing at my end.

Will try to see if we can remove that periodic timer once nohz_full mode
is entered.

Thanks,
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression
  2025-08-04 14:49         ` Juri Lelli
@ 2025-08-04 15:44           ` David Haufe
  2025-08-04 16:58             ` Juri Lelli
  0 siblings, 1 reply; 8+ messages in thread
From: David Haufe @ 2025-08-04 15:44 UTC (permalink / raw)
  To: Juri Lelli; +Cc: linux-kernel

My apologies, I see what you mean now. add_nr_running() is not being
invoked if it is the dl_server. We are still trying to get this branch
to boot to verify ourselves. We will be on the lookout for this to be
merged for release.

Thanks again,
Dave

On Mon, Aug 4, 2025 at 9:49 AM Juri Lelli <juri.lelli@redhat.com> wrote:
>
> On 01/08/25 10:28, David Haufe wrote:
> > I am sorry, but we cannot get this branch to boot on our hardware.
> > Looking through the code of the branch, it will not address the issue.
> > I believe the issue is more fundamental. In
> > fair.c->enqueue_task_fair(), dl_server_start() is called when the
> > single fair/SCHED_OTHER task is added to the isolcpu/nohz_full core.
> > The check here is simply checking if there is 1 or more process and
> > kicks off the dl_server_start() and the housekeeping timer in
> > start_dl_timer(). Once this timer is running, it will invoke
> > dl_server_timer() continuously. This timer calls __enqueue_dl_entity()
> > and then inc_dl_tasks(). inc_dl_tasks() increments
> > dl_rq->dl_nr_running++ and invokes add_nr_running(). This code will
> > eventually call the sched_can_stop_tick() function but
> > rq->dl.dl_nr_running now != 0, so this function will always return
> > false. Something needs to be done to prevent this timer from running
> > in the first place, or maybe have some checks around single
> > "fair/SCHED_OTHER/etc" process running on an isolcpu/nohz_full core
> > which prevents the need for the deadline code to run for the core.
>
> The fix commit I mentioned should at least make entering nohz_full work
> again even when the dl_server is active (but deferred). We still have
> the 1 dl_server_timer firing each second (after recent additional fix by
> Peter), though. At least this is what I am seeing at my end.
>
> Will try to see if we can remove that periodic timer once nohz_full mode
> is entered.
>
> Thanks,
> Juri
>

-- 
DISCLAIMER: NOTICE REGARDING PRIVACY AND CONFIDENTIALITY 

The information 
contained in and/or accompanying this communication is intended only for 
use by the addressee(s) named herein and may contain legally privileged 
and/or confidential information. If you are not the intended recipient of 
this e-mail, you are hereby notified that any dissemination, distribution 
or copying of this information, and any attachments thereto, is strictly 
prohibited. If you have received this e-mail in error, please immediately 
notify the sender and permanently delete the original and any copy of any 
e-mail and any printout thereof. Electronic transmissions cannot be 
guaranteed to be secure or error-free. The sender therefore does not accept 
liability for any errors or omissions in the contents of this message which 
arise as a result of e-mail transmission. Simplex Trading, LLC and its 
affiliates reserves the right to intercept, monitor, and retain electronic 
communications to and from its system as permitted by law. Simplex Trading, 
LLC is a registered Broker Dealer with CBOE and a Member of SIPC.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression
  2025-08-04 15:44           ` David Haufe
@ 2025-08-04 16:58             ` Juri Lelli
  0 siblings, 0 replies; 8+ messages in thread
From: Juri Lelli @ 2025-08-04 16:58 UTC (permalink / raw)
  To: David Haufe; +Cc: linux-kernel

On 04/08/25 10:44, David Haufe wrote:
> My apologies, I see what you mean now. add_nr_running() is not being
> invoked if it is the dl_server. We are still trying to get this branch
> to boot to verify ourselves. We will be on the lookout for this to be
> merged for release.

No worries, guess I wasn't clear the first time. :)

I added a very much experimental commit on

https://github.com/jlelli/linux/tree/upstream/fix-dlserver-1

that seems to be able to remove the one per second dl_server_timer and
start it back as needed. But, I just played briefly with it, so I am not
fully convinced is what we want. Anyway, if you could test with it as
well it would be a useful data point. In principle you could try porting
the following commits to your current tree and check if they do improve
things (in reverse order starting from the bottom from the branch above):

f237e524f3c7 ("sched/deadline: Make dl-server nohz full aware")
219a63335b67 ("sched/deadline: Don't count nr_running twice for dl_server proxy tasks")
7620177e8108 ("sched/deadline: Fix RT task potential starvation when expiry time passed")
cccb45d7c429 ("sched/deadline: Less agressive dl_server handling")

Thanks!
Juri


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-04 16:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 16:51 Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call Interrupts on isolcpu/nohz_full cores, performance regression David Haufe
2025-07-31  7:01 ` Juri Lelli
2025-07-31 17:50   ` David Haufe
     [not found]   ` <CAKJHwtOZkrR9kEj+tffq=o0i1fPi3P+8BTHz3RyPDmn=uDOF7g@mail.gmail.com>
2025-08-01  9:06     ` Juri Lelli
2025-08-01 15:28       ` David Haufe
2025-08-04 14:49         ` Juri Lelli
2025-08-04 15:44           ` David Haufe
2025-08-04 16:58             ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).