public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Chuck Lever <cel@kernel.org>
Cc: Marco Crivellari <marco.crivellari@suse.com>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	netdev@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Michal Hocko <mhocko@suse.com>,
	Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna@kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>
Subject: Re: [RFC PATCH] xprtrdma: Move long delayed work on system_dfl_long_wq
Date: Thu, 30 Apr 2026 17:04:04 +0200	[thread overview]
Message-ID: <afNvZKtiQPLbi-3F@localhost.localdomain> (raw)
In-Reply-To: <1e220a70-4318-49de-aaac-332c0a1cfab4@app.fastmail.com>

Le Thu, Apr 30, 2026 at 10:05:52AM -0400, Chuck Lever a écrit :
> 
> 
> On Thu, Apr 30, 2026, at 10:01 AM, Frederic Weisbecker wrote:
> > Le Thu, Apr 30, 2026 at 09:35:20AM -0400, Chuck Lever a écrit :
> >> 
> >> On Thu, Apr 30, 2026, at 4:54 AM, Marco Crivellari wrote:
> >> > Currently the code enqueue work items using {queue|mod}_delayed_work(),
> >> > using system_long_wq. This workqueue should be used when long works are
> >> > expected, but it is a per-cpu workqueue.
> >> >
> >> > This is important because queue_delayed_work() queue the work using:
> >> >
> >> >    queue_delayed_work_on(WORK_CPU_UNBOUND, ...);
> >> >
> >> > Note that WORK_CPU_UNBOUND = NR_CPUS.
> >> >
> >> > This would end up calling __queue_delayed_work() that does:
> >> >
> >> >     if (housekeeping_enabled(HK_TYPE_TIMER)) {
> >> >     //      [....]
> >> >     } else {
> >> >             if (likely(cpu == WORK_CPU_UNBOUND))
> >> >                     add_timer_global(timer);
> >> >             else
> >> >                     add_timer_on(timer, cpu);
> >> >     }
> >> >
> >> > So when cpu == WORK_CPU_UNBOUND the timer is global and is
> >> > not using a specific CPU. Later, when __queue_work() is called:
> >> >
> >> >     if (req_cpu == WORK_CPU_UNBOUND) {
> >> >             if (wq->flags & WQ_UNBOUND)
> >> >                     cpu = wq_select_unbound_cpu(raw_smp_processor_id());
> >> >             else
> >> >                     cpu = raw_smp_processor_id();
> >> >     }
> >> >
> >> > Because the wq is not unbound, it takes the CPU where the timer
> >> > fired and enqueue the work on that CPU.
> >> > The consequence of all of this is that the work can run anywhere,
> >> > depending on where the timer fired.
> >> >
> >> > Recently, a new unbound workqueue specific for long running work has
> >> > been added:
> >> >
> >> >    c116737e972e ("workqueue: Add system_dfl_long_wq for long unbound works")
> >> >
> >> > So change system_long_wq with system_dfl_long_wq so that the work may
> >> > benefit from scheduler task placement.
> >> 
> >> The patch description confuses me.
> >> 
> >> The message ends with "the work can run anywhere, depending on where
> >> the timer fired." Read literally, "can run anywhere" sounds like a
> >> feature, not a bug
> >
> > A feature, but incomplete :)
> >
> >> — and the proposed fix (WQ_UNBOUND) also lets it
> >> run anywhere, just via a different selection path. Without a sentence
> >> saying "and that anywhere includes isolated CPUs, which we don't want,"
> >> the reader is left to fill in the gap.
> >
> > Not quite, global timers don't fire on isolated CPUs. And since it gets enqueued
> > on the CPU where it fired, it won't be enqueued on an isolated CPU.
> >
> >> 
> >> So, could the commit message lead with the motivation? My guess is that
> >> this is about respecting HK_TYPE_TIMER housekeeping on isolated systems,
> >> which system_long_wq cannot do because its per-CPU pool ignores the
> >> housekeeping mask once the global timer fires. If that is the case,
> >> please say so directly and the mechanism trace becomes a supporting
> >> argument rather than the whole argument.
> >
> > The purpose is explained on the last line:
> >
> > """
> > So change system_long_wq with system_dfl_long_wq so that the work may
> >  benefit from scheduler task placement.
> > """
> >
> > Arguably this could be elaborated. For example we can change that:
> >
> > """
> > The consequence of all of this is that the work can run anywhere,
> > depending on where the timer fired.
> > """
> >
> > into that:
> >
> > """
> > The consequence of all of this is that the work can run on any
> > housekeeping CPU, irrespective of the scheduler that knows better
> > about the best task placement, which would apply if the work were
> > to be queued on an unbound workqueue.
> > """
> >
> > Would that help?
> 
> It's still not clearing it up for me.
> 
> Does the patch address a bug (work isn't getting rescheduled at
> all) or is it merely a minor optimization for certain platforms?
> 
> What's the user-visible issue that will be improved with this
> change?

It's not a bug, it's an optimization power-wise and performance-wise
and also part of a bigger sanity change:

- Long works have no reason to stick to a single CPU. If they are converted to
  be unbound, the scheduler can move them to relevant targets to optimize
  performances and power consumption. Hence the new system_unbound_long_wq.
  The goal is to remove system_long_wq if none of its users rely on locality.

- Using queue_delayed_work() with a bound workqueue doesn't make any sense
  since the target is completely random.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

  reply	other threads:[~2026-04-30 15:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  8:54 [RFC PATCH] xprtrdma: Move long delayed work on system_dfl_long_wq Marco Crivellari
2026-04-30 13:35 ` Chuck Lever
2026-04-30 14:01   ` Frederic Weisbecker
2026-04-30 14:05     ` Chuck Lever
2026-04-30 15:04       ` Frederic Weisbecker [this message]
2026-04-30 15:09         ` Chuck Lever
2026-05-04  8:34           ` Marco Crivellari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afNvZKtiQPLbi-3F@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=marco.crivellari@suse.com \
    --cc=mhocko@suse.com \
    --cc=neil@brown.name \
    --cc=netdev@vger.kernel.org \
    --cc=okorniev@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=tj@kernel.org \
    --cc=tom@talpey.com \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox