All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Chuck Lever <cel@kernel.org>
Cc: Marco Crivellari <marco.crivellari@suse.com>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	netdev@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Michal Hocko <mhocko@suse.com>,
	Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna@kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>
Subject: Re: [RFC PATCH] xprtrdma: Move long delayed work on system_dfl_long_wq
Date: Thu, 30 Apr 2026 17:04:04 +0200	[thread overview]
Message-ID: <afNvZKtiQPLbi-3F@localhost.localdomain> (raw)
In-Reply-To: <1e220a70-4318-49de-aaac-332c0a1cfab4@app.fastmail.com>

Le Thu, Apr 30, 2026 at 10:05:52AM -0400, Chuck Lever a écrit :
> 
> 
> On Thu, Apr 30, 2026, at 10:01 AM, Frederic Weisbecker wrote:
> > Le Thu, Apr 30, 2026 at 09:35:20AM -0400, Chuck Lever a écrit :
> >> 
> >> On Thu, Apr 30, 2026, at 4:54 AM, Marco Crivellari wrote:
> >> > Currently the code enqueue work items using {queue|mod}_delayed_work(),
> >> > using system_long_wq. This workqueue should be used when long works are
> >> > expected, but it is a per-cpu workqueue.
> >> >
> >> > This is important because queue_delayed_work() queue the work using:
> >> >
> >> >    queue_delayed_work_on(WORK_CPU_UNBOUND, ...);
> >> >
> >> > Note that WORK_CPU_UNBOUND = NR_CPUS.
> >> >
> >> > This would end up calling __queue_delayed_work() that does:
> >> >
> >> >     if (housekeeping_enabled(HK_TYPE_TIMER)) {
> >> >     //      [....]
> >> >     } else {
> >> >             if (likely(cpu == WORK_CPU_UNBOUND))
> >> >                     add_timer_global(timer);
> >> >             else
> >> >                     add_timer_on(timer, cpu);
> >> >     }
> >> >
> >> > So when cpu == WORK_CPU_UNBOUND the timer is global and is
> >> > not using a specific CPU. Later, when __queue_work() is called:
> >> >
> >> >     if (req_cpu == WORK_CPU_UNBOUND) {
> >> >             if (wq->flags & WQ_UNBOUND)
> >> >                     cpu = wq_select_unbound_cpu(raw_smp_processor_id());
> >> >             else
> >> >                     cpu = raw_smp_processor_id();
> >> >     }
> >> >
> >> > Because the wq is not unbound, it takes the CPU where the timer
> >> > fired and enqueue the work on that CPU.
> >> > The consequence of all of this is that the work can run anywhere,
> >> > depending on where the timer fired.
> >> >
> >> > Recently, a new unbound workqueue specific for long running work has
> >> > been added:
> >> >
> >> >    c116737e972e ("workqueue: Add system_dfl_long_wq for long unbound works")
> >> >
> >> > So change system_long_wq with system_dfl_long_wq so that the work may
> >> > benefit from scheduler task placement.
> >> 
> >> The patch description confuses me.
> >> 
> >> The message ends with "the work can run anywhere, depending on where
> >> the timer fired." Read literally, "can run anywhere" sounds like a
> >> feature, not a bug
> >
> > A feature, but incomplete :)
> >
> >> — and the proposed fix (WQ_UNBOUND) also lets it
> >> run anywhere, just via a different selection path. Without a sentence
> >> saying "and that anywhere includes isolated CPUs, which we don't want,"
> >> the reader is left to fill in the gap.
> >
> > Not quite, global timers don't fire on isolated CPUs. And since it gets enqueued
> > on the CPU where it fired, it won't be enqueued on an isolated CPU.
> >
> >> 
> >> So, could the commit message lead with the motivation? My guess is that
> >> this is about respecting HK_TYPE_TIMER housekeeping on isolated systems,
> >> which system_long_wq cannot do because its per-CPU pool ignores the
> >> housekeeping mask once the global timer fires. If that is the case,
> >> please say so directly and the mechanism trace becomes a supporting
> >> argument rather than the whole argument.
> >
> > The purpose is explained on the last line:
> >
> > """
> > So change system_long_wq with system_dfl_long_wq so that the work may
> >  benefit from scheduler task placement.
> > """
> >
> > Arguably this could be elaborated. For example we can change that:
> >
> > """
> > The consequence of all of this is that the work can run anywhere,
> > depending on where the timer fired.
> > """
> >
> > into that:
> >
> > """
> > The consequence of all of this is that the work can run on any
> > housekeeping CPU, irrespective of the scheduler that knows better
> > about the best task placement, which would apply if the work were
> > to be queued on an unbound workqueue.
> > """
> >
> > Would that help?
> 
> It's still not clearing it up for me.
> 
> Does the patch address a bug (work isn't getting rescheduled at
> all) or is it merely a minor optimization for certain platforms?
> 
> What's the user-visible issue that will be improved with this
> change?

It's not a bug, it's an optimization power-wise and performance-wise
and also part of a bigger sanity change:

- Long works have no reason to stick to a single CPU. If they are converted to
  be unbound, the scheduler can move them to relevant targets to optimize
  performances and power consumption. Hence the new system_unbound_long_wq.
  The goal is to remove system_long_wq if none of its users rely on locality.

- Using queue_delayed_work() with a bound workqueue doesn't make any sense
  since the target is completely random.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

  reply	other threads:[~2026-04-30 15:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  8:54 [RFC PATCH] xprtrdma: Move long delayed work on system_dfl_long_wq Marco Crivellari
2026-04-30 13:35 ` Chuck Lever
2026-04-30 14:01   ` Frederic Weisbecker
2026-04-30 14:05     ` Chuck Lever
2026-04-30 15:04       ` Frederic Weisbecker [this message]
2026-04-30 15:09         ` Chuck Lever
2026-05-04  8:34           ` Marco Crivellari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afNvZKtiQPLbi-3F@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=marco.crivellari@suse.com \
    --cc=mhocko@suse.com \
    --cc=neil@brown.name \
    --cc=netdev@vger.kernel.org \
    --cc=okorniev@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=tj@kernel.org \
    --cc=tom@talpey.com \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.