* [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq
@ 2026-03-13 15:40 Marco Crivellari
2026-03-13 17:49 ` yanjun.zhu
2026-03-16 20:13 ` Leon Romanovsky
0 siblings, 2 replies; 15+ messages in thread
From: Marco Crivellari @ 2026-03-13 15:40 UTC (permalink / raw)
To: linux-kernel, linux-rdma
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky
This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.
Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:
system_wq -> system_percpu_wq
system_unbound_wq -> system_dfl_wq
This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index bc11b1ec59ac..d440c8cbaea5 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
work->frags[i].mr = mr;
}
- queue_work(system_unbound_wq, &work->work);
+ queue_work(system_dfl_wq, &work->work);
return 0;
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-13 15:40 [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq Marco Crivellari @ 2026-03-13 17:49 ` yanjun.zhu 2026-03-16 20:13 ` Leon Romanovsky 1 sibling, 0 replies; 15+ messages in thread From: yanjun.zhu @ 2026-03-13 17:49 UTC (permalink / raw) To: Marco Crivellari, linux-kernel, linux-rdma Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky On 3/13/26 8:40 AM, Marco Crivellari wrote: > This patch continues the effort to refactor workqueue APIs, which has begun > with the changes introducing new workqueues and a new alloc_workqueue flag: > > commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") > commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") > > The point of the refactoring is to eventually alter the default behavior of > workqueues to become unbound by default so that their workload placement is > optimized by the scheduler. > > Before that to happen, workqueue users must be converted to the better named > new workqueues with no intended behaviour changes: > > system_wq -> system_percpu_wq > system_unbound_wq -> system_dfl_wq > > This way the old obsolete workqueues (system_wq, system_unbound_wq) can be > removed in the future. > > Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ > Suggested-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> This patch is part of a broader effort to clarify workqueue semantics. As discussed in the recent thread at https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/], the move towards system_dfl_wq is not just a renaming exercise; it's about ensuring work items correctly respect the system's housekeeping CPUMASK. To RXE, it is a software-defined RDMA transport. RXE does not have strict hardware-to-CPU affinity requirements. Specifically for the ODP prefetch path modified here: 1. Prefetching doesn't rely on being executed on the local CPU where the advise_mr was called. 2. The locality benefits of per-cpu execution are negligible compared to the importance of system-wide jitter reduction, especially in NOHZ_FULL environments. 3. By using system_dfl_wq, we allow the scheduler to offload prefetch tasks from isolated CPUs to housekeeping CPUs, which is a desirable behavior for real-time users. The patch is safe, logically sound, and aligns with the current kernel-wide modernization of workqueue placement. I have made tests with this commit. It can work well in functionality. I am fine with this. Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Zhu Yanjun > --- > drivers/infiniband/sw/rxe/rxe_odp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c > index bc11b1ec59ac..d440c8cbaea5 100644 > --- a/drivers/infiniband/sw/rxe/rxe_odp.c > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c > @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, > work->frags[i].mr = mr; > } > > - queue_work(system_unbound_wq, &work->work); > + queue_work(system_dfl_wq, &work->work); > > return 0; > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-13 15:40 [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq Marco Crivellari 2026-03-13 17:49 ` yanjun.zhu @ 2026-03-16 20:13 ` Leon Romanovsky 2026-03-17 14:32 ` Marco Crivellari 2026-03-17 14:38 ` Zhu Yanjun 1 sibling, 2 replies; 15+ messages in thread From: Leon Romanovsky @ 2026-03-16 20:13 UTC (permalink / raw) To: Marco Crivellari Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On Fri, Mar 13, 2026 at 04:40:23PM +0100, Marco Crivellari wrote: > This patch continues the effort to refactor workqueue APIs, which has begun > with the changes introducing new workqueues and a new alloc_workqueue flag: > > commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") > commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") > > The point of the refactoring is to eventually alter the default behavior of > workqueues to become unbound by default so that their workload placement is > optimized by the scheduler. > > Before that to happen, workqueue users must be converted to the better named > new workqueues with no intended behaviour changes: > > system_wq -> system_percpu_wq > system_unbound_wq -> system_dfl_wq > > This way the old obsolete workqueues (system_wq, system_unbound_wq) can be > removed in the future. I recall earlier efforts to replace system workqueues with per‑driver queues, because unloading a driver forces a flush of the entire system workqueue, which is undesirable for overall system behavior. Wouldn't it be better to introduce a local workqueue here and use that instead? Thanks > > Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ > Suggested-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> > --- > drivers/infiniband/sw/rxe/rxe_odp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c > index bc11b1ec59ac..d440c8cbaea5 100644 > --- a/drivers/infiniband/sw/rxe/rxe_odp.c > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c > @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, > work->frags[i].mr = mr; > } > > - queue_work(system_unbound_wq, &work->work); > + queue_work(system_dfl_wq, &work->work); > > return 0; > > -- > 2.53.0 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-16 20:13 ` Leon Romanovsky @ 2026-03-17 14:32 ` Marco Crivellari 2026-03-17 16:24 ` Leon Romanovsky 2026-03-17 14:38 ` Zhu Yanjun 1 sibling, 1 reply; 15+ messages in thread From: Marco Crivellari @ 2026-03-17 14:32 UTC (permalink / raw) To: Leon Romanovsky Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On Mon, Mar 16, 2026 at 9:13 PM Leon Romanovsky <leon@kernel.org> wrote: > [...] > I recall earlier efforts to replace system workqueues with per‑driver queues, > because unloading a driver forces a flush of the entire system workqueue, > which is undesirable for overall system behavior. > > Wouldn't it be better to introduce a local workqueue here and use that instead? > > Thanks Hi, There is only this wq here. But we can do so if needed, no problem. Where do you think is the most appropriate place for the workqueue struct declaration? Like `struct prefetch_mr_work` maybe? Do you have suggestions for an appropriate place to allocate the workqueue? Thanks! -- Marco Crivellari L3 Support Engineer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-17 14:32 ` Marco Crivellari @ 2026-03-17 16:24 ` Leon Romanovsky 2026-03-18 8:34 ` Marco Crivellari 2026-03-18 12:20 ` Marco Crivellari 0 siblings, 2 replies; 15+ messages in thread From: Leon Romanovsky @ 2026-03-17 16:24 UTC (permalink / raw) To: Marco Crivellari Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On Tue, Mar 17, 2026 at 03:32:01PM +0100, Marco Crivellari wrote: > On Mon, Mar 16, 2026 at 9:13 PM Leon Romanovsky <leon@kernel.org> wrote: > > [...] > > I recall earlier efforts to replace system workqueues with per‑driver queues, > > because unloading a driver forces a flush of the entire system workqueue, > > which is undesirable for overall system behavior. > > > > Wouldn't it be better to introduce a local workqueue here and use that instead? > > > > Thanks > > Hi, > > There is only this wq here. But we can do so if needed, no problem. > > Where do you think is the most appropriate place for the workqueue struct > declaration? Like `struct prefetch_mr_work` maybe? > > Do you have suggestions for an appropriate place to allocate the workqueue? Actually, RXE already have one workqueue in rxe_alloc_wq(), just use it. Thanks > > Thanks! > > -- > > Marco Crivellari > > L3 Support Engineer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-17 16:24 ` Leon Romanovsky @ 2026-03-18 8:34 ` Marco Crivellari 2026-03-18 12:20 ` Marco Crivellari 1 sibling, 0 replies; 15+ messages in thread From: Marco Crivellari @ 2026-03-18 8:34 UTC (permalink / raw) To: Leon Romanovsky Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On Tue, Mar 17, 2026 at 5:24 PM Leon Romanovsky <leon@kernel.org> wrote: > [...] > Actually, RXE already have one workqueue in rxe_alloc_wq(), just use it. > > Thanks Thanks Leon, I will do as you suggested in the v2. -- Marco Crivellari L3 Support Engineer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-17 16:24 ` Leon Romanovsky 2026-03-18 8:34 ` Marco Crivellari @ 2026-03-18 12:20 ` Marco Crivellari 2026-03-18 14:47 ` Zhu Yanjun 2026-03-18 15:02 ` Leon Romanovsky 1 sibling, 2 replies; 15+ messages in thread From: Marco Crivellari @ 2026-03-18 12:20 UTC (permalink / raw) To: Leon Romanovsky Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On Tue, Mar 17, 2026 at 5:24 PM Leon Romanovsky <leon@kernel.org> wrote: > [...] > > Actually, RXE already have one workqueue in rxe_alloc_wq(), just use it. Hi Leon, I noticed the workqueue is declared as static into a C file. So I changed it a bit, tell me if it's not the right approach. You can see the diff below: --- diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index ff8cd53f5f28..c56bae376c7f 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -121,4 +121,6 @@ void rxe_port_up(struct rxe_dev *rxe); void rxe_port_down(struct rxe_dev *rxe); void rxe_set_port_state(struct rxe_dev *rxe); +extern struct workqueue_struct *rxe_wq; + #endif /* RXE_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index d440c8cbaea5..ff904d5e54a7 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, work->frags[i].mr = mr; } - queue_work(system_dfl_wq, &work->work); + queue_work(rxe_wq, &work->work); return 0; diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c index f522820b950c..801d06c969c9 100644 --- a/drivers/infiniband/sw/rxe/rxe_task.c +++ b/drivers/infiniband/sw/rxe/rxe_task.c @@ -6,7 +6,7 @@ #include "rxe.h" -static struct workqueue_struct *rxe_wq; +struct workqueue_struct *rxe_wq; int rxe_alloc_wq(void) { --- Thanks! -- Marco Crivellari L3 Support Engineer ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-18 12:20 ` Marco Crivellari @ 2026-03-18 14:47 ` Zhu Yanjun 2026-03-18 15:02 ` Leon Romanovsky 1 sibling, 0 replies; 15+ messages in thread From: Zhu Yanjun @ 2026-03-18 14:47 UTC (permalink / raw) To: Marco Crivellari, Leon Romanovsky, yanjun.zhu@linux.dev Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe 在 2026/3/18 5:20, Marco Crivellari 写道: > On Tue, Mar 17, 2026 at 5:24 PM Leon Romanovsky <leon@kernel.org> wrote: >> [...] >> >> Actually, RXE already have one workqueue in rxe_alloc_wq(), just use it. > > Hi Leon, > > I noticed the workqueue is declared as static into a C file. So I > changed it a bit, tell me if > it's not the right approach. > You can see the diff below: > > --- > > diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h > index ff8cd53f5f28..c56bae376c7f 100644 > --- a/drivers/infiniband/sw/rxe/rxe.h > +++ b/drivers/infiniband/sw/rxe/rxe.h > @@ -121,4 +121,6 @@ void rxe_port_up(struct rxe_dev *rxe); > void rxe_port_down(struct rxe_dev *rxe); > void rxe_set_port_state(struct rxe_dev *rxe); > > +extern struct workqueue_struct *rxe_wq; Hi, Marco https://patchwork.kernel.org/project/linux-rdma/patch/20260318025739.5058-1-yanjun.zhu@linux.dev/ Please see the above link. A fix has already been ready for this problem. Zhu Yanjun > + > #endif /* RXE_H */ > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c > b/drivers/infiniband/sw/rxe/rxe_odp.c > index d440c8cbaea5..ff904d5e54a7 100644 > --- a/drivers/infiniband/sw/rxe/rxe_odp.c > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c > @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, > work->frags[i].mr = mr; > } > > - queue_work(system_dfl_wq, &work->work); > + queue_work(rxe_wq, &work->work); > > return 0; > > diff --git a/drivers/infiniband/sw/rxe/rxe_task.c > b/drivers/infiniband/sw/rxe/rxe_task.c > index f522820b950c..801d06c969c9 100644 > --- a/drivers/infiniband/sw/rxe/rxe_task.c > +++ b/drivers/infiniband/sw/rxe/rxe_task.c > @@ -6,7 +6,7 @@ > > #include "rxe.h" > > -static struct workqueue_struct *rxe_wq; > +struct workqueue_struct *rxe_wq; > > int rxe_alloc_wq(void) > { > > --- > > Thanks! > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-18 12:20 ` Marco Crivellari 2026-03-18 14:47 ` Zhu Yanjun @ 2026-03-18 15:02 ` Leon Romanovsky 2026-03-18 15:08 ` Marco Crivellari 1 sibling, 1 reply; 15+ messages in thread From: Leon Romanovsky @ 2026-03-18 15:02 UTC (permalink / raw) To: Marco Crivellari Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On Wed, Mar 18, 2026 at 01:20:01PM +0100, Marco Crivellari wrote: > On Tue, Mar 17, 2026 at 5:24 PM Leon Romanovsky <leon@kernel.org> wrote: > > [...] > > > > Actually, RXE already have one workqueue in rxe_alloc_wq(), just use it. > > Hi Leon, > > I noticed the workqueue is declared as static into a C file. So I > changed it a bit, tell me if > it's not the right approach. Your fix is the most accurate and technically sound among all proposals. Thanks > You can see the diff below: > > --- > > diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h > index ff8cd53f5f28..c56bae376c7f 100644 > --- a/drivers/infiniband/sw/rxe/rxe.h > +++ b/drivers/infiniband/sw/rxe/rxe.h > @@ -121,4 +121,6 @@ void rxe_port_up(struct rxe_dev *rxe); > void rxe_port_down(struct rxe_dev *rxe); > void rxe_set_port_state(struct rxe_dev *rxe); > > +extern struct workqueue_struct *rxe_wq; > + > #endif /* RXE_H */ > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c > b/drivers/infiniband/sw/rxe/rxe_odp.c > index d440c8cbaea5..ff904d5e54a7 100644 > --- a/drivers/infiniband/sw/rxe/rxe_odp.c > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c > @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, > work->frags[i].mr = mr; > } > > - queue_work(system_dfl_wq, &work->work); > + queue_work(rxe_wq, &work->work); > > return 0; > > diff --git a/drivers/infiniband/sw/rxe/rxe_task.c > b/drivers/infiniband/sw/rxe/rxe_task.c > index f522820b950c..801d06c969c9 100644 > --- a/drivers/infiniband/sw/rxe/rxe_task.c > +++ b/drivers/infiniband/sw/rxe/rxe_task.c > @@ -6,7 +6,7 @@ > > #include "rxe.h" > > -static struct workqueue_struct *rxe_wq; > +struct workqueue_struct *rxe_wq; > > int rxe_alloc_wq(void) > { > > --- > > Thanks! > > -- > > Marco Crivellari > > L3 Support Engineer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-18 15:02 ` Leon Romanovsky @ 2026-03-18 15:08 ` Marco Crivellari 0 siblings, 0 replies; 15+ messages in thread From: Marco Crivellari @ 2026-03-18 15:08 UTC (permalink / raw) To: Leon Romanovsky Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On Wed, Mar 18, 2026 at 4:02 PM Leon Romanovsky <leon@kernel.org> wrote: > [...] > > I noticed the workqueue is declared as static into a C file. So I > > changed it a bit, tell me if > > it's not the right approach. > > Your fix is the most accurate and technically sound among all proposals. > > Thanks Thanks for your feedback Leon. I will send the v2 with the above code. -- Marco Crivellari L3 Support Engineer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-16 20:13 ` Leon Romanovsky 2026-03-17 14:32 ` Marco Crivellari @ 2026-03-17 14:38 ` Zhu Yanjun 2026-03-17 17:24 ` Yanjun.Zhu 1 sibling, 1 reply; 15+ messages in thread From: Zhu Yanjun @ 2026-03-17 14:38 UTC (permalink / raw) To: Leon Romanovsky, Marco Crivellari, yanjun.zhu@linux.dev Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe 在 2026/3/16 13:13, Leon Romanovsky 写道: > On Fri, Mar 13, 2026 at 04:40:23PM +0100, Marco Crivellari wrote: >> This patch continues the effort to refactor workqueue APIs, which has begun >> with the changes introducing new workqueues and a new alloc_workqueue flag: >> >> commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") >> commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") >> >> The point of the refactoring is to eventually alter the default behavior of >> workqueues to become unbound by default so that their workload placement is >> optimized by the scheduler. >> >> Before that to happen, workqueue users must be converted to the better named >> new workqueues with no intended behaviour changes: >> >> system_wq -> system_percpu_wq >> system_unbound_wq -> system_dfl_wq >> >> This way the old obsolete workqueues (system_wq, system_unbound_wq) can be >> removed in the future. > > I recall earlier efforts to replace system workqueues with per‑driver queues, > because unloading a driver forces a flush of the entire system workqueue, > which is undesirable for overall system behavior. > > Wouldn't it be better to introduce a local workqueue here and use that instead? Thanks. 1.The initialization should be: my_wq = alloc_workqueue("my_driver_queue", WQ_UNBOUND | WQ_MEM_RECLAIM, 0); if (!my_wq) return -ENOMEM; 2. The Submission should be: queue_work(my_wq, &my_work); 3. Destroy should be: destroy_workqueue() Thanks, Zhu Yanjun > > Thanks > >> >> Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ >> Suggested-by: Tejun Heo <tj@kernel.org> >> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> >> --- >> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c >> index bc11b1ec59ac..d440c8cbaea5 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_odp.c >> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c >> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, >> work->frags[i].mr = mr; >> } >> >> - queue_work(system_unbound_wq, &work->work); >> + queue_work(system_dfl_wq, &work->work); >> >> return 0; >> >> -- >> 2.53.0 >> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-17 14:38 ` Zhu Yanjun @ 2026-03-17 17:24 ` Yanjun.Zhu 2026-03-17 19:03 ` Leon Romanovsky 0 siblings, 1 reply; 15+ messages in thread From: Yanjun.Zhu @ 2026-03-17 17:24 UTC (permalink / raw) To: Leon Romanovsky, Marco Crivellari, Zhu Yanjun Cc: linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On 3/17/26 7:38 AM, Zhu Yanjun wrote: > 在 2026/3/16 13:13, Leon Romanovsky 写道: >> On Fri, Mar 13, 2026 at 04:40:23PM +0100, Marco Crivellari wrote: >>> This patch continues the effort to refactor workqueue APIs, which >>> has begun >>> with the changes introducing new workqueues and a new >>> alloc_workqueue flag: >>> >>> commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and >>> system_dfl_wq") >>> commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") >>> >>> The point of the refactoring is to eventually alter the default >>> behavior of >>> workqueues to become unbound by default so that their workload >>> placement is >>> optimized by the scheduler. >>> >>> Before that to happen, workqueue users must be converted to the >>> better named >>> new workqueues with no intended behaviour changes: >>> >>> system_wq -> system_percpu_wq >>> system_unbound_wq -> system_dfl_wq >>> >>> This way the old obsolete workqueues (system_wq, system_unbound_wq) >>> can be >>> removed in the future. >> >> I recall earlier efforts to replace system workqueues with per‑driver >> queues, >> because unloading a driver forces a flush of the entire system >> workqueue, >> which is undesirable for overall system behavior. >> >> Wouldn't it be better to introduce a local workqueue here and use >> that instead? > > Thanks. > > 1.The initialization should be: > > my_wq = alloc_workqueue("my_driver_queue", WQ_UNBOUND | > WQ_MEM_RECLAIM, 0); > if (!my_wq) > return -ENOMEM; > > 2. The Submission should be: > > queue_work(my_wq, &my_work); > > 3. Destroy should be: > > destroy_workqueue() > > Thanks, > Zhu Yanjun Hi, Leon The diff for a new work queue in rxe is as below. Please review it. diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index bc11b1ec59ac..03199fef47fb 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, work->frags[i].mr = mr; } - queue_work(system_unbound_wq, &work->work); + rxe_queue_aux_work(&work->work); return 0; diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c index f522820b950c..a2da699b969e 100644 --- a/drivers/infiniband/sw/rxe/rxe_task.c +++ b/drivers/infiniband/sw/rxe/rxe_task.c @@ -6,19 +6,36 @@ #include "rxe.h" +/* work for rxe_task */ static struct workqueue_struct *rxe_wq; +/* work for other rxe jobs */ +static struct workqueue_struct *rxe_aux_wq; + int rxe_alloc_wq(void) { - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE); + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, + WQ_MAX_ACTIVE); if (!rxe_wq) return -ENOMEM; + rxe_aux_wq = alloc_workqueue("rxe_aux_wq", + WQ_UNBOUND | WQ_MEM_RECLAIM, WQ_MAX_ACTIVE); + if (!rxe_aux_wq) { + destroy_workqueue(rxe_wq); + return -ENOMEM; + + } + return 0; } void rxe_destroy_wq(void) { + flush_workqueue(rxe_aux_wq); + destroy_workqueue(rxe_aux_wq); + + flush_workqueue(rxe_wq); destroy_workqueue(rxe_wq); } @@ -254,6 +271,14 @@ void rxe_sched_task(struct rxe_task *task) spin_unlock_irqrestore(&task->lock, flags); } +/* rxe_wq for rxe tasks. rxe_aux_wq for other rxe jobs. + */ +void rxe_queue_aux_work(struct work_struct *work) +{ + WARN_ON_ONCE(!rxe_aux_wq); + queue_work(rxe_aux_wq, work); +} + /* rxe_disable/enable_task are only called from * rxe_modify_qp in process context. Task is moved * to the drained state by do_task. diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h index a8c9a77b6027..e1c0a34808b4 100644 --- a/drivers/infiniband/sw/rxe/rxe_task.h +++ b/drivers/infiniband/sw/rxe/rxe_task.h @@ -36,6 +36,7 @@ int rxe_alloc_wq(void); void rxe_destroy_wq(void); +void rxe_queue_aux_work(struct work_struct *work); /* * init rxe_task structure * qp => parameter to pass to func Zhu Yanjun > >> >> Thanks >> >>> >>> Link: >>> https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ >>> Suggested-by: Tejun Heo <tj@kernel.org> >>> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> >>> --- >>> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c >>> b/drivers/infiniband/sw/rxe/rxe_odp.c >>> index bc11b1ec59ac..d440c8cbaea5 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c >>> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct >>> ib_pd *ibpd, >>> work->frags[i].mr = mr; >>> } >>> - queue_work(system_unbound_wq, &work->work); >>> + queue_work(system_dfl_wq, &work->work); >>> return 0; >>> -- >>> 2.53.0 >>> > ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-17 17:24 ` Yanjun.Zhu @ 2026-03-17 19:03 ` Leon Romanovsky 2026-03-17 19:31 ` Yanjun.Zhu 0 siblings, 1 reply; 15+ messages in thread From: Leon Romanovsky @ 2026-03-17 19:03 UTC (permalink / raw) To: Yanjun.Zhu Cc: Marco Crivellari, linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On Tue, Mar 17, 2026 at 10:24:11AM -0700, Yanjun.Zhu wrote: > > On 3/17/26 7:38 AM, Zhu Yanjun wrote: > > 在 2026/3/16 13:13, Leon Romanovsky 写道: > > > On Fri, Mar 13, 2026 at 04:40:23PM +0100, Marco Crivellari wrote: > > > > This patch continues the effort to refactor workqueue APIs, > > > > which has begun > > > > with the changes introducing new workqueues and a new > > > > alloc_workqueue flag: > > > > > > > > commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and > > > > system_dfl_wq") > > > > commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") > > > > > > > > The point of the refactoring is to eventually alter the default > > > > behavior of > > > > workqueues to become unbound by default so that their workload > > > > placement is > > > > optimized by the scheduler. > > > > > > > > Before that to happen, workqueue users must be converted to the > > > > better named > > > > new workqueues with no intended behaviour changes: > > > > > > > > system_wq -> system_percpu_wq > > > > system_unbound_wq -> system_dfl_wq > > > > > > > > This way the old obsolete workqueues (system_wq, > > > > system_unbound_wq) can be > > > > removed in the future. > > > > > > I recall earlier efforts to replace system workqueues with > > > per‑driver queues, > > > because unloading a driver forces a flush of the entire system > > > workqueue, > > > which is undesirable for overall system behavior. > > > > > > Wouldn't it be better to introduce a local workqueue here and use > > > that instead? > > > > Thanks. > > > > 1.The initialization should be: > > > > my_wq = alloc_workqueue("my_driver_queue", WQ_UNBOUND | WQ_MEM_RECLAIM, > > 0); > > if (!my_wq) > > return -ENOMEM; > > > > 2. The Submission should be: > > > > queue_work(my_wq, &my_work); > > > > 3. Destroy should be: > > > > destroy_workqueue() > > > > Thanks, > > Zhu Yanjun > > Hi, Leon > > The diff for a new work queue in rxe is as below. Please review it. I'm not sure that you need second workqueue and destroy_workqueue already does flush_workqueue. There is no need to call it explicitly. Thanks > > > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c > b/drivers/infiniband/sw/rxe/rxe_odp.c > index bc11b1ec59ac..03199fef47fb 100644 > --- a/drivers/infiniband/sw/rxe/rxe_odp.c > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c > @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, > work->frags[i].mr = mr; > } > > - queue_work(system_unbound_wq, &work->work); > + rxe_queue_aux_work(&work->work); > > return 0; > > diff --git a/drivers/infiniband/sw/rxe/rxe_task.c > b/drivers/infiniband/sw/rxe/rxe_task.c > index f522820b950c..a2da699b969e 100644 > --- a/drivers/infiniband/sw/rxe/rxe_task.c > +++ b/drivers/infiniband/sw/rxe/rxe_task.c > @@ -6,19 +6,36 @@ > > #include "rxe.h" > > +/* work for rxe_task */ > static struct workqueue_struct *rxe_wq; > > +/* work for other rxe jobs */ > +static struct workqueue_struct *rxe_aux_wq; > + > int rxe_alloc_wq(void) > { > - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE); > + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, > + WQ_MAX_ACTIVE); > if (!rxe_wq) > return -ENOMEM; > > + rxe_aux_wq = alloc_workqueue("rxe_aux_wq", > + WQ_UNBOUND | WQ_MEM_RECLAIM, WQ_MAX_ACTIVE); > + if (!rxe_aux_wq) { > + destroy_workqueue(rxe_wq); > + return -ENOMEM; > + > + } > + > return 0; > } > > void rxe_destroy_wq(void) > { > + flush_workqueue(rxe_aux_wq); > + destroy_workqueue(rxe_aux_wq); > + > + flush_workqueue(rxe_wq); > destroy_workqueue(rxe_wq); > } > > @@ -254,6 +271,14 @@ void rxe_sched_task(struct rxe_task *task) > spin_unlock_irqrestore(&task->lock, flags); > } > > +/* rxe_wq for rxe tasks. rxe_aux_wq for other rxe jobs. > + */ > +void rxe_queue_aux_work(struct work_struct *work) > +{ > + WARN_ON_ONCE(!rxe_aux_wq); > + queue_work(rxe_aux_wq, work); > +} > + > /* rxe_disable/enable_task are only called from > * rxe_modify_qp in process context. Task is moved > * to the drained state by do_task. > diff --git a/drivers/infiniband/sw/rxe/rxe_task.h > b/drivers/infiniband/sw/rxe/rxe_task.h > index a8c9a77b6027..e1c0a34808b4 100644 > --- a/drivers/infiniband/sw/rxe/rxe_task.h > +++ b/drivers/infiniband/sw/rxe/rxe_task.h > @@ -36,6 +36,7 @@ int rxe_alloc_wq(void); > > void rxe_destroy_wq(void); > > +void rxe_queue_aux_work(struct work_struct *work); > /* > * init rxe_task structure > * qp => parameter to pass to func > > Zhu Yanjun > > > > > > > > > Thanks > > > > > > > > > > > Link: > > > > https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ > > > > Suggested-by: Tejun Heo <tj@kernel.org> > > > > Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> > > > > --- > > > > drivers/infiniband/sw/rxe/rxe_odp.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c > > > > b/drivers/infiniband/sw/rxe/rxe_odp.c > > > > index bc11b1ec59ac..d440c8cbaea5 100644 > > > > --- a/drivers/infiniband/sw/rxe/rxe_odp.c > > > > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c > > > > @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct > > > > ib_pd *ibpd, > > > > work->frags[i].mr = mr; > > > > } > > > > - queue_work(system_unbound_wq, &work->work); > > > > + queue_work(system_dfl_wq, &work->work); > > > > return 0; > > > > -- > > > > 2.53.0 > > > > > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-17 19:03 ` Leon Romanovsky @ 2026-03-17 19:31 ` Yanjun.Zhu 2026-03-17 20:15 ` Yanjun.Zhu 0 siblings, 1 reply; 15+ messages in thread From: Yanjun.Zhu @ 2026-03-17 19:31 UTC (permalink / raw) To: Leon Romanovsky, Zhu Yanjun Cc: Marco Crivellari, linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On 3/17/26 12:03 PM, Leon Romanovsky wrote: > On Tue, Mar 17, 2026 at 10:24:11AM -0700, Yanjun.Zhu wrote: >> On 3/17/26 7:38 AM, Zhu Yanjun wrote: >>> 在 2026/3/16 13:13, Leon Romanovsky 写道: >>>> On Fri, Mar 13, 2026 at 04:40:23PM +0100, Marco Crivellari wrote: >>>>> This patch continues the effort to refactor workqueue APIs, >>>>> which has begun >>>>> with the changes introducing new workqueues and a new >>>>> alloc_workqueue flag: >>>>> >>>>> commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and >>>>> system_dfl_wq") >>>>> commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") >>>>> >>>>> The point of the refactoring is to eventually alter the default >>>>> behavior of >>>>> workqueues to become unbound by default so that their workload >>>>> placement is >>>>> optimized by the scheduler. >>>>> >>>>> Before that to happen, workqueue users must be converted to the >>>>> better named >>>>> new workqueues with no intended behaviour changes: >>>>> >>>>> system_wq -> system_percpu_wq >>>>> system_unbound_wq -> system_dfl_wq >>>>> >>>>> This way the old obsolete workqueues (system_wq, >>>>> system_unbound_wq) can be >>>>> removed in the future. >>>> I recall earlier efforts to replace system workqueues with >>>> per‑driver queues, >>>> because unloading a driver forces a flush of the entire system >>>> workqueue, >>>> which is undesirable for overall system behavior. >>>> >>>> Wouldn't it be better to introduce a local workqueue here and use >>>> that instead? >>> Thanks. >>> >>> 1.The initialization should be: >>> >>> my_wq = alloc_workqueue("my_driver_queue", WQ_UNBOUND | WQ_MEM_RECLAIM, >>> 0); >>> if (!my_wq) >>> return -ENOMEM; >>> >>> 2. The Submission should be: >>> >>> queue_work(my_wq, &my_work); >>> >>> 3. Destroy should be: >>> >>> destroy_workqueue() >>> >>> Thanks, >>> Zhu Yanjun >> Hi, Leon >> >> The diff for a new work queue in rxe is as below. Please review it. > I'm not sure that you need second workqueue and destroy_workqueue > already does flush_workqueue. There is no need to call it explicitly. flush_workqueue() can be removed. The introduction of the second workqueue is due to rxe_wq being heavily utilized by QP tasks. The additional workqueue helps offload and distribute the workload, preventing rxe_wq from becoming a bottleneck. If you believe that the workload on rxe_wq is not significant, I can simplify the design by removing the second workqueue and using rxe_wq for all work items instead. Zhu Yanjun > > Thanks > >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c >> b/drivers/infiniband/sw/rxe/rxe_odp.c >> index bc11b1ec59ac..03199fef47fb 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_odp.c >> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c >> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, >> work->frags[i].mr = mr; >> } >> >> - queue_work(system_unbound_wq, &work->work); >> + rxe_queue_aux_work(&work->work); >> >> return 0; >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c >> b/drivers/infiniband/sw/rxe/rxe_task.c >> index f522820b950c..a2da699b969e 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_task.c >> +++ b/drivers/infiniband/sw/rxe/rxe_task.c >> @@ -6,19 +6,36 @@ >> >> #include "rxe.h" >> >> +/* work for rxe_task */ >> static struct workqueue_struct *rxe_wq; >> >> +/* work for other rxe jobs */ >> +static struct workqueue_struct *rxe_aux_wq; >> + >> int rxe_alloc_wq(void) >> { >> - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE); >> + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, >> + WQ_MAX_ACTIVE); >> if (!rxe_wq) >> return -ENOMEM; >> >> + rxe_aux_wq = alloc_workqueue("rxe_aux_wq", >> + WQ_UNBOUND | WQ_MEM_RECLAIM, WQ_MAX_ACTIVE); >> + if (!rxe_aux_wq) { >> + destroy_workqueue(rxe_wq); >> + return -ENOMEM; >> + >> + } >> + >> return 0; >> } >> >> void rxe_destroy_wq(void) >> { >> + flush_workqueue(rxe_aux_wq); >> + destroy_workqueue(rxe_aux_wq); >> + >> + flush_workqueue(rxe_wq); >> destroy_workqueue(rxe_wq); >> } >> >> @@ -254,6 +271,14 @@ void rxe_sched_task(struct rxe_task *task) >> spin_unlock_irqrestore(&task->lock, flags); >> } >> >> +/* rxe_wq for rxe tasks. rxe_aux_wq for other rxe jobs. >> + */ >> +void rxe_queue_aux_work(struct work_struct *work) >> +{ >> + WARN_ON_ONCE(!rxe_aux_wq); >> + queue_work(rxe_aux_wq, work); >> +} >> + >> /* rxe_disable/enable_task are only called from >> * rxe_modify_qp in process context. Task is moved >> * to the drained state by do_task. >> diff --git a/drivers/infiniband/sw/rxe/rxe_task.h >> b/drivers/infiniband/sw/rxe/rxe_task.h >> index a8c9a77b6027..e1c0a34808b4 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_task.h >> +++ b/drivers/infiniband/sw/rxe/rxe_task.h >> @@ -36,6 +36,7 @@ int rxe_alloc_wq(void); >> >> void rxe_destroy_wq(void); >> >> +void rxe_queue_aux_work(struct work_struct *work); >> /* >> * init rxe_task structure >> * qp => parameter to pass to func >> >> Zhu Yanjun >> >>>> Thanks >>>> >>>>> Link: >>>>> https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ >>>>> Suggested-by: Tejun Heo <tj@kernel.org> >>>>> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> >>>>> --- >>>>> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c >>>>> b/drivers/infiniband/sw/rxe/rxe_odp.c >>>>> index bc11b1ec59ac..d440c8cbaea5 100644 >>>>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c >>>>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c >>>>> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct >>>>> ib_pd *ibpd, >>>>> work->frags[i].mr = mr; >>>>> } >>>>> - queue_work(system_unbound_wq, &work->work); >>>>> + queue_work(system_dfl_wq, &work->work); >>>>> return 0; >>>>> -- >>>>> 2.53.0 >>>>> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq 2026-03-17 19:31 ` Yanjun.Zhu @ 2026-03-17 20:15 ` Yanjun.Zhu 0 siblings, 0 replies; 15+ messages in thread From: Yanjun.Zhu @ 2026-03-17 20:15 UTC (permalink / raw) To: Leon Romanovsky, Zhu Yanjun Cc: Marco Crivellari, linux-kernel, linux-rdma, Tejun Heo, Lai Jiangshan, Frederic Weisbecker, Sebastian Andrzej Siewior, Michal Hocko, Zhu Yanjun, Jason Gunthorpe On 3/17/26 12:31 PM, Yanjun.Zhu wrote: > > On 3/17/26 12:03 PM, Leon Romanovsky wrote: >> On Tue, Mar 17, 2026 at 10:24:11AM -0700, Yanjun.Zhu wrote: >>> On 3/17/26 7:38 AM, Zhu Yanjun wrote: >>>> 在 2026/3/16 13:13, Leon Romanovsky 写道: >>>>> On Fri, Mar 13, 2026 at 04:40:23PM +0100, Marco Crivellari wrote: >>>>>> This patch continues the effort to refactor workqueue APIs, >>>>>> which has begun >>>>>> with the changes introducing new workqueues and a new >>>>>> alloc_workqueue flag: >>>>>> >>>>>> commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and >>>>>> system_dfl_wq") >>>>>> commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") >>>>>> >>>>>> The point of the refactoring is to eventually alter the default >>>>>> behavior of >>>>>> workqueues to become unbound by default so that their workload >>>>>> placement is >>>>>> optimized by the scheduler. >>>>>> >>>>>> Before that to happen, workqueue users must be converted to the >>>>>> better named >>>>>> new workqueues with no intended behaviour changes: >>>>>> >>>>>> system_wq -> system_percpu_wq >>>>>> system_unbound_wq -> system_dfl_wq >>>>>> >>>>>> This way the old obsolete workqueues (system_wq, >>>>>> system_unbound_wq) can be >>>>>> removed in the future. >>>>> I recall earlier efforts to replace system workqueues with >>>>> per‑driver queues, >>>>> because unloading a driver forces a flush of the entire system >>>>> workqueue, >>>>> which is undesirable for overall system behavior. >>>>> >>>>> Wouldn't it be better to introduce a local workqueue here and use >>>>> that instead? >>>> Thanks. >>>> >>>> 1.The initialization should be: >>>> >>>> my_wq = alloc_workqueue("my_driver_queue", WQ_UNBOUND | >>>> WQ_MEM_RECLAIM, >>>> 0); >>>> if (!my_wq) >>>> return -ENOMEM; >>>> >>>> 2. The Submission should be: >>>> >>>> queue_work(my_wq, &my_work); >>>> >>>> 3. Destroy should be: >>>> >>>> destroy_workqueue() >>>> >>>> Thanks, >>>> Zhu Yanjun >>> Hi, Leon >>> >>> The diff for a new work queue in rxe is as below. Please review it. >> I'm not sure that you need second workqueue and destroy_workqueue >> already does flush_workqueue. There is no need to call it explicitly. > flush_workqueue() can be removed. > > The introduction of the second workqueue is due to rxe_wq being > heavily utilized by QP tasks. > > The additional workqueue helps offload and distribute the workload, > preventing rxe_wq from becoming a bottleneck. > > If you believe that the workload on rxe_wq is not significant, I can > simplify the design > > by removing the second workqueue and using rxe_wq for all work items > instead. > > Zhu Yanjun Hi, Leon The latest commit is as below: diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index bc11b1ec59ac..98092dcc1870 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd, work->frags[i].mr = mr; } - queue_work(system_unbound_wq, &work->work); + rxe_queue_work(&work->work); return 0; diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c index f522820b950c..0131829b5641 100644 --- a/drivers/infiniband/sw/rxe/rxe_task.c +++ b/drivers/infiniband/sw/rxe/rxe_task.c @@ -6,11 +6,13 @@ #include "rxe.h" +/* work for rxe_task */ static struct workqueue_struct *rxe_wq; int rxe_alloc_wq(void) { - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE); + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, + WQ_MAX_ACTIVE); if (!rxe_wq) return -ENOMEM; @@ -254,6 +256,13 @@ void rxe_sched_task(struct rxe_task *task) spin_unlock_irqrestore(&task->lock, flags); } +/* Helper to queue auxiliary tasks into rxe_wq. + */ +void rxe_queue_work(struct work_struct *work) +{ + queue_work(rxe_wq, work); +} + /* rxe_disable/enable_task are only called from * rxe_modify_qp in process context. Task is moved * to the drained state by do_task. diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h index a8c9a77b6027..60c085cc11a7 100644 --- a/drivers/infiniband/sw/rxe/rxe_task.h +++ b/drivers/infiniband/sw/rxe/rxe_task.h @@ -36,6 +36,7 @@ int rxe_alloc_wq(void); void rxe_destroy_wq(void); +void rxe_queue_work(struct work_struct *work); /* * init rxe_task structure * qp => parameter to pass to func > >> >> Thanks >> >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c >>> b/drivers/infiniband/sw/rxe/rxe_odp.c >>> index bc11b1ec59ac..03199fef47fb 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c >>> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct >>> ib_pd *ibpd, >>> work->frags[i].mr = mr; >>> } >>> >>> - queue_work(system_unbound_wq, &work->work); >>> + rxe_queue_aux_work(&work->work); >>> >>> return 0; >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c >>> b/drivers/infiniband/sw/rxe/rxe_task.c >>> index f522820b950c..a2da699b969e 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_task.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_task.c >>> @@ -6,19 +6,36 @@ >>> >>> #include "rxe.h" >>> >>> +/* work for rxe_task */ >>> static struct workqueue_struct *rxe_wq; >>> >>> +/* work for other rxe jobs */ >>> +static struct workqueue_struct *rxe_aux_wq; >>> + >>> int rxe_alloc_wq(void) >>> { >>> - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE); >>> + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, >>> + WQ_MAX_ACTIVE); >>> if (!rxe_wq) >>> return -ENOMEM; >>> >>> + rxe_aux_wq = alloc_workqueue("rxe_aux_wq", >>> + WQ_UNBOUND | WQ_MEM_RECLAIM, WQ_MAX_ACTIVE); >>> + if (!rxe_aux_wq) { >>> + destroy_workqueue(rxe_wq); >>> + return -ENOMEM; >>> + >>> + } >>> + >>> return 0; >>> } >>> >>> void rxe_destroy_wq(void) >>> { >>> + flush_workqueue(rxe_aux_wq); >>> + destroy_workqueue(rxe_aux_wq); >>> + >>> + flush_workqueue(rxe_wq); >>> destroy_workqueue(rxe_wq); >>> } >>> >>> @@ -254,6 +271,14 @@ void rxe_sched_task(struct rxe_task *task) >>> spin_unlock_irqrestore(&task->lock, flags); >>> } >>> >>> +/* rxe_wq for rxe tasks. rxe_aux_wq for other rxe jobs. >>> + */ >>> +void rxe_queue_aux_work(struct work_struct *work) >>> +{ >>> + WARN_ON_ONCE(!rxe_aux_wq); >>> + queue_work(rxe_aux_wq, work); >>> +} >>> + >>> /* rxe_disable/enable_task are only called from >>> * rxe_modify_qp in process context. Task is moved >>> * to the drained state by do_task. >>> diff --git a/drivers/infiniband/sw/rxe/rxe_task.h >>> b/drivers/infiniband/sw/rxe/rxe_task.h >>> index a8c9a77b6027..e1c0a34808b4 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_task.h >>> +++ b/drivers/infiniband/sw/rxe/rxe_task.h >>> @@ -36,6 +36,7 @@ int rxe_alloc_wq(void); >>> >>> void rxe_destroy_wq(void); >>> >>> +void rxe_queue_aux_work(struct work_struct *work); >>> /* >>> * init rxe_task structure >>> * qp => parameter to pass to func >>> >>> Zhu Yanjun >>> >>>>> Thanks >>>>> >>>>>> Link: >>>>>> https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ >>>>>> Suggested-by: Tejun Heo <tj@kernel.org> >>>>>> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> >>>>>> --- >>>>>> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c >>>>>> b/drivers/infiniband/sw/rxe/rxe_odp.c >>>>>> index bc11b1ec59ac..d440c8cbaea5 100644 >>>>>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c >>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c >>>>>> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct >>>>>> ib_pd *ibpd, >>>>>> work->frags[i].mr = mr; >>>>>> } >>>>>> - queue_work(system_unbound_wq, &work->work); >>>>>> + queue_work(system_dfl_wq, &work->work); >>>>>> return 0; >>>>>> -- >>>>>> 2.53.0 >>>>>> ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-03-18 15:08 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-13 15:40 [PATCH] RDMA/rxe: Replace use of system_unbound_wq with system_dfl_wq Marco Crivellari 2026-03-13 17:49 ` yanjun.zhu 2026-03-16 20:13 ` Leon Romanovsky 2026-03-17 14:32 ` Marco Crivellari 2026-03-17 16:24 ` Leon Romanovsky 2026-03-18 8:34 ` Marco Crivellari 2026-03-18 12:20 ` Marco Crivellari 2026-03-18 14:47 ` Zhu Yanjun 2026-03-18 15:02 ` Leon Romanovsky 2026-03-18 15:08 ` Marco Crivellari 2026-03-17 14:38 ` Zhu Yanjun 2026-03-17 17:24 ` Yanjun.Zhu 2026-03-17 19:03 ` Leon Romanovsky 2026-03-17 19:31 ` Yanjun.Zhu 2026-03-17 20:15 ` Yanjun.Zhu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox