* [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
@ 2026-03-18 2:57 Zhu Yanjun
2026-03-18 14:53 ` Leon Romanovsky
0 siblings, 1 reply; 5+ messages in thread
From: Zhu Yanjun @ 2026-03-18 2:57 UTC (permalink / raw)
To: zyjzyj2000, jgg, leon, linux-rdma, yanjun.zhu
Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
auxiliary tasks like ODP prefetching. This can lead to interference
from other system services and lacks guaranteed forward progress
under memory pressure.
Currently make all the tasks queue into the driver-specific 'rxe_wq'.
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
drivers/infiniband/sw/rxe/rxe_task.h | 1 +
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index bc11b1ec59ac..98092dcc1870 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
work->frags[i].mr = mr;
}
- queue_work(system_unbound_wq, &work->work);
+ rxe_queue_work(&work->work);
return 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index f522820b950c..4385137eb4d7 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
int rxe_alloc_wq(void)
{
- rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
+ rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
+ WQ_MAX_ACTIVE);
if (!rxe_wq)
return -ENOMEM;
@@ -254,6 +255,13 @@ void rxe_sched_task(struct rxe_task *task)
spin_unlock_irqrestore(&task->lock, flags);
}
+/* Helper to queue auxiliary tasks into rxe_wq.
+ */
+void rxe_queue_work(struct work_struct *work)
+{
+ queue_work(rxe_wq, work);
+}
+
/* rxe_disable/enable_task are only called from
* rxe_modify_qp in process context. Task is moved
* to the drained state by do_task.
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index a8c9a77b6027..60c085cc11a7 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -36,6 +36,7 @@ int rxe_alloc_wq(void);
void rxe_destroy_wq(void);
+void rxe_queue_work(struct work_struct *work);
/*
* init rxe_task structure
* qp => parameter to pass to func
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
2026-03-18 2:57 [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks Zhu Yanjun
@ 2026-03-18 14:53 ` Leon Romanovsky
2026-03-18 15:34 ` Zhu Yanjun
0 siblings, 1 reply; 5+ messages in thread
From: Leon Romanovsky @ 2026-03-18 14:53 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: zyjzyj2000, jgg, linux-rdma
On Wed, Mar 18, 2026 at 03:57:39AM +0100, Zhu Yanjun wrote:
> Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
> auxiliary tasks like ODP prefetching. This can lead to interference
> from other system services and lacks guaranteed forward progress
> under memory pressure.
>
> Currently make all the tasks queue into the driver-specific 'rxe_wq'.
>
> Suggested-by: Leon Romanovsky <leon@kernel.org>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
> drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
> drivers/infiniband/sw/rxe/rxe_task.h | 1 +
> 3 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> index bc11b1ec59ac..98092dcc1870 100644
> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
> work->frags[i].mr = mr;
> }
>
> - queue_work(system_unbound_wq, &work->work);
> + rxe_queue_work(&work->work);
>
> return 0;
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> index f522820b950c..4385137eb4d7 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.c
> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> @@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
>
> int rxe_alloc_wq(void)
> {
> - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
> + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
Why did you add WQ_MEM_RECLAIM flag? rxe_ib_advise_mr_prefetch() doesn't
perform any memory reclaim.
Thanks
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
2026-03-18 14:53 ` Leon Romanovsky
@ 2026-03-18 15:34 ` Zhu Yanjun
2026-03-18 15:41 ` Leon Romanovsky
0 siblings, 1 reply; 5+ messages in thread
From: Zhu Yanjun @ 2026-03-18 15:34 UTC (permalink / raw)
To: Leon Romanovsky, yanjun.zhu@linux.dev; +Cc: zyjzyj2000, jgg, linux-rdma
在 2026/3/18 7:53, Leon Romanovsky 写道:
> On Wed, Mar 18, 2026 at 03:57:39AM +0100, Zhu Yanjun wrote:
>> Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
>> auxiliary tasks like ODP prefetching. This can lead to interference
>> from other system services and lacks guaranteed forward progress
>> under memory pressure.
>>
>> Currently make all the tasks queue into the driver-specific 'rxe_wq'.
>>
>> Suggested-by: Leon Romanovsky <leon@kernel.org>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> ---
>> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
>> drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
>> drivers/infiniband/sw/rxe/rxe_task.h | 1 +
>> 3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
>> index bc11b1ec59ac..98092dcc1870 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
>> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
>> work->frags[i].mr = mr;
>> }
>>
>> - queue_work(system_unbound_wq, &work->work);
>> + rxe_queue_work(&work->work);
>>
>> return 0;
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
>> index f522820b950c..4385137eb4d7 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_task.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
>> @@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
>>
>> int rxe_alloc_wq(void)
>> {
>> - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
>> + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
> Why did you add WQ_MEM_RECLAIM flag? rxe_ib_advise_mr_prefetch() doesn't
> perform any memory reclaim.
You are correct that rxe_ib_advise_mr_prefetch() does not directly call
memory reclaim functions.
However, the WQ_MEM_RECLAIM flag was added to prevent circular
dependencies during
low-memory conditions.
Since rxe handles memory regions that may be part of the storage or
network stack,
the workqueue must be able to make progress even when the system is
under extreme
memory pressure. Without this flag, if the kernel attempts to reclaim
memory and that
reclaim process depends on an RDMA operation being processed by this
workqueue,
the system could deadlock because the workqueue might be unable to spawn
a new
worker thread.
By setting WQ_MEM_RECLAIM, we ensure that a rescuer thread is pre-allocated,
guaranteeing that prefetch and MR-related tasks can complete and allow the
memory management subsystem to finish its reclaim cycle.
Zhu Yanjun
>
> Thanks
--
Best Regards,
Yanjun.Zhu
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
2026-03-18 15:34 ` Zhu Yanjun
@ 2026-03-18 15:41 ` Leon Romanovsky
2026-03-18 16:10 ` Yanjun.Zhu
0 siblings, 1 reply; 5+ messages in thread
From: Leon Romanovsky @ 2026-03-18 15:41 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: zyjzyj2000, jgg, linux-rdma
On Wed, Mar 18, 2026 at 08:34:42AM -0700, Zhu Yanjun wrote:
>
> 在 2026/3/18 7:53, Leon Romanovsky 写道:
> > On Wed, Mar 18, 2026 at 03:57:39AM +0100, Zhu Yanjun wrote:
> > > Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
> > > auxiliary tasks like ODP prefetching. This can lead to interference
> > > from other system services and lacks guaranteed forward progress
> > > under memory pressure.
> > >
> > > Currently make all the tasks queue into the driver-specific 'rxe_wq'.
> > >
> > > Suggested-by: Leon Romanovsky <leon@kernel.org>
> > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> > > ---
> > > drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
> > > drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
> > > drivers/infiniband/sw/rxe/rxe_task.h | 1 +
> > > 3 files changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> > > index bc11b1ec59ac..98092dcc1870 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> > > @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
> > > work->frags[i].mr = mr;
> > > }
> > > - queue_work(system_unbound_wq, &work->work);
> > > + rxe_queue_work(&work->work);
> > > return 0;
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> > > index f522820b950c..4385137eb4d7 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_task.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> > > @@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
> > > int rxe_alloc_wq(void)
> > > {
> > > - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
> > > + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
> > Why did you add WQ_MEM_RECLAIM flag? rxe_ib_advise_mr_prefetch() doesn't
> > perform any memory reclaim.
>
> You are correct that rxe_ib_advise_mr_prefetch() does not directly call
> memory reclaim functions.
>
> However, the WQ_MEM_RECLAIM flag was added to prevent circular dependencies
> during
>
> low-memory conditions.
>
> Since rxe handles memory regions that may be part of the storage or network
> stack,
>
> the workqueue must be able to make progress even when the system is under
> extreme
>
> memory pressure. Without this flag, if the kernel attempts to reclaim memory
> and that
>
> reclaim process depends on an RDMA operation being processed by this
> workqueue,
>
> the system could deadlock because the workqueue might be unable to spawn a
> new
>
> worker thread.
>
> By setting WQ_MEM_RECLAIM, we ensure that a rescuer thread is pre-allocated,
>
> guaranteeing that prefetch and MR-related tasks can complete and allow the
>
> memory management subsystem to finish its reclaim cycle.
Zhu,
Please avoid relying on AI when answering ML-related questions. The
response you received is broadly correct, but it is incorrect for RXE.
You should set the WQ_MEM_RECLAIM flag only when the workqueue handlers
free memory. RXE does the opposite in rxe_ib_advise_mr_prefetch().
Thanks
>
>
> Zhu Yanjun
>
> >
> > Thanks
>
> --
> Best Regards,
> Yanjun.Zhu
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
2026-03-18 15:41 ` Leon Romanovsky
@ 2026-03-18 16:10 ` Yanjun.Zhu
0 siblings, 0 replies; 5+ messages in thread
From: Yanjun.Zhu @ 2026-03-18 16:10 UTC (permalink / raw)
To: Leon Romanovsky, Zhu Yanjun; +Cc: zyjzyj2000, jgg, linux-rdma
On 3/18/26 8:41 AM, Leon Romanovsky wrote:
> On Wed, Mar 18, 2026 at 08:34:42AM -0700, Zhu Yanjun wrote:
>> 在 2026/3/18 7:53, Leon Romanovsky 写道:
>>> On Wed, Mar 18, 2026 at 03:57:39AM +0100, Zhu Yanjun wrote:
>>>> Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
>>>> auxiliary tasks like ODP prefetching. This can lead to interference
>>>> from other system services and lacks guaranteed forward progress
>>>> under memory pressure.
>>>>
>>>> Currently make all the tasks queue into the driver-specific 'rxe_wq'.
>>>>
>>>> Suggested-by: Leon Romanovsky <leon@kernel.org>
>>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>> ---
>>>> drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
>>>> drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
>>>> drivers/infiniband/sw/rxe/rxe_task.h | 1 +
>>>> 3 files changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
>>>> index bc11b1ec59ac..98092dcc1870 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
>>>> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
>>>> work->frags[i].mr = mr;
>>>> }
>>>> - queue_work(system_unbound_wq, &work->work);
>>>> + rxe_queue_work(&work->work);
>>>> return 0;
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
>>>> index f522820b950c..4385137eb4d7 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_task.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
>>>> @@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
>>>> int rxe_alloc_wq(void)
>>>> {
>>>> - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
>>>> + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
>>> Why did you add WQ_MEM_RECLAIM flag? rxe_ib_advise_mr_prefetch() doesn't
>>> perform any memory reclaim.
>> You are correct that rxe_ib_advise_mr_prefetch() does not directly call
>> memory reclaim functions.
>>
>> However, the WQ_MEM_RECLAIM flag was added to prevent circular dependencies
>> during
>>
>> low-memory conditions.
>>
>> Since rxe handles memory regions that may be part of the storage or network
>> stack,
>>
>> the workqueue must be able to make progress even when the system is under
>> extreme
>>
>> memory pressure. Without this flag, if the kernel attempts to reclaim memory
>> and that
>>
>> reclaim process depends on an RDMA operation being processed by this
>> workqueue,
>>
>> the system could deadlock because the workqueue might be unable to spawn a
>> new
>>
>> worker thread.
>>
>> By setting WQ_MEM_RECLAIM, we ensure that a rescuer thread is pre-allocated,
>>
>> guaranteeing that prefetch and MR-related tasks can complete and allow the
>>
>> memory management subsystem to finish its reclaim cycle.
> Zhu,
>
> Please avoid relying on AI when answering ML-related questions. The
> response you received is broadly correct, but it is incorrect for RXE.
> You should set the WQ_MEM_RECLAIM flag only when the workqueue handlers
OK. Thanks a lot.
Zhu Yanjun
> free memory. RXE does the opposite in rxe_ib_advise_mr_prefetch().
>
> Thanks
>
>>
>> Zhu Yanjun
>>
>>> Thanks
>> --
>> Best Regards,
>> Yanjun.Zhu
>>
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-18 16:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 2:57 [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks Zhu Yanjun
2026-03-18 14:53 ` Leon Romanovsky
2026-03-18 15:34 ` Zhu Yanjun
2026-03-18 15:41 ` Leon Romanovsky
2026-03-18 16:10 ` Yanjun.Zhu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox