public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
@ 2026-03-18  2:57 Zhu Yanjun
  2026-03-18 14:53 ` Leon Romanovsky
  0 siblings, 1 reply; 5+ messages in thread
From: Zhu Yanjun @ 2026-03-18  2:57 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, linux-rdma, yanjun.zhu

Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
auxiliary tasks like ODP prefetching. This can lead to interference
from other system services and lacks guaranteed forward progress
under memory pressure.

Currently make all the tasks queue into the driver-specific 'rxe_wq'.

Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe_odp.c  |  2 +-
 drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
 drivers/infiniband/sw/rxe/rxe_task.h |  1 +
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index bc11b1ec59ac..98092dcc1870 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
 		work->frags[i].mr = mr;
 	}
 
-	queue_work(system_unbound_wq, &work->work);
+	rxe_queue_work(&work->work);
 
 	return 0;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index f522820b950c..4385137eb4d7 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
 
 int rxe_alloc_wq(void)
 {
-	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
+	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
+				WQ_MAX_ACTIVE);
 	if (!rxe_wq)
 		return -ENOMEM;
 
@@ -254,6 +255,13 @@ void rxe_sched_task(struct rxe_task *task)
 	spin_unlock_irqrestore(&task->lock, flags);
 }
 
+/* Helper to queue auxiliary tasks into rxe_wq.
+ */
+void rxe_queue_work(struct work_struct *work)
+{
+	queue_work(rxe_wq, work);
+}
+
 /* rxe_disable/enable_task are only called from
  * rxe_modify_qp in process context. Task is moved
  * to the drained state by do_task.
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index a8c9a77b6027..60c085cc11a7 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -36,6 +36,7 @@ int rxe_alloc_wq(void);
 
 void rxe_destroy_wq(void);
 
+void rxe_queue_work(struct work_struct *work);
 /*
  * init rxe_task structure
  *	qp  => parameter to pass to func
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
  2026-03-18  2:57 [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks Zhu Yanjun
@ 2026-03-18 14:53 ` Leon Romanovsky
  2026-03-18 15:34   ` Zhu Yanjun
  0 siblings, 1 reply; 5+ messages in thread
From: Leon Romanovsky @ 2026-03-18 14:53 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: zyjzyj2000, jgg, linux-rdma

On Wed, Mar 18, 2026 at 03:57:39AM +0100, Zhu Yanjun wrote:
> Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
> auxiliary tasks like ODP prefetching. This can lead to interference
> from other system services and lacks guaranteed forward progress
> under memory pressure.
> 
> Currently make all the tasks queue into the driver-specific 'rxe_wq'.
> 
> Suggested-by: Leon Romanovsky <leon@kernel.org>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
>  drivers/infiniband/sw/rxe/rxe_odp.c  |  2 +-
>  drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
>  drivers/infiniband/sw/rxe/rxe_task.h |  1 +
>  3 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> index bc11b1ec59ac..98092dcc1870 100644
> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
>  		work->frags[i].mr = mr;
>  	}
>  
> -	queue_work(system_unbound_wq, &work->work);
> +	rxe_queue_work(&work->work);
>  
>  	return 0;
>  
> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> index f522820b950c..4385137eb4d7 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.c
> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> @@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
>  
>  int rxe_alloc_wq(void)
>  {
> -	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
> +	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,

Why did you add WQ_MEM_RECLAIM flag? rxe_ib_advise_mr_prefetch() doesn't
perform any memory reclaim.

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
  2026-03-18 14:53 ` Leon Romanovsky
@ 2026-03-18 15:34   ` Zhu Yanjun
  2026-03-18 15:41     ` Leon Romanovsky
  0 siblings, 1 reply; 5+ messages in thread
From: Zhu Yanjun @ 2026-03-18 15:34 UTC (permalink / raw)
  To: Leon Romanovsky, yanjun.zhu@linux.dev; +Cc: zyjzyj2000, jgg, linux-rdma


在 2026/3/18 7:53, Leon Romanovsky 写道:
> On Wed, Mar 18, 2026 at 03:57:39AM +0100, Zhu Yanjun wrote:
>> Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
>> auxiliary tasks like ODP prefetching. This can lead to interference
>> from other system services and lacks guaranteed forward progress
>> under memory pressure.
>>
>> Currently make all the tasks queue into the driver-specific 'rxe_wq'.
>>
>> Suggested-by: Leon Romanovsky <leon@kernel.org>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_odp.c  |  2 +-
>>   drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
>>   drivers/infiniband/sw/rxe/rxe_task.h |  1 +
>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
>> index bc11b1ec59ac..98092dcc1870 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
>> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
>>   		work->frags[i].mr = mr;
>>   	}
>>   
>> -	queue_work(system_unbound_wq, &work->work);
>> +	rxe_queue_work(&work->work);
>>   
>>   	return 0;
>>   
>> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
>> index f522820b950c..4385137eb4d7 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_task.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
>> @@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
>>   
>>   int rxe_alloc_wq(void)
>>   {
>> -	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
>> +	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
> Why did you add WQ_MEM_RECLAIM flag? rxe_ib_advise_mr_prefetch() doesn't
> perform any memory reclaim.

You are correct that rxe_ib_advise_mr_prefetch() does not directly call 
memory reclaim functions.

However, the WQ_MEM_RECLAIM flag was added to prevent circular 
dependencies during

low-memory conditions.

Since rxe handles memory regions that may be part of the storage or 
network stack,

the workqueue must be able to make progress even when the system is 
under extreme

memory pressure. Without this flag, if the kernel attempts to reclaim 
memory and that

reclaim process depends on an RDMA operation being processed by this 
workqueue,

the system could deadlock because the workqueue might be unable to spawn 
a new

worker thread.

By setting WQ_MEM_RECLAIM, we ensure that a rescuer thread is pre-allocated,

guaranteeing that prefetch and MR-related tasks can complete and allow the

memory management subsystem to finish its reclaim cycle.


Zhu Yanjun

>
> Thanks

-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
  2026-03-18 15:34   ` Zhu Yanjun
@ 2026-03-18 15:41     ` Leon Romanovsky
  2026-03-18 16:10       ` Yanjun.Zhu
  0 siblings, 1 reply; 5+ messages in thread
From: Leon Romanovsky @ 2026-03-18 15:41 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: zyjzyj2000, jgg, linux-rdma

On Wed, Mar 18, 2026 at 08:34:42AM -0700, Zhu Yanjun wrote:
> 
> 在 2026/3/18 7:53, Leon Romanovsky 写道:
> > On Wed, Mar 18, 2026 at 03:57:39AM +0100, Zhu Yanjun wrote:
> > > Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
> > > auxiliary tasks like ODP prefetching. This can lead to interference
> > > from other system services and lacks guaranteed forward progress
> > > under memory pressure.
> > > 
> > > Currently make all the tasks queue into the driver-specific 'rxe_wq'.
> > > 
> > > Suggested-by: Leon Romanovsky <leon@kernel.org>
> > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> > > ---
> > >   drivers/infiniband/sw/rxe/rxe_odp.c  |  2 +-
> > >   drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
> > >   drivers/infiniband/sw/rxe/rxe_task.h |  1 +
> > >   3 files changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> > > index bc11b1ec59ac..98092dcc1870 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> > > @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
> > >   		work->frags[i].mr = mr;
> > >   	}
> > > -	queue_work(system_unbound_wq, &work->work);
> > > +	rxe_queue_work(&work->work);
> > >   	return 0;
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> > > index f522820b950c..4385137eb4d7 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_task.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> > > @@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
> > >   int rxe_alloc_wq(void)
> > >   {
> > > -	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
> > > +	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
> > Why did you add WQ_MEM_RECLAIM flag? rxe_ib_advise_mr_prefetch() doesn't
> > perform any memory reclaim.
> 
> You are correct that rxe_ib_advise_mr_prefetch() does not directly call
> memory reclaim functions.
> 
> However, the WQ_MEM_RECLAIM flag was added to prevent circular dependencies
> during
> 
> low-memory conditions.
> 
> Since rxe handles memory regions that may be part of the storage or network
> stack,
> 
> the workqueue must be able to make progress even when the system is under
> extreme
> 
> memory pressure. Without this flag, if the kernel attempts to reclaim memory
> and that
> 
> reclaim process depends on an RDMA operation being processed by this
> workqueue,
> 
> the system could deadlock because the workqueue might be unable to spawn a
> new
> 
> worker thread.
> 
> By setting WQ_MEM_RECLAIM, we ensure that a rescuer thread is pre-allocated,
> 
> guaranteeing that prefetch and MR-related tasks can complete and allow the
> 
> memory management subsystem to finish its reclaim cycle.

Zhu,

Please avoid relying on AI when answering ML-related questions. The  
response you received is broadly correct, but it is incorrect for RXE.  
You should set the WQ_MEM_RECLAIM flag only when the workqueue handlers  
free memory. RXE does the opposite in rxe_ib_advise_mr_prefetch().

Thanks

> 
> 
> Zhu Yanjun
> 
> > 
> > Thanks
> 
> -- 
> Best Regards,
> Yanjun.Zhu
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks
  2026-03-18 15:41     ` Leon Romanovsky
@ 2026-03-18 16:10       ` Yanjun.Zhu
  0 siblings, 0 replies; 5+ messages in thread
From: Yanjun.Zhu @ 2026-03-18 16:10 UTC (permalink / raw)
  To: Leon Romanovsky, Zhu Yanjun; +Cc: zyjzyj2000, jgg, linux-rdma


On 3/18/26 8:41 AM, Leon Romanovsky wrote:
> On Wed, Mar 18, 2026 at 08:34:42AM -0700, Zhu Yanjun wrote:
>> 在 2026/3/18 7:53, Leon Romanovsky 写道:
>>> On Wed, Mar 18, 2026 at 03:57:39AM +0100, Zhu Yanjun wrote:
>>>> Currently, the RXE driver uses the system-wide 'system_unbound_wq' for
>>>> auxiliary tasks like ODP prefetching. This can lead to interference
>>>> from other system services and lacks guaranteed forward progress
>>>> under memory pressure.
>>>>
>>>> Currently make all the tasks queue into the driver-specific 'rxe_wq'.
>>>>
>>>> Suggested-by: Leon Romanovsky <leon@kernel.org>
>>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>> ---
>>>>    drivers/infiniband/sw/rxe/rxe_odp.c  |  2 +-
>>>>    drivers/infiniband/sw/rxe/rxe_task.c | 10 +++++++++-
>>>>    drivers/infiniband/sw/rxe/rxe_task.h |  1 +
>>>>    3 files changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
>>>> index bc11b1ec59ac..98092dcc1870 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
>>>> @@ -545,7 +545,7 @@ static int rxe_ib_advise_mr_prefetch(struct ib_pd *ibpd,
>>>>    		work->frags[i].mr = mr;
>>>>    	}
>>>> -	queue_work(system_unbound_wq, &work->work);
>>>> +	rxe_queue_work(&work->work);
>>>>    	return 0;
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
>>>> index f522820b950c..4385137eb4d7 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_task.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
>>>> @@ -10,7 +10,8 @@ static struct workqueue_struct *rxe_wq;
>>>>    int rxe_alloc_wq(void)
>>>>    {
>>>> -	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
>>>> +	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
>>> Why did you add WQ_MEM_RECLAIM flag? rxe_ib_advise_mr_prefetch() doesn't
>>> perform any memory reclaim.
>> You are correct that rxe_ib_advise_mr_prefetch() does not directly call
>> memory reclaim functions.
>>
>> However, the WQ_MEM_RECLAIM flag was added to prevent circular dependencies
>> during
>>
>> low-memory conditions.
>>
>> Since rxe handles memory regions that may be part of the storage or network
>> stack,
>>
>> the workqueue must be able to make progress even when the system is under
>> extreme
>>
>> memory pressure. Without this flag, if the kernel attempts to reclaim memory
>> and that
>>
>> reclaim process depends on an RDMA operation being processed by this
>> workqueue,
>>
>> the system could deadlock because the workqueue might be unable to spawn a
>> new
>>
>> worker thread.
>>
>> By setting WQ_MEM_RECLAIM, we ensure that a rescuer thread is pre-allocated,
>>
>> guaranteeing that prefetch and MR-related tasks can complete and allow the
>>
>> memory management subsystem to finish its reclaim cycle.
> Zhu,
>
> Please avoid relying on AI when answering ML-related questions. The
> response you received is broadly correct, but it is incorrect for RXE.
> You should set the WQ_MEM_RECLAIM flag only when the workqueue handlers

OK. Thanks a lot.

Zhu Yanjun

> free memory. RXE does the opposite in rxe_ib_advise_mr_prefetch().
>
> Thanks
>
>>
>> Zhu Yanjun
>>
>>> Thanks
>> -- 
>> Best Regards,
>> Yanjun.Zhu
>>
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-18 16:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18  2:57 [PATCH 1/1] RDMA/rxe: Use a dedicated and robust workqueue for RXE tasks Zhu Yanjun
2026-03-18 14:53 ` Leon Romanovsky
2026-03-18 15:34   ` Zhu Yanjun
2026-03-18 15:41     ` Leon Romanovsky
2026-03-18 16:10       ` Yanjun.Zhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox