From: Leon Romanovsky <leon@kernel.org>
To: "yanjun.zhu" <yanjun.zhu@linux.dev>
Cc: Gui-Dong Han <hanguidong02@gmail.com>,
zyjzyj2000@gmail.com, jgg@ziepe.ca, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, baijiaju1990@gmail.com,
stable@vger.kernel.org
Subject: Re: [PATCH] RDMA/rxe: Fix race in do_task() when draining
Date: Thu, 18 Sep 2025 12:58:44 +0300 [thread overview]
Message-ID: <20250918095844.GD10800@unreal> (raw)
In-Reply-To: <a321729d-f8a1-4901-ae9d-f08339b5093b@linux.dev>
On Wed, Sep 17, 2025 at 12:30:56PM -0700, yanjun.zhu wrote:
> On 9/17/25 3:06 AM, Gui-Dong Han wrote:
> > When do_task() exhausts its RXE_MAX_ITERATIONS budget, it unconditionally
>
> From the source code, it will check ret value, then set it to
> TASK_STATE_IDLE, not unconditionally.
>
> > sets the task state to TASK_STATE_IDLE to reschedule. This overwrites
> > the TASK_STATE_DRAINING state that may have been concurrently set by
> > rxe_cleanup_task() or rxe_disable_task().
>
> From the source code, there is a spin lock to protect the state. It will not
> make race condition.
>
> >
> > This race condition breaks the cleanup and disable logic, which expects
> > the task to stop processing new work. The cleanup code may proceed while
> > do_task() reschedules itself, leading to a potential use-after-free.
> >
>
> Can you post the call trace when this problem occurred?
>
> Hi, Jason && Leon
>
> Please comment on this problem.
The idea to recheck task->state looks correct to me, otherwise we overwrite it unconditionally.
However I would write this patch slightly different (without cont = 1):
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 6f8f353e95838..2ff5d7cc0a933 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -132,8 +132,10 @@ static void do_task(struct rxe_task *task)
* yield the cpu and reschedule the task
*/
if (!ret) {
- task->state = TASK_STATE_IDLE;
- resched = 1;
+ if (task->state != TASK_STATE_DRAINING) {
+ task->state = TASK_STATE_IDLE;
+ resched = 1;
+ }
goto exit;
}
@@ -151,7 +153,6 @@ static void do_task(struct rxe_task *task)
break;
case TASK_STATE_DRAINING:
- task->state = TASK_STATE_DRAINED;
break;
default:
(END)
>
> Thanks a lot.
> Yanjun.Zhu
>
> > This bug was introduced during the migration from tasklets to workqueues,
> > where the special handling for the draining case was lost.
> >
> > Fix this by restoring the original behavior. If the state is
> > TASK_STATE_DRAINING when iterations are exhausted, continue the loop by
> > setting cont to 1. This allows new iterations to finish the remaining
> > work and reach the switch statement, which properly transitions the
> > state to TASK_STATE_DRAINED and stops the task as intended.
> >
> > Fixes: 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
> > ---
> > drivers/infiniband/sw/rxe/rxe_task.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> > index 6f8f353e9583..f522820b950c 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_task.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> > @@ -132,8 +132,12 @@ static void do_task(struct rxe_task *task)
> > * yield the cpu and reschedule the task
> > */
> > if (!ret) {
> > - task->state = TASK_STATE_IDLE;
> > - resched = 1;
> > + if (task->state != TASK_STATE_DRAINING) {
> > + task->state = TASK_STATE_IDLE;
> > + resched = 1;
> > + } else {
> > + cont = 1;
> > + }
> > goto exit;
> > }
>
next prev parent reply other threads:[~2025-09-18 9:58 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-17 10:06 [PATCH] RDMA/rxe: Fix race in do_task() when draining Gui-Dong Han
2025-09-17 19:30 ` yanjun.zhu
2025-09-18 2:21 ` Gui-Dong Han
2025-09-18 16:31 ` Yanjun.Zhu
2025-09-19 3:15 ` Gui-Dong Han
2025-09-18 9:58 ` Leon Romanovsky [this message]
2025-09-18 12:02 ` Gui-Dong Han
2025-09-18 12:10 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250918095844.GD10800@unreal \
--to=leon@kernel.org \
--cc=baijiaju1990@gmail.com \
--cc=hanguidong02@gmail.com \
--cc=jgg@ziepe.ca \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=yanjun.zhu@linux.dev \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.