* Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client
@ 2023-11-30 5:22 Sukruth Sridharan (he/him)
2023-11-30 7:36 ` Bagas Sanjaya
2023-11-30 14:08 ` Chuck Lever III
0 siblings, 2 replies; 6+ messages in thread
From: Sukruth Sridharan (he/him) @ 2023-11-30 5:22 UTC (permalink / raw)
To: linux-nfs
I notice the following hung task panic on 6.2.0-34 kernel during RDMA disconnect
[Wed Nov 1 08:03:54 2023] INFO: task kworker/u16:5:2274646 blocked
for more than 120 seconds.
[Wed Nov 1 08:03:55 2023] Tainted: G W OE
6.2.0-34-generic #34-Ubuntu
[Wed Nov 1 08:03:55 2023] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Nov 1 08:03:55 2023] task:kworker/u16:5 state:D stack:0
pid:2274646 ppid:2 flags:0x00004000
[Wed Nov 1 08:03:55 2023] Workqueue: xprtiod xprt_autoclose [sunrpc]
[Wed Nov 1 08:03:55 2023] Call Trace:
[Wed Nov 1 08:03:55 2023] <TASK>
[Wed Nov 1 08:03:55 2023] __schedule+0x2aa/0x610
[Wed Nov 1 08:03:55 2023] schedule+0x63/0x110
[Wed Nov 1 08:03:55 2023] schedule_timeout+0x157/0x170
[Wed Nov 1 08:03:55 2023] wait_for_completion+0x88/0x150
[Wed Nov 1 08:03:55 2023] rpcrdma_xprt_disconnect+0x33f/0x350 [rpcrdma]
[Wed Nov 1 08:03:55 2023] xprt_rdma_close+0x12/0x40 [rpcrdma]
[Wed Nov 1 08:03:55 2023] xprt_autoclose+0x5c/0x120 [sunrpc]
[Wed Nov 1 08:03:55 2023] process_one_work+0x225/0x430
[Wed Nov 1 08:03:55 2023] worker_thread+0x50/0x3e0
[Wed Nov 1 08:03:55 2023] ? __pfx_worker_thread+0x10/0x10
[Wed Nov 1 08:03:55 2023] kthread+0xe9/0x110
[Wed Nov 1 08:03:55 2023] ? __pfx_kthread+0x10/0x10
[Wed Nov 1 08:03:55 2023] ret_from_fork+0x2c/0x50
[Wed Nov 1 08:03:55 2023] </TASK>
The flow which induced the bug is as follows:
1. Client initiates connection
2. Server hands off the response to the first RPC on the connection to
the NIC (Mellanox ConnectX-5)
3. NIC tries to send the response around 6 times and fails the response with RNR
4. Client issues disconnect (possibly because it didn't receive a response)
5. Server cleans up the connection state
6. Client runs into the above panic as part of disconnect while draining the IOs
It looks like re_receiving is set only in rpcrdma_post_recvs, and the
reason why it wouldn't be reset is if memory-region allocation code
fails.
That is possible if disconnect on the client somehow blocks allocation.
void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
{
// ... (some initialization code)
if (atomic_inc_return(&ep->re_receiving) > 1)
goto out;
// ... (some allocation code)
if (!wr) // <<<<<<<<<<<<<<<<<< PROBLEM HERE >>>>>>>>>>>>>>>>>>>
goto out;
// ... (post recv code, and some error handling)
if (atomic_dec_return(&ep->re_receiving) > 0)
complete(&ep->re_done);
out:
trace_xprtrdma_post_recvs(r_xprt, count);
ep->re_receive_count += count;
return;
}
static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_ep *ep = r_xprt->rx_ep;
struct rdma_cm_id *id = ep->re_id;
/* Wait for rpcrdma_post_recvs() to leave its critical
* section.
*/
if (atomic_inc_return(&ep->re_receiving) > 1) //
<<<<<<<<<<<<<<<<<<< This is not reset, so wait gets stuck
>>>>>>>>>>>>>>>>>
wait_for_completion(&ep->re_done);
/* Flush Receives, then wait for deferred Reply work
* to complete.
*/
ib_drain_rq(id->qp);
/* Deferred Reply processing might have scheduled
* local invalidations.
*/
ib_drain_sq(id->qp);
rpcrdma_ep_put(ep);
}
Can you help conclude if the above theory around the bug being in the
client code is right? If not, can you help with steps/data points
required to debug this further?
Thanks in advance.
Sukruth
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client
2023-11-30 5:22 Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client Sukruth Sridharan (he/him)
@ 2023-11-30 7:36 ` Bagas Sanjaya
2023-11-30 12:15 ` Sukruth Sridharan (he/him)
2023-11-30 12:23 ` Sukruth Sridharan (he/him)
2023-11-30 14:08 ` Chuck Lever III
1 sibling, 2 replies; 6+ messages in thread
From: Bagas Sanjaya @ 2023-11-30 7:36 UTC (permalink / raw)
To: Sukruth Sridharan (he/him), Linux Network File System, Linux RDMA
Cc: Chuck Lever, Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Saeed Mahameed, Leon Romanovsky
[-- Attachment #1: Type: text/plain, Size: 3975 bytes --]
On Thu, Nov 30, 2023 at 10:52:59AM +0530, Sukruth Sridharan (he/him) wrote:
> I notice the following hung task panic on 6.2.0-34 kernel during RDMA disconnect
>
> [Wed Nov 1 08:03:54 2023] INFO: task kworker/u16:5:2274646 blocked
> for more than 120 seconds.
> [Wed Nov 1 08:03:55 2023] Tainted: G W OE
> 6.2.0-34-generic #34-Ubuntu
> [Wed Nov 1 08:03:55 2023] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Nov 1 08:03:55 2023] task:kworker/u16:5 state:D stack:0
> pid:2274646 ppid:2 flags:0x00004000
> [Wed Nov 1 08:03:55 2023] Workqueue: xprtiod xprt_autoclose [sunrpc]
> [Wed Nov 1 08:03:55 2023] Call Trace:
> [Wed Nov 1 08:03:55 2023] <TASK>
> [Wed Nov 1 08:03:55 2023] __schedule+0x2aa/0x610
> [Wed Nov 1 08:03:55 2023] schedule+0x63/0x110
> [Wed Nov 1 08:03:55 2023] schedule_timeout+0x157/0x170
> [Wed Nov 1 08:03:55 2023] wait_for_completion+0x88/0x150
> [Wed Nov 1 08:03:55 2023] rpcrdma_xprt_disconnect+0x33f/0x350 [rpcrdma]
> [Wed Nov 1 08:03:55 2023] xprt_rdma_close+0x12/0x40 [rpcrdma]
> [Wed Nov 1 08:03:55 2023] xprt_autoclose+0x5c/0x120 [sunrpc]
> [Wed Nov 1 08:03:55 2023] process_one_work+0x225/0x430
> [Wed Nov 1 08:03:55 2023] worker_thread+0x50/0x3e0
> [Wed Nov 1 08:03:55 2023] ? __pfx_worker_thread+0x10/0x10
> [Wed Nov 1 08:03:55 2023] kthread+0xe9/0x110
> [Wed Nov 1 08:03:55 2023] ? __pfx_kthread+0x10/0x10
> [Wed Nov 1 08:03:55 2023] ret_from_fork+0x2c/0x50
> [Wed Nov 1 08:03:55 2023] </TASK>
>
> The flow which induced the bug is as follows:
> 1. Client initiates connection
> 2. Server hands off the response to the first RPC on the connection to
> the NIC (Mellanox ConnectX-5)
> 3. NIC tries to send the response around 6 times and fails the response with RNR
> 4. Client issues disconnect (possibly because it didn't receive a response)
> 5. Server cleans up the connection state
> 6. Client runs into the above panic as part of disconnect while draining the IOs
>
> It looks like re_receiving is set only in rpcrdma_post_recvs, and the
> reason why it wouldn't be reset is if memory-region allocation code
> fails.
> That is possible if disconnect on the client somehow blocks allocation.
>
> void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
> {
> // ... (some initialization code)
>
> if (atomic_inc_return(&ep->re_receiving) > 1)
> goto out;
>
> // ... (some allocation code)
>
> if (!wr) // <<<<<<<<<<<<<<<<<< PROBLEM HERE >>>>>>>>>>>>>>>>>>>
> goto out;
>
> // ... (post recv code, and some error handling)
>
> if (atomic_dec_return(&ep->re_receiving) > 0)
> complete(&ep->re_done);
>
> out:
> trace_xprtrdma_post_recvs(r_xprt, count);
> ep->re_receive_count += count;
> return;
> }
>
> static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
> {
> struct rpcrdma_ep *ep = r_xprt->rx_ep;
> struct rdma_cm_id *id = ep->re_id;
>
> /* Wait for rpcrdma_post_recvs() to leave its critical
> * section.
> */
> if (atomic_inc_return(&ep->re_receiving) > 1) //
> <<<<<<<<<<<<<<<<<<< This is not reset, so wait gets stuck
> >>>>>>>>>>>>>>>>>
> wait_for_completion(&ep->re_done);
>
> /* Flush Receives, then wait for deferred Reply work
> * to complete.
> */
> ib_drain_rq(id->qp);
>
> /* Deferred Reply processing might have scheduled
> * local invalidations.
> */
> ib_drain_sq(id->qp);
>
> rpcrdma_ep_put(ep);
> }
>
> Can you help conclude if the above theory around the bug being in the
> client code is right? If not, can you help with steps/data points
> required to debug this further?
>
Can you verify that the bug still occurs with latest vanilla kernel
(currently v6.7-rc3)?
Thanks.
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client
2023-11-30 7:36 ` Bagas Sanjaya
@ 2023-11-30 12:15 ` Sukruth Sridharan (he/him)
2023-11-30 12:17 ` Bagas Sanjaya
2023-11-30 12:23 ` Sukruth Sridharan (he/him)
1 sibling, 1 reply; 6+ messages in thread
From: Sukruth Sridharan (he/him) @ 2023-11-30 12:15 UTC (permalink / raw)
To: Bagas Sanjaya
Cc: Linux Network File System, Linux RDMA, Chuck Lever, Jeff Layton,
Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Saeed Mahameed, Leon Romanovsky
The issue has been seen once in the past few weeks.
Unfortunately, we're yet to see a repro of the same.
We will try to repro it on the latest kernel.
Curious if there's any improvements gone in that you suspect would
have resolved the issue?
Thanks,
Sukruth
On Thu, Nov 30, 2023 at 1:06 PM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On Thu, Nov 30, 2023 at 10:52:59AM +0530, Sukruth Sridharan (he/him) wrote:
> > I notice the following hung task panic on 6.2.0-34 kernel during RDMA disconnect
> >
> > [Wed Nov 1 08:03:54 2023] INFO: task kworker/u16:5:2274646 blocked
> > for more than 120 seconds.
> > [Wed Nov 1 08:03:55 2023] Tainted: G W OE
> > 6.2.0-34-generic #34-Ubuntu
> > [Wed Nov 1 08:03:55 2023] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [Wed Nov 1 08:03:55 2023] task:kworker/u16:5 state:D stack:0
> > pid:2274646 ppid:2 flags:0x00004000
> > [Wed Nov 1 08:03:55 2023] Workqueue: xprtiod xprt_autoclose [sunrpc]
> > [Wed Nov 1 08:03:55 2023] Call Trace:
> > [Wed Nov 1 08:03:55 2023] <TASK>
> > [Wed Nov 1 08:03:55 2023] __schedule+0x2aa/0x610
> > [Wed Nov 1 08:03:55 2023] schedule+0x63/0x110
> > [Wed Nov 1 08:03:55 2023] schedule_timeout+0x157/0x170
> > [Wed Nov 1 08:03:55 2023] wait_for_completion+0x88/0x150
> > [Wed Nov 1 08:03:55 2023] rpcrdma_xprt_disconnect+0x33f/0x350 [rpcrdma]
> > [Wed Nov 1 08:03:55 2023] xprt_rdma_close+0x12/0x40 [rpcrdma]
> > [Wed Nov 1 08:03:55 2023] xprt_autoclose+0x5c/0x120 [sunrpc]
> > [Wed Nov 1 08:03:55 2023] process_one_work+0x225/0x430
> > [Wed Nov 1 08:03:55 2023] worker_thread+0x50/0x3e0
> > [Wed Nov 1 08:03:55 2023] ? __pfx_worker_thread+0x10/0x10
> > [Wed Nov 1 08:03:55 2023] kthread+0xe9/0x110
> > [Wed Nov 1 08:03:55 2023] ? __pfx_kthread+0x10/0x10
> > [Wed Nov 1 08:03:55 2023] ret_from_fork+0x2c/0x50
> > [Wed Nov 1 08:03:55 2023] </TASK>
> >
> > The flow which induced the bug is as follows:
> > 1. Client initiates connection
> > 2. Server hands off the response to the first RPC on the connection to
> > the NIC (Mellanox ConnectX-5)
> > 3. NIC tries to send the response around 6 times and fails the response with RNR
> > 4. Client issues disconnect (possibly because it didn't receive a response)
> > 5. Server cleans up the connection state
> > 6. Client runs into the above panic as part of disconnect while draining the IOs
> >
> > It looks like re_receiving is set only in rpcrdma_post_recvs, and the
> > reason why it wouldn't be reset is if memory-region allocation code
> > fails.
> > That is possible if disconnect on the client somehow blocks allocation.
> >
> > void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
> > {
> > // ... (some initialization code)
> >
> > if (atomic_inc_return(&ep->re_receiving) > 1)
> > goto out;
> >
> > // ... (some allocation code)
> >
> > if (!wr) // <<<<<<<<<<<<<<<<<< PROBLEM HERE >>>>>>>>>>>>>>>>>>>
> > goto out;
> >
> > // ... (post recv code, and some error handling)
> >
> > if (atomic_dec_return(&ep->re_receiving) > 0)
> > complete(&ep->re_done);
> >
> > out:
> > trace_xprtrdma_post_recvs(r_xprt, count);
> > ep->re_receive_count += count;
> > return;
> > }
> >
> > static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
> > {
> > struct rpcrdma_ep *ep = r_xprt->rx_ep;
> > struct rdma_cm_id *id = ep->re_id;
> >
> > /* Wait for rpcrdma_post_recvs() to leave its critical
> > * section.
> > */
> > if (atomic_inc_return(&ep->re_receiving) > 1) //
> > <<<<<<<<<<<<<<<<<<< This is not reset, so wait gets stuck
> > >>>>>>>>>>>>>>>>>
> > wait_for_completion(&ep->re_done);
> >
> > /* Flush Receives, then wait for deferred Reply work
> > * to complete.
> > */
> > ib_drain_rq(id->qp);
> >
> > /* Deferred Reply processing might have scheduled
> > * local invalidations.
> > */
> > ib_drain_sq(id->qp);
> >
> > rpcrdma_ep_put(ep);
> > }
> >
> > Can you help conclude if the above theory around the bug being in the
> > client code is right? If not, can you help with steps/data points
> > required to debug this further?
> >
>
> Can you verify that the bug still occurs with latest vanilla kernel
> (currently v6.7-rc3)?
>
> Thanks.
>
> --
> An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client
2023-11-30 12:15 ` Sukruth Sridharan (he/him)
@ 2023-11-30 12:17 ` Bagas Sanjaya
0 siblings, 0 replies; 6+ messages in thread
From: Bagas Sanjaya @ 2023-11-30 12:17 UTC (permalink / raw)
To: Sukruth Sridharan (he/him)
Cc: Linux Network File System, Linux RDMA, Chuck Lever, Jeff Layton,
Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Saeed Mahameed, Leon Romanovsky
On 11/30/23 19:15, Sukruth Sridharan (he/him) wrote:
> The issue has been seen once in the past few weeks.
> Unfortunately, we're yet to see a repro of the same.
> We will try to repro it on the latest kernel.
> Curious if there's any improvements gone in that you suspect would
> have resolved the issue?
>
Please don't top-post; reply inline with appropriate context instead.
Sorry, I don't know about that question.
Thanks.
--
An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client
2023-11-30 7:36 ` Bagas Sanjaya
2023-11-30 12:15 ` Sukruth Sridharan (he/him)
@ 2023-11-30 12:23 ` Sukruth Sridharan (he/him)
1 sibling, 0 replies; 6+ messages in thread
From: Sukruth Sridharan (he/him) @ 2023-11-30 12:23 UTC (permalink / raw)
To: Bagas Sanjaya
Cc: Linux Network File System, Linux RDMA, Chuck Lever, Jeff Layton,
Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Saeed Mahameed, Leon Romanovsky
On Thu, Nov 30, 2023 at 1:06 PM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On Thu, Nov 30, 2023 at 10:52:59AM +0530, Sukruth Sridharan (he/him) wrote:
> > I notice the following hung task panic on 6.2.0-34 kernel during RDMA disconnect
> >
> > [Wed Nov 1 08:03:54 2023] INFO: task kworker/u16:5:2274646 blocked
> > for more than 120 seconds.
> > [Wed Nov 1 08:03:55 2023] Tainted: G W OE
> > 6.2.0-34-generic #34-Ubuntu
> > [Wed Nov 1 08:03:55 2023] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [Wed Nov 1 08:03:55 2023] task:kworker/u16:5 state:D stack:0
> > pid:2274646 ppid:2 flags:0x00004000
> > [Wed Nov 1 08:03:55 2023] Workqueue: xprtiod xprt_autoclose [sunrpc]
> > [Wed Nov 1 08:03:55 2023] Call Trace:
> > [Wed Nov 1 08:03:55 2023] <TASK>
> > [Wed Nov 1 08:03:55 2023] __schedule+0x2aa/0x610
> > [Wed Nov 1 08:03:55 2023] schedule+0x63/0x110
> > [Wed Nov 1 08:03:55 2023] schedule_timeout+0x157/0x170
> > [Wed Nov 1 08:03:55 2023] wait_for_completion+0x88/0x150
> > [Wed Nov 1 08:03:55 2023] rpcrdma_xprt_disconnect+0x33f/0x350 [rpcrdma]
> > [Wed Nov 1 08:03:55 2023] xprt_rdma_close+0x12/0x40 [rpcrdma]
> > [Wed Nov 1 08:03:55 2023] xprt_autoclose+0x5c/0x120 [sunrpc]
> > [Wed Nov 1 08:03:55 2023] process_one_work+0x225/0x430
> > [Wed Nov 1 08:03:55 2023] worker_thread+0x50/0x3e0
> > [Wed Nov 1 08:03:55 2023] ? __pfx_worker_thread+0x10/0x10
> > [Wed Nov 1 08:03:55 2023] kthread+0xe9/0x110
> > [Wed Nov 1 08:03:55 2023] ? __pfx_kthread+0x10/0x10
> > [Wed Nov 1 08:03:55 2023] ret_from_fork+0x2c/0x50
> > [Wed Nov 1 08:03:55 2023] </TASK>
> >
> > The flow which induced the bug is as follows:
> > 1. Client initiates connection
> > 2. Server hands off the response to the first RPC on the connection to
> > the NIC (Mellanox ConnectX-5)
> > 3. NIC tries to send the response around 6 times and fails the response with RNR
> > 4. Client issues disconnect (possibly because it didn't receive a response)
> > 5. Server cleans up the connection state
> > 6. Client runs into the above panic as part of disconnect while draining the IOs
> >
> > It looks like re_receiving is set only in rpcrdma_post_recvs, and the
> > reason why it wouldn't be reset is if memory-region allocation code
> > fails.
> > That is possible if disconnect on the client somehow blocks allocation.
> >
> > void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
> > {
> > // ... (some initialization code)
> >
> > if (atomic_inc_return(&ep->re_receiving) > 1)
> > goto out;
> >
> > // ... (some allocation code)
> >
> > if (!wr) // <<<<<<<<<<<<<<<<<< PROBLEM HERE >>>>>>>>>>>>>>>>>>>
> > goto out;
> >
> > // ... (post recv code, and some error handling)
> >
> > if (atomic_dec_return(&ep->re_receiving) > 0)
> > complete(&ep->re_done);
> >
> > out:
> > trace_xprtrdma_post_recvs(r_xprt, count);
> > ep->re_receive_count += count;
> > return;
> > }
> >
> > static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
> > {
> > struct rpcrdma_ep *ep = r_xprt->rx_ep;
> > struct rdma_cm_id *id = ep->re_id;
> >
> > /* Wait for rpcrdma_post_recvs() to leave its critical
> > * section.
> > */
> > if (atomic_inc_return(&ep->re_receiving) > 1) //
> > <<<<<<<<<<<<<<<<<<< This is not reset, so wait gets stuck
> > >>>>>>>>>>>>>>>>>
> > wait_for_completion(&ep->re_done);
> >
> > /* Flush Receives, then wait for deferred Reply work
> > * to complete.
> > */
> > ib_drain_rq(id->qp);
> >
> > /* Deferred Reply processing might have scheduled
> > * local invalidations.
> > */
> > ib_drain_sq(id->qp);
> >
> > rpcrdma_ep_put(ep);
> > }
> >
> > Can you help conclude if the above theory around the bug being in the
> > client code is right? If not, can you help with steps/data points
> > required to debug this further?
> >
>
> Can you verify that the bug still occurs with latest vanilla kernel
> (currently v6.7-rc3)?
>
> Thanks.
>
> --
> An old man doll... just what I always wanted! - Clara
The issue has been seen once in the past few weeks.
Unfortunately, we're yet to see a repro of the same.
We will try to repro it on the latest kernel.
Curious if there's any improvements gone in that you suspect would
have resolved the issue?
(Apologies for the top post earlier)
Thanks,
Sukruth
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client
2023-11-30 5:22 Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client Sukruth Sridharan (he/him)
2023-11-30 7:36 ` Bagas Sanjaya
@ 2023-11-30 14:08 ` Chuck Lever III
1 sibling, 0 replies; 6+ messages in thread
From: Chuck Lever III @ 2023-11-30 14:08 UTC (permalink / raw)
To: Sukruth Sridharan (he/him); +Cc: Linux NFS Mailing List
> On Nov 30, 2023, at 12:22 AM, Sukruth Sridharan (he/him) <susridharan@purestorage.com> wrote:
>
> I notice the following hung task panic on 6.2.0-34 kernel during RDMA disconnect
>
> [Wed Nov 1 08:03:54 2023] INFO: task kworker/u16:5:2274646 blocked
> for more than 120 seconds.
> [Wed Nov 1 08:03:55 2023] Tainted: G W OE
> 6.2.0-34-generic #34-Ubuntu
> [Wed Nov 1 08:03:55 2023] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Nov 1 08:03:55 2023] task:kworker/u16:5 state:D stack:0
> pid:2274646 ppid:2 flags:0x00004000
> [Wed Nov 1 08:03:55 2023] Workqueue: xprtiod xprt_autoclose [sunrpc]
> [Wed Nov 1 08:03:55 2023] Call Trace:
> [Wed Nov 1 08:03:55 2023] <TASK>
> [Wed Nov 1 08:03:55 2023] __schedule+0x2aa/0x610
> [Wed Nov 1 08:03:55 2023] schedule+0x63/0x110
> [Wed Nov 1 08:03:55 2023] schedule_timeout+0x157/0x170
> [Wed Nov 1 08:03:55 2023] wait_for_completion+0x88/0x150
> [Wed Nov 1 08:03:55 2023] rpcrdma_xprt_disconnect+0x33f/0x350 [rpcrdma]
> [Wed Nov 1 08:03:55 2023] xprt_rdma_close+0x12/0x40 [rpcrdma]
> [Wed Nov 1 08:03:55 2023] xprt_autoclose+0x5c/0x120 [sunrpc]
> [Wed Nov 1 08:03:55 2023] process_one_work+0x225/0x430
> [Wed Nov 1 08:03:55 2023] worker_thread+0x50/0x3e0
> [Wed Nov 1 08:03:55 2023] ? __pfx_worker_thread+0x10/0x10
> [Wed Nov 1 08:03:55 2023] kthread+0xe9/0x110
> [Wed Nov 1 08:03:55 2023] ? __pfx_kthread+0x10/0x10
> [Wed Nov 1 08:03:55 2023] ret_from_fork+0x2c/0x50
> [Wed Nov 1 08:03:55 2023] </TASK>
Hi Sukruth -
This isn't a panic, fortunately. It's simply a report that
the task is not making progress. More below...
> The flow which induced the bug is as follows:
> 1. Client initiates connection
> 2. Server hands off the response to the first RPC on the connection to
> the NIC (Mellanox ConnectX-5)
> 3. NIC tries to send the response around 6 times and fails the response with RNR
> 4. Client issues disconnect (possibly because it didn't receive a response)
> 5. Server cleans up the connection state
> 6. Client runs into the above panic as part of disconnect while draining the IOs
>
> It looks like re_receiving is set only in rpcrdma_post_recvs, and the
> reason why it wouldn't be reset is if memory-region allocation code
> fails.
> That is possible if disconnect on the client somehow blocks allocation.
>
> void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
> {
> // ... (some initialization code)
>
> if (atomic_inc_return(&ep->re_receiving) > 1)
> goto out;
>
> // ... (some allocation code)
>
> if (!wr) // <<<<<<<<<<<<<<<<<< PROBLEM HERE >>>>>>>>>>>>>>>>>>>
> goto out;
>
> // ... (post recv code, and some error handling)
>
> if (atomic_dec_return(&ep->re_receiving) > 0)
> complete(&ep->re_done);
>
> out:
> trace_xprtrdma_post_recvs(r_xprt, count);
> ep->re_receive_count += count;
> return;
> }
>
> static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
> {
> struct rpcrdma_ep *ep = r_xprt->rx_ep;
> struct rdma_cm_id *id = ep->re_id;
>
> /* Wait for rpcrdma_post_recvs() to leave its critical
> * section.
> */
> if (atomic_inc_return(&ep->re_receiving) > 1) //
> <<<<<<<<<<<<<<<<<<< This is not reset, so wait gets stuck
>>>>>>>>>>>>>>>>>>
> wait_for_completion(&ep->re_done);
>
> /* Flush Receives, then wait for deferred Reply work
> * to complete.
> */
> ib_drain_rq(id->qp);
>
> /* Deferred Reply processing might have scheduled
> * local invalidations.
> */
> ib_drain_sq(id->qp);
>
> rpcrdma_ep_put(ep);
> }
>
> Can you help conclude if the above theory around the bug being in the
> client code is right? If not, can you help with steps/data points
> required to debug this further?
Try applying:
895cedc17919 ("xprtrdma: Remap Receive buffers after a reconnect")
--
Chuck Lever
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-11-30 14:08 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-30 5:22 Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client Sukruth Sridharan (he/him)
2023-11-30 7:36 ` Bagas Sanjaya
2023-11-30 12:15 ` Sukruth Sridharan (he/him)
2023-11-30 12:17 ` Bagas Sanjaya
2023-11-30 12:23 ` Sukruth Sridharan (he/him)
2023-11-30 14:08 ` Chuck Lever III
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.