From: Dmitry Bogdanov <d.bogdanov@yadro.com>
To: Paul Dagnelie <paul.dagnelie@perforce.com>
Cc: "target-devel@vger.kernel.org" <target-devel@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"mlombard@redhat.com" <mlombard@redhat.com>,
David Mendez <david.mendez@perforce.com>
Subject: Re: Leak of tpg->np_login_sem, possibly due to connection interruptions
Date: Fri, 9 Aug 2024 10:39:58 +0300 [thread overview]
Message-ID: <20240809073958.GA30598@yadro.com> (raw)
In-Reply-To: <SJ0PR20MB5136A763BB1792A1FDECABD780BA2@SJ0PR20MB5136.namprd20.prod.outlook.com>
Hi Paul,
>
> I've done some more digging into this. I still would really appreciate any advice that folks have on how to root cause and fix this bug.
>
> I have access to a core file from a system that was taken while the system was suffering from this issue. In that core dump, we can see that the thread in __transport_wait_for_tasks is waiting for the LUN_RESET command to complete. This lead me to realize that in the syslog output, the LUN_RESET message that occured when the issue first happened is different from the other LUN_RESET commands I see: We never get the "LUN_RESET: TMR for [iblock] Complete" message. That lead me to look for the thread that is blocked in processing the LUN_RESET command. That thread's stack trace looks like this:
>
> 0xffff9416b0fa2080 UNINTERRUPTIBLE 4
> __schedule+0x2bd
> ...
> target_put_cmd_and_wait+0x5a
> core_tmr_drain_state_list
> core_tmr_lun_reset+0x4e3
> target_tmr_work+0xd1
> ...
>
> The command *that* thread is waiting for has a t_state of TRANSPORT_WRITE_PENDING, and it's transport_state is CMD_T_ABORTED. However, it still has a cmd_kref value of 2, which is why the LUN_RESET command can't proceed. It looks like it's a write command (execute_cmd is sbc_execute_rw and data_direction is DMA_TO_DEVICE). I'm still investigating further to try to understand how this state of offairs could occur. Any insight or information anyone could provide would be greatly appreciated.
5.15 is too old kernel for iSCSI, there were plenty of patches that fix
commands hanging there.
Definitely you need this patchset for the beginning:
https://lore.kernel.org/all/20230319015620.96006-1-michael.christie@oracle.com/
BR,
Dmitry
next prev parent reply other threads:[~2024-08-09 7:50 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <SJ0PR20MB5136CD6B38D86FD141070F7E80B82@SJ0PR20MB5136.namprd20.prod.outlook.com>
[not found] ` <SJ0PR20MB5136C6C85B1B82FF78ADECE680B82@SJ0PR20MB5136.namprd20.prod.outlook.com>
2024-08-09 0:25 ` Leak of tpg->np_login_sem, possibly due to connection interruptions Paul Dagnelie
2024-08-09 7:39 ` Dmitry Bogdanov [this message]
2024-08-09 19:40 ` Paul Dagnelie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240809073958.GA30598@yadro.com \
--to=d.bogdanov@yadro.com \
--cc=david.mendez@perforce.com \
--cc=linux-scsi@vger.kernel.org \
--cc=mlombard@redhat.com \
--cc=paul.dagnelie@perforce.com \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox