From: Zhu Lingshan <lszhu-IBi9RG/b67k@public.gmane.org>
To: Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: iscsi_trx going into D state
Date: Tue, 18 Oct 2016 11:06:11 +0800 [thread overview]
Message-ID: <4fc72e32-26fb-96bd-8a0d-814eef712b43@suse.com> (raw)
In-Reply-To: <CAANLjFoh+C8QE=qcPKqUUG3SnH2EMmS7DWZ5D4AD7yWMxoK0Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Hi Robert,
I think the reason why you can not logout the targets is that iscsi_np
in D status. I think the patches fixed something, but it seems to be
more than one code path can trigger these similar issues. as you can
see, there are several call stacks, I am still working on it. Actually
in my environment I see there is another call stack not listed in your
mail....
Thanks,
BR
Zhu Lingshan
On 10/18/2016 03:11 AM, Robert LeBlanc wrote:
> Sorry hit send too soon.
>
> In addition, on the client we see:
> # ps -aux | grep D | grep kworker
> root 5583 0.0 0.0 0 0 ? D 11:55 0:03 [kworker/11:0]
> root 7721 0.1 0.0 0 0 ? D 12:00 0:04 [kworker/4:25]
> root 10877 0.0 0.0 0 0 ? D 09:27 0:00 [kworker/22:1]
> root 11246 0.0 0.0 0 0 ? D 10:28 0:00 [kworker/30:2]
> root 14034 0.0 0.0 0 0 ? D 12:20 0:02 [kworker/19:15]
> root 14048 0.0 0.0 0 0 ? D 12:20 0:00 [kworker/16:0]
> root 15871 0.0 0.0 0 0 ? D 12:25 0:00 [kworker/13:0]
> root 17442 0.0 0.0 0 0 ? D 12:28 0:00 [kworker/9:1]
> root 17816 0.0 0.0 0 0 ? D 12:30 0:00 [kworker/11:1]
> root 18744 0.0 0.0 0 0 ? D 12:32 0:00 [kworker/10:2]
> root 19060 0.0 0.0 0 0 ? D 12:32 0:00 [kworker/29:0]
> root 21748 0.0 0.0 0 0 ? D 12:40 0:00 [kworker/21:0]
> root 21967 0.0 0.0 0 0 ? D 12:40 0:00 [kworker/22:0]
> root 21978 0.0 0.0 0 0 ? D 12:40 0:00 [kworker/22:2]
> root 22024 0.0 0.0 0 0 ? D 12:40 0:00 [kworker/22:4]
> root 22035 0.0 0.0 0 0 ? D 12:40 0:00 [kworker/22:5]
> root 22060 0.0 0.0 0 0 ? D 12:40 0:00 [kworker/16:1]
> root 22282 0.0 0.0 0 0 ? D 12:41 0:00 [kworker/26:0]
> root 22362 0.0 0.0 0 0 ? D 12:42 0:00 [kworker/18:9]
> root 22426 0.0 0.0 0 0 ? D 12:42 0:00 [kworker/16:3]
> root 23298 0.0 0.0 0 0 ? D 12:43 0:00 [kworker/12:1]
> root 23302 0.0 0.0 0 0 ? D 12:43 0:00 [kworker/12:5]
> root 24264 0.0 0.0 0 0 ? D 12:46 0:00 [kworker/30:1]
> root 24271 0.0 0.0 0 0 ? D 12:46 0:00 [kworker/14:8]
> root 24441 0.0 0.0 0 0 ? D 12:47 0:00 [kworker/9:7]
> root 24443 0.0 0.0 0 0 ? D 12:47 0:00 [kworker/9:9]
> root 25005 0.0 0.0 0 0 ? D 12:48 0:00 [kworker/30:3]
> root 25158 0.0 0.0 0 0 ? D 12:49 0:00 [kworker/9:12]
> root 26382 0.0 0.0 0 0 ? D 12:52 0:00 [kworker/13:2]
> root 26453 0.0 0.0 0 0 ? D 12:52 0:00 [kworker/21:2]
> root 26724 0.0 0.0 0 0 ? D 12:53 0:00 [kworker/19:1]
> root 28400 0.0 0.0 0 0 ? D 05:20 0:00 [kworker/25:1]
> root 29552 0.0 0.0 0 0 ? D 11:40 0:00 [kworker/17:1]
> root 29811 0.0 0.0 0 0 ? D 11:40 0:00 [kworker/7:10]
> root 31903 0.0 0.0 0 0 ? D 11:43 0:00 [kworker/26:1]
>
> And all of the processes have this stack:
> [<ffffffffa0727ed5>] iser_release_work+0x25/0x60 [ib_iser]
> [<ffffffff8109633f>] process_one_work+0x14f/0x400
> [<ffffffff81096bb4>] worker_thread+0x114/0x470
> [<ffffffff8109c6f8>] kthread+0xd8/0xf0
> [<ffffffff8172004f>] ret_from_fork+0x3f/0x70
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> We are not able to log out of the sessions in all cases. And have to
> restart the box.
>
> iscsiadm -m session will show messages like:
> iscsiadm: could not read session targetname: 5
> iscsiadm: could not find session info for session100
> iscsiadm: could not read session targetname: 5
> iscsiadm: could not find session info for session101
> iscsiadm: could not read session targetname: 5
> iscsiadm: could not find session info for session103
> ...
>
> I can't find any way to force iscsiadm to clean up these sessions
> possibly due to tasks in D state.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
>
>
> On Mon, Oct 17, 2016 at 10:32 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>> Some more info as we hit this this morning. We have volumes mirrored
>> between two targets and we had one target on the kernel with the three
>> patches mentioned in this thread [0][1][2] and the other was on a
>> kernel without the patches. We decided that after a week and a half we
>> wanted to get both targets on the same kernel so we rebooted the
>> non-patched target. Within an hour we saw iSCSI in D state with the
>> same stack trace so it seems that we are not hitting any of the
>> WARN_ON lines. We are getting both iscsi_trx and iscsi_np both in D
>> state, this time we have two iscsi_trx processes in D state. I don't
>> know if stale sessions on the clients could be contributing to this
>> issue (the target trying to close non-existent sessions??). This is on
>> 4.4.23. Any more debug info we can throw at this problem to help?
>>
>> Thank you,
>> Robert LeBlanc
>>
>> # ps aux | grep D | grep iscsi
>> root 16525 0.0 0.0 0 0 ? D 08:50 0:00 [iscsi_np]
>> root 16614 0.0 0.0 0 0 ? D 08:50 0:00 [iscsi_trx]
>> root 16674 0.0 0.0 0 0 ? D 08:50 0:00 [iscsi_trx]
>>
>> # for i in 16525 16614 16674; do echo $i; cat /proc/$i/stack; done
>> 16525
>> [<ffffffff814f0d5f>] iscsit_stop_session+0x19f/0x1d0
>> [<ffffffff814e2516>] iscsi_check_for_session_reinstatement+0x1e6/0x270
>> [<ffffffff814e4ed0>] iscsi_target_check_for_existing_instances+0x30/0x40
>> [<ffffffff814e5020>] iscsi_target_do_login+0x140/0x640
>> [<ffffffff814e63bc>] iscsi_target_start_negotiation+0x1c/0xb0
>> [<ffffffff814e410b>] iscsi_target_login_thread+0xa9b/0xfc0
>> [<ffffffff8109c748>] kthread+0xd8/0xf0
>> [<ffffffff8172018f>] ret_from_fork+0x3f/0x70
>> [<ffffffffffffffff>] 0xffffffffffffffff
>> 16614
>> [<ffffffff814cca79>] target_wait_for_sess_cmds+0x49/0x1a0
>> [<ffffffffa064692b>] isert_wait_conn+0x1ab/0x2f0 [ib_isert]
>> [<ffffffff814f0ef2>] iscsit_close_connection+0x162/0x870
>> [<ffffffff814df9bf>] iscsit_take_action_for_connection_exit+0x7f/0x100
>> [<ffffffff814f00a0>] iscsi_target_rx_thread+0x5a0/0xe80
>> [<ffffffff8109c748>] kthread+0xd8/0xf0
>> [<ffffffff8172018f>] ret_from_fork+0x3f/0x70
>> [<ffffffffffffffff>] 0xffffffffffffffff
>> 16674
>> [<ffffffff814cca79>] target_wait_for_sess_cmds+0x49/0x1a0
>> [<ffffffffa064692b>] isert_wait_conn+0x1ab/0x2f0 [ib_isert]
>> [<ffffffff814f0ef2>] iscsit_close_connection+0x162/0x870
>> [<ffffffff814df9bf>] iscsit_take_action_for_connection_exit+0x7f/0x100
>> [<ffffffff814f00a0>] iscsi_target_rx_thread+0x5a0/0xe80
>> [<ffffffff8109c748>] kthread+0xd8/0xf0
>> [<ffffffff8172018f>] ret_from_fork+0x3f/0x70
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>>
>> [0] https://www.spinics.net/lists/target-devel/msg13463.html
>> [1] http://marc.info/?l=linux-scsi&m=147282568910535&w=2
>> [2] http://www.spinics.net/lists/linux-scsi/msg100221.html
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Fri, Oct 7, 2016 at 8:59 PM, Zhu Lingshan <lszhu-IBi9RG/b67k@public.gmane.org> wrote:
>>> Hi Robert,
>>>
>>> I also see this issue, but this is not the only code path can trigger this
>>> problem, I think you may also see iscsi_np in D status. I fixed one code
>>> path whitch still not merged to mainline. I will forward you my patch later.
>>> Note: my patch only fixed one code path, you may see other call statck with
>>> D status.
>>>
>>> Thanks,
>>> BR
>>> Zhu Lingshan
>>>
>>>
>>> 在 2016/10/1 1:14, Robert LeBlanc 写道:
>>>> We are having a reoccurring problem where iscsi_trx is going into D
>>>> state. It seems like it is waiting for a session tear down to happen
>>>> or something, but keeps waiting. We have to reboot these targets on
>>>> occasion. This is running the 4.4.12 kernel and we have seen it on
>>>> several previous 4.4.x and 4.2.x kernels. There is no message in dmesg
>>>> or /var/log/messages. This seems to happen with increased frequency
>>>> when we have a disruption in our Infiniband fabric, but can happen
>>>> without any changes to the fabric (other than hosts rebooting).
>>>>
>>>> # ps aux | grep iscsi | grep D
>>>> root 4185 0.0 0.0 0 0 ? D Sep29 0:00
>>>> [iscsi_trx]
>>>> root 18505 0.0 0.0 0 0 ? D Sep29 0:00
>>>> [iscsi_np]
>>>>
>>>> # cat /proc/4185/stack
>>>> [<ffffffff814cc999>] target_wait_for_sess_cmds+0x49/0x1a0
>>>> [<ffffffffa087292b>] isert_wait_conn+0x1ab/0x2f0 [ib_isert]
>>>> [<ffffffff814f0de2>] iscsit_close_connection+0x162/0x840
>>>> [<ffffffff814df8df>] iscsit_take_action_for_connection_exit+0x7f/0x100
>>>> [<ffffffff814effc0>] iscsi_target_rx_thread+0x5a0/0xe80
>>>> [<ffffffff8109c6f8>] kthread+0xd8/0xf0
>>>> [<ffffffff8172004f>] ret_from_fork+0x3f/0x70
>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>
>>>> # cat /proc/18505/stack
>>>> [<ffffffff814f0c71>] iscsit_stop_session+0x1b1/0x1c0
>>>> [<ffffffff814e2436>] iscsi_check_for_session_reinstatement+0x1e6/0x270
>>>> [<ffffffff814e4df0>] iscsi_target_check_for_existing_instances+0x30/0x40
>>>> [<ffffffff814e4f40>] iscsi_target_do_login+0x140/0x640
>>>> [<ffffffff814e62dc>] iscsi_target_start_negotiation+0x1c/0xb0
>>>> [<ffffffff814e402b>] iscsi_target_login_thread+0xa9b/0xfc0
>>>> [<ffffffff8109c6f8>] kthread+0xd8/0xf0
>>>> [<ffffffff8172004f>] ret_from_fork+0x3f/0x70
>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>
>>>> What can we do to help get this resolved?
>>>>
>>>> Thanks,
>>>>
>>>> ----------------
>>>> Robert LeBlanc
>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-10-18 3:06 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-30 17:14 iscsi_trx going into D state Robert LeBlanc
[not found] ` <CAANLjFoj9-qscJOSf2jtKYt2+4cQxMHNJ9q2QTey4wyG5OTSAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-04 7:55 ` Johannes Thumshirn
[not found] ` <20161004075545.j52mg3a2jckrchlp-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org>
2016-10-04 9:11 ` Hannes Reinecke
2016-10-04 11:46 ` Christoph Hellwig
[not found] ` <20161004114642.GA2377-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-10-04 16:39 ` Robert LeBlanc
2016-10-05 17:40 ` Robert LeBlanc
2016-10-05 18:03 ` Christoph Hellwig
2016-10-05 18:19 ` Robert LeBlanc
2016-10-08 2:59 ` Zhu Lingshan
2016-10-17 16:32 ` Robert LeBlanc
[not found] ` <CAANLjFobXiBO2tXxTBB-8BQjM8FC0wmxdxQvEd6Rp=1LZkrvpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-17 19:03 ` Robert LeBlanc
2016-10-17 19:11 ` Robert LeBlanc
[not found] ` <CAANLjFoh+C8QE=qcPKqUUG3SnH2EMmS7DWZ5D4AD7yWMxoK0Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-18 3:06 ` Zhu Lingshan [this message]
[not found] ` <4fc72e32-26fb-96bd-8a0d-814eef712b43-IBi9RG/b67k@public.gmane.org>
2016-10-18 4:42 ` Robert LeBlanc
2016-10-18 7:05 ` Nicholas A. Bellinger
2016-10-18 7:52 ` Nicholas A. Bellinger
[not found] ` <1476774332.8490.43.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-10-18 22:13 ` Robert LeBlanc
[not found] ` <CAANLjFqXt5r=c9F75vjeK=_zLa8zCS1priLuZo=A1ZSHKZ=1Bw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-19 6:25 ` Nicholas A. Bellinger
[not found] ` <1476858359.8490.97.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-10-19 16:41 ` Robert LeBlanc
[not found] ` <CAANLjFoGEi29goybqsvEg6trystEkurVz52P8SwqGUSNV1jdSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-29 22:29 ` Nicholas A. Bellinger
[not found] ` <1477780190.22703.47.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-10-31 16:34 ` Robert LeBlanc
[not found] ` <CAANLjFpkEVmO83r5YWh=hCnN=AUf9bvrrCyVJHc-=CRpc3P0vQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-11-04 21:57 ` Robert LeBlanc
[not found] ` <CAANLjFqoHuSq2SsNZ4J2uvAQGPg0F1tpxeJuAQT1oM1hXQ0wew-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-12 23:57 ` Robert LeBlanc
[not found] ` <CAANLjFpYT62G86w-r00+shJUyrPd68BS64y8f9OZemz_5kojzg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-15 20:38 ` Robert LeBlanc
[not found] ` <CAANLjFon+re7eMriFjnFfR-4SnzxR4LLSb2qcwhfkb7ODbuTwg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-21 23:39 ` Robert LeBlanc
2016-12-22 19:15 ` Doug Ledford
2016-12-27 20:22 ` Robert LeBlanc
[not found] ` <CAANLjFq2ib0H+W3RFVAdqvWF8_qDOkM5mvmAhVh0x4Usha2dOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-27 20:58 ` Robert LeBlanc
[not found] ` <CAANLjFqRskoM7dn_zj_-V=uUb5KYq0OLLdLLuC4Uuba4+mq5Vw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-28 20:39 ` Robert LeBlanc
2016-12-28 20:58 ` Robert LeBlanc
[not found] ` <CAANLjFpbE9-B8qWtU5nDfg4+t+kD8TSVy0JOfN+zuFYsZ05_Dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-29 21:23 ` Robert LeBlanc
[not found] ` <CAANLjFpEpJ4647u9R-7phf68fw--pOfThbp5Sntd4c7DdRSwwQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-29 23:57 ` Robert LeBlanc
[not found] ` <CAANLjFooGrt51a9rOy8TKMyXyxBYmGEPm=h1YJm81Nj6YS=5yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-30 23:07 ` Robert LeBlanc
[not found] ` <CAANLjFrZrTPUuzP_NjkgG5h_YwwYKEWT-KzVjTvuXZ1d04z6Fg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-03 20:07 ` Robert LeBlanc
[not found] ` <CAANLjFpSnQ7ApOK5HDRHXQQeQNGWLUv4e+2N=_e-zBeziYm5tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-04 0:11 ` Robert LeBlanc
2017-01-06 17:06 ` Laurence Oberman
2017-01-06 19:12 ` Robert LeBlanc
2017-01-12 21:22 ` Robert LeBlanc
2017-01-12 21:26 ` Robert LeBlanc
2017-01-13 15:10 ` Laurence Oberman
[not found] ` <1449740553.15880491.1484320214006.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-01-13 23:38 ` Robert LeBlanc
[not found] ` <CAANLjFrFxasp6e=jWq4FwPFjRLgX-nwHc5n+eYRTz9EjTCAQ5g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-15 18:15 ` Laurence Oberman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4fc72e32-26fb-96bd-8a0d-814eef712b43@suse.com \
--to=lszhu-ibi9rg/b67k@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox