From: Potnuri Bharat Teja <bharat@chelsio.com>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: SWise OGC <swise@opengridcomputing.com>,
'Sagi Grimberg' <sagi@grimberg.me>,
"target-devel@vger.kernel.org" <target-devel@vger.kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: SQ overflow seen running isert traffic
Date: Mon, 20 Mar 2017 15:45:28 +0530 [thread overview]
Message-ID: <20170320101526.GA11699@chelsio.com> (raw)
In-Reply-To: <20161108100617.GA2812@chelsio.com>
On Tuesday, November 11/08/16, 2016 at 15:36:18 +0530, Potnuri Bharat Teja wrote:
> On Monday, October 10/31/16, 2016 at 09:10:08 +0530, Nicholas A. Bellinger
> wrote:
Hi Nicholas,
Tested the following 3 patches for SQ overflow.
https://www.spinics.net/lists/target-devel/msg13571.html
They along with an additional change added on top of those seems to hold good.( Aborts seen
intermittently as shown at the end of mail)
I have been testing the changes using fio with IO sizes ranging from 1KB to 32MB and test is
running fine (despite few Intermittent Aborts).
Here is what I see with the 3 patches alone applied:
-> In isert_put_datain() and isert_post_response() a corresponding recv WR is posted before
posting a send and hence for every post failure a recv is already posted into a tightly packed
RQ, causing it to overflow.
Below are the logs suggesting it:
[ 189.944714] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944718] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944721] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944724] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944729] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944731] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944731] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944732] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944733] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944734] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944735] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944736] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944736] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944737] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944738] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944739] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944740] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944740] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944741] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944746] isert: isert_rdma_rw_ctx_post: Cmd: ffff882d233e83d0 failed to post RDMA res
[ 189.944747] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 189.944748] isert: isert_put_datain: ib_post_recv failed with -12
[ 189.944749] isert: isert_post_recv: ib_post_recv() failed with ret: -12
[ 251.543847] ABORT_TASK: Found referenced iSCSI task_tag: 68
[ 251.543858] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 68
[ 251.544071] ABORT_TASK: Found referenced iSCSI task_tag: 46
[ 251.544076] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 46
[ 251.544300] ABORT_TASK: Found referenced iSCSI task_tag: 51
[ 251.544307] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 51
[ 251.544528] ABORT_TASK: Found referenced iSCSI task_tag: 117
[ 251.544533] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 117
[ 282.778402] Unable to locate ITT: 0x00000062 on CID: 0
[ 282.778404] Unable to locate RefTaskTag: 0x00000062 on CID: 0.
[ 282.778407] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 0
[ 282.778436] ABORT_TASK: Found referenced iSCSI task_tag: 37
[ 282.778439] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 37
[ 282.778519] ABORT_TASK: Found referenced iSCSI task_tag: 65
[ 282.778522] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 65
[ 282.778559] Unable to locate ITT: 0x00000019 on CID: 0
[ 282.778560] Unable to locate RefTaskTag: 0x00000019 on CID: 0.
[ 282.778563] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 0
[ 293.004632] ABORT_TASK: Found referenced iSCSI task_tag: 22
[ 293.004643] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 22
[ 293.004808] TMR Opcode TARGET_WARM_RESET authorization failed for Initiator Node: iqn.1994-05.com.redhat:37fc2b6a5331
[ 310.265993] iSCSI Login timeout on Network Portal 10.0.2.2:3260
[ 493.567143] INFO: task iscsi_np:3998 blocked for more than 120 seconds.
[ 493.567149] Tainted: G OE 4.9.0-For-DSGL-Debug+ #1
[ 493.567151] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 493.567153] iscsi_np D 0 3998 2 0x00000084
[ 493.567159] 0000000000000000 ffff883028a88b40 ffff88302693f7c0 ffff88302de3e300
[ 493.567163] ffff88303e057a40 ffffc900008abc00 ffffffff815e415b ffffc900008abca0
[ 493.567166] ffffffff81af4180 ffff883028a88b40 7fffffffffffffff ffff882f7426db68
[ 493.567170] Call Trace:
[ 493.567180] [<ffffffff815e415b>] ? __schedule+0x23b/0x6c0
[ 493.567184] [<ffffffff815e4616>] schedule+0x36/0x80
[ 493.567187] [<ffffffff815e71c3>] schedule_timeout+0x1f3/0x390
[ 493.567194] [<ffffffff812c1f2e>] ? number+0x2ce/0x300
[ 493.567197] [<ffffffff812c42be>] ? vsnprintf+0x32e/0x4b0
[ 493.567199] [<ffffffff815e56a4>] wait_for_completion+0xb4/0xf0
[ 493.567201] [<ffffffff8108a6b0>] ? wake_up_q+0x80/0x80
[ 493.567216] [<ffffffffa05284d1>] iscsit_stop_session+0x1b1/0x1c0 [iscsi_target_mod]
[ 493.567220] [<ffffffffa0519db2>] iscsi_check_for_session_reinstatement+0x1e2/0x270 [iscsi_target_mod]
[ 493.567225] [<ffffffffa051c8c0>] iscsi_target_check_for_existing_instances+0x30/0x40 [iscsi_target_mod]
[ 493.567230] [<ffffffffa051c9ee>] iscsi_target_do_login+0x11e/0x610 [iscsi_target_mod]
[ 493.567235] [<ffffffffa051dbbe>] iscsi_target_start_negotiation+0x4e/0xa0 [iscsi_target_mod]
[ 493.567239] [<ffffffffa051b933>] iscsi_target_login_thread+0xa73/0xfc0 [iscsi_target_mod]
[ 493.567244] [<ffffffffa051aec0>] ? iscsi_target_login_sess_out+0x250/0x250 [iscsi_target_mod]
[ 493.567246] [<ffffffff810660c4>] ? SyS_exit_group+0x14/0x20
[ 493.567249] [<ffffffff8107ee5a>] kthread+0xca/0xe0
[ 493.567250] [<ffffffff8107ed90>] ? kthread_park+0x60/0x60
[ 493.567252] [<ffffffff815e8752>] ret_from_fork+0x22/0x30
[ 493.567254] INFO: task iscsi_trx:4149 blocked for more than 120 seconds.
[ 493.567255] Tainted: G OE 4.9.0-For-DSGL-Debug+ #1
[ 493.567256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 493.567256] iscsi_trx D 0 4149 2 0x00000084
[ 493.567258] 0000000000000000 ffff882f5ef70340 ffff882f76253040 ffff88302de40340
[ 493.567259] ffff88303e097a40 ffffc90000f93c90 ffffffff815e415b ffffffff815e56c2
[ 493.567261] 0000000000000001 ffff882f5ef70340 7fffffffffffffff ffff882d233cc9b0
[ 493.567262] Call Trace:
[ 493.567264] [<ffffffff815e415b>] ? __schedule+0x23b/0x6c0
[ 493.567265] [<ffffffff815e56c2>] ? wait_for_completion+0xd2/0xf0
[ 493.567267] [<ffffffff815e4616>] schedule+0x36/0x80
[ 493.567268] [<ffffffff815e71c3>] schedule_timeout+0x1f3/0x390
[ 493.567281] [<ffffffffa118e910>] ? ib_sg_to_pages+0x1a0/0x1a0 [ib_core]
[ 493.567285] [<ffffffffa118e910>] ? ib_sg_to_pages+0x1a0/0x1a0 [ib_core]
[ 493.567286] [<ffffffff815e56a4>] wait_for_completion+0xb4/0xf0
[ 493.567287] [<ffffffff8108a6b0>] ? wake_up_q+0x80/0x80
[ 493.567302] [<ffffffffa04d5399>] target_wait_for_sess_cmds+0x49/0x1a0 [target_core_mod]
[ 493.567306] [<ffffffffa0471644>] isert_wait_conn+0x1a4/0x2d0 [ib_isert]
[ 493.567311] [<ffffffffa05289ab>] iscsit_close_connection+0x14b/0x820 [iscsi_target_mod]
[ 493.567315] [<ffffffffa05172ef>] iscsit_take_action_for_connection_exit+0x7f/0x100 [iscsi_target_mod]
[ 493.567320] [<ffffffffa0528265>] iscsi_target_rx_thread+0x95/0xa0 [iscsi_target_mod]
[ 493.567324] [<ffffffffa05281d0>] ? iscsi_target_tx_thread+0x1b0/0x1b0 [iscsi_target_mod]
[ 493.567325] [<ffffffff8107ee5a>] kthread+0xca/0xe0
[ 493.567327] [<ffffffff8107ed90>] ? kthread_park+0x60/0x60
[ 493.567328] [<ffffffff815e8752>] ret_from_fork+0x22/0x30
-> In order to address this I tried the following (experimental) change on top of your 3 patches:
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 501a0ad124a7..0aead5faa2a2 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1817,16 +1817,16 @@ isert_post_response(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd)
struct ib_send_wr *wr_failed;
int ret;
- ret = isert_post_recv(isert_conn, isert_cmd->rx_desc);
+ ret = ib_post_send(isert_conn->qp, &isert_cmd->tx_desc.send_wr,
+ &wr_failed);
if (ret) {
- isert_err("ib_post_recv failed with %d\n", ret);
+ isert_err("ib_post_send failed with %d\n", ret);
return ret;
}
- ret = ib_post_send(isert_conn->qp, &isert_cmd->tx_desc.send_wr,
- &wr_failed);
+ ret = isert_post_recv(isert_conn, isert_cmd->rx_desc);
if (ret) {
- isert_err("ib_post_send failed with %d\n", ret);
+ isert_err("ib_post_recv failed with %d\n", ret);
return ret;
}
return ret;
@@ -2166,6 +2166,13 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
if (isert_prot_cmd(isert_conn, se_cmd)) {
isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
cqe = &isert_cmd->tx_desc.tx_cqe;
+
+ rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
+ if (rc) {
+ isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
+ isert_cmd, rc);
+ return rc;
+ }
} else {
/*
* Build isert_conn->tx_desc for iSCSI response PDU and attach
@@ -2178,18 +2185,21 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
isert_init_send_wr(isert_conn, isert_cmd,
&isert_cmd->tx_desc.send_wr);
+ chain_wr = &isert_cmd->tx_desc.send_wr;
+
+ rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
+ if (rc) {
+ isert_err("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
+ isert_cmd, rc);
+ return rc;
+ }
+
rc = isert_post_recv(isert_conn, isert_cmd->rx_desc);
if (rc) {
isert_err("ib_post_recv failed with %d\n", rc);
return rc;
}
-
- chain_wr = &isert_cmd->tx_desc.send_wr;
}
- rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
- isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
- isert_cmd, rc);
-
return rc;
}
-> With these changes it works fine, But I still see few Aborts intermittently as follows:
[10674.434643] isert: isert_rdma_rw_ctx_post: Cmd: ffff882cbb961138 failed to post RDMA res
[10674.434690] isert: isert_rdma_rw_ctx_post: Cmd: ffff882cbb9434c0 failed to post RDMA res
[10715.186372] ABORT_TASK: Found referenced iSCSI task_tag: 58
[10715.186417] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 58
-> Apart from the Aborts and post failure logs filling dmesg, luns are in state "running" and
fio countinues to run.
[root@bharat ~]# iscsiadm -m session -P3 | grep "Attached scsi disk"
Attached scsi disk sdj State: running
Attached scsi disk sdc State: running
Attached scsi disk sdd State: running
Attached scsi disk sde State: running
Attached scsi disk sdf State: running
Attached scsi disk sdg State: running
Attached scsi disk sdh State: running
Attached scsi disk sdi State: running
Thanks,
Bharat.
> > Hi Steve, Potnuri, & Co,
> >
> > On Tue, 2016-10-18 at 09:34 -0500, Steve Wise wrote:
> > > >
> > > > > I tried out this change and it works fine with iwarp. I dont see
> SQ
> > > > > overflow. Apparently we have increased the sq too big to overflow.
> I am
> > > going
> > > > > to let it run with higher workloads for longer time, to see if it
> holds
> > > good.
> > > >
> > > > Actually on second thought, this patch is an overkill. Effectively
> we
> > > > now set:
> > > >
> > > > MAX_CMD=266
> > > > and max_rdma_ctx=128 so together we take 394 which seems to too
> much.
> > > >
> > > > If we go by the scheme of 1 rdma + 1 send for each IO we need:
> > > > - 128 sends
> > > > - 128 rdmas
> > > > - 10 miscs
> > > >
> > > > so this gives 266.
> > > >
> > > > Perhaps this is due to the fact that iWARP needs to register memory
> for
> > > > rdma reads as well? (and also rdma writes > 128k for chelsio HW
> right?)
> > > >
> > >
> > > iWARP definitely needs to register memory for the target of reads, due
> to
> > > REMOTE_WRITE requirement for the protocol. The source of a write
> doesn't need
> > > to register memory, but the SGE depth can cause multiple WRITE WRs to
> be
> > > required to service the IO. And in theory there should be some
> threshold where
> > > it might be better performance-wise to do a memory register + 1 WRITE
> vs X
> > > WRITEs.
> > >
> > > As you mentioned, the RW API should account for this, but perhaps it
> is still
> > > off some. Bharat, have a look into the RDMA-RW API and let us see if
> we can
> > > figure out if the additional SQ depth it adds is sufficient.
> > >
> > > > What is the workload you are running? with immediatedata enabled you
> > > > should issue reg+rdma_read+send only for writes > 8k.
> > > >
> > > > Does this happen when you run only reads for example?
> > > >
> > > > I guess its time to get the sq accounting into shape...
> > >
> > > So to sum up - 2 issues:
> > >
> > > 1) we believe the iSER + RW API correctly sizes the SQ, yet we're
> seeing SQ
> > > overflows. So the SQ sizing needs more investigation.
> > >
> > > 2) if the SQ is full, then the iSER/target code is supposed to
> resubmit. And
> > > apparently that isn't working.
> > >
> >
> > For #2, target-core expects -ENOMEM or -EAGAIN return from fabric driver
> > callbacks to signal internal queue-full retry logic. Otherwise, the
> > extra se_cmd->cmd_kref response SCF_ACK_KREF is leaked until session
> > shutdown and/or reinstatement occurs.
> >
> > AFAICT, Potunri's earlier hung task with v4.8.y + ABORT_TASK is likely
> > the earlier v4.1+ regression:
> >
> >
> [1]https://github.com/torvalds/linux/commit/527268df31e57cf2b6d417198717c6d6afdb1e3e
> >
> > That said, there is room for improvement in target-core queue-full error
> > signaling, and iscsi-target/iser-target callback error propagation.
> >
> > Sending out a series shortly to address these particular items.
> > Please have a look.
> >
> Thanks for the changes Nicholas.
> Testing them now.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at [2]http://vger.kernel.org/majordomo-info.html
>
> References
>
> Visible links
> 1. https://github.com/torvalds/linux/commit/527268df31e57cf2b6d417198717c6d6afdb1e3e
> 2. http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-03-20 10:15 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-27 7:01 RQ overflow seen running isert traffic Potnuri Bharat Teja
[not found] ` <20160927070157.GA13140-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2016-09-29 14:12 ` Steve Wise
2016-10-05 6:14 ` Sagi Grimberg
2016-10-17 11:16 ` Potnuri Bharat Teja
2016-10-17 18:29 ` Steve Wise
2016-10-18 8:04 ` Sagi Grimberg
2016-10-18 11:28 ` SQ " Potnuri Bharat Teja
2016-10-18 13:17 ` Sagi Grimberg
[not found] ` <ed7ebb39-be81-00b3-ef23-3f4c0e3afbb1-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-10-18 14:34 ` Steve Wise
2016-10-18 16:13 ` Jason Gunthorpe
2016-10-18 19:03 ` Steve Wise
2016-10-20 8:34 ` Sagi Grimberg
[not found] ` <f7a4b395-1786-3c7a-7639-195e830db5ad-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-20 13:05 ` Potnuri Bharat Teja
2017-03-20 15:04 ` Steve Wise
2016-10-31 3:40 ` Nicholas A. Bellinger
2016-11-02 17:03 ` Steve Wise
[not found] ` <1477885208.27946.8.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-11-08 10:06 ` Potnuri Bharat Teja
2017-03-20 10:15 ` Potnuri Bharat Teja [this message]
2017-03-21 6:32 ` Nicholas A. Bellinger
2017-03-21 7:51 ` Potnuri Bharat Teja
[not found] ` <20170321075131.GA11565-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 13:52 ` Sagi Grimberg
[not found] ` <945e2947-f67a-4202-cd27-d4631fe10f68-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 15:25 ` [SPAMMY (7.002)]Re: " Potnuri Bharat Teja
[not found] ` <20170321152506.GA32655-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 16:38 ` Sagi Grimberg
[not found] ` <4dab6b43-20d3-86f0-765a-be0851e9f4a0-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 17:50 ` Potnuri Bharat Teja
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170320101526.GA11699@chelsio.com \
--to=bharat@chelsio.com \
--cc=linux-rdma@vger.kernel.org \
--cc=nab@linux-iscsi.org \
--cc=sagi@grimberg.me \
--cc=swise@opengridcomputing.com \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).