From: Philip Pokorny <ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
To: Philip Pokorny
<ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>,
scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Arend Dittmer
<adittmer-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
Subject: Re: SRPT and SCST
Date: Thu, 05 Nov 2009 03:51:13 -0500 [thread overview]
Message-ID: <4AF29201.6000606@penguincomputing.com> (raw)
In-Reply-To: <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
Chris Worley asked that we post this to scst and linux-rdma lists for
discussion.
We're trying to get IB SRPT working and can't seem to get a stable
configuration using any of the various SCST, IB_SRPT, and kernel/distro
versions out there. In most cases, we're able to crash the connection
and typically the target within minutes of pounding by 4 initiators
doing "mkfs.ext3", "tar xf" and "fsck" to the SRP block device.
Our target is a Penguin Computing Altus 2704 with disk expansion
chassis. That's a 4-socket AMD hex-core (24 total cores) with 128GB of
memory and 24 1TB drives attached to two LSI 1068 SAS controllers. (aka
3801E) The drives are configured as 12 RAID-1 mirrors and 3-wide LVM
stripes over those mirrors. There are an additional 6 SSD's in the
server in a "fast" VG also RAID-1 mirrored and LVM striped. Read ahead
is disabled on the LVM volumes.
LVM volumes are exported via SCST as FILEIO block devices to initiators.
50 groups are defined with two LVM volumes/block devices per group.
One initiator per group. (NODE GUID added to "names" in the group)
With only 4 initiators, almost 100% of I/O is to RAM and no disk I/O is
seen on the target.
Performance (when it's working) is generally good at 800MB/sec
aggregate, but we'd like to see better. It appeared we were getting
1.3GB/s at one point.
On Wed, Nov 4, 2009 at 5:34 PM, Philip Pokorny
<ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org> wrote:
> We got a serial console attached and ran a test using the SCST and IB_SRPT
> versions that you recommended (Arend set it up so I'll defer to him on the
> exact SVN checkout that he used).
>
>> What sort of crashes are you seeing? I also have a customer
>> experiencing a crash, but I can't get details out of them.
>
> The client gets SCSI I/O errors and aborts the filesystem (putting it in
> read-only mode).
>
> After about 400 seconds of testing, the server side logs the following:
>
> [ 8418.697830] <6>[12426]: scst_check_sense:2444:Clearing dbl_ua_possible
> flag (dev ffff811816136000, cmd ffff81081017c1c8)
> [ 8418.697836] <6>[12426]: scst_dec_on_dev_cmd:577:cmd ffff81081017c1c8 (tag
> 17): unblocking dev ffff811816136000
> [ 8418.697843] <6>[0]: scst_unblock_dev:4653:Device UNBLOCK(new 0), dev
> ffff811816136000
> [ 8864.258468] ib_mthca 0000:81:00.0: SQ 000405 full (999320 head, 997272
> tail, 2048 max, 0 nreq)
> [ 8864.294450] ***ERROR***: srpt_xfer_data[2374] ret=-12
> [ 8864.326702] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
> incrementing retry_cmds 1
> [ 8864.326709] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
> direct retry (finished_cmds=2023031, tgt->finished_cmds=2023137,
> retry_cmds=0)
> [ 8878.447081] ib_mthca 0000:81:00.0: SQ 000406 full (1080498 head, 1078450
> tail, 2048 max, 0 nreq)
> [ 8878.484452] ***ERROR***: srpt_xfer_data[2374] ret=-12
> [ 8878.517595] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
> incrementing retry_cmds 1
> [ 8878.517608] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
> direct retry (finished_cmds=2256307, tgt->finished_cmds=2256504,
> retry_cmds=0)
> [ 8882.694684] ib_mthca 0000:81:00.0: SQ 000404 full (1087484 head, 1085436
> tail, 2048 max, 0 nreq)
> [ 8882.732542] ***ERROR***: srpt_xfer_data[2374] ret=-12
> [ 8882.766396] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
> incrementing retry_cmds 1
> [ 8882.766403] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
> direct retry (finished_cmds=2310445, tgt->finished_cmds=2310539,
> retry_cmds=0)
> [ 8891.650890] ib_mthca 0000:81:00.0: SQ 000407 full (1155377 head, 1153329
> tail, 2048 max, 0 nreq)
> [ 8891.689016] ***ERROR***: srpt_xfer_data[2374] ret=-12
> [ 8891.723548] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
> incrementing retry_cmds 1
> [ 8891.723556] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
> direct retry (finished_cmds=2381910, tgt->finished_cmds=2382001,
> retry_cmds=0)
> [ 8891.723573] ib_mthca 0000:81:00.0: too many gathers
> [ 8891.758000] ***ERROR***: srpt_xfer_data[2374] ret=-22
> [ 8891.792888] <6>[0]: scst: scst_rdy_to_xfer:985:***ERROR***: Target driver
> ib_srpt rdy_to_xfer() returned fatal error
>
> I hope that helps.
>
> I've seen that same "rdy_to_xfer() returned fatal error several times in
> different configurations. The screen shots we sent earlier had the same
> "ib_mthca ... SQ ... full (xx head..." message at the start. So that seems
> to be related as well.
>
> Thanks for the help,
> Phil P.
>
> --
> Philip Pokorny, RHCE
> Chief Hardware Architect - Penguin Computing
> Voice: 415-370-0835 Toll free: 888-PENGUIN
> www.penguincomputing.com
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next parent reply other threads:[~2009-11-05 8:51 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3142CEFB1403044F9954E2DF6C85660FBB34BD@orca.penguincomputing.com>
[not found] ` <f3177b9e0911040802o7fce0f4fte02c52dfe940f582@mail.gmail.com>
[not found] ` <3142CEFB1403044F9954E2DF6C85660FBB34BF@orca.penguincomputing.com>
[not found] ` <f3177b9e0911041004t2e75d545v5cc10d5375550bde@mail.gmail.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD@orca.penguincomputing.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-05 8:51 ` Philip Pokorny [this message]
[not found] ` <4AF29201.6000606-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
2009-11-05 13:27 ` SRPT and SCST Vladislav Bolkhovitin
[not found] ` <4AF2D2B8.5080304-d+Crzxg7Rs0@public.gmane.org>
2009-11-05 18:34 ` Bart Van Assche
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4F9@orca.penguincomputing.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4F9-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-06 7:06 ` Bart Van Assche
[not found] ` <e2e108260911052306l230d8d7cxbae68bf08678d6fe-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-06 11:59 ` Vladislav Bolkhovitin
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4FA@orca.penguincomputing.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4FA-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-06 14:53 ` Bart Van Assche
[not found] ` <e2e108260911060653g6832c124uaa6e11072a12e448-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-06 16:39 ` Vladislav Bolkhovitin
[not found] ` <3142CEFB1403044F9954E2DF6C85660FBB34E6@orca.penguincomputing.com>
[not found] ` <3142CEFB1403044F9954E2DF6C85660FBB34E6-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-08 9:49 ` Bart Van Assche
[not found] ` <e2e108260911080149t569fc016p6e38d86a15cb7d05-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-09 20:26 ` Vladislav Bolkhovitin
[not found] ` <4AF87B05.1050902-d+Crzxg7Rs0@public.gmane.org>
2009-11-09 20:43 ` Chris Worley
2009-11-11 0:33 ` Arend Dittmer
[not found] ` <3142CEFB1403044F9954E2DF6C85660FB801C9-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-11 12:36 ` Vladislav Bolkhovitin
2009-11-09 7:27 ` Bart Van Assche
2009-12-14 20:41 ` Bart Van Assche
2010-05-30 8:01 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AF29201.6000606@penguincomputing.com \
--to=ppokorny-pabctywev4zw60mlemdbcvatqe2ktcn/@public.gmane.org \
--cc=adittmer-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox