From: Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
To: Philip Pokorny
<ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
Cc: scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Arend Dittmer
<adittmer-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>,
Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Bart Van Assche
<bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: SRPT and SCST
Date: Thu, 05 Nov 2009 16:27:20 +0300 [thread overview]
Message-ID: <4AF2D2B8.5080304@vlnb.net> (raw)
In-Reply-To: <4AF29201.6000606-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
Philip Pokorny, on 11/05/2009 11:51 AM wrote:
> Chris Worley asked that we post this to scst and linux-rdma lists for
> discussion.
>
> We're trying to get IB SRPT working and can't seem to get a stable
> configuration using any of the various SCST, IB_SRPT, and kernel/distro
> versions out there. In most cases, we're able to crash the connection
> and typically the target within minutes of pounding by 4 initiators
> doing "mkfs.ext3", "tar xf" and "fsck" to the SRP block device.
>
> Our target is a Penguin Computing Altus 2704 with disk expansion
> chassis. That's a 4-socket AMD hex-core (24 total cores) with 128GB of
> memory and 24 1TB drives attached to two LSI 1068 SAS controllers. (aka
> 3801E) The drives are configured as 12 RAID-1 mirrors and 3-wide LVM
> stripes over those mirrors. There are an additional 6 SSD's in the
> server in a "fast" VG also RAID-1 mirrored and LVM striped. Read ahead
> is disabled on the LVM volumes.
>
> LVM volumes are exported via SCST as FILEIO block devices to initiators.
> 50 groups are defined with two LVM volumes/block devices per group.
> One initiator per group. (NODE GUID added to "names" in the group)
>
> With only 4 initiators, almost 100% of I/O is to RAM and no disk I/O is
> seen on the target.
>
> Performance (when it's working) is generally good at 800MB/sec
> aggregate, but we'd like to see better. It appeared we were getting
> 1.3GB/s at one point.
>
> On Wed, Nov 4, 2009 at 5:34 PM, Philip Pokorny
> <ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org> wrote:
>> We got a serial console attached and ran a test using the SCST and IB_SRPT
>> versions that you recommended (Arend set it up so I'll defer to him on the
>> exact SVN checkout that he used).
>>
>>> What sort of crashes are you seeing? I also have a customer
>>> experiencing a crash, but I can't get details out of them.
>> The client gets SCSI I/O errors and aborts the filesystem (putting it in
>> read-only mode).
>>
>> After about 400 seconds of testing, the server side logs the following:
>>
>> [ 8418.697830] <6>[12426]: scst_check_sense:2444:Clearing dbl_ua_possible
>> flag (dev ffff811816136000, cmd ffff81081017c1c8)
>> [ 8418.697836] <6>[12426]: scst_dec_on_dev_cmd:577:cmd ffff81081017c1c8 (tag
>> 17): unblocking dev ffff811816136000
>> [ 8418.697843] <6>[0]: scst_unblock_dev:4653:Device UNBLOCK(new 0), dev
>> ffff811816136000
>> [ 8864.258468] ib_mthca 0000:81:00.0: SQ 000405 full (999320 head, 997272
>> tail, 2048 max, 0 nreq)
>> [ 8864.294450] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8864.326702] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8864.326709] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2023031, tgt->finished_cmds=2023137,
>> retry_cmds=0)
>> [ 8878.447081] ib_mthca 0000:81:00.0: SQ 000406 full (1080498 head, 1078450
>> tail, 2048 max, 0 nreq)
>> [ 8878.484452] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8878.517595] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8878.517608] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2256307, tgt->finished_cmds=2256504,
>> retry_cmds=0)
>> [ 8882.694684] ib_mthca 0000:81:00.0: SQ 000404 full (1087484 head, 1085436
>> tail, 2048 max, 0 nreq)
>> [ 8882.732542] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8882.766396] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8882.766403] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2310445, tgt->finished_cmds=2310539,
>> retry_cmds=0)
>> [ 8891.650890] ib_mthca 0000:81:00.0: SQ 000407 full (1155377 head, 1153329
>> tail, 2048 max, 0 nreq)
>> [ 8891.689016] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8891.723548] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8891.723556] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2381910, tgt->finished_cmds=2382001,
>> retry_cmds=0)
>> [ 8891.723573] ib_mthca 0000:81:00.0: too many gathers
>> [ 8891.758000] ***ERROR***: srpt_xfer_data[2374] ret=-22
>> [ 8891.792888] <6>[0]: scst: scst_rdy_to_xfer:985:***ERROR***: Target driver
>> ib_srpt rdy_to_xfer() returned fatal error
>>
>> I hope that helps.
>>
>> I've seen that same "rdy_to_xfer() returned fatal error several times in
>> different configurations. The screen shots we sent earlier had the same
>> "ib_mthca ... SQ ... full (xx head..." message at the start. So that seems
>> to be related as well.
Looks like ib_post_send() in srpt_perform_rdmas() returned ENOMEM and
then srpt_xfer_data() "forgot" to unmapped corresponding SG with all
related consequences.
>> Thanks for the help,
>> Phil P.
>>
>> --
>> Philip Pokorny, RHCE
>> Chief Hardware Architect - Penguin Computing
>> Voice: 415-370-0835 Toll free: 888-PENGUIN
>> www.penguincomputing.com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-11-05 13:27 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3142CEFB1403044F9954E2DF6C85660FBB34BD@orca.penguincomputing.com>
[not found] ` <f3177b9e0911040802o7fce0f4fte02c52dfe940f582@mail.gmail.com>
[not found] ` <3142CEFB1403044F9954E2DF6C85660FBB34BF@orca.penguincomputing.com>
[not found] ` <f3177b9e0911041004t2e75d545v5cc10d5375550bde@mail.gmail.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD@orca.penguincomputing.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-05 8:51 ` SRPT and SCST Philip Pokorny
[not found] ` <4AF29201.6000606-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
2009-11-05 13:27 ` Vladislav Bolkhovitin [this message]
[not found] ` <4AF2D2B8.5080304-d+Crzxg7Rs0@public.gmane.org>
2009-11-05 18:34 ` Bart Van Assche
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4F9@orca.penguincomputing.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4F9-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-06 7:06 ` Bart Van Assche
[not found] ` <e2e108260911052306l230d8d7cxbae68bf08678d6fe-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-06 11:59 ` Vladislav Bolkhovitin
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4FA@orca.penguincomputing.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4FA-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-06 14:53 ` Bart Van Assche
[not found] ` <e2e108260911060653g6832c124uaa6e11072a12e448-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-06 16:39 ` Vladislav Bolkhovitin
[not found] ` <3142CEFB1403044F9954E2DF6C85660FBB34E6@orca.penguincomputing.com>
[not found] ` <3142CEFB1403044F9954E2DF6C85660FBB34E6-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-08 9:49 ` Bart Van Assche
[not found] ` <e2e108260911080149t569fc016p6e38d86a15cb7d05-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-09 20:26 ` Vladislav Bolkhovitin
[not found] ` <4AF87B05.1050902-d+Crzxg7Rs0@public.gmane.org>
2009-11-09 20:43 ` Chris Worley
2009-11-11 0:33 ` Arend Dittmer
[not found] ` <3142CEFB1403044F9954E2DF6C85660FB801C9-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-11 12:36 ` Vladislav Bolkhovitin
2009-11-09 7:27 ` Bart Van Assche
2009-12-14 20:41 ` Bart Van Assche
2010-05-30 8:01 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AF2D2B8.5080304@vlnb.net \
--to=vst-d+crzxg7rs0@public.gmane.org \
--cc=adittmer-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org \
--cc=bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org \
--cc=scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox