From: Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
To: Philip Pokorny
<ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
Cc: scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Arend Dittmer
<adittmer-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>,
Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Bart Van Assche
<bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: SRPT and SCST
Date: Thu, 05 Nov 2009 16:27:20 +0300 [thread overview]
Message-ID: <4AF2D2B8.5080304@vlnb.net> (raw)
In-Reply-To: <4AF29201.6000606-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
Philip Pokorny, on 11/05/2009 11:51 AM wrote:
> Chris Worley asked that we post this to scst and linux-rdma lists for
> discussion.
>
> We're trying to get IB SRPT working and can't seem to get a stable
> configuration using any of the various SCST, IB_SRPT, and kernel/distro
> versions out there. In most cases, we're able to crash the connection
> and typically the target within minutes of pounding by 4 initiators
> doing "mkfs.ext3", "tar xf" and "fsck" to the SRP block device.
>
> Our target is a Penguin Computing Altus 2704 with disk expansion
> chassis. That's a 4-socket AMD hex-core (24 total cores) with 128GB of
> memory and 24 1TB drives attached to two LSI 1068 SAS controllers. (aka
> 3801E) The drives are configured as 12 RAID-1 mirrors and 3-wide LVM
> stripes over those mirrors. There are an additional 6 SSD's in the
> server in a "fast" VG also RAID-1 mirrored and LVM striped. Read ahead
> is disabled on the LVM volumes.
>
> LVM volumes are exported via SCST as FILEIO block devices to initiators.
> 50 groups are defined with two LVM volumes/block devices per group.
> One initiator per group. (NODE GUID added to "names" in the group)
>
> With only 4 initiators, almost 100% of I/O is to RAM and no disk I/O is
> seen on the target.
>
> Performance (when it's working) is generally good at 800MB/sec
> aggregate, but we'd like to see better. It appeared we were getting
> 1.3GB/s at one point.
>
> On Wed, Nov 4, 2009 at 5:34 PM, Philip Pokorny
> <ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org> wrote:
>> We got a serial console attached and ran a test using the SCST and IB_SRPT
>> versions that you recommended (Arend set it up so I'll defer to him on the
>> exact SVN checkout that he used).
>>
>>> What sort of crashes are you seeing? I also have a customer
>>> experiencing a crash, but I can't get details out of them.
>> The client gets SCSI I/O errors and aborts the filesystem (putting it in
>> read-only mode).
>>
>> After about 400 seconds of testing, the server side logs the following:
>>
>> [ 8418.697830] <6>[12426]: scst_check_sense:2444:Clearing dbl_ua_possible
>> flag (dev ffff811816136000, cmd ffff81081017c1c8)
>> [ 8418.697836] <6>[12426]: scst_dec_on_dev_cmd:577:cmd ffff81081017c1c8 (tag
>> 17): unblocking dev ffff811816136000
>> [ 8418.697843] <6>[0]: scst_unblock_dev:4653:Device UNBLOCK(new 0), dev
>> ffff811816136000
>> [ 8864.258468] ib_mthca 0000:81:00.0: SQ 000405 full (999320 head, 997272
>> tail, 2048 max, 0 nreq)
>> [ 8864.294450] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8864.326702] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8864.326709] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2023031, tgt->finished_cmds=2023137,
>> retry_cmds=0)
>> [ 8878.447081] ib_mthca 0000:81:00.0: SQ 000406 full (1080498 head, 1078450
>> tail, 2048 max, 0 nreq)
>> [ 8878.484452] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8878.517595] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8878.517608] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2256307, tgt->finished_cmds=2256504,
>> retry_cmds=0)
>> [ 8882.694684] ib_mthca 0000:81:00.0: SQ 000404 full (1087484 head, 1085436
>> tail, 2048 max, 0 nreq)
>> [ 8882.732542] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8882.766396] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8882.766403] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2310445, tgt->finished_cmds=2310539,
>> retry_cmds=0)
>> [ 8891.650890] ib_mthca 0000:81:00.0: SQ 000407 full (1155377 head, 1153329
>> tail, 2048 max, 0 nreq)
>> [ 8891.689016] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8891.723548] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8891.723556] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2381910, tgt->finished_cmds=2382001,
>> retry_cmds=0)
>> [ 8891.723573] ib_mthca 0000:81:00.0: too many gathers
>> [ 8891.758000] ***ERROR***: srpt_xfer_data[2374] ret=-22
>> [ 8891.792888] <6>[0]: scst: scst_rdy_to_xfer:985:***ERROR***: Target driver
>> ib_srpt rdy_to_xfer() returned fatal error
>>
>> I hope that helps.
>>
>> I've seen that same "rdy_to_xfer() returned fatal error several times in
>> different configurations. The screen shots we sent earlier had the same
>> "ib_mthca ... SQ ... full (xx head..." message at the start. So that seems
>> to be related as well.
Looks like ib_post_send() in srpt_perform_rdmas() returned ENOMEM and
then srpt_xfer_data() "forgot" to unmapped corresponding SG with all
related consequences.
>> Thanks for the help,
>> Phil P.
>>
>> --
>> Philip Pokorny, RHCE
>> Chief Hardware Architect - Penguin Computing
>> Voice: 415-370-0835 Toll free: 888-PENGUIN
>> www.penguincomputing.com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-11-05 13:27 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3142CEFB1403044F9954E2DF6C85660FBB34BD@orca.penguincomputing.com>
[not found] ` <f3177b9e0911040802o7fce0f4fte02c52dfe940f582@mail.gmail.com>
[not found] ` <3142CEFB1403044F9954E2DF6C85660FBB34BF@orca.penguincomputing.com>
[not found] ` <f3177b9e0911041004t2e75d545v5cc10d5375550bde@mail.gmail.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD@orca.penguincomputing.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-05 8:51 ` SRPT and SCST Philip Pokorny
[not found] ` <4AF29201.6000606-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
2009-11-05 13:27 ` Vladislav Bolkhovitin [this message]
[not found] ` <4AF2D2B8.5080304-d+Crzxg7Rs0@public.gmane.org>
2009-11-05 18:34 ` Bart Van Assche
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4F9@orca.penguincomputing.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4F9-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-06 7:06 ` Bart Van Assche
[not found] ` <e2e108260911052306l230d8d7cxbae68bf08678d6fe-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-06 11:59 ` Vladislav Bolkhovitin
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4FA@orca.penguincomputing.com>
[not found] ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4FA-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-06 14:53 ` Bart Van Assche
[not found] ` <e2e108260911060653g6832c124uaa6e11072a12e448-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-06 16:39 ` Vladislav Bolkhovitin
[not found] ` <3142CEFB1403044F9954E2DF6C85660FBB34E6@orca.penguincomputing.com>
[not found] ` <3142CEFB1403044F9954E2DF6C85660FBB34E6-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-08 9:49 ` Bart Van Assche
[not found] ` <e2e108260911080149t569fc016p6e38d86a15cb7d05-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-09 20:26 ` Vladislav Bolkhovitin
[not found] ` <4AF87B05.1050902-d+Crzxg7Rs0@public.gmane.org>
2009-11-09 20:43 ` Chris Worley
2009-11-11 0:33 ` Arend Dittmer
[not found] ` <3142CEFB1403044F9954E2DF6C85660FB801C9-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-11 12:36 ` Vladislav Bolkhovitin
2009-11-09 7:27 ` Bart Van Assche
2009-12-14 20:41 ` Bart Van Assche
2010-05-30 8:01 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AF2D2B8.5080304@vlnb.net \
--to=vst-d+crzxg7rs0@public.gmane.org \
--cc=adittmer-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org \
--cc=bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org \
--cc=scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.