public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
To: Philip Pokorny
	<ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
Cc: scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Arend Dittmer
	<adittmer-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>,
	Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Bart Van Assche
	<bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: SRPT and SCST
Date: Thu, 05 Nov 2009 16:27:20 +0300	[thread overview]
Message-ID: <4AF2D2B8.5080304@vlnb.net> (raw)
In-Reply-To: <4AF29201.6000606-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>

Philip Pokorny, on 11/05/2009 11:51 AM wrote:
> Chris Worley asked that we post this to scst and linux-rdma lists for 
> discussion.
> 
> We're trying to get IB SRPT working and can't seem to get a stable 
> configuration using any of the various SCST, IB_SRPT, and kernel/distro 
> versions out there.  In most cases, we're able to crash the connection 
> and typically the target within minutes of pounding by 4 initiators 
> doing "mkfs.ext3", "tar xf" and "fsck" to the SRP block device.
> 
> Our target is a Penguin Computing Altus 2704 with disk expansion 
> chassis.  That's a 4-socket AMD hex-core (24 total cores) with 128GB of 
> memory and 24 1TB drives attached to two LSI 1068 SAS controllers. (aka 
> 3801E)  The drives are configured as 12 RAID-1 mirrors and 3-wide LVM 
> stripes over those mirrors.  There are an additional 6 SSD's in the 
> server in a "fast" VG also RAID-1 mirrored and LVM striped.  Read ahead 
> is disabled on the LVM volumes.
> 
> LVM volumes are exported via SCST as FILEIO block devices to initiators. 
>   50 groups are defined with two LVM volumes/block devices per group. 
> One initiator per group. (NODE GUID added to "names" in the group)
> 
> With only 4 initiators, almost 100% of I/O is to RAM and no disk I/O is 
> seen on the target.
> 
> Performance (when it's working) is generally good at 800MB/sec 
> aggregate, but we'd like to see better.  It appeared we were getting 
> 1.3GB/s at one point.
> 
> On Wed, Nov 4, 2009 at 5:34 PM, Philip Pokorny
> <ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org> wrote:
>> We got a serial console attached and ran a test using the SCST and IB_SRPT
>> versions that you recommended (Arend set it up so I'll defer to him on the
>> exact SVN checkout that he used).
>>
>>> What sort of crashes are you seeing?  I also have a customer
>>> experiencing a crash, but I can't get details out of them.
>> The client gets SCSI I/O errors and aborts the filesystem (putting it in
>> read-only mode).
>>
>> After about 400 seconds of testing, the server side logs the following:
>>
>> [ 8418.697830] <6>[12426]: scst_check_sense:2444:Clearing dbl_ua_possible
>> flag (dev ffff811816136000, cmd ffff81081017c1c8)
>> [ 8418.697836] <6>[12426]: scst_dec_on_dev_cmd:577:cmd ffff81081017c1c8 (tag
>> 17): unblocking dev ffff811816136000
>> [ 8418.697843] <6>[0]: scst_unblock_dev:4653:Device UNBLOCK(new 0), dev
>> ffff811816136000
>> [ 8864.258468] ib_mthca 0000:81:00.0: SQ 000405 full (999320 head, 997272
>> tail, 2048 max, 0 nreq)
>> [ 8864.294450] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8864.326702] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8864.326709] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2023031, tgt->finished_cmds=2023137,
>> retry_cmds=0)
>> [ 8878.447081] ib_mthca 0000:81:00.0: SQ 000406 full (1080498 head, 1078450
>> tail, 2048 max, 0 nreq)
>> [ 8878.484452] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8878.517595] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8878.517608] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2256307, tgt->finished_cmds=2256504,
>> retry_cmds=0)
>> [ 8882.694684] ib_mthca 0000:81:00.0: SQ 000404 full (1087484 head, 1085436
>> tail, 2048 max, 0 nreq)
>> [ 8882.732542] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8882.766396] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8882.766403] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2310445, tgt->finished_cmds=2310539,
>> retry_cmds=0)
>> [ 8891.650890] ib_mthca 0000:81:00.0: SQ 000407 full (1155377 head, 1153329
>> tail, 2048 max, 0 nreq)
>> [ 8891.689016] ***ERROR***: srpt_xfer_data[2374] ret=-12
>> [ 8891.723548] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8891.723556] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2381910, tgt->finished_cmds=2382001,
>> retry_cmds=0)
>> [ 8891.723573] ib_mthca 0000:81:00.0: too many gathers
>> [ 8891.758000] ***ERROR***: srpt_xfer_data[2374] ret=-22
>> [ 8891.792888] <6>[0]: scst: scst_rdy_to_xfer:985:***ERROR***: Target driver
>> ib_srpt rdy_to_xfer() returned fatal error
>>
>> I hope that helps.
>>
>> I've seen that same "rdy_to_xfer() returned fatal error several times in
>> different configurations.  The screen shots we sent earlier had the same
>> "ib_mthca ... SQ ... full (xx head..." message at the start.  So that seems
>> to be related as well.

Looks like ib_post_send() in srpt_perform_rdmas() returned ENOMEM and 
then srpt_xfer_data() "forgot" to unmapped corresponding SG with all 
related consequences.

>> Thanks for the help,
>> Phil P.
>>
>> --
>> Philip Pokorny, RHCE
>> Chief Hardware Architect - Penguin Computing
>> Voice: 415-370-0835  Toll free: 888-PENGUIN
>> www.penguincomputing.com

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2009-11-05 13:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3142CEFB1403044F9954E2DF6C85660FBB34BD@orca.penguincomputing.com>
     [not found] ` <f3177b9e0911040802o7fce0f4fte02c52dfe940f582@mail.gmail.com>
     [not found]   ` <3142CEFB1403044F9954E2DF6C85660FBB34BF@orca.penguincomputing.com>
     [not found]     ` <f3177b9e0911041004t2e75d545v5cc10d5375550bde@mail.gmail.com>
     [not found]       ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD@orca.penguincomputing.com>
     [not found]         ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-05  8:51           ` SRPT and SCST Philip Pokorny
     [not found]             ` <4AF29201.6000606-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org>
2009-11-05 13:27               ` Vladislav Bolkhovitin [this message]
     [not found]                 ` <4AF2D2B8.5080304-d+Crzxg7Rs0@public.gmane.org>
2009-11-05 18:34                   ` Bart Van Assche
     [not found]                     ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4F9@orca.penguincomputing.com>
     [not found]                       ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4F9-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-06  7:06                         ` Bart Van Assche
     [not found]                           ` <e2e108260911052306l230d8d7cxbae68bf08678d6fe-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-06 11:59                             ` Vladislav Bolkhovitin
     [not found]                               ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4FA@orca.penguincomputing.com>
     [not found]                                 ` <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4FA-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-06 14:53                                   ` Bart Van Assche
     [not found]                                     ` <e2e108260911060653g6832c124uaa6e11072a12e448-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-06 16:39                                       ` Vladislav Bolkhovitin
     [not found]                                         ` <3142CEFB1403044F9954E2DF6C85660FBB34E6@orca.penguincomputing.com>
     [not found]                                           ` <3142CEFB1403044F9954E2DF6C85660FBB34E6-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-08  9:49                                             ` Bart Van Assche
     [not found]                                               ` <e2e108260911080149t569fc016p6e38d86a15cb7d05-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-09 20:26                                                 ` Vladislav Bolkhovitin
     [not found]                                                   ` <4AF87B05.1050902-d+Crzxg7Rs0@public.gmane.org>
2009-11-09 20:43                                                     ` Chris Worley
2009-11-11  0:33                                                       ` Arend Dittmer
     [not found]                                                         ` <3142CEFB1403044F9954E2DF6C85660FB801C9-/U8SqUwOx9/OOpeOfUw7maQk6oIRg43YAL8bYrjMMd8@public.gmane.org>
2009-11-11 12:36                                                           ` Vladislav Bolkhovitin
2009-11-09  7:27                         ` Bart Van Assche
2009-12-14 20:41               ` Bart Van Assche
2010-05-30  8:01               ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF2D2B8.5080304@vlnb.net \
    --to=vst-d+crzxg7rs0@public.gmane.org \
    --cc=adittmer-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org \
    --cc=bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ppokorny-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org \
    --cc=scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    --cc=vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox