From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vladislav Bolkhovitin Subject: Re: SRPT and SCST Date: Thu, 05 Nov 2009 16:27:20 +0300 Message-ID: <4AF2D2B8.5080304@vlnb.net> References: <3142CEFB1403044F9954E2DF6C85660FBB34BD@orca.penguincomputing.com> <3142CEFB1403044F9954E2DF6C85660FBB34BF@orca.penguincomputing.com> <654FA770A883FB43BAF3CB0B1E1DAC8C01C8C4DD@orca.penguincomputing.com> <4AF29201.6000606@penguincomputing.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4AF29201.6000606-pabcTyWEv4ZW60MLeMDbCVaTQe2KTcn/@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Philip Pokorny Cc: scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Arend Dittmer , Vu Pham , Bart Van Assche List-Id: linux-rdma@vger.kernel.org Philip Pokorny, on 11/05/2009 11:51 AM wrote: > Chris Worley asked that we post this to scst and linux-rdma lists for > discussion. > > We're trying to get IB SRPT working and can't seem to get a stable > configuration using any of the various SCST, IB_SRPT, and kernel/distro > versions out there. In most cases, we're able to crash the connection > and typically the target within minutes of pounding by 4 initiators > doing "mkfs.ext3", "tar xf" and "fsck" to the SRP block device. > > Our target is a Penguin Computing Altus 2704 with disk expansion > chassis. That's a 4-socket AMD hex-core (24 total cores) with 128GB of > memory and 24 1TB drives attached to two LSI 1068 SAS controllers. (aka > 3801E) The drives are configured as 12 RAID-1 mirrors and 3-wide LVM > stripes over those mirrors. There are an additional 6 SSD's in the > server in a "fast" VG also RAID-1 mirrored and LVM striped. Read ahead > is disabled on the LVM volumes. > > LVM volumes are exported via SCST as FILEIO block devices to initiators. > 50 groups are defined with two LVM volumes/block devices per group. > One initiator per group. (NODE GUID added to "names" in the group) > > With only 4 initiators, almost 100% of I/O is to RAM and no disk I/O is > seen on the target. > > Performance (when it's working) is generally good at 800MB/sec > aggregate, but we'd like to see better. It appeared we were getting > 1.3GB/s at one point. > > On Wed, Nov 4, 2009 at 5:34 PM, Philip Pokorny > wrote: >> We got a serial console attached and ran a test using the SCST and IB_SRPT >> versions that you recommended (Arend set it up so I'll defer to him on the >> exact SVN checkout that he used). >> >>> What sort of crashes are you seeing? I also have a customer >>> experiencing a crash, but I can't get details out of them. >> The client gets SCSI I/O errors and aborts the filesystem (putting it in >> read-only mode). >> >> After about 400 seconds of testing, the server side logs the following: >> >> [ 8418.697830] <6>[12426]: scst_check_sense:2444:Clearing dbl_ua_possible >> flag (dev ffff811816136000, cmd ffff81081017c1c8) >> [ 8418.697836] <6>[12426]: scst_dec_on_dev_cmd:577:cmd ffff81081017c1c8 (tag >> 17): unblocking dev ffff811816136000 >> [ 8418.697843] <6>[0]: scst_unblock_dev:4653:Device UNBLOCK(new 0), dev >> ffff811816136000 >> [ 8864.258468] ib_mthca 0000:81:00.0: SQ 000405 full (999320 head, 997272 >> tail, 2048 max, 0 nreq) >> [ 8864.294450] ***ERROR***: srpt_xfer_data[2374] ret=-12 >> [ 8864.326702] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL: >> incrementing retry_cmds 1 >> [ 8864.326709] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished, >> direct retry (finished_cmds=2023031, tgt->finished_cmds=2023137, >> retry_cmds=0) >> [ 8878.447081] ib_mthca 0000:81:00.0: SQ 000406 full (1080498 head, 1078450 >> tail, 2048 max, 0 nreq) >> [ 8878.484452] ***ERROR***: srpt_xfer_data[2374] ret=-12 >> [ 8878.517595] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL: >> incrementing retry_cmds 1 >> [ 8878.517608] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished, >> direct retry (finished_cmds=2256307, tgt->finished_cmds=2256504, >> retry_cmds=0) >> [ 8882.694684] ib_mthca 0000:81:00.0: SQ 000404 full (1087484 head, 1085436 >> tail, 2048 max, 0 nreq) >> [ 8882.732542] ***ERROR***: srpt_xfer_data[2374] ret=-12 >> [ 8882.766396] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL: >> incrementing retry_cmds 1 >> [ 8882.766403] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished, >> direct retry (finished_cmds=2310445, tgt->finished_cmds=2310539, >> retry_cmds=0) >> [ 8891.650890] ib_mthca 0000:81:00.0: SQ 000407 full (1155377 head, 1153329 >> tail, 2048 max, 0 nreq) >> [ 8891.689016] ***ERROR***: srpt_xfer_data[2374] ret=-12 >> [ 8891.723548] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL: >> incrementing retry_cmds 1 >> [ 8891.723556] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished, >> direct retry (finished_cmds=2381910, tgt->finished_cmds=2382001, >> retry_cmds=0) >> [ 8891.723573] ib_mthca 0000:81:00.0: too many gathers >> [ 8891.758000] ***ERROR***: srpt_xfer_data[2374] ret=-22 >> [ 8891.792888] <6>[0]: scst: scst_rdy_to_xfer:985:***ERROR***: Target driver >> ib_srpt rdy_to_xfer() returned fatal error >> >> I hope that helps. >> >> I've seen that same "rdy_to_xfer() returned fatal error several times in >> different configurations. The screen shots we sent earlier had the same >> "ib_mthca ... SQ ... full (xx head..." message at the start. So that seems >> to be related as well. Looks like ib_post_send() in srpt_perform_rdmas() returned ENOMEM and then srpt_xfer_data() "forgot" to unmapped corresponding SG with all related consequences. >> Thanks for the help, >> Phil P. >> >> -- >> Philip Pokorny, RHCE >> Chief Hardware Architect - Penguin Computing >> Voice: 415-370-0835 Toll free: 888-PENGUIN >> www.penguincomputing.com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html