public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
To: Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	scst-devel
	<scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>,
	OpenIB
	<general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org>
Subject: Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow	reads and eventually hangs
Date: Wed, 16 Sep 2009 22:15:28 +0400	[thread overview]
Message-ID: <4AB12B40.9050902@vlnb.net> (raw)
In-Reply-To: <f3177b9e0909151351p12173c78oe01cc8bcca957550-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Chris Worley, on 09/16/2009 12:51 AM wrote:
> On Tue, Sep 15, 2009 at 11:10 AM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
>> Chris Worley, on 09/15/2009 09:01 PM wrote:
>>> On Tue, Sep 15, 2009 at 10:57 AM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
>>> wrote:
>>>> Chris Worley, on 09/15/2009 08:53 PM wrote:
>>>>> On Tue, Sep 15, 2009 at 10:43 AM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
>>>>> wrote:
>>>>>> Chris Worley, on 09/15/2009 07:50 PM wrote:
>>>>>>> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
>>>>>>> <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>>>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>>>>>> wrote:
>>>>>>>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin
>>>>>>>>> <vst-d+Crzxg7Rs0@public.gmane.org>
>>>>>>>>> wrote:
>>>>>>>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>>>>>>>> I've definitely removed the switch/firmware from being the cause.
>>>>>>>>>>>
>>>>>>>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>>>>>>>> related.  We get ~50usecs average latency (on small block sizes),
>>>>>>>>>>> which can't be achieved using regular SSD's (and rotating drives
>>>>>>>>>>> are
>>>>>>>>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>>>>>>>> I think you should try to reproduce the problem with ramdisk or
>>>>>>>>>> nullio.
>>>>>>>>>> By
>>>>>>>>>> so you will eliminate possible influence of the SSD backend.
>>>>>>>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>>>>>>>
>>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>>>>>>
>>>>>>>>> Then, on the initiator, I tested it... and it hung during sequential
>>>>>>>>> 8KB block reads:
>>>>>>>>>
>>>>>>>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>>>>>>>>> --randrepeat=0 \
>>>>>>>>>  --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>>>>>>>>> --loops=10000 --runtime=600
>>>>>>>>>
>>>>>>>>> Note that I was running the SM on the target this time too.
>>>>>>>> Which Linux distro was installed on the inititiator and on the target
>>>>>>>> ? And if applicable, which OFED version ? Which kernel messages were
>>>>>>>> logged by SRPT around the time the issue occurred (after having
>>>>>>>> enabled SRPT logging first) ?
>>>>>>> As logging hadn't helped this issue previously, I've not been enabling
>>>>>>> it.  That plus the kernel hacks needed to invoke logging, it's not
>>>>>>> worth enabling.
>>>>>>>
>>>>>>> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.
>>>>>>>
>>>>>>> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
>>>>>>>
>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>
>>>>>>> I get the error:
>>>>>>>
>>>>>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>>>>>> capabilities
>>>>>>>
>>>>>>> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
>>>>>>> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
>>>>>>> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
>>>>>>> w/ RHEL kernels.
>>>>>> Use ramfs instead.
>>>>> Do you mean:
>>>>>
>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>> You should then create a file on it and use it.
>>> That's what I'm doing, I believe.  From above:
>>>
>>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>> ... but the "open", on RHEL5.2 kernel 2.6.18-92.el5, generates the
>>> following kernel messages:
>>>
>>> dev_vdisk: Registering virtual FILEIO device ramdisk
>>> scst: Processing thread started, PID 9629
>>> scst: Processing thread started, PID 9630
>>> scst: Processing thread started, PID 9631
>>> scst: Processing thread started, PID 9632
>>> scst: Processing thread started, PID 9633
>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>> capabilities
>>> scst: ***ERROR***: New device handler's vdisk attach() failed: -22
>>> scst: Processing thread PID 9629 finished
>>> scst: Processing thread PID 9630 finished
>>> scst: Processing thread PID 9631 finished
>>> scst: Processing thread PID 9632 finished
>>> scst: Processing thread PID 9633 finished
>>> scst: Failed to attach to virtual device ramdisk
>>>
>>> Chris
>>>>> ?
>>>>>
>>>>> That's what I'm doing.
>> That's strange. I'm doing it all the time, although with not so old kernels
>> as 2.6.18.
> 
> In lots of testing today, I've seen this panic twice on the Ubuntu 8.10 targets:
> 
> [  330.155992] ib_srpt: disconnected session
> 0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
> been received.
> [  357.207046] ib_srpt: srpt_xmit_response: tag= 17 channel in bad state 2
> [  357.207052] ib_srpt: disconnected session
> 0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
> been received.
> [  357.207100] ib_srpt: srpt_xmit_response: tag= 47 channel in bad state 2
> [  357.207104] scst: ***ERROR***: Target driver ib_srpt
> xmit_response() returned fatal error
> [  357.241429] scst: ***ERROR***: Target driver ib_srpt
> xmit_response() returned fatal error
> [  357.250234] ------------[ cut here ]------------
> [  357.250537] ib_srpt: srpt_xmit_response: tag= 26 channel in bad state 2
> [  357.250539] scst: ***ERROR***: Target driver ib_srpt
> xmit_response() returned fatal error
> [  357.250550] ib_srpt: srpt_xmit_response: tag= 38 channel in bad state 2
> [  357.250553] scst: ***ERROR***: Target driver ib_srpt
> xmit_response() returned fatal error
> [  357.250560] ib_srpt: srpt_xmit_response: tag= 27 channel in bad state 2
> <repeated many times>
> [  357.301253] kernel BUG at /root/scst/scst/src/scst_targ.c:3089!
> [  357.301253] invalid opcode: 0000 [1] SMP
> [  357.301253] CPU 0
> ...
> [  357.301253] RIP: 0010:[<ffffffffa04759f6>]  [<ffffffffa04759f6>]
> scst_tgt_cmd_done+0x26/0x30 [scst]
> [  357.301253] RSP: 0018:ffff88039ad27b50  EFLAGS: 00010297
> [  357.301253] RAX: 0000000000000200 RBX: ffff8803ad9c68f8 RCX: 0000000000000000
> [  357.301253] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff8803ad9c68f8
> [  357.301253] RBP: ffff88039ad27b50 R08: 0000000000000000 R09: 0000000000000000
> [  357.301253] R10: ffff88039ad277c0 R11: ffff88041ad278cf R12: ffff8803c2972180
> [  357.301253] R13: ffff88039ada0000 R14: 0000000000000001 R15: ffff8803fb00c2b0
> [  357.301253] FS:  0000000000000000(0000) GS:ffffffff807dd000(0000)
> knlGS:0000000000000000
> [  357.301253] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [  357.301253] CR2: 00007f9281e64000 CR3: 0000000000201000 CR4: 00000000000006e0
> [  357.301253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  357.301253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  357.301253] Process ib_cm/0 (pid: 8299, threadinfo
> ffff88039ad26000, task ffff88039ad40000)
> [  357.301253] Stack:  ffff88039ad27b80 ffffffffa04c0c47
> ffff88039a8db900 ffff8803c2972180
> [  357.301253]  ffff8803fb00c240 ffff8803fb00c284 ffff88039ad27bc0
> ffffffffa04c0d93
> [  357.301253]  ffff88042a4959c0 ffff88042a9d7800 ffff88042544da00
> ffff88042a9d7898
> [  357.301253] Call Trace:
> [  357.301253]  [<ffffffffa04c0c47>] srpt_abort_scst_cmd+0xd7/0x160 [ib_srpt]
> [  357.301253]  [<ffffffffa04c0d93>] srpt_release_channel+0xc3/0x190 [ib_srpt]
> [  357.301253]  [<ffffffffa04c0e82>]
> srpt_find_and_release_channel+0x22/0x30 [ib_srpt]
> [  357.301253]  [<ffffffffa04c227d>] srpt_cm_handler+0x6d/0xbb8 [ib_srpt]

It's because srpt called scst_tgt_cmd_done() when the corresponding 
command hasn't yet been sent to xmit_response() callback, so srpt should 
use another function to abort commands in this state.

Vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2009-09-16 18:15 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <f3177b9e0908091009x23813cbdq4fbd9ebe6d8e174f@mail.gmail.com>
     [not found] ` <f3177b9e0909022108uc826c3ct3db0ae7bfa2c8128@mail.gmail.com>
     [not found]   ` <4A9FA945.4070408@vlnb.net>
     [not found]     ` <f3177b9e0909031038s22f0a1bg401629d4208fd82@mail.gmail.com>
     [not found]       ` <f3177b9e0909031620i10db945ep69ddc36a7044334a@mail.gmail.com>
     [not found]         ` <e2e108260909060617y13a1ded8jd6686d02521ecb4@mail.gmail.com>
     [not found]           ` <f3177b9e0909060636k6e293c06la3dfe9f929da4915@mail.gmail.com>
     [not found]             ` <f3177b9e0909060641i296c7aefp322712f52de9786a@mail.gmail.com>
     [not found]               ` <4AA4F561.504@vlnb.net>
     [not found]                 ` <f3177b9e0909081529i4dd74faq9a6c5a4783b5ded4@mail.gmail.com>
     [not found]                   ` <f3177b9e0909081529i4dd74faq9a6c5a4783b5ded4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-09 16:38                     ` [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs Bart Van Assche
     [not found]                       ` <e2e108260909090938x6b72c519teed7dfd280eac804-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-11 19:50                         ` Chris Worley
     [not found]                           ` <f3177b9e0909111250w159def51h9b720366e27fa3a7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-12 15:24                             ` Bart Van Assche
2009-09-14 18:51                             ` Vladislav Bolkhovitin
     [not found]                               ` <4AAE909F.6030202-d+Crzxg7Rs0@public.gmane.org>
2009-09-14 23:03                                 ` Chris Worley
     [not found]                                   ` <f3177b9e0909141603j2dc61663j4c6bbcc0dda631d4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-15  6:10                                     ` Bart Van Assche
     [not found]                                       ` <e2e108260909142310vb353718uea99d50ab638a865-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-15 15:50                                         ` Chris Worley
     [not found]                                           ` <f3177b9e0909150850r2d0f5e15id7f4e14b015f68ed-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-15 16:43                                             ` Vladislav Bolkhovitin
2009-09-15 16:53                                               ` Chris Worley
     [not found]                                                 ` <f3177b9e0909150953x11d19210mf07cbdcc57928d42-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-15 16:57                                                   ` Vladislav Bolkhovitin
2009-09-15 17:01                                                     ` Chris Worley
2009-09-15 17:10                                                       ` Vladislav Bolkhovitin
     [not found]                                                         ` <4AAFCA77.6050305-d+Crzxg7Rs0@public.gmane.org>
2009-09-15 20:51                                                           ` Chris Worley
     [not found]                                                             ` <f3177b9e0909151351p12173c78oe01cc8bcca957550-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-16  7:03                                                               ` Bart Van Assche
2009-09-16 15:11                                                                 ` Chris Worley
2009-09-16 18:15                                                               ` Vladislav Bolkhovitin [this message]
     [not found]                                                                 ` <4AB12B40.9050902-d+Crzxg7Rs0@public.gmane.org>
2009-09-16 19:41                                                                   ` Chris Worley
     [not found]                                                                     ` <f3177b9e0909161241h89dabdbybaf98edc5b10f735-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-17 10:22                                                                       ` Bart Van Assche
2009-09-16  6:38                                             ` Bart Van Assche
2009-09-16  6:42                                             ` Bart Van Assche
2009-09-15 16:39                                     ` Vladislav Bolkhovitin
2009-09-15 16:52                                       ` Chris Worley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AB12B40.9050902@vlnb.net \
    --to=vst-d+crzxg7rs0@public.gmane.org \
    --cc=general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    --cc=worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox