linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Christie <michaelc@cs.wisc.edu>
To: FUJITA Tomonori <tomof@acm.org>
Cc: James.Bottomley@HansenPartnership.com, pw@osc.edu,
	fujita.tomonori@lab.ntt.co.jp, linux-scsi@vger.kernel.org,
	erezz@voltaire.com, Jens.Axboe@oracle.com
Subject: Re: Serious regression caused by fix for [BUG 1/3] bsg queue oops with	iscsi logout
Date: Wed, 26 Mar 2008 20:51:44 -0500	[thread overview]
Message-ID: <47EAFDB0.4090503@cs.wisc.edu> (raw)
In-Reply-To: <20080326235900H.tomof@acm.org>

FUJITA Tomonori wrote:
> On Wed, 26 Mar 2008 07:36:26 -0700
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
>> On Wed, 2008-03-26 at 23:22 +0900, FUJITA Tomonori wrote:
>>> On Sat, 22 Mar 2008 11:06:00 -0500
>>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>
>>>> On Tue, 2008-03-11 at 00:36 -0500, Mike Christie wrote:
>>>>> Mike Christie wrote:
>>>>>> Pete Wyckoff wrote:
>>>>>>> I think this used not to happen; not sure.  But I changed two things
>>>>>> This most likely did not happen before 2.6.25-rc* or it broke in 
>>>>>> slightly different ways, because iscsi used to try and do
>>>>>>
>>>>>> echo 1 > /sys/block/sdX/device/delete
>>>>>>
>>>>>> from userspace instead of calling scsi_remove_target from the kernel.
>>>>>>
>>>>>> As you know around 2.6.21, the behavior of doing the echo to the delete 
>>>>>> file changed due to a driver model and scsi change and that broke the 
>>>>>> iscsi tools. The iscsi tools userspace removal was sort of hack in the 
>>>>>> first place and was racey, so we switched to removing devices/target 
>>>>>> like the FC class.
>>>>>>
>>>>>>
>>>>>>> lately.  2.6.25-rc1 to -rc4 and fedora 8 iscsi-initiator-utils (865) to
>>>>>>> fedora devel (868).  Bidi and varlen patches always too.
>>>>>>>
>>>>>>> I'll follow with some more variations on this theme.  Looks like bsg
>>>>>>> needs to protect more carefully against the device going away.  Any
>>>>>>> ideas how best to do this?  What was the approach in sg?
>>>>>>>
>>>>>> I think sg is broken in similar ways. The iser guys have some tests 
>>>>>> cases that have broken sg while IO is outstanding. I am ccing Erez.
>>>>> Actually one of the problems looks a little different than some of the 
>>>>> problems hit with sg and are caused because we remove the bsg device too 
>>>>> soon. I think we want to wait until all the references from the 
>>>>> commands/requests are released. The attached patch (untested) moves the 
>>>>> bsg unreg call to the scsi device release fn.
>>>> Well, this fix is now upstream.  However, it's causing all our
>>>> scsi_devices never to get released, which is a serious regression.
>>>> We're also doing spurious bsg_unregister_queue() for things that never
>>>> actually registered one (all scan devices that return DID_NO_CONNECT),
>>>> but bsg doesn't seem to be complaining about this.
>>>>
>>>> The essence of the problem is that bsg_register_queue() takes a ref to
>>>> the sdev_gendev, so you can't move bsg_unregister_queue() into the
>>>> release function because nothing ever puts bsg's device ref and so
>>>> release is never called.
>>>>
>>>> Options for fixing this before 2.6.25 are
>>>>
>>>>      1. revert the patch
>>>>      2. Do an additional put for the bsg reference in
>>>>         __scsi_remove_device (patch below).  It's nasty but it preserves
>>>>         the semantics and does what you want
>>> After some investigation, this patch doesn't fix the bug that Pete
>>> reported (I'll send a new patch shortly).
>>>
>>> Can you revert the commit 4b6f5b3a993cbe34b4280f252bccc76967c185c8
>>> instead of merging this?
>> Sure ... I didn't like the hack either.  As long as iSCSI is fine with
>> the reversion it's the quickest way to fix the problem.
> 
> How about this? With the commit reversion, I confirmed that this patch
> fixes the first bug that Pete reported:
> 
> http://marc.info/?l=linux-scsi&m=120508166505141&w=2
> 
> I suspect that this could fix the rest too.
> 
> =
> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Subject: [PATCH] bsg: takes a ref to struct device in fops->open
> 
> bsg_register_queue() takes a ref to struct device that a caller
> passes. For example, it takes a ref to the sdev_gendev with scsi
> devices. However, bsg doesn't takes a ref to it in fops->open. So
> while an application opens a bsg device, the scsi device that the bsg
> device holds can go away (bsg also takes a ref to a queue, but it
> doesn't prevent the device from going away).
> 
> With this, bsg takes a ref to struct device in fops->open and frees it
> in fops->release.
> 
> Note that bsg doesn't need to takes a ref to a queue for SCSI devices
> at least. I think that it would be better to remove the code but I let
> it alone for now.
> 

Why does bsg_add_device do kobject_get instead of blk_get_queue?

It seems like if we added a blk_qet_queue when we opened the device and 
a blk_put_queue when bsg_release is called we could remove the 
get/put_device calls. I am not sure if that is cleaner or not. I was 
just thinking that bsg goes from bsg->request_queue->scsi_device so 
maybe it should not worry about the device.

  parent reply	other threads:[~2008-03-27  1:52 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-09 16:53 [BUG 1/3] bsg queue oops with iscsi logout Pete Wyckoff
2008-03-09 16:54 ` [BUG 2/3] bsg null sdev " Pete Wyckoff
2008-03-09 16:55 ` [BUG 3/3] bsg mutex hang " Pete Wyckoff
2008-03-10 17:57 ` [BUG 1/3] bsg queue oops " Mike Christie
2008-03-11  5:36   ` Mike Christie
2008-03-11 22:46     ` FUJITA Tomonori
2008-03-15  0:45     ` Pete Wyckoff
2008-03-22 16:06     ` Serious regression caused by fix for " James Bottomley
2008-03-24  9:23       ` FUJITA Tomonori
2008-03-26 14:22       ` FUJITA Tomonori
2008-03-26 14:36         ` James Bottomley
2008-03-26 14:59           ` FUJITA Tomonori
2008-03-27  1:32             ` Mike Christie
2008-03-27 11:11               ` FUJITA Tomonori
2008-03-27 20:46                 ` Mike Christie
2008-03-27  1:51             ` Mike Christie [this message]
2008-03-27  2:18               ` Mike Christie
2008-03-27 11:11                 ` FUJITA Tomonori
2008-03-27 11:11               ` FUJITA Tomonori
2008-03-27 12:18                 ` FUJITA Tomonori
2008-03-30 17:39                   ` James Bottomley
2008-03-31  0:20                     ` FUJITA Tomonori
2008-04-02 18:41                       ` Boaz Harrosh
2008-04-02 21:00                         ` FUJITA Tomonori
2008-04-03  7:58                           ` Boaz Harrosh
2008-03-27  1:59             ` Mike Christie
2008-03-27  0:25           ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47EAFDB0.4090503@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=Jens.Axboe@oracle.com \
    --cc=erezz@voltaire.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=linux-scsi@vger.kernel.org \
    --cc=pw@osc.edu \
    --cc=tomof@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).