From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
To: bharrosh@panasas.com
Cc: fujita.tomonori@lab.ntt.co.jp,
James.Bottomley@HansenPartnership.com, tomof@acm.org,
michaelc@cs.wisc.edu, pw@osc.edu, linux-scsi@vger.kernel.org,
erezz@voltaire.com, Jens.Axboe@oracle.com
Subject: Re: Serious regression caused by fix for [BUG 1/3] bsg queue oops with iscsi logout
Date: Thu, 3 Apr 2008 06:00:35 +0900 [thread overview]
Message-ID: <20080403060033H.tomof@acm.org> (raw)
In-Reply-To: <47F3D364.3050505@panasas.com>
On Wed, 02 Apr 2008 21:41:40 +0300
Boaz Harrosh <bharrosh@panasas.com> wrote:
> FUJITA Tomonori wrote:
> > On Sun, 30 Mar 2008 12:39:36 -0500
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >
> >> On Thu, 2008-03-27 at 21:18 +0900, FUJITA Tomonori wrote:
> >>> On Thu, 27 Mar 2008 20:11:52 +0900
> >>> FUJITA Tomonori <tomof@acm.org> wrote:
> >>>
> >>>> On Wed, 26 Mar 2008 20:51:44 -0500
> >>>> Mike Christie <michaelc@cs.wisc.edu> wrote:
> >>>>
> >>>>> FUJITA Tomonori wrote:
> >>>>>> On Wed, 26 Mar 2008 07:36:26 -0700
> >>>>>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>>>>>
> >>>>>>> On Wed, 2008-03-26 at 23:22 +0900, FUJITA Tomonori wrote:
> >>>>>>>> On Sat, 22 Mar 2008 11:06:00 -0500
> >>>>>>>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>>>>>>>
> >>>>>>>>> On Tue, 2008-03-11 at 00:36 -0500, Mike Christie wrote:
> >>>>>>>>>> Mike Christie wrote:
> >>>>>>>>>>> Pete Wyckoff wrote:
> >>>>>>>>>>>> I think this used not to happen; not sure. But I changed two things
> >>>>>>>>>>> This most likely did not happen before 2.6.25-rc* or it broke in
> >>>>>>>>>>> slightly different ways, because iscsi used to try and do
> >>>>>>>>>>>
> >>>>>>>>>>> echo 1 > /sys/block/sdX/device/delete
> >>>>>>>>>>>
> >>>>>>>>>>> from userspace instead of calling scsi_remove_target from the kernel.
> >>>>>>>>>>>
> >>>>>>>>>>> As you know around 2.6.21, the behavior of doing the echo to the delete
> >>>>>>>>>>> file changed due to a driver model and scsi change and that broke the
> >>>>>>>>>>> iscsi tools. The iscsi tools userspace removal was sort of hack in the
> >>>>>>>>>>> first place and was racey, so we switched to removing devices/target
> >>>>>>>>>>> like the FC class.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> lately. 2.6.25-rc1 to -rc4 and fedora 8 iscsi-initiator-utils (865) to
> >>>>>>>>>>>> fedora devel (868). Bidi and varlen patches always too.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'll follow with some more variations on this theme. Looks like bsg
> >>>>>>>>>>>> needs to protect more carefully against the device going away. Any
> >>>>>>>>>>>> ideas how best to do this? What was the approach in sg?
> >>>>>>>>>>>>
> >>>>>>>>>>> I think sg is broken in similar ways. The iser guys have some tests
> >>>>>>>>>>> cases that have broken sg while IO is outstanding. I am ccing Erez.
> >>>>>>>>>> Actually one of the problems looks a little different than some of the
> >>>>>>>>>> problems hit with sg and are caused because we remove the bsg device too
> >>>>>>>>>> soon. I think we want to wait until all the references from the
> >>>>>>>>>> commands/requests are released. The attached patch (untested) moves the
> >>>>>>>>>> bsg unreg call to the scsi device release fn.
> >>>>>>>>> Well, this fix is now upstream. However, it's causing all our
> >>>>>>>>> scsi_devices never to get released, which is a serious regression.
> >>>>>>>>> We're also doing spurious bsg_unregister_queue() for things that never
> >>>>>>>>> actually registered one (all scan devices that return DID_NO_CONNECT),
> >>>>>>>>> but bsg doesn't seem to be complaining about this.
> >>>>>>>>>
> >>>>>>>>> The essence of the problem is that bsg_register_queue() takes a ref to
> >>>>>>>>> the sdev_gendev, so you can't move bsg_unregister_queue() into the
> >>>>>>>>> release function because nothing ever puts bsg's device ref and so
> >>>>>>>>> release is never called.
> >>>>>>>>>
> >>>>>>>>> Options for fixing this before 2.6.25 are
> >>>>>>>>>
> >>>>>>>>> 1. revert the patch
> >>>>>>>>> 2. Do an additional put for the bsg reference in
> >>>>>>>>> __scsi_remove_device (patch below). It's nasty but it preserves
> >>>>>>>>> the semantics and does what you want
> >>>>>>>> After some investigation, this patch doesn't fix the bug that Pete
> >>>>>>>> reported (I'll send a new patch shortly).
> >>>>>>>>
> >>>>>>>> Can you revert the commit 4b6f5b3a993cbe34b4280f252bccc76967c185c8
> >>>>>>>> instead of merging this?
> >>>>>>> Sure ... I didn't like the hack either. As long as iSCSI is fine with
> >>>>>>> the reversion it's the quickest way to fix the problem.
> >>>>>> How about this? With the commit reversion, I confirmed that this patch
> >>>>>> fixes the first bug that Pete reported:
> >>>>>>
> >>>>>> http://marc.info/?l=linux-scsi&m=120508166505141&w=2
> >>>>>>
> >>>>>> I suspect that this could fix the rest too.
> >>>>>>
> >>>>>> =
> >>>>>> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> >>>>>> Subject: [PATCH] bsg: takes a ref to struct device in fops->open
> >>>>>>
> >>>>>> bsg_register_queue() takes a ref to struct device that a caller
> >>>>>> passes. For example, it takes a ref to the sdev_gendev with scsi
> >>>>>> devices. However, bsg doesn't takes a ref to it in fops->open. So
> >>>>>> while an application opens a bsg device, the scsi device that the bsg
> >>>>>> device holds can go away (bsg also takes a ref to a queue, but it
> >>>>>> doesn't prevent the device from going away).
> >>>>>>
> >>>>>> With this, bsg takes a ref to struct device in fops->open and frees it
> >>>>>> in fops->release.
> >>>>>>
> >>>>>> Note that bsg doesn't need to takes a ref to a queue for SCSI devices
> >>>>>> at least. I think that it would be better to remove the code but I let
> >>>>>> it alone for now.
> >>>>>>
> >>>>> Why does bsg_add_device do kobject_get instead of blk_get_queue?
> >>>> I think that it's a bug. But both takes a ref to a queue (though
> >>>> kobject_get doesn't see QUEUE_FLAG_DEAD), so I think that it's not
> >>>> related with the current problems.
> >>>>
> >>>>
> >>>>> It seems like if we added a blk_qet_queue when we opened the device and
> >>>>> a blk_put_queue when bsg_release is called we could remove the
> >>>>> get/put_device calls. I am not sure if that is cleaner or not. I was
> >>>>> just thinking that bsg goes from bsg->request_queue->scsi_device so
> >>>>> maybe it should not worry about the device.
> >>>> kobject_get takes a ref to a queue. If we don't take a ref to a
> >>>> device, the scsi device has gone though the queue is still there
> >>>> because the queue release is done from the device release. If the scsi
> >>>> device has gone, we are dead, right?
> >>>>
> >>>>
> >>>> Anyway, here's a patch to replace kobject_get with blk_get_queue.
> >>>>
> >>>> James, please apply this patch too.
> >>>>
> >>>> =
> >>>> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> >>>> Subject: [PATCH] bsg: replace kobject_get with blk_get_queue
> >>> Really sorry, please apply this one.
> >>>
> >>> =
> >>> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> >>> Subject: [PATCH] bsg: replace kobject_get with blk_get_queue
> >>>
> >>> Both takes a ref to a queue. But blk_get_queue checks QUEUE_FLAG_DEAD
> >>> and is more appropriate interface here.
> >>>
> >>> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> >>> Cc: Jens Axboe <jens.axboe@oracle.com>
> >>> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> >>
> >> This looks reasonable to me. It's probably a rc-fixes patch, so could I
> >> get Jen's ack and some evidence of testing (and that it actually fixes
> >> the bug).
> >
> > Do you mean that the patch to take a ref to strutc device
> > (e.g. sdev_gendev for scsi devices) in fops->open is a reasonable fix?
> >
> > http://marc.info/?l=linux-scsi&m=120654365424916&w=2
> >
> > The patch with the commit reversion fixes all the problems for me that
> > Pete reported. Pete, can you test the patch?
> >
> >
> > It's a rc-fixes patch, but I'm fine with applying it to scsi-misc
> > (I'll send it to the stable tree later on).
> >
> > The patch has one bug in an error handling path (I should have used
> > IS_ERR there). So I'll send an updated version shortly.
>
> Hi Tomo.
> Do you please have an accumulated latest patch for this problem.
> (Or point me to the right one, I can't find it). I want to test
> it here too. (Over rc-fixes)
No change since I submitted last time:
http://marc.info/?l=linux-scsi&m=120692552424155&w=2
They need to be applied to the latest Linus git (or scsi-fixes).
If you prefer a git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git bsg
James pointed out another race:
1. we hold the bsg device open and remove it.
2. we add a new device.
3. we try to open the new device
4. we get a ref to the removed device (but it's still hold open)
instead of the new one.
I overlooked this race (James, thanks a lot for pointing out
it). Fortunately, the fourth patch fixes this race. I've confirmed it.
So when submitting the patchset, I said that only the first patch is
crucial, however, the 4th patch is crucial too.
I'm fine with either via scsi-misc or scsi-fixes.
next prev parent reply other threads:[~2008-04-02 21:03 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-09 16:53 [BUG 1/3] bsg queue oops with iscsi logout Pete Wyckoff
2008-03-09 16:54 ` [BUG 2/3] bsg null sdev " Pete Wyckoff
2008-03-09 16:55 ` [BUG 3/3] bsg mutex hang " Pete Wyckoff
2008-03-10 17:57 ` [BUG 1/3] bsg queue oops " Mike Christie
2008-03-11 5:36 ` Mike Christie
2008-03-11 22:46 ` FUJITA Tomonori
2008-03-15 0:45 ` Pete Wyckoff
2008-03-22 16:06 ` Serious regression caused by fix for " James Bottomley
2008-03-24 9:23 ` FUJITA Tomonori
2008-03-26 14:22 ` FUJITA Tomonori
2008-03-26 14:36 ` James Bottomley
2008-03-26 14:59 ` FUJITA Tomonori
2008-03-27 1:32 ` Mike Christie
2008-03-27 11:11 ` FUJITA Tomonori
2008-03-27 20:46 ` Mike Christie
2008-03-27 1:51 ` Mike Christie
2008-03-27 2:18 ` Mike Christie
2008-03-27 11:11 ` FUJITA Tomonori
2008-03-27 11:11 ` FUJITA Tomonori
2008-03-27 12:18 ` FUJITA Tomonori
2008-03-30 17:39 ` James Bottomley
2008-03-31 0:20 ` FUJITA Tomonori
2008-04-02 18:41 ` Boaz Harrosh
2008-04-02 21:00 ` FUJITA Tomonori [this message]
2008-04-03 7:58 ` Boaz Harrosh
2008-03-27 1:59 ` Mike Christie
2008-03-27 0:25 ` Mike Christie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080403060033H.tomof@acm.org \
--to=fujita.tomonori@lab.ntt.co.jp \
--cc=James.Bottomley@HansenPartnership.com \
--cc=Jens.Axboe@oracle.com \
--cc=bharrosh@panasas.com \
--cc=erezz@voltaire.com \
--cc=linux-scsi@vger.kernel.org \
--cc=michaelc@cs.wisc.edu \
--cc=pw@osc.edu \
--cc=tomof@acm.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).