All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	jaxboe@fusionio.com, jbottomley@parallels.com,
	roland@purestorage.com,
	device-mapper development <dm-devel@redhat.com>,
	"Alasdair G. Kergon" <agk@redhat.com>
Subject: Re: [PATCH v2] dm mpath: maintain reference count for underlying devices
Date: Tue, 20 Sep 2011 14:29:47 +0900	[thread overview]
Message-ID: <4E7824CB.9030405@ce.jp.nec.com> (raw)
In-Reply-To: <20110919143411.GA7965@redhat.com>

Hi Mike,

On 09/19/11 23:34, Mike Snitzer wrote:
> On Mon, Sep 19 2011 at  2:49am -0400,
> Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:
>> DM opens underlying devices and it should be sufficient to keep
>> request_queue from being freed.
> 
> I welcome your review but please be more specific in the future.
> 
> Sure DM opens the underlying devices:
> 
> dm_get_device()
>   -> open_dev()
>      -> blkdev_get_by_dev()
>      	-> bdget()
> 	-> blkdev_get()
> 
> But DM only gets a reference on the associated block_device.

Point is the above should be sufficient to keep the queue from freeing.
Otherwise, 'q->_something_' everywhere could cause invalid pointer access
as the queue is freed.


Below are additional details replying to your comments:

> 
> DM multipath makes use of the request_queue of each paths'
> block_device.  Having a reference on the block_device isn't the same as
> having a reference on the request_queue.

Yes. But it does not necessarily mean we have to raise
a reference count of the request_queue.

> 
> Point is, blk_cleanup_queue() could easily be called by the SCSI
> subsystem for a device that is removed -- a request_queue reference is
> taken by the underlying driver at blk_alloc_queue_node() time.  So SCSI
> is free to drop the only reference in blk_cleanup_queue() which frees
> the request_queue (unless upper layer driver like mpath also takes a
> request_queue reference).

As for SCSI, it takes another reference count and drops it
in scsi_device_dev_release.
So blk_cleanup_queue is not dropping the last reference.

> 
> FYI, I got looking at mpath's request_queue references, or lack thereof,
> because of this report/patch on LKML from Roland Drier:
> https://lkml.org/lkml/2011/7/8/457
> 
> here was my follow-up to Roland:
> https://lkml.org/lkml/2011/7/11/410

For that problem, taking a reference count is not a remedy.
The problem occurs because elevator is freed regardless of the
reference count.

The cause of the problem was:
  a) SCSI has moved blk_cleanup_queue() to earlier stage
     where there still is a opener
  b) blk_cleanup_queue() frees elevator after marking
     the queue DEAD
  c) blk_insert_cloned_request() uses elevator without
     checking QUEUE_FLAG_DEAD

Roland's patch was to fix c) by adding QUEUE_FLAG_DEAD check.
However, it's not possible to do it safely because
QUEUE_FLAG_DEAD means we can't even access q->queue_lock.
(See Vivek's comment in the same thread)
And without queue_lock, there's a window for a race.

James's suggestion was to fix b) by not freeing elevator
until blk_release_queue() is called:
  https://lkml.org/lkml/2011/8/10/421
But it would hit the same issue that there's no guarantee
of q->queue_lock validity after blk_cleanup_queue().
James's other suggestion was to add a callback mechanism
for driver to free q->queue_lock.

I was trying to fix a) in the following thread:
  https://lkml.org/lkml/2011/8/18/103
but haven't gotten response from James so it seems rejected.


> 
> James Bottomley points out that we should always have a reference on the
> request_queue (otherwise final put frees the request_queue on us):
> https://lkml.org/lkml/2011/7/12/265

SCSI is taking a reference count of device being used.

> 
>> If it was not enough, any other openers would have to get the reference
>> count, too, and that should be done in more generic place.
> 
> For DM, dm-multipath is the only direct consumer of request_queue(s)
> that DM didn't allocate.

In DM, there are various users of underlying request_queue's member;
e.g. device_flush_capable(), dm_table_any_congested(), etc.

They would not be safe if request_queue was suddenly freed when
someone accessing 'q->something'.

> 
> We have no intention of adding another request-based target (in fact
> there is serious doubt that request-based DM was ever worth it).  So I
> avoided complicating the DM core (even if only slightly) for rq-based
> concerns that are localized to dm-multipath.

Where to put the code is about the maintainability.
So I don't mind if that's the maintainers' preference.

But for this specific patch, I don't understand why it is necessary
nor sufficient.

> p.s. it should be noted that AFAIK this patch is already part of Oracle
> Linux's uek kernel...

The patch should be harmless except for the extra complexity and
possible future maintenance burden.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

  reply	other threads:[~2011-09-20  5:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-16 13:59 [PATCH v2] dm mpath: maintain reference count for underlying devices Mike Snitzer
2011-09-19  6:49 ` Jun'ichi Nomura
2011-09-19 14:34   ` Mike Snitzer
2011-09-20  5:29     ` Jun'ichi Nomura [this message]
2011-09-20  7:13       ` James Bottomley
2011-09-20 11:31         ` Jun'ichi Nomura
2011-10-17 20:15       ` Mike Snitzer
2011-10-18 13:00         ` Jun'ichi Nomura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E7824CB.9030405@ce.jp.nec.com \
    --to=j-nomura@ce.jp.nec.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=jaxboe@fusionio.com \
    --cc=jbottomley@parallels.com \
    --cc=martin.petersen@oracle.com \
    --cc=roland@purestorage.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.