Linux Device Mapper development
 help / color / mirror / Atom feed
From: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	jaxboe@fusionio.com, jbottomley@parallels.com,
	roland@purestorage.com,
	device-mapper development <dm-devel@redhat.com>,
	"Alasdair G. Kergon" <agk@redhat.com>
Subject: Re: [PATCH v2] dm mpath: maintain reference count for underlying devices
Date: Tue, 20 Sep 2011 14:29:47 +0900	[thread overview]
Message-ID: <4E7824CB.9030405@ce.jp.nec.com> (raw)
In-Reply-To: <20110919143411.GA7965@redhat.com>

Hi Mike,

On 09/19/11 23:34, Mike Snitzer wrote:
> On Mon, Sep 19 2011 at  2:49am -0400,
> Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:
>> DM opens underlying devices and it should be sufficient to keep
>> request_queue from being freed.
> 
> I welcome your review but please be more specific in the future.
> 
> Sure DM opens the underlying devices:
> 
> dm_get_device()
>   -> open_dev()
>      -> blkdev_get_by_dev()
>      	-> bdget()
> 	-> blkdev_get()
> 
> But DM only gets a reference on the associated block_device.

Point is the above should be sufficient to keep the queue from freeing.
Otherwise, 'q->_something_' everywhere could cause invalid pointer access
as the queue is freed.


Below are additional details replying to your comments:

> 
> DM multipath makes use of the request_queue of each paths'
> block_device.  Having a reference on the block_device isn't the same as
> having a reference on the request_queue.

Yes. But it does not necessarily mean we have to raise
a reference count of the request_queue.

> 
> Point is, blk_cleanup_queue() could easily be called by the SCSI
> subsystem for a device that is removed -- a request_queue reference is
> taken by the underlying driver at blk_alloc_queue_node() time.  So SCSI
> is free to drop the only reference in blk_cleanup_queue() which frees
> the request_queue (unless upper layer driver like mpath also takes a
> request_queue reference).

As for SCSI, it takes another reference count and drops it
in scsi_device_dev_release.
So blk_cleanup_queue is not dropping the last reference.

> 
> FYI, I got looking at mpath's request_queue references, or lack thereof,
> because of this report/patch on LKML from Roland Drier:
> https://lkml.org/lkml/2011/7/8/457
> 
> here was my follow-up to Roland:
> https://lkml.org/lkml/2011/7/11/410

For that problem, taking a reference count is not a remedy.
The problem occurs because elevator is freed regardless of the
reference count.

The cause of the problem was:
  a) SCSI has moved blk_cleanup_queue() to earlier stage
     where there still is a opener
  b) blk_cleanup_queue() frees elevator after marking
     the queue DEAD
  c) blk_insert_cloned_request() uses elevator without
     checking QUEUE_FLAG_DEAD

Roland's patch was to fix c) by adding QUEUE_FLAG_DEAD check.
However, it's not possible to do it safely because
QUEUE_FLAG_DEAD means we can't even access q->queue_lock.
(See Vivek's comment in the same thread)
And without queue_lock, there's a window for a race.

James's suggestion was to fix b) by not freeing elevator
until blk_release_queue() is called:
  https://lkml.org/lkml/2011/8/10/421
But it would hit the same issue that there's no guarantee
of q->queue_lock validity after blk_cleanup_queue().
James's other suggestion was to add a callback mechanism
for driver to free q->queue_lock.

I was trying to fix a) in the following thread:
  https://lkml.org/lkml/2011/8/18/103
but haven't gotten response from James so it seems rejected.


> 
> James Bottomley points out that we should always have a reference on the
> request_queue (otherwise final put frees the request_queue on us):
> https://lkml.org/lkml/2011/7/12/265

SCSI is taking a reference count of device being used.

> 
>> If it was not enough, any other openers would have to get the reference
>> count, too, and that should be done in more generic place.
> 
> For DM, dm-multipath is the only direct consumer of request_queue(s)
> that DM didn't allocate.

In DM, there are various users of underlying request_queue's member;
e.g. device_flush_capable(), dm_table_any_congested(), etc.

They would not be safe if request_queue was suddenly freed when
someone accessing 'q->something'.

> 
> We have no intention of adding another request-based target (in fact
> there is serious doubt that request-based DM was ever worth it).  So I
> avoided complicating the DM core (even if only slightly) for rq-based
> concerns that are localized to dm-multipath.

Where to put the code is about the maintainability.
So I don't mind if that's the maintainers' preference.

But for this specific patch, I don't understand why it is necessary
nor sufficient.

> p.s. it should be noted that AFAIK this patch is already part of Oracle
> Linux's uek kernel...

The patch should be harmless except for the extra complexity and
possible future maintenance burden.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

  reply	other threads:[~2011-09-20  5:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-16 13:59 [PATCH v2] dm mpath: maintain reference count for underlying devices Mike Snitzer
2011-09-19  6:49 ` Jun'ichi Nomura
2011-09-19 14:34   ` Mike Snitzer
2011-09-20  5:29     ` Jun'ichi Nomura [this message]
2011-09-20  7:13       ` James Bottomley
2011-09-20 11:31         ` Jun'ichi Nomura
2011-10-17 20:15       ` Mike Snitzer
2011-10-18 13:00         ` Jun'ichi Nomura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E7824CB.9030405@ce.jp.nec.com \
    --to=j-nomura@ce.jp.nec.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=jaxboe@fusionio.com \
    --cc=jbottomley@parallels.com \
    --cc=martin.petersen@oracle.com \
    --cc=roland@purestorage.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox