Re: [RFC PATCH 2/2] dm: only initialize full request_queue for request-based device

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mike Snitzer <snitzer@redhat.com>
To: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Cc: dm-devel@redhat.com, linux-kernel@vger.kernel.org,
	Jens Axboe <jens.axboe@oracle.com>,
	"Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Nikanth Karthikesan <knikanth@suse.de>,
	Alasdair Kergon <agk@redhat.com>
Subject: Re: [RFC PATCH 2/2] dm: only initialize full request_queue for  request-based device
Date: Wed, 19 May 2010 08:01:30 -0400	[thread overview]
Message-ID: <AANLkTinndXq8DkMBOUf5z8R_8lWRoRbDuOZ1LEmWeI21@mail.gmail.com> (raw)
In-Reply-To: <4BF37DD5.9050409@ct.jp.nec.com>

On Wed, May 19, 2010 at 1:57 AM, Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> Hi Mike,
>
> On 05/18/2010 10:46 PM +0900, Mike Snitzer wrote:
>> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
>>> On 05/18/2010 02:27 AM +0900, Mike Snitzer wrote:
>>>> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
>>>>> As far as I understand, the current model of device-mapper is:
>>>>>   - a table (precisely, a target) has various attributes,
>>>>>     bio-based/request-based is one of such attributes
>>>>>   - a table and its attributes are bound to the block device on resume
>>>>> If we want to fix a problem, I think we should either work based on
>>>>> this model or change the model.
>>>>>
>>>>> Your patch makes that loading table affects the block device, so you
>>>>> are changing the model.
>>>>>
>>>>> If you change the model, it should be done carefully.
>>>>> For example, the current model allows most of the table loading code
>>>>> to run without exclusive lock on the device because it doesn't affect
>>>>> the device itself.  If you change this model, table loading needs to
>>>>> be serialized with appropriate locking.
>>>>
>>>> Nice catch, yes md->queue needs protection (see patch below).
>>>
>>> Not enough. (See drivers/md/dm-ioctl.c:table_load().)
>>> Table load sequence is:
>>>   1. populate table
>>>   2. set the table to ->new_map of the hash_cell for the mapped_device
>>>      in protection by _hash_lock.
>>>
>>> Since your fix only serializes the step 1, concurrent table loading
>>> could end up with inconsistent status; e.g. request-based table is
>>> bound to the mapped_device while the queue is initialized as bio-based.
>>> With your new model, those 2 steps above must be atomic.
>>
>> Ah, yes.. I looked at the possibility of serializing the entirety of
>> table_load but determined that would be too excessive (would reduce
>> parallelism of table_load).  But I clearly missed the fact that there
>> could be a race to the _hash_lock protected critical section in
>> table_load() -- leading to queue inconsistency.
>>
>> I'll post v5 of the overall patch which will revert the mapped_device
>> 'queue_lock' serialization that I proposed in v4.  v5 will contain
>> the following patch to localize all table load related queue
>> manipulation to the _hash_lock protected critical section in
>> table_load().  So it sets the queue up _after_ the table's type is
>> established with dm_table_set_type().
>
> dm_table_setup_md_queue() may allocate memory with blocking mode.
> Blocking allocation inside exclusive _hash_lock can cause deadlock;
> e.g. when it has to wait for other dm devices to resume to free some
> memory.

We make no guarantees that other DM devices will resume before a table
load -- so calling dm_table_setup_md_queue() within the exclusive
_hash_lock is no different than other DM devices being suspended while
a request-based DM device performs its first table_load().

My thinking was this should not be a problem as it is only valid to
call dm_table_setup_md_queue() before the newly created request-based
DM device has been resumed.

AFAIK we don't have any explicit constraints on memory allocations
during table load (e.g. table loads shouldn't depend on other devices'
writeback) -- but any GFP_KERNEL allocation could recurse
(elevator_alloc() currently uses GFP_KERNEL with kmalloc_node)...

I'll have to review the DM code further to see if all memory
allocations during table_load() are done via mempools.  I'll also
bring this up on this week's LVM call.

> Also, your patch changes the queue configuration even when a table is
> already active and used.  (e.g. Loading bio-based table to a mapped_device
> which is already active/used as request-based sets q->requst_fn in NULL.)
> That could cause some critical problems.

Yes, that is possible and I can add additional checks to prevent this.
But this speaks to a more general problem with the existing DM code.

dm_swap_table() has the negative check to prevent such table loads, e.g.:
/* cannot change the device type, once a table is bound */

This check should come during table_load, as part of
dm_table_set_type(), rather than during table resume.

Thanks,
Mike

next prev parent reply	other threads:[~2010-05-19 12:01 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-10 22:55 [RFC PATCH 1/2] block: allow initialization of previously allocated request_queue Mike Snitzer
2010-05-10 22:55 ` [RFC PATCH 2/2] dm: only initialize full request_queue for request-based device Mike Snitzer
2010-05-11  4:23   ` Kiyoshi Ueda
2010-05-11 13:15     ` Mike Snitzer
2010-05-12  8:23       ` Kiyoshi Ueda
2010-05-13  3:57         ` Mike Snitzer
2010-05-14  8:06           ` Kiyoshi Ueda
2010-05-14 14:08             ` Mike Snitzer
2010-05-17  9:27               ` Kiyoshi Ueda
2010-05-17 17:27                 ` Mike Snitzer
2010-05-18  8:32                   ` Kiyoshi Ueda
2010-05-18 13:46                     ` Mike Snitzer
2010-05-19  5:57                       ` Kiyoshi Ueda
2010-05-19 12:01                         ` Mike Snitzer [this message]
2010-05-19 14:39                           ` Mike Snitzer
2010-05-19 14:45                             ` Mike Snitzer
2010-05-20 11:21                             ` Kiyoshi Ueda
2010-05-20 17:07                               ` Mike Snitzer
2010-05-21  8:32                                 ` Kiyoshi Ueda
2010-05-21 13:34                                   ` Mike Snitzer
2010-05-24  9:58                                     ` Kiyoshi Ueda
2010-05-19 21:51                           ` Mike Snitzer
2010-05-13  4:31   ` [PATCH 2/2 v2] " Mike Snitzer
2010-05-13  5:02     ` [RFC PATCH 3/2] dm: bio-based device must not register elevator in sysfs Mike Snitzer
2010-05-13 22:14       ` [PATCH 3/2 v2] " Mike Snitzer
2010-05-11  6:55 ` [RFC PATCH 1/2] block: allow initialization of previously allocated request_queue Jens Axboe
2010-05-11 13:18   ` Mike Snitzer
2010-05-11 13:21     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTinndXq8DkMBOUf5z8R_8lWRoRbDuOZ1LEmWeI21@mail.gmail.com \
    --to=snitzer@redhat.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=jens.axboe@oracle.com \
    --cc=k-ueda@ct.jp.nec.com \
    --cc=knikanth@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).