public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
To: Nikanth Karthikesan <knikanth@suse.de>
Cc: Alasdair G Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	dm-devel@redhat.com, linux-kernel@vger.kernel.org,
	Hannes Reinecke <hare@suse.de>
Subject: Re: [PATCH-v2 2/2] Initialize mempool and elevator only for request-based dm devices
Date: Fri, 14 Aug 2009 16:01:07 +0900	[thread overview]
Message-ID: <4A850BB3.3070703@ct.jp.nec.com> (raw)
In-Reply-To: <200908121417.28760.knikanth@suse.de>

Hi Nikanth,

On 08/12/2009 05:47 PM +0900, Nikanth Karthikesan wrote:
> Hi Kiyoshi Ueda,
> 
> On Wednesday 12 August 2009 07:45:56 Kiyoshi Ueda wrote:
>> Hi Nikanth,
>>
>> On 08/11/2009 06:05 PM +0900, Nikanth Karthikesan wrote:
>>> On Tuesday 11 August 2009 13:36:24 Kiyoshi Ueda wrote:
>>>> On 08/10/2009 07:48 PM +0900, Nikanth Karthikesan wrote:
>>>>> +
>>>>> +		/*
>>>>> +		 * reinitialize make_request_fn as it was reset to the
>>>>> +		 * default __make_request by blk_init_allocate_queue
>>>>> +		 */
>>>>> +		md->saved_make_request_fn = md->queue->make_request_fn;
>>>>> +		blk_queue_make_request(md->queue, dm_request);
>>>>> +
>>>>> +		blk_queue_softirq_done(md->queue, dm_softirq_done);
>>>>> +		blk_queue_prep_rq(md->queue, dm_prep_fn);
>>>>> +		blk_queue_lld_busy(md->queue, dm_lld_busy);
>>>>> +	}
>>>>> +
>>>>>  	__unbind(md);
>>>>>  	r = __bind(md, table, &limits);
>>>> The queue has been registered at the device creation time by
>>>> add_disk() in alloc_dev().
>>>> Since the queue is reconfigured (elevator is attached), you have to
>>>> update the queue registration (e.g. unregister, then re-register).
>>>> But it may not be easy.  At least, there is no exported interface to
>>>> unregister/re-register queue.
>>> Ah, yes. The scheduler attributes will not be exported in
>>> /sys/block/dm*/queue/iosched. Exporting elv_register_queue() and calling
>>> it here solves it. Something like..
>>>
>>> @@ -2203,6 +2199,29 @@ int dm_swap_table(struct mapped_device *md, struct
>>> dm_table *table)
>>>  		goto out;
>>>  	}
>>>
>>> +	/* new device is being marked as request-based */
>>> +	if (!md->map && dm_table_request_based(table)) {
>>> +		/* initialize queue for request-based dm */
>>> +		r = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
>>> +		if (r)
>>> +			goto out;
>>> +
>>> +		r = elv_register_queue(md->queue);
>>> +		/* if (r)
>>> +		 *	 goto out; Better to ignore, just like add_disk does ;-)
>>> +		 */
>>> +		/*
>>> +		 * reinitialize make_request_fn as it was reset to the
>>> +		 * default __make_request by blk_init_allocate_queue
>>> +		 */
>>> +		md->saved_make_request_fn = md->queue->make_request_fn;
>>> +		blk_queue_make_request(md->queue, dm_request);
>>> +
>>> +		blk_queue_softirq_done(md->queue, dm_softirq_done);
>>> +		blk_queue_prep_rq(md->queue, dm_prep_fn);
>>> +		blk_queue_lld_busy(md->queue, dm_lld_busy);
>>> +	}
>>> +
>>>  	__unbind(md);
>>>  	r = __bind(md, table, &limits);
>>>
>>> I would post the v3 of the patches with this change. Do you see any
>>> problems in this?
>> Humm, it might work for now, but I disagree with that.
>>
>> Since elevator is block internal and dm doesn't really care
>> (its initialization is actually hidden in blk_init_allocated_queue()),
>> directly calling elv_register_queue() from dm seems not right.
>> It will likely introduce a bug by future changes in block layer.
>>
>> I think the right approach is to define a proper block layer interface
>> to reflect the queue configuration change.
>> That's why I said "Updating the queue registration may not be easy".
> 
> I do not see too much of overhead in the future with this approach,
> atleast no more than "proper block layer interface".

I don't think so.
Just exporting elv_register_queue() will cause some maintenance costs
to request-based dm developers as below.

Although currently only elevator is the queue's feature which is
needed for only request-based dm, such other features may be added
to queue in the future.
Then, the developer who added the feature may not notice that
request-based dm needs to register the feature here, if there
is no critical problem (e.g. compile error or panic) without it.
That causes the lack of such features only in request-based dm.
Therefore, request-based dm developers always have to watch
the change of the block-layer and make the registration related code.
I think it's a sort of big maintenance cost.

So we should make the code as the change of the block-layer becomes
effective automatically in request-based dm, too, as mush as possible.
In this case, you should make/call an interface for the whole queue,
not only for the elevator, since dm can't/shouldn't know how
blk_init_allocated_queue() initializes the queue.
(And the interface should be used in other generic paths (e.g. add_disk()))
That's a proper block-layer interface which I mentioned, and this
approach should have less overhead than your approach from view point
of longer period.


> IMHO, unregistering the queue and registering the queue again with
> the elevator, is basically wasting CPU cycles and possibly would
> confuse the user-space, which may be watching the sysfs... 

Right, so I said "Updating may not be easy."
(By the way, wasting CPU cycles doesn't matter here, since it happens
 only when we initialize the device and it shouldn't too much.)


> Or asking block layer to recheck and find what we have changed
> in the request_queue. It does not sound like the best solution.

I think this is a better solution than exposing a part of queue
internals as I described above.


> It is better to tell the block-layer that we have added a q->request_fn 
> function, so initialize the elevator.

I don't think it's better as I described above.
(dm can't/shouldn't know how blk_init_allocated_queue() initializes
 the queue.)



By the way, another approach to optimizing the memory usage would be
to determine whether the dm device is bio-based or request-based
at the device creation time, instead of the table binding time.
We want the delayed allocation, since kernel can't decide the device
type until the first table is bound because of the auto-detection
mechanism.  The auto-detection is good for keeping compatibility with
existing user-space tools.  But once user-space tools are changed to
specify device type at the device creation time, we can eventually
remove the auto-detection.
Then, kernel can decide device type in alloc_dev(), so
the initialization code in kernel will become very simple.

FYI, actually, I had this approach in a very early stage of
request-based dm development:
    [kernel]     http://marc.info/?l=dm-devel&m=116656637419846&w=2
    [kernel]     http://marc.info/?l=dm-devel&m=116656689701459&w=2
    [kernel]     http://marc.info/?l=dm-devel&m=116656689707043&w=2
    [user-space] http://marc.info/?l=dm-devel&m=116656689906056&w=2
Now, you can change user-space first before kernel, since
request-based dm is already available.

Thanks,
Kiyoshi Ueda


  reply	other threads:[~2009-08-14  7:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-08  4:56 [PATCH 2/2] Initialize mempool and elevator only for request-based dm devices Nikanth Karthikesan
2009-08-08 16:21 ` Mike Snitzer
2009-08-10 10:21   ` Nikanth Karthikesan
2009-08-10 10:48     ` [PATCH-v2 " Nikanth Karthikesan
2009-08-11  8:06       ` Kiyoshi Ueda
2009-08-11  9:05         ` Nikanth Karthikesan
2009-08-11  9:32           ` [PATCH-v3 " Nikanth Karthikesan
2009-08-12  2:15           ` [PATCH-v2 " Kiyoshi Ueda
2009-08-12  8:47             ` Nikanth Karthikesan
2009-08-14  7:01               ` Kiyoshi Ueda [this message]
2010-05-11 16:23                 ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A850BB3.3070703@ct.jp.nec.com \
    --to=k-ueda@ct.jp.nec.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=hare@suse.de \
    --cc=jens.axboe@oracle.com \
    --cc=knikanth@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox