Linux block layer
 help / color / mirror / Atom feed
* Re: [PATCH 2/4] blk-mq: merge mq and sq make_request instances
From: Bart Van Assche @ 2017-03-13 21:01 UTC (permalink / raw)
  To: hch@lst.de, axboe@kernel.dk; +Cc: linux-block@vger.kernel.org
In-Reply-To: <20170313154833.14165-3-hch@lst.de>

On Mon, 2017-03-13 at 09:48 -0600, Christoph Hellwig wrote:
> @@ -1534,7 +1529,36 @@ static blk_qc_t blk_mq_make_request(struct request=
_queue *q, struct bio *bio)
>  	}
> =20
>  	plug =3D current->plug;
> -	if (((plug && !blk_queue_nomerges(q)) || is_sync)) {
> +	if (plug && q->nr_hw_queues =3D=3D 1) {
> +		struct request *last =3D NULL;
> +
> +		blk_mq_bio_to_request(rq, bio);
> +
> +		/*
> +		 * @request_count may become stale because of schedule
> +		 * out, so check the list again.
> +		 */

The above comment was relevant as long as there was a request_count assignm=
ent
above blk_mq_sched_get_request(). This patch moves that assignment inside i=
f
(plug && q->nr_hw_queues =3D=3D 1). Does that mean that the above comment s=
hould be
removed entirely?

> +		if (list_empty(&plug->mq_list))
> +			request_count =3D 0;
> +		else if (blk_queue_nomerges(q))
> +			request_count =3D blk_plug_queued_count(q);
> +
> +		if (!request_count)
> +			trace_block_plug(q);
> +		else
> +			last =3D list_entry_rq(plug->mq_list.prev);
> +
> +		blk_mq_put_ctx(data.ctx);
> +
> +		if (request_count >=3D BLK_MAX_REQUEST_COUNT || (last &&
> +		    blk_rq_bytes(last) >=3D BLK_PLUG_FLUSH_SIZE)) {
> +			blk_flush_plug_list(plug, false);
> +			trace_block_plug(q);
> +		}
> +
> +		list_add_tail(&rq->queuelist, &plug->mq_list);
> +		goto done;
> +	} else if (((plug && !blk_queue_nomerges(q)) || is_sync)) {
>  		struct request *old_rq =3D NULL;
> =20
>  		blk_mq_bio_to_request(rq, bio);

Bart.=

^ permalink raw reply

* Re: [PATCH 3/4] blk-mq: improve blk_mq_try_issue_directly
From: Bart Van Assche @ 2017-03-13 21:02 UTC (permalink / raw)
  To: hch@lst.de, axboe@kernel.dk; +Cc: linux-block@vger.kernel.org
In-Reply-To: <20170313154833.14165-4-hch@lst.de>

On Mon, 2017-03-13 at 09:48 -0600, Christoph Hellwig wrote:
> Rename blk_mq_try_issue_directly to __blk_mq_try_issue_directly and add a
> new wrapper that takes care of RCU / SRCU locking to avoid having
> boileplate code in the caller which would get duplicated with new callers=
.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>=

^ permalink raw reply

* Re: NULL deref in cpu hot unplug on jens for-linus branch
From: Sagi Grimberg @ 2017-03-13 21:46 UTC (permalink / raw)
  To: Jens Axboe, linux-block@vger.kernel.org, linux-nvme
In-Reply-To: <54f833f1-ab98-5d22-e7d4-5e6059a4c467@fb.com>


> Are you saying your code works on top of 4.11-rc2, but not on top of my
> for-linus?

I was actually on Linus 4.11-rc1 before I rebased on top of your
for-linus.

> That seems odd. Looking at the oops, you are crashing with
> !tags in __blk_mq_tag_idle. The below should work around it, but I'm
> puzzled why this is new.

I got it just once (out of a single run :)), but maybe it is
possible that its racy and not really new.

But another example where this can happen:
blk_mq_realloc_hw_ctxs explicitly checks on hctx->tags != NULL
but right after calls blk_mq_exit_hctx() which goes in the
same route, won't this happen there too? Or is it assumed that
hctx->state does not have BLK_MQ_S_TAG_ACTIVE on here?

> Is it related to the other path you fixed in this patch:
>
> commit 0067d4b020ea07a58540acb2c5fcd3364bf326e0
> Author: Sagi Grimberg <sagi@grimberg.me>
> Date:   Mon Mar 13 16:10:11 2017 +0200
>
>     blk-mq: Fix tagset reinit in the presence of cpu hot-unplug
>
> Since that's also handling hctx->tags == NULL.

The above patch prevented a NULL deref earlier when the
tags were reinitialized, now we are all setup and we
happen to remove an old namespace.

> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 9d97bfc4d465..1283f74bfdfb 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -54,9 +54,11 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
>  	if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>  		return;
>
> -	atomic_dec(&tags->active_queues);
> +	if (tags) {
> +		atomic_dec(&tags->active_queues);
>
> -	blk_mq_tag_wakeup_all(tags, false);
> +		blk_mq_tag_wakeup_all(tags, false);
> +	}
>  }
>
>  /*
>

I'll see if I can test it out later this week. thanks.

^ permalink raw reply

* Re: [PATCH 1/4] blk-mq: remove BLK_MQ_F_DEFER_ISSUE
From: hch @ 2017-03-13 23:16 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch@lst.de, axboe@kernel.dk, linux-block@vger.kernel.org
In-Reply-To: <1489438361.2658.21.camel@sandisk.com>

On Mon, Mar 13, 2017 at 08:52:54PM +0000, Bart Van Assche wrote:
> > -	if (((plug && !blk_queue_nomerges(q)) || is_sync) &&
> > -	    !(data.hctx->flags & BLK_MQ_F_DEFER_ISSUE)) {
> > +	if (((plug && !blk_queue_nomerges(q)) || is_sync)) {
> 
> A minor comment: due to this change the outer pair of parentheses
> became superfluous. Please consider removing these.

The last patch in the series removes the statement in this form. But
if I have to respin the series for some reason I'll make sure it
gets removed here already.

^ permalink raw reply

* Re: [PATCH 2/4] blk-mq: merge mq and sq make_request instances
From: hch @ 2017-03-13 23:17 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch@lst.de, axboe@kernel.dk, linux-block@vger.kernel.org
In-Reply-To: <1489438828.2658.23.camel@sandisk.com>

On Mon, Mar 13, 2017 at 09:01:08PM +0000, Bart Van Assche wrote:
> > +		/*
> > +		 * @request_count may become stale because of schedule
> > +		 * out, so check the list again.
> > +		 */
> 
> The above comment was relevant as long as there was a request_count assignment
> above blk_mq_sched_get_request(). This patch moves that assignment inside if
> (plug && q->nr_hw_queues == 1). Does that mean that the above comment should be
> removed entirely?

I don't think so - for the !blk_queue_nomerges cases we still rely
on blk_attempt_plug_merge calculatіng request_count, so the comment
still applies.

^ permalink raw reply

* Re: [PATCH RFC 01/14] block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler
From: Paolo Valente @ 2017-03-14 11:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, Fabio Checconi, Arianna Avanzini, linux-block,
	Linux-Kernal, Ulf Hansson, Linus Walleij, broonie
In-Reply-To: <7654009c-f38b-20f8-7536-2982338a8228@kernel.dk>


> Il giorno 06 mar 2017, alle ore 21:46, Jens Axboe <axboe@kernel.dk> ha =
scritto:
>=20
> On 03/05/2017 09:02 AM, Paolo Valente wrote:
>>=20
>>> Il giorno 05 mar 2017, alle ore 16:16, Jens Axboe <axboe@kernel.dk> =
ha scritto:
>>>=20
>>> On 03/04/2017 09:01 AM, Paolo Valente wrote:
>>>> We tag as v0 the version of BFQ containing only BFQ's engine plus
>>>> hierarchical support. BFQ's engine is introduced by this commit, =
while
>>>> hierarchical support is added by next commit. We use the v0 tag to
>>>> distinguish this minimal version of BFQ from the versions =
containing
>>>> also the features and the improvements added by next commits. =
BFQ-v0
>>>> coincides with the version of BFQ submitted a few years ago [1], =
apart
>>>> from the introduction of preemption, described below.
>>>>=20
>>>> BFQ is a proportional-share I/O scheduler, whose general structure,
>>>> plus a lot of code, are borrowed from CFQ.
>>>=20
>>> I'll take a closer look at this in the coming week.
>>=20
>> ok
>>=20
>>> But one quick
>>> comment - don't default to BFQ. Both because it might not be fully
>>> stable yet, and also because the performance limitation of it is
>>> quite severe. Whereas deadline doesn't really hurt single queue
>>> flash at all, BFQ will.
>>>=20
>>=20
>> Ok, sorry.  I was doubtful on what to do, but, to not bother you on
>> every details, I went for setting it as default, because I thought
>> people would have preferred to test it, even from boot, in this
>> preliminary stage.  I reset elevator.c in the submission, unless you
>> want me to do it even before receiving your and others' reviews.
>=20
> I don't think it's stable enough for that yet, it's seen very little
> testing outside of your own testing.

Hi Jens.

Yes, sorry, in general people of course use a kernel for a little bit
more than just testing bfq ...

> Given that, it's much better that
> people opt in to testing BFQ, so they at least know they can expect
> crashes.
>=20
> Speaking of testing, I ran into this bug:
>=20
> [ 9469.621413] general protection fault: 0000 [#1] PREEMPT SMP
> [ 9469.627872] Modules linked in: loop dm_mod xfs libcrc32c =
bfq_iosched x86_pkg_temp_thermal btrfe
> [ 9469.648196] CPU: 0 PID: 2114 Comm: kworker/0:1H Tainted: G        W =
      4.11.0-rc1+ #249
> [ 9469.657873] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS =
2.3.4 11/09/2016
> [ 9469.666742] Workqueue: xfs-log/nvme4n1p1 xfs_buf_ioend_work [xfs]
> [ 9469.674213] task: ffff881fe97646c0 task.stack: ffff881ff13d0000
> [ 9469.681053] RIP: 0010:__bfq_bfqq_expire+0xb3/0x110 [bfq_iosched]
> [ 9469.687991] RSP: 0018:ffff881fff603dd8 EFLAGS: 00010082
> [ 9469.694052] RAX: 6b6b6b6b6b6b6b6b RBX: ffff883fe8e0eb58 RCX: =
0000000000010004
> [ 9469.702251] RDX: 0000000000010004 RSI: 0000000000000000 RDI: =
00000000ffffffff
> [ 9469.710456] RBP: ffff881fff603de8 R08: 0000000000000000 R09: =
0000000000000001
> [ 9469.718659] R10: 0000000000000001 R11: 0000000000000000 R12: =
ffff883fe8dbf4e8
> [ 9469.726863] R13: 0000000000000000 R14: 0000000000000001 R15: =
0000000000007904
> [ 9469.735063] FS:  0000000000000000(0000) GS:ffff881fff600000(0000) =
knlGS:0000000000000000
> [ 9469.744539] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9469.751189] CR2: 00000000020c0018 CR3: 0000001fec1db000 CR4: =
00000000003406f0
> [ 9469.759392] DR0: 0000000000000000 DR1: 0000000000000000 DR2: =
0000000000000000
> [ 9469.767596] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: =
0000000000000400
> [ 9469.775794] Call Trace:
> [ 9469.778748]  <IRQ>
> [ 9469.781222]  bfq_bfqq_expire+0x104/0x2f0 [bfq_iosched]
> [ 9469.787193]  ? bfq_idle_slice_timer+0x2a/0xc0 [bfq_iosched]
> [ 9469.793650]  bfq_idle_slice_timer+0x7c/0xc0 [bfq_iosched]
> [ 9469.799914]  __hrtimer_run_queues+0xd9/0x500
> [ 9469.804911]  ? bfq_rq_enqueued+0x340/0x340 [bfq_iosched]
> [ 9469.811072]  hrtimer_interrupt+0xb0/0x200
> [ 9469.815781]  local_apic_timer_interrupt+0x31/0x50
> [ 9469.821264]  smp_apic_timer_interrupt+0x33/0x50
> [ 9469.826555]  apic_timer_interrupt+0x90/0xa0
>=20
> just running the xfstest suite. It's test generic/299. Have you done
> full runs of xfstest? I'd greatly recommend that for shaking out bugs.
> Run a full loop with xfs, one with btrfs, and one with ext4 for better
> confidence in the stability of the code.
>=20

Thanks for this information and suggestions.  I'm running xfstests
right now, to reproduce at least the failure you spotted.

Thanks,
Paolo

> (gdb) l *__bfq_bfqq_expire+0xb3
> 0x5983 is in __bfq_bfqq_expire (block/bfq-iosched.c:2664).
> 2659		 * been properly deactivated or requeued, so we can =
safely
> 2660		 * execute the final step: reset in_service_entity along =
the
> 2661		 * path from entity to the root.
> 2662		 */
> 2663		for_each_entity(entity)
> 2664			entity->sched_data->in_service_entity =3D NULL;
> 2665	}
> 2666=09
> 2667	static void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct =
bfq_queue *bfqq,
> 2668					bo

^ permalink raw reply

* Re: [PATCH 06/16] mmc: core: replace waitqueue with worker
From: Adrian Hunter @ 2017-03-14 12:59 UTC (permalink / raw)
  To: Jens Axboe, Linus Walleij
  Cc: linux-mmc@vger.kernel.org, Ulf Hansson, Paolo Valente,
	Chunyan Zhang, Baolin Wang, linux-block, Christoph Hellwig,
	Arnd Bergmann
In-Reply-To: <97f2a4ea-0802-9f23-6ac7-b9d6c6afbfcc@kernel.dk>

On 13/03/17 16:19, Jens Axboe wrote:
> On 03/13/2017 03:25 AM, Adrian Hunter wrote:
>> On 11/03/17 00:05, Jens Axboe wrote:
>>> On 03/10/2017 07:21 AM, Adrian Hunter wrote:
>>>>> Essentially I take out that thread and replace it with this one worker
>>>>> introduced in this very patch. I agree the driver can block in many ways
>>>>> and that is why I need to have it running in process context, and this
>>>>> is what the worker introduced here provides.
>>>>
>>>> The last time I looked at the blk-mq I/O scheduler code, it pulled up to
>>>> qdepth requests from the I/O scheduler and left them on a local list while
>>>> running ->queue_rq().  That means blocking in ->queue_rq() leaves some
>>>> number of requests in limbo (not issued but also not in the I/O scheduler)
>>>> for that time.
>>>
>>> Look again, if we're not handling the requeued dispatches, we pull one
>>> at the time from the scheduler.
>>>
>>
>> That's good :-)
>>
>> Now the next thing ;-)
>>
>> It looks like we either set BLK_MQ_F_BLOCKING and miss the possibility of
>> issuing synchronous requests immediately, or we don't set BLK_MQ_F_BLOCKING
>> in which case we are never allowed to sleep in ->queue_rq().  Is that true?
> 
> Only one of those statements is true - if you don't set BLK_MQ_F_BLOCKING,
> then you may never block in your ->queue_rq() function. But if you do set it,
> it does not preclude immediate issue of sync requests.

I meant it gets put to the workqueue rather than issued in the context of
the submitter.

^ permalink raw reply

* Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq
From: Paolo Valente @ 2017-03-14 14:12 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Tejun Heo, Fabio Checconi, Arianna Avanzini,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	ulf.hansson@linaro.org, linus.walleij@linaro.org,
	broonie@kernel.org
In-Reply-To: <1D08B61A9CF0974AA09887BE32D889DA0DA827@ULS-OP-MBXIP03.sdcorp.global.sandisk.com>


> Il giorno 07 mar 2017, alle ore 01:22, Bart Van Assche =
<bart.vanassche@sandisk.com> ha scritto:
>=20
> On 03/04/2017 08:01 AM, Paolo Valente wrote:
>> Some patch generates WARNINGS with checkpatch.pl, but these WARNINGS
>> seem to be either unavoidable for the involved pieces of code (which
>> the patch just extends), or false positives.
>=20
> The code in this series looks reasonably clean from a code style point
> of view,

Great, thanks!

> but please address all checkpatch warnings that can be
> addressed easily. A few examples of such checkpatch warnings:
>=20
> ERROR: "foo * bar" should be "foo *bar"
>=20

The offending line is:
*(__PTR) =3D (u64)__data * NSEC_PER_USEC;

so this seems a false positive.

> WARNING: Symbolic permissions 'S_IRUGO|S_IWUSR' are not preferred.
> Consider using octal permissions '0644'.
>=20

I have used symbolic permissions because I find them much easier to
remember and decode than numeric constants, and because it is done so
in cfq-iosched.c, deadline-iosched.c and now mq-deadline.c.  But,
since you share this checkpatch complain, I will switch to constants.

Thanks,
Paolo

> Thanks,
>=20
> Bart.

^ permalink raw reply

* Re: [PATCH RFC 13/14] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs
From: Paolo Valente @ 2017-03-14 14:12 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: tj@kernel.org, axboe@kernel.dk, ulf.hansson@linaro.org,
	linux-kernel@vger.kernel.org, fchecconi@gmail.com,
	Arianna Avanzini, linux-block@vger.kernel.org,
	linus.walleij@linaro.org, broonie@kernel.org
In-Reply-To: <1488848033.3125.12.camel@sandisk.com>


> Il giorno 07 mar 2017, alle ore 01:54, Bart Van Assche =
<bart.vanassche@sandisk.com> ha scritto:
>=20
> On Sat, 2017-03-04 at 17:01 +0100, Paolo Valente wrote:
>> @@ -8301,7 +8297,7 @@ static struct blkcg_policy blkcg_policy_bfq =3D =
{
>> static int __init bfq_init(void)
>> {
>> 	int ret;
>> -	char msg[50] =3D "BFQ I/O-scheduler: v6";
>> +	char msg[50] =3D "BFQ I/O-scheduler: v7r3";
>>=20
>> #ifdef CONFIG_BFQ_GROUP_IOSCHED
>> 	ret =3D blkcg_policy_register(&blkcg_policy_bfq);
>=20
> In the Linux kernel the kernel as a whole has a version number but we
> do not assign a version number to individual core components.
>=20

I'll remove that.

Thanks,
Paolo

> Bart.

^ permalink raw reply

* Re: [PATCH RFC 01/14] block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler
From: Paolo Valente @ 2017-03-14 14:18 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Tejun Heo, Fabio Checconi, Arianna Avanzini,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	ulf.hansson@linaro.org, linus.walleij@linaro.org,
	broonie@kernel.org
In-Reply-To: <1D08B61A9CF0974AA09887BE32D889DA0DA145@ULS-OP-MBXIP03.sdcorp.global.sandisk.com>


> Il giorno 06 mar 2017, alle ore 20:40, Bart Van Assche =
<bart.vanassche@sandisk.com> ha scritto:
>=20

Hi Bart,
thanks for such an accurate review.  I'm addressing the issues you
raised, and I'll get back in touch as soon as I have finished.

Paolo

> On 03/04/2017 08:01 AM, Paolo Valente wrote:
>> BFQ is a proportional-share I/O scheduler, whose general structure,
>> plus a lot of code, are borrowed from CFQ.
>> [ ... ]
>=20
> This description is very useful. However, since it is identical to the
> description this patch adds to Documentation/block/bfq-iosched.txt I
> propose to leave it out from the patch description.
>=20
> What seems missing to me is an overview of the limitations of BFQ. =
Does
> BFQ e.g. support multiple hardware queues?
>=20
>> +3. What are BFQ's tunable?
>> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
>> +[ ... ]
>=20
> A thorough knowledge of BFQ is required to tune it properly. Users =
don't
> want to tune I/O schedulers. Has it been considered to invent =
algorithms
> to tune these parameters automatically?
>=20
>> + * Licensed under GPL-2.
>=20
> The COPYING file at the top of the tree mentions that GPL-v2 licensing
> should be specified as follows close to the start of each source file:
>=20
>    This program is free software; you can redistribute it and/or =
modify
>    it under the terms of the GNU General Public License as published =
by
>    the Free Software Foundation; either version 2 of the License, or
>    (at your option) any later version.
>=20
>    This program is distributed in the hope that it will be useful,
>    but WITHOUT ANY WARRANTY; without even the implied warranty of
>    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>    GNU General Public License for more details.
>=20
>    You should have received a copy of the GNU General Public License
>    along with this program; if not, write to the Free Software
>    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
>    02110-1301 USA
>=20
>> + * BFQ is a proportional-share I/O scheduler, with some extra
>> + * low-latency capabilities. BFQ also supports full hierarchical
>> + * scheduling through cgroups. Next paragraphs provide an =
introduction
>> + * on BFQ inner workings. Details on BFQ benefits and usage can be
>> + * found in Documentation/block/bfq-iosched.txt.
>=20
> That reference should be sufficient - please do not duplicate
> Documentation/block/bfq-iosched.txt in block/bfq-iosched.c.
>=20
>> +/**
>> + * struct bfq_service_tree - per ioprio_class service tree.
>> + *
>> + * Each service tree represents a B-WF2Q+ scheduler on its own.  =
Each
>> + * ioprio_class has its own independent scheduler, and so its own
>> + * bfq_service_tree.  All the fields are protected by the queue lock
>> + * of the containing bfqd.
>> + */
>> +struct bfq_service_tree {
>> +	/* tree for active entities (i.e., those backlogged) */
>> +	struct rb_root active;
>> +	/* tree for idle entities (i.e., not backlogged, with V <=3D =
F_i)*/
>> +	struct rb_root idle;
>> +
>> +	struct bfq_entity *first_idle;	/* idle entity with minimum F_i =
*/
>> +	struct bfq_entity *last_idle;	/* idle entity with maximum F_i =
*/
>> +
>> +	u64 vtime; /* scheduler virtual time */
>> +	/* scheduler weight sum; active and idle entities contribute to =
it */
>> +	unsigned long wsum;
>> +};
>=20
> Inline comments next to structure members are ugly and make the
> structure definition hard to read. Please follow the instructions in
> Documentation/kernel-doc-nano-HOWTO.txt for documenting structure =
members.
>=20
>> +	u64 finish; /* B-WF2Q+ finish timestamp (aka F_i) */
>> +	u64 start;  /* B-WF2Q+ start timestamp (aka S_i) */
>=20
> For all times and timestamps, please document the time unit (e.g. s, =
ms,
> us, ns, jiffies, ...).
>=20
>> +enum bfq_device_speed {
>> +	BFQ_BFQD_FAST,
>> +	BFQ_BFQD_SLOW,
>> +};
>=20
> What is the meaning of "fast" and "slow" devices in this context?
> Anyway, since the first patch that uses this enum is patch 6, please
> defer introduction of this enum until patch 6.
>=20
>> +
>> +/**
>> + * struct bfq_data - per-device data structure.
>> + *
>> + * All the fields are protected by @lock.
>> + */
>> +struct bfq_data {
>> +	/* device request queue */
>> +	struct request_queue *queue;
>> +	[ ... ]
>> +
>> +	/* on-disk position of the last served request */
>> +	sector_t last_position;
>=20
> What is the relevance of last_position if there are multiple hardware
> queues? Will the BFQ algorithm fail to realize its guarantees in that =
case?
>=20
> What is the relevance of this structure member for block devices that
> have multiple spindles, e.g. arrays of hard disks?
>=20
>> +enum bfqq_state_flags {
>> +	BFQ_BFQQ_FLAG_busy =3D 0,		/* has requests or is in =
service */
>> +	BFQ_BFQQ_FLAG_wait_request,	/* waiting for a request */
>> +	BFQ_BFQQ_FLAG_non_blocking_wait_rq, /*
>> +					     * waiting for a request
>> +					     * without idling the device
>> +					     */
>> +	BFQ_BFQQ_FLAG_fifo_expire,	/* FIFO checked in this slice */
>> +	BFQ_BFQQ_FLAG_idle_window,	/* slice idling enabled */
>> +	BFQ_BFQQ_FLAG_sync,		/* synchronous queue */
>> +	BFQ_BFQQ_FLAG_budget_new,	/* no completion with this =
budget */
>> +	BFQ_BFQQ_FLAG_IO_bound,		/*
>> +					 * bfqq has timed-out at least =
once
>> +					 * having consumed at most 2/10 =
of
>> +					 * its budget
>> +					 */
>> +};
>=20
> The "BFQ_BFQQ_FLAG_" prefix looks silly and too long to me. How about
> e.g. using the prefix "BFQQF_" instead?
>=20
>> +#define BFQ_BFQQ_FNS(name)						=
\
>> +static void bfq_mark_bfqq_##name(struct bfq_queue *bfqq)		=
\
>> +{									=
\
>> +	(bfqq)->flags |=3D (1 << BFQ_BFQQ_FLAG_##name);			=
\
>> +}									=
\
>> +static void bfq_clear_bfqq_##name(struct bfq_queue *bfqq)		=
\
>> +{									=
\
>> +	(bfqq)->flags &=3D ~(1 << BFQ_BFQQ_FLAG_##name);			=
\
>> +}									=
\
>> +static int bfq_bfqq_##name(const struct bfq_queue *bfqq)		=
\
>> +{									=
\
>> +	return ((bfqq)->flags & (1 << BFQ_BFQQ_FLAG_##name)) !=3D 0;	=
\
>> +}
>=20
> Are the bodies of the above functions duplicates of __set_bit(),
> __clear_bit() and test_bit()?
>=20
>> +/* Expiration reasons. */
>> +enum bfqq_expiration {
>> +	BFQ_BFQQ_TOO_IDLE =3D 0,		/*
>> +					 * queue has been idling for
>> +					 * too long
>> +					 */
>> +	BFQ_BFQQ_BUDGET_TIMEOUT,	/* budget took too long to be =
used */
>> +	BFQ_BFQQ_BUDGET_EXHAUSTED,	/* budget consumed */
>> +	BFQ_BFQQ_NO_MORE_REQUESTS,	/* the queue has no more =
requests */
>> +	BFQ_BFQQ_PREEMPTED		/* preemption in progress */
>> +};
>=20
> The prefix of these constants refers twice to "BFQ" and does not make =
it
> clear that these constants are about expiration. How about using the
> "BFQQE_" prefix instead?
>=20
>> +/* Maximum backwards seek, in KiB. */
>> +static const int bfq_back_max =3D 16 * 1024;
>=20
> Where does this constant come from? Should it depend on geometry data
> like e.g. the number of sectors in a cylinder?
>=20
>> +#define for_each_entity(entity)	\
>> +	for (; entity ; entity =3D NULL)
>=20
> Why has this confusing #define been introduced? Shouldn't all
> occurrences of this macro be changed into the equivalent "if =
(entity)"?
> We don't want silly macros like this in the Linux kernel.
>=20
>> +#define for_each_entity_safe(entity, parent) \
>> +	for (parent =3D NULL; entity ; entity =3D parent)
>=20
> Same question here - why has this macro been introduced and how has =
its
> name been chosen? Since this macro is used only once and since no =
value
> is assigned to 'parent' in the code controlled by this construct, =
please
> remove this macro and use something that is less confusing than a =
"for"
> loop for something that is not a loop.
>=20
>> +/**
>> + * bfq_weight_to_ioprio - calc an ioprio from a weight.
>> + * @weight: the weight value to convert.
>> + *
>> + * To preserve as much as possible the old only-ioprio user =
interface,
>> + * 0 is used as an escape ioprio value for weights (numerically) =
equal or
>> + * larger than IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF.
>> + */
>> +static unsigned short bfq_weight_to_ioprio(int weight)
>> +{
>> +	return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight < 0 ?
>> +		0 : IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight;
>> +}
>=20
> Please consider using max() or max_t() to make this function less =
verbose.
>=20
>> +
>> +/**
>> + * bfq_active_extract - remove an entity from the active tree.
>> + * @st: the service_tree containing the tree.
>> + * @entity: the entity being removed.
>> + */
>> +static void bfq_active_extract(struct bfq_service_tree *st,
>> +			       struct bfq_entity *entity)
>> +{
>> +	struct bfq_queue *bfqq =3D bfq_entity_to_bfqq(entity);
>> +	struct rb_node *node;
>> +
>> +	node =3D bfq_find_deepest(&entity->rb_node);
>> +	bfq_extract(&st->active, entity);
>> +
>> +	if (node)
>> +		bfq_update_active_tree(node);
>> +
>> +	if (bfqq)
>> +		list_del(&bfqq->bfqq_list);
>> +}
>=20
> Which locks protect the data structures manipulated by this and other
> functions? Have you considered to use lockdep_assert_held() to =
document
> these assumptions?
>=20
>> +	case (BFQ_RQ1_WRAP|BFQ_RQ2_WRAP): /* both rqs wrapped */
>=20
> Please don't use parentheses if no confusion is possible. =
Additionally,
> checkpatch should have requested you to insert a space before and =
after
> the logical or operator.
>=20
>> +static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
>> +				       struct bfq_queue *bfqq)
>> +{
>> +	if (bfqq) {
>> +		bfq_mark_bfqq_budget_new(bfqq);
>> +		bfq_clear_bfqq_fifo_expire(bfqq);
>> +
>> +		bfqd->budgets_assigned =3D (bfqd->budgets_assigned*7 + =
256) / 8;
>=20
> Checkpatch should have asked you to insert spaces around the
> multiplication operator.
>=20
>> +/*
>> + * bfq_default_budget - return the default budget for @bfqq on =
@bfqd.
>> + * @bfqd: the device descriptor.
>> + * @bfqq: the queue to consider.
>> + *
>> + * We use 3/4 of the @bfqd maximum budget as the default value
>> + * for the max_budget field of the queues.  This lets the feedback
>> + * mechanism to start from some middle ground, then the behavior
>> + * of the process will drive the heuristics towards high values, if
>> + * it behaves as a greedy sequential reader, or towards small values
>> + * if it shows a more intermittent behavior.
>> + */
>> +static unsigned long bfq_default_budget(struct bfq_data *bfqd,
>> +					struct bfq_queue *bfqq)
>> +{
>> +	unsigned long budget;
>> +
>> +	/*
>> +	 * When we need an estimate of the peak rate we need to avoid
>> +	 * to give budgets that are too short due to previous =
measurements.
>> +	 * So, in the first 10 assignments use a ``safe'' budget value.
>> +	 */
>> +	if (bfqd->budgets_assigned < 194 && bfqd->bfq_user_max_budget =3D=3D=
 0)
>> +		budget =3D bfq_default_max_budget;
>> +	else
>> +		budget =3D bfqd->bfq_max_budget;
>> +
>> +	return budget - budget / 4;
>> +}
>=20
> Where does the magic constant "194" come from?
>=20
>=20
>> +	} else
>> +		/*
>> +		 * Async queues get always the maximum possible
>> +		 * budget, as for them we do not care about latency
>> +		 * (in addition, their ability to dispatch is limited
>> +		 * by the charging factor).
>> +		 */
>> +		budget =3D bfqd->bfq_max_budget;
>> +
>=20
> Please balance braces. Checkpatch should have warned about the use of =
"}
> else" instead of "} else {".
>=20
>> +static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout)
>> +{
>> +	unsigned long max_budget;
>> +
>> +	/*
>> +	 * The max_budget calculated when autotuning is equal to the
>> +	 * amount of sectors transferred in timeout at the
>> +	 * estimated peak rate.
>> +	 */
>> +	max_budget =3D (unsigned long)(peak_rate * 1000 *
>> +				     timeout >> BFQ_RATE_SHIFT);
>> +
>> +	return max_budget;
>> +}
>=20
> Where does the constant 1000 come from? What are the units of =
peak_rate
> and timeout? What is the maximum value of peak_rate? Can the
> multiplication overflow?
>=20
>> +/*
>> + * In addition to updating the peak rate, checks whether the process
>> + * is "slow", and returns 1 if so. This slow flag is used, in =
addition
>> + * to the budget timeout, to reduce the amount of service provided =
to
>> + * seeky processes, and hence reduce their chances to lower the
>> + * throughput. See the code for more details.
>> + */
>> +static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct =
bfq_queue *bfqq,
>> +				 bool compensate)
>> +{
>> +	u64 bw, usecs, expected, timeout;
>> +	ktime_t delta;
>> +	int update =3D 0;
>> +
>> +	if (!bfq_bfqq_sync(bfqq) || bfq_bfqq_budget_new(bfqq))
>> +		return false;
>> +
>> +	if (compensate)
>> +		delta =3D bfqd->last_idling_start;
>> +	else
>> +		delta =3D ktime_get();
>> +	delta =3D ktime_sub(delta, bfqd->last_budget_start);
>> +	usecs =3D ktime_to_us(delta);
>> +
>> +	/* Don't trust short/unrealistic values. */
>> +	if (usecs < 100 || usecs >=3D LONG_MAX)
>> +		return false;
>=20
> If usecs >=3D LONG_MAX that indicates a kernel bug. Please consider
> triggering a kernel warning in that case.
>=20
>> +/*
>> + * Budget timeout is not implemented through a dedicated timer, but
>> + * just checked on request arrivals and completions, as well as on
>> + * idle timer expirations.
>> + */
>> +static bool bfq_bfqq_budget_timeout(struct bfq_queue *bfqq)
>> +{
>> +	if (bfq_bfqq_budget_new(bfqq) ||
>> +	    time_before(jiffies, bfqq->budget_timeout))
>> +		return false;
>> +	return true;
>> +}
>=20
> Have you considered to use time_is_after_jiffies() instead of
> time_before(jiffies, ...)?
>=20
>> +static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue =
*bfqq,
>> +			  struct bfq_io_cq *bic, pid_t pid, int is_sync)
>> +{
>> +	RB_CLEAR_NODE(&bfqq->entity.rb_node);
>> +	INIT_LIST_HEAD(&bfqq->fifo);
>> +
>> +	bfqq->ref =3D 0;
>> +	bfqq->bfqd =3D bfqd;
>> +
>> +	if (bic)
>> +		bfq_set_next_ioprio_data(bfqq, bic);
>> +
>> +	if (is_sync) {
>> +		if (!bfq_class_idle(bfqq))
>> +			bfq_mark_bfqq_idle_window(bfqq);
>> +		bfq_mark_bfqq_sync(bfqq);
>> +	} else
>> +		bfq_clear_bfqq_sync(bfqq);
>> +
>> +	bfqq->ttime.last_end_request =3D ktime_get_ns() - (1ULL<<32);
>> +
>> +	bfq_mark_bfqq_IO_bound(bfqq);
>> +
>> +	bfqq->pid =3D pid;
>> +
>> +	/* Tentative initial value to trade off between thr and lat */
>> +	bfqq->max_budget =3D bfq_default_budget(bfqd, bfqq);
>> +	bfqq->budget_timeout =3D bfq_smallest_from_now();
>> +	bfqq->pid =3D pid;
>> +
>> +	/* first request is almost certainly seeky */
>> +	bfqq->seek_history =3D 1;
>> +}
>=20
> What is the meaning of the 1ULL << 32 constant?
>=20
>> +static int __init bfq_init(void)
>> +{
>> +	int ret;
>> +	char msg[50] =3D "BFQ I/O-scheduler: v0";
>=20
> Please leave out "[50]" and use "static const char" instead of "char".
>=20
>> diff --git a/block/elevator.c b/block/elevator.c
>> index 01139f5..786fdcd 100644
>> --- a/block/elevator.c
>> +++ b/block/elevator.c
>> @@ -221,14 +221,20 @@ int elevator_init(struct request_queue *q, char =
*name)
>>=20
>> 	if (!e) {
>> 		/*
>> -		 * For blk-mq devices, we default to using mq-deadline,
>> -		 * if available, for single queue devices. If deadline
>> -		 * isn't available OR we have multiple queues, default
>> -		 * to "none".
>> +		 * For blk-mq devices, we default to using bfq, if
>> +		 * available, for single queue devices. If bfq isn't
>> +		 * available, we try mq-deadline. If neither is
>> +		 * available, OR we have multiple queues, default to
>> +		 * "none".
>> 		 */
>> 		if (q->mq_ops) {
>> +			if (q->nr_hw_queues =3D=3D 1) {
>> +				e =3D elevator_get("bfq", false);
>> +				if (!e)
>> +					e =3D =
elevator_get("mq-deadline", false);
>> +			}
>> 			if (q->nr_hw_queues =3D=3D 1)
>> -				e =3D elevator_get("mq-deadline", =
false);
>> +				e =3D elevator_get("bfq", false);
>> 			if (!e)
>> 				return 0;
>> 		} else
>>=20
>=20
> As Jens wrote, it's way too early to make BFQ the default scheduler.
>=20
> Bart.

^ permalink raw reply

* Re: [PATCH 06/16] mmc: core: replace waitqueue with worker
From: Jens Axboe @ 2017-03-14 14:36 UTC (permalink / raw)
  To: Adrian Hunter, Linus Walleij
  Cc: linux-mmc@vger.kernel.org, Ulf Hansson, Paolo Valente,
	Chunyan Zhang, Baolin Wang, linux-block, Christoph Hellwig,
	Arnd Bergmann
In-Reply-To: <a3571278-b30d-c3cb-bea6-d1776eb53025@intel.com>

On 03/14/2017 06:59 AM, Adrian Hunter wrote:
> On 13/03/17 16:19, Jens Axboe wrote:
>> On 03/13/2017 03:25 AM, Adrian Hunter wrote:
>>> On 11/03/17 00:05, Jens Axboe wrote:
>>>> On 03/10/2017 07:21 AM, Adrian Hunter wrote:
>>>>>> Essentially I take out that thread and replace it with this one worker
>>>>>> introduced in this very patch. I agree the driver can block in many ways
>>>>>> and that is why I need to have it running in process context, and this
>>>>>> is what the worker introduced here provides.
>>>>>
>>>>> The last time I looked at the blk-mq I/O scheduler code, it pulled up to
>>>>> qdepth requests from the I/O scheduler and left them on a local list while
>>>>> running ->queue_rq().  That means blocking in ->queue_rq() leaves some
>>>>> number of requests in limbo (not issued but also not in the I/O scheduler)
>>>>> for that time.
>>>>
>>>> Look again, if we're not handling the requeued dispatches, we pull one
>>>> at the time from the scheduler.
>>>>
>>>
>>> That's good :-)
>>>
>>> Now the next thing ;-)
>>>
>>> It looks like we either set BLK_MQ_F_BLOCKING and miss the possibility of
>>> issuing synchronous requests immediately, or we don't set BLK_MQ_F_BLOCKING
>>> in which case we are never allowed to sleep in ->queue_rq().  Is that true?
>>
>> Only one of those statements is true - if you don't set BLK_MQ_F_BLOCKING,
>> then you may never block in your ->queue_rq() function. But if you do set it,
>> it does not preclude immediate issue of sync requests.
> 
> I meant it gets put to the workqueue rather than issued in the context of
> the submitter.

There's one case that doesn't look like it was converted properly, but
that's a mistake. The general insert-and-run cases run inline if we can,
but the direct-issue needs a fixup, see below.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 159187a28d66..4196d6bee92d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1434,7 +1434,8 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
 	return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
 }
 
-static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
+static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
+				      bool can_block)
 {
 	struct request_queue *q = rq->q;
 	struct blk_mq_queue_data bd = {
@@ -1475,7 +1476,7 @@ static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
 	}
 
 insert:
-	blk_mq_sched_insert_request(rq, false, true, true, false);
+	blk_mq_sched_insert_request(rq, false, true, false, can_block);
 }
 
 /*
@@ -1569,11 +1570,11 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
 		if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
 			rcu_read_lock();
-			blk_mq_try_issue_directly(old_rq, &cookie);
+			blk_mq_try_issue_directly(old_rq, &cookie, false);
 			rcu_read_unlock();
 		} else {
 			srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
-			blk_mq_try_issue_directly(old_rq, &cookie);
+			blk_mq_try_issue_directly(old_rq, &cookie, true);
 			srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
 		}
 		goto done;

-- 
Jens Axboe

^ permalink raw reply related

* Re: [PATCH 06/16] mmc: core: replace waitqueue with worker
From: Christoph Hellwig @ 2017-03-14 14:43 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Adrian Hunter, Linus Walleij, linux-mmc@vger.kernel.org,
	Ulf Hansson, Paolo Valente, Chunyan Zhang, Baolin Wang,
	linux-block, Christoph Hellwig, Arnd Bergmann
In-Reply-To: <a3216031-0801-45e2-320c-bfc88fa742b4@kernel.dk>

On Tue, Mar 14, 2017 at 08:36:26AM -0600, Jens Axboe wrote:
> There's one case that doesn't look like it was converted properly, but
> that's a mistake. The general insert-and-run cases run inline if we can,
> but the direct-issue needs a fixup, see below.

Note that blk_mq_try_issue_directly is only called for the multiple
hardware queue case, so MMC would not hit it.

^ permalink raw reply

* Re: [PATCH 06/16] mmc: core: replace waitqueue with worker
From: Jens Axboe @ 2017-03-14 14:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Adrian Hunter, Linus Walleij, linux-mmc@vger.kernel.org,
	Ulf Hansson, Paolo Valente, Chunyan Zhang, Baolin Wang,
	linux-block, Arnd Bergmann
In-Reply-To: <20170314144357.GA31043@lst.de>

On 03/14/2017 08:43 AM, Christoph Hellwig wrote:
> On Tue, Mar 14, 2017 at 08:36:26AM -0600, Jens Axboe wrote:
>> There's one case that doesn't look like it was converted properly, but
>> that's a mistake. The general insert-and-run cases run inline if we can,
>> but the direct-issue needs a fixup, see below.
> 
> Note that blk_mq_try_issue_directly is only called for the multiple
> hardware queue case, so MMC would not hit it.

Right, which is why I said that the general case works fine, it's
only the explicit issue-direct that currently does not. Not by
design, just an error.

-- 
Jens Axboe

^ permalink raw reply

* [PATCH] blk-mq-sched: don't run the queue async from blk_mq_try_issue_directly()
From: Jens Axboe @ 2017-03-14 14:57 UTC (permalink / raw)
  To: linux-block@vger.kernel.org

If we have scheduling enabled, we jump directly to insert-and-run.
That's fine, but we run the queue async and we don't pass in information
on whether we can block from this context or not. Fixup both these
cases.

Signed-off-by: Jens Axboe <axboe@fb.com>

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 159187a28d66..4196d6bee92d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1434,7 +1434,8 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
 	return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
 }
 
-static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
+static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
+				      bool can_block)
 {
 	struct request_queue *q = rq->q;
 	struct blk_mq_queue_data bd = {
@@ -1475,7 +1476,7 @@ static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
 	}
 
 insert:
-	blk_mq_sched_insert_request(rq, false, true, true, false);
+	blk_mq_sched_insert_request(rq, false, true, false, can_block);
 }
 
 /*
@@ -1569,11 +1570,11 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
 		if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
 			rcu_read_lock();
-			blk_mq_try_issue_directly(old_rq, &cookie);
+			blk_mq_try_issue_directly(old_rq, &cookie, false);
 			rcu_read_unlock();
 		} else {
 			srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
-			blk_mq_try_issue_directly(old_rq, &cookie);
+			blk_mq_try_issue_directly(old_rq, &cookie, true);
 			srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
 		}
 		goto done;

-- 
Jens Axboe

^ permalink raw reply related

* Re: [PATCH] blk-mq-sched: don't run the queue async from blk_mq_try_issue_directly()
From: Bart Van Assche @ 2017-03-14 15:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block@vger.kernel.org
In-Reply-To: <3ac4f49e-3bc0-7fc6-e93a-db9ec7ad21cf@kernel.dk>

On 03/14/2017 07:57 AM, Jens Axboe wrote:=0A=
> If we have scheduling enabled, we jump directly to insert-and-run.=0A=
> That's fine, but we run the queue async and we don't pass in information=
=0A=
> on whether we can block from this context or not. Fixup both these=0A=
> cases.=0A=
=0A=
How about renaming "can_block" into "may_sleep"? Otherwise this patch=0A=
looks fine to me.=0A=
=0A=
Bart.=0A=

^ permalink raw reply

* Re: [PATCH] blk-mq-sched: don't run the queue async from blk_mq_try_issue_directly()
From: Jens Axboe @ 2017-03-14 15:26 UTC (permalink / raw)
  To: Bart Van Assche, linux-block@vger.kernel.org
In-Reply-To: <1D08B61A9CF0974AA09887BE32D889DA0EC183@ULS-OP-MBXIP03.sdcorp.global.sandisk.com>

On 03/14/2017 09:17 AM, Bart Van Assche wrote:
> On 03/14/2017 07:57 AM, Jens Axboe wrote:
>> If we have scheduling enabled, we jump directly to insert-and-run.
>> That's fine, but we run the queue async and we don't pass in information
>> on whether we can block from this context or not. Fixup both these
>> cases.
> 
> How about renaming "can_block" into "may_sleep"? Otherwise this patch
> looks fine to me.

Sure, either one is fine with me, at least to me they convey the same
information.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq
From: Paolo Valente @ 2017-03-14 15:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: tj@kernel.org, axboe@kernel.dk, ulf.hansson@linaro.org,
	linux-kernel@vger.kernel.org, fchecconi@gmail.com,
	Arianna Avanzini, linux-block@vger.kernel.org,
	linus.walleij@linaro.org, broonie@kernel.org
In-Reply-To: <1488848390.3125.14.camel@sandisk.com>


> Il giorno 07 mar 2017, alle ore 02:00, Bart Van Assche =
<bart.vanassche@sandisk.com> ha scritto:
>=20
> On Sat, 2017-03-04 at 17:01 +0100, Paolo Valente wrote:
>> Finally, a few details on the patchset.
>>=20
>> The first two patches introduce BFQ-v0, which is more or less the
>> first version of BFQ submitted a few years ago [1].  The remaining
>> patches turn progressively BFQ-v0 into BFQ-v8r8, the current version
>> of BFQ.
>=20
> Hello Paolo,
>=20

Hi Bart,

> Thank you for having done the work to improve, test, fix and post the
> BFQ scheduler as a patch series. However, from what I have seen in the
> patches there is a large number of tunable constants in the code for
> which no scientific approach exists to chose an optimal value.

I'm very sorry about that, I have exported those parameters over the
years, just as an aid for debugging and tuning.  Then I have forgot to
remove them :(

They'll disappear in my next submission.

> Additionally, the complexity of the code is huge. Just like for CFQ,
> sooner or later someone will run into a bug or a performance issue
> and will post a patch to fix it. However, the complexity of BFQ is
> such that a source code review alone won't be sufficient to verify
> whether or not such a patch negatively affects a workload or device
> that has not been tested by the author of the patch. This makes me
> wonder what process should be followed to verify future BFQ patches?
>=20

I've a sort of triple reply for that.

First, I've developed BFQ in a sort of
first-the-problem-then-the-solution way.  That is, each time, I have
first implemented a benchmark that enabled me to highlight the problem
and get all relevant statistics on it, then I have worked on BFQ to
try to solve that problem, using the benchmark as a support.  All
those benchmarks are in the public S suite now.  In particular, by
running one script, and waiting at most one hour, you get graphs of
- throughput with read/write/random/sequential workloads
- start-up times of bash, xterm, gnome terminal and libreoffice, when
all the above combinations of workloads are executed in the background
- frame drop rate for the playback of a movie, again with both all the
above combinations of workloads and the recurrent start of a bash
shell in the background
- kernel-task execution times (compilation, merge, ...), again with
all the above combinations of workloads in the background
- fairness with various combinations of weights and processes
- throughput against interleaved I/O, with a number of readers ranging
from 2 to 9

Every time I fix a bug, add a new feature or port BFQ to a new kernel
version, I just run that script and compare new graphs with previous
ones.  Any regression shows up immediately.  We already have a
similar, working script for Android too, although covering only
throughput, responsiveness and frame drops for the moment.  Of course,
the coverage of these scripts is limited to only the goals for which I
have devised and tuned BFQ so far.  But I hope that it won't be too
hard to extend them to other important use cases (e.g., dbms).

Second, IMO BFQ is complex also because it contains a lot of features.
We have adopted the usual approach for handling this type of
complexity: find clean cuts to get independent pieces, and put each
piece in a separate file, plus one header glue file.  The pieces were:
scheduling engine, hierarchical-scheduling support (allowing the
engine to scheduler generic nodes in the hierarchy), cgroups support.
Yet, Tejun last year, and Jens more recently, have asked to put
everything in one file; for other good reasons of course.  If you do
think that turning back to multiple files may somehow help, and there
are no strong objections from others, then I'm willing to resume this
option and possibly find event better splits.

Third and last, a proposal: why don't we discuss this issue at LSF
too?  In particular, we could talk about the parts of BFQ that seem
more complex to understand, until they become clearer to you.  Then I
could try to understand what helped make them clearer, and translate
it into extra comments in the code or into other, more radical
changes.

Thanks,
Paolo

> Thanks,
>=20
> Bart.

^ permalink raw reply

* Re: [PATCH 4/4] blk-mq: streamline blk_mq_make_request
From: Bart Van Assche @ 2017-03-14 15:40 UTC (permalink / raw)
  To: hch@lst.de, axboe@kernel.dk; +Cc: linux-block@vger.kernel.org
In-Reply-To: <20170313154833.14165-5-hch@lst.de>

On Mon, 2017-03-13 at 09:48 -0600, Christoph Hellwig wrote:
> Turn the different ways of merging or issuing I/O into a series of if/els=
e
> statements instead of the current maze of gotos.  Note that this means we
> pin the CPU a little longer for some cases as the CTX put is moved to
> common code at the end of the function.
>=20
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-mq.c | 67 +++++++++++++++++++++++-----------------------------=
------
>  1 file changed, 27 insertions(+), 40 deletions(-)
>=20
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 48748cb799ed..18e449cc832f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1534,16 +1534,17 @@ static blk_qc_t blk_mq_make_request(struct reques=
t_queue *q, struct bio *bio)
> =20
>  	cookie =3D request_to_qc_t(data.hctx, rq);
> =20
> +	plug =3D current->plug;
>  	if (unlikely(is_flush_fua)) {
> -		if (q->elevator)
> -			goto elv_insert;
>  		blk_mq_bio_to_request(rq, bio);
> -		blk_insert_flush(rq);
> -		goto run_queue;
> -	}
> -
> -	plug =3D current->plug;
> -	if (plug && q->nr_hw_queues =3D=3D 1) {
> +		if (q->elevator) {
> +			blk_mq_sched_insert_request(rq, false, true,
> +						!is_sync || is_flush_fua, true);
> +		} else {
> +			blk_insert_flush(rq);
> +			blk_mq_run_hw_queue(data.hctx, !is_sync || is_flush_fua);
> +		}
> +	} else if (plug && q->nr_hw_queues =3D=3D 1) {
>  		struct request *last =3D NULL;
> =20
>  		blk_mq_bio_to_request(rq, bio);

This change introduces the following construct:

if (is_flush_fua) {
	/* use is_flush_fua */
} else ...

Have you considered to simplify the code that uses is_flush_fua further?

> @@ -1562,8 +1563,6 @@ static blk_qc_t blk_mq_make_request(struct request_=
queue *q, struct bio *bio)
>  		else
>  			last =3D list_entry_rq(plug->mq_list.prev);
> =20
> -		blk_mq_put_ctx(data.ctx);
> -
>  		if (request_count >=3D BLK_MAX_REQUEST_COUNT || (last &&
>  		    blk_rq_bytes(last) >=3D BLK_PLUG_FLUSH_SIZE)) {
>  			blk_flush_plug_list(plug, false);
> @@ -1571,56 +1570,44 @@ static blk_qc_t blk_mq_make_request(struct reques=
t_queue *q, struct bio *bio)
>  		}
> =20
>  		list_add_tail(&rq->queuelist, &plug->mq_list);
> -		goto done;
> -	} else if (((plug && !blk_queue_nomerges(q)) || is_sync)) {
> -		struct request *old_rq =3D NULL;
> -
> +	} else if (plug && !blk_queue_nomerges(q)) {
>  		blk_mq_bio_to_request(rq, bio);
> =20
>  		/*
>  		 * We do limited plugging. If the bio can be merged, do that.
>  		 * Otherwise the existing request in the plug list will be
>  		 * issued. So the plug list will have one request at most
> +		 *
> +		 * The plug list might get flushed before this. If that happens,
> +		 * the plug list is emptry and same_queue_rq is invalid.
>  		 */

"emptry" looks like a typo?

> -		if (plug) {
> -			/*
> -			 * The plug list might get flushed before this. If that
> -			 * happens, same_queue_rq is invalid and plug list is
> -			 * empty
> -			 */
> -			if (same_queue_rq && !list_empty(&plug->mq_list)) {
> -				old_rq =3D same_queue_rq;
> -				list_del_init(&old_rq->queuelist);
> -			}
> -			list_add_tail(&rq->queuelist, &plug->mq_list);
> -		} else /* is_sync */
> -			old_rq =3D rq;
> -		blk_mq_put_ctx(data.ctx);
> -		if (old_rq)
> -			blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
> -		goto done;
> -	}
> +		if (!list_empty(&plug->mq_list))
> +			list_del_init(&same_queue_rq->queuelist);
> +		else
> +			same_queue_rq =3D NULL;
> =20
> -	if (q->elevator) {
> -elv_insert:
> -		blk_mq_put_ctx(data.ctx);
> +		list_add_tail(&rq->queuelist, &plug->mq_list);
> +		if (same_queue_rq)
> +			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
> +					&cookie);
> +	} else if (is_sync) {
> +		blk_mq_bio_to_request(rq, bio);
> +		blk_mq_try_issue_directly(data.hctx, rq, &cookie);
> +	} else if (q->elevator) {
>  		blk_mq_bio_to_request(rq, bio);
>  		blk_mq_sched_insert_request(rq, false, true,
>  						!is_sync || is_flush_fua, true);
> -		goto done;
> -	}
> -	if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
> +	} else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
>  		/*
>  		 * For a SYNC request, send it to the hardware immediately. For
>  		 * an ASYNC request, just ensure that we run it later on. The
>  		 * latter allows for merging opportunities and more efficient
>  		 * dispatching.
>  		 */
> -run_queue:
>  		blk_mq_run_hw_queue(data.hctx, !is_sync || is_flush_fua);
>  	}

Since this code occurs in the else branch of an if (is_flush_fua) statement=
,
can "|| is_flush_fua" be left out?

What I noticed while reviewing this patch is that there are multiple change=
s in
this patch: not only the goto statements have been eliminated but the old_r=
q
variable has been eliminated too. I think this patch would be easier to rev=
iew
if it would be split in three patches: one that removes the old_rq variable=
,
one that eliminates the goto statements and one that removes dead code.

Thanks,

Bart.=

^ permalink raw reply

* Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq
From: Jens Axboe @ 2017-03-14 15:42 UTC (permalink / raw)
  To: Paolo Valente, Bart Van Assche
  Cc: tj@kernel.org, ulf.hansson@linaro.org,
	linux-kernel@vger.kernel.org, fchecconi@gmail.com,
	Arianna Avanzini, linux-block@vger.kernel.org,
	linus.walleij@linaro.org, broonie@kernel.org
In-Reply-To: <81048010-02AB-4A7A-8C10-FAF7E3242DCC@linaro.org>

On 03/14/2017 09:35 AM, Paolo Valente wrote:
> First, I've developed BFQ in a sort of
> first-the-problem-then-the-solution way.  That is, each time, I have
> first implemented a benchmark that enabled me to highlight the problem
> and get all relevant statistics on it, then I have worked on BFQ to
> try to solve that problem, using the benchmark as a support.  All
> those benchmarks are in the public S suite now.  In particular, by
> running one script, and waiting at most one hour, you get graphs of
> - throughput with read/write/random/sequential workloads
> - start-up times of bash, xterm, gnome terminal and libreoffice, when
> all the above combinations of workloads are executed in the background
> - frame drop rate for the playback of a movie, again with both all the
> above combinations of workloads and the recurrent start of a bash
> shell in the background
> - kernel-task execution times (compilation, merge, ...), again with
> all the above combinations of workloads in the background
> - fairness with various combinations of weights and processes
> - throughput against interleaved I/O, with a number of readers ranging
> from 2 to 9
> 
> Every time I fix a bug, add a new feature or port BFQ to a new kernel
> version, I just run that script and compare new graphs with previous
> ones.  Any regression shows up immediately.  We already have a
> similar, working script for Android too, although covering only
> throughput, responsiveness and frame drops for the moment.  Of course,
> the coverage of these scripts is limited to only the goals for which I
> have devised and tuned BFQ so far.  But I hope that it won't be too
> hard to extend them to other important use cases (e.g., dbms).

This is great, btw, and a really nice tool set to have when evaluating
new changes.

> Second, IMO BFQ is complex also because it contains a lot of features.
> We have adopted the usual approach for handling this type of
> complexity: find clean cuts to get independent pieces, and put each
> piece in a separate file, plus one header glue file.  The pieces were:
> scheduling engine, hierarchical-scheduling support (allowing the
> engine to scheduler generic nodes in the hierarchy), cgroups support.
> Yet, Tejun last year, and Jens more recently, have asked to put
> everything in one file; for other good reasons of course.  If you do
> think that turning back to multiple files may somehow help, and there
> are no strong objections from others, then I'm willing to resume this
> option and possibly find event better splits.
> 
> Third and last, a proposal: why don't we discuss this issue at LSF
> too?  In particular, we could talk about the parts of BFQ that seem
> more complex to understand, until they become clearer to you.  Then I
> could try to understand what helped make them clearer, and translate
> it into extra comments in the code or into other, more radical
> changes.

A big issue here is the lack of nicely structured code. It's one massive
file of code, 8751 lines, or almost 270K of code. It might be a lot
easier to read and understand if it was split into smaller files,
containing various parts of it.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq
From: Bart Van Assche @ 2017-03-14 16:32 UTC (permalink / raw)
  To: paolo.valente@linaro.org
  Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	fchecconi@gmail.com, linus.walleij@linaro.org, axboe@kernel.dk,
	avanzini.arianna@gmail.com, broonie@kernel.org, tj@kernel.org,
	ulf.hansson@linaro.org
In-Reply-To: <81048010-02AB-4A7A-8C10-FAF7E3242DCC@linaro.org>

On Tue, 2017-03-14 at 16:35 +0100, Paolo Valente wrote:
> > Il giorno 07 mar 2017, alle ore 02:00, Bart Van Assche <bart.vanassche@=
sandisk.com> ha scritto:
> >=20
> > Additionally, the complexity of the code is huge. Just like for CFQ,
> > sooner or later someone will run into a bug or a performance issue
> > and will post a patch to fix it. However, the complexity of BFQ is
> > such that a source code review alone won't be sufficient to verify
> > whether or not such a patch negatively affects a workload or device
> > that has not been tested by the author of the patch. This makes me
> > wonder what process should be followed to verify future BFQ patches?
>=20
> Third and last, a proposal: why don't we discuss this issue at LSF
> too?  In particular, we could talk about the parts of BFQ that seem
> more complex to understand, until they become clearer to you.  Then I
> could try to understand what helped make them clearer, and translate
> it into extra comments in the code or into other, more radical
> changes.

Hello Paolo,

Sorry if my comment was not clear enough. Suppose that e.g. someone would
like to modify the following code:

static int bfq_min_budget(struct bfq_data *bfqd)
{
       if (bfqd->budgets_assigned < bfq_stats_min_budgets)
               return bfq_default_max_budget / 32;
       else
               return bfqd->bfq_max_budget / 32;
}

How to predict the performance impact of any changes in e.g. this function?
It is really great that a performance benchmark is available. But what shou=
ld
a developer do who only has access to a small subset of all the storage
devices that are supported by the Linux kernel and hence who can not run th=
e
benchmark against every supported storage device? Do developers who do not
fully understand the BFQ algorithms and who run into a performance problem
have any other option than trial and error for fixing such performance issu=
es?

Thanks,

Bart.=

^ permalink raw reply

* Re: [PATCH] blk-mq-sched: don't run the queue async from blk_mq_try_issue_directly()
From: Omar Sandoval @ 2017-03-14 17:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block@vger.kernel.org
In-Reply-To: <3ac4f49e-3bc0-7fc6-e93a-db9ec7ad21cf@kernel.dk>

On Tue, Mar 14, 2017 at 08:57:50AM -0600, Jens Axboe wrote:
> If we have scheduling enabled, we jump directly to insert-and-run.
> That's fine, but we run the queue async and we don't pass in information
> on whether we can block from this context or not. Fixup both these
> cases.

Reviewed-by: Omar Sandoval <osandov@fb.com>

Just one question: we call blk_mq_get_driver_tag() with wait=false in
blk_mq_try_issue_directly(). Should we change that to wait=can_block?
Maybe it's pointless to try a direct issue if we'd have to wait for a
tag anyways, though.

> Signed-off-by: Jens Axboe <axboe@fb.com>
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 159187a28d66..4196d6bee92d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1434,7 +1434,8 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
>  	return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
>  }
>  
> -static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
> +static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
> +				      bool can_block)
>  {
>  	struct request_queue *q = rq->q;
>  	struct blk_mq_queue_data bd = {
> @@ -1475,7 +1476,7 @@ static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
>  	}
>  
>  insert:
> -	blk_mq_sched_insert_request(rq, false, true, true, false);
> +	blk_mq_sched_insert_request(rq, false, true, false, can_block);
>  }
>  
>  /*
> @@ -1569,11 +1570,11 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  
>  		if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
>  			rcu_read_lock();
> -			blk_mq_try_issue_directly(old_rq, &cookie);
> +			blk_mq_try_issue_directly(old_rq, &cookie, false);
>  			rcu_read_unlock();
>  		} else {
>  			srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
> -			blk_mq_try_issue_directly(old_rq, &cookie);
> +			blk_mq_try_issue_directly(old_rq, &cookie, true);
>  			srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
>  		}
>  		goto done;
> 
> -- 
> Jens Axboe
> 

^ permalink raw reply

* Re: [PATCH] blk-mq-sched: don't run the queue async from blk_mq_try_issue_directly()
From: Jens Axboe @ 2017-03-14 17:51 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block@vger.kernel.org
In-Reply-To: <20170314174837.GB2352@vader.DHCP.thefacebook.com>

On 03/14/2017 11:48 AM, Omar Sandoval wrote:
> On Tue, Mar 14, 2017 at 08:57:50AM -0600, Jens Axboe wrote:
>> If we have scheduling enabled, we jump directly to insert-and-run.
>> That's fine, but we run the queue async and we don't pass in information
>> on whether we can block from this context or not. Fixup both these
>> cases.
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> 
> Just one question: we call blk_mq_get_driver_tag() with wait=false in
> blk_mq_try_issue_directly(). Should we change that to wait=can_block?
> Maybe it's pointless to try a direct issue if we'd have to wait for a
> tag anyways, though.

Exactly, don't want to wait for a tag, at that point we are just
pointlessly stalling an app that could perhaps be submitting more IO.
So I don't think we should factor that in here, better to let the
blocking vs non-blocking drivers behave the same in that regard.

-- 
Jens Axboe

^ permalink raw reply

* [PATCH 0/4] block: callback-based statistics
From: Omar Sandoval @ 2017-03-14 21:03 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

This patchset generalizes the blk-stats infrastructure to allow users to
register a callback to be called at a given time with the statistics of
requests completed during that window. Writeback throttling and hybrid
polling are converted to the new infrastructure. The ultimate goal is to
use this infrastructure for the mq I/O scheduler.

The details are in patch 4, which is the actual conversion. Patches 1-3
are preparation cleanups.

A couple of minor open issues:

- Since there is no single source of truth for stats anymore (it's
  per-callback now), I got rid of the sysfs/debugfs files. If we still
  want these, we could potentially expose the statistics that poll is
  using.
- Polling used to use per-hctx stats, but now the stats are gathered to
  a per-request_queue cache. This might make the stats noisier, but we
  can improve this in the future by splitting out stats by request size
  or something like that.

Omar Sandoval (4):
  block: tear out blk-stat debugfs/sysfs entries
  block: remove extra calls to wbt_exit()
  block: change stats from array type to struct type
  blk-stats: convert to callback-based statistics reporting

 block/blk-core.c           |   7 +-
 block/blk-mq-debugfs.c     |  55 ---------
 block/blk-mq.c             |  79 ++++++++----
 block/blk-mq.h             |   1 -
 block/blk-stat.c           | 300 ++++++++++++++++++++-------------------------
 block/blk-stat.h           |  98 ++++++++++++---
 block/blk-sysfs.c          |  31 +----
 block/blk-wbt.c            |  66 +++++-----
 block/blk-wbt.h            |   2 +-
 include/linux/blk_types.h  |   8 +-
 include/linux/blkdev.h     |  10 +-
 include/trace/events/wbt.h |  20 +--
 12 files changed, 331 insertions(+), 346 deletions(-)

-- 
2.12.0

^ permalink raw reply

* [PATCH 1/4] block: tear out blk-stat debugfs/sysfs entries
From: Omar Sandoval @ 2017-03-14 21:03 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team
In-Reply-To: <cover.1489524591.git.osandov@fb.com>

From: Omar Sandoval <osandov@fb.com>

These made sense when there was a single stats buffer for the request
queue, but that's going away.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 block/blk-mq-debugfs.c | 55 --------------------------------------------------
 block/blk-sysfs.c      | 26 ------------------------
 2 files changed, 81 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index f6d917977b33..18672225c417 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -322,60 +322,6 @@ static const struct file_operations hctx_io_poll_fops = {
 	.release	= single_release,
 };
 
-static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
-{
-	seq_printf(m, "samples=%d, mean=%lld, min=%llu, max=%llu",
-		   stat->nr_samples, stat->mean, stat->min, stat->max);
-}
-
-static int hctx_stats_show(struct seq_file *m, void *v)
-{
-	struct blk_mq_hw_ctx *hctx = m->private;
-	struct blk_rq_stat stat[2];
-
-	blk_stat_init(&stat[BLK_STAT_READ]);
-	blk_stat_init(&stat[BLK_STAT_WRITE]);
-
-	blk_hctx_stat_get(hctx, stat);
-
-	seq_puts(m, "read: ");
-	print_stat(m, &stat[BLK_STAT_READ]);
-	seq_puts(m, "\n");
-
-	seq_puts(m, "write: ");
-	print_stat(m, &stat[BLK_STAT_WRITE]);
-	seq_puts(m, "\n");
-	return 0;
-}
-
-static int hctx_stats_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, hctx_stats_show, inode->i_private);
-}
-
-static ssize_t hctx_stats_write(struct file *file, const char __user *buf,
-				size_t count, loff_t *ppos)
-{
-	struct seq_file *m = file->private_data;
-	struct blk_mq_hw_ctx *hctx = m->private;
-	struct blk_mq_ctx *ctx;
-	int i;
-
-	hctx_for_each_ctx(hctx, ctx, i) {
-		blk_stat_init(&ctx->stat[BLK_STAT_READ]);
-		blk_stat_init(&ctx->stat[BLK_STAT_WRITE]);
-	}
-	return count;
-}
-
-static const struct file_operations hctx_stats_fops = {
-	.open		= hctx_stats_open,
-	.read		= seq_read,
-	.write		= hctx_stats_write,
-	.llseek		= seq_lseek,
-	.release	= single_release,
-};
-
 static int hctx_dispatched_show(struct seq_file *m, void *v)
 {
 	struct blk_mq_hw_ctx *hctx = m->private;
@@ -646,7 +592,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
 	{"sched_tags", 0400, &hctx_sched_tags_fops},
 	{"sched_tags_bitmap", 0400, &hctx_sched_tags_bitmap_fops},
 	{"io_poll", 0600, &hctx_io_poll_fops},
-	{"stats", 0600, &hctx_stats_fops},
 	{"dispatched", 0600, &hctx_dispatched_fops},
 	{"queued", 0600, &hctx_queued_fops},
 	{"run", 0600, &hctx_run_fops},
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index c44b321335f3..6e1cf622af21 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -503,26 +503,6 @@ static ssize_t queue_dax_show(struct request_queue *q, char *page)
 	return queue_var_show(blk_queue_dax(q), page);
 }
 
-static ssize_t print_stat(char *page, struct blk_rq_stat *stat, const char *pre)
-{
-	return sprintf(page, "%s samples=%llu, mean=%lld, min=%lld, max=%lld\n",
-			pre, (long long) stat->nr_samples,
-			(long long) stat->mean, (long long) stat->min,
-			(long long) stat->max);
-}
-
-static ssize_t queue_stats_show(struct request_queue *q, char *page)
-{
-	struct blk_rq_stat stat[2];
-	ssize_t ret;
-
-	blk_queue_stat_get(q, stat);
-
-	ret = print_stat(page, &stat[BLK_STAT_READ], "read :");
-	ret += print_stat(page + ret, &stat[BLK_STAT_WRITE], "write:");
-	return ret;
-}
-
 static struct queue_sysfs_entry queue_requests_entry = {
 	.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_requests_show,
@@ -691,11 +671,6 @@ static struct queue_sysfs_entry queue_dax_entry = {
 	.show = queue_dax_show,
 };
 
-static struct queue_sysfs_entry queue_stats_entry = {
-	.attr = {.name = "stats", .mode = S_IRUGO },
-	.show = queue_stats_show,
-};
-
 static struct queue_sysfs_entry queue_wb_lat_entry = {
 	.attr = {.name = "wbt_lat_usec", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_wb_lat_show,
@@ -733,7 +708,6 @@ static struct attribute *default_attrs[] = {
 	&queue_poll_entry.attr,
 	&queue_wc_entry.attr,
 	&queue_dax_entry.attr,
-	&queue_stats_entry.attr,
 	&queue_wb_lat_entry.attr,
 	&queue_poll_delay_entry.attr,
 	NULL,
-- 
2.12.0

^ permalink raw reply related

* [PATCH 2/4] block: remove extra calls to wbt_exit()
From: Omar Sandoval @ 2017-03-14 21:03 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team
In-Reply-To: <cover.1489524591.git.osandov@fb.com>

From: Omar Sandoval <osandov@fb.com>

We always call wbt_exit() from blk_release_queue(), so these are
unnecessary.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 block/blk-core.c | 1 -
 block/blk-mq.c   | 2 --
 2 files changed, 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 0eeb99ef654f..82425017c9b8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -889,7 +889,6 @@ int blk_init_allocated_queue(struct request_queue *q)
 		q->exit_rq_fn(q, q->fq->flush_rq);
 out_free_flush_queue:
 	blk_free_flush_queue(q->fq);
-	wbt_exit(q);
 	return -ENOMEM;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 159187a28d66..eb84daf550f4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2430,8 +2430,6 @@ void blk_mq_free_queue(struct request_queue *q)
 	list_del_init(&q->all_q_node);
 	mutex_unlock(&all_q_mutex);
 
-	wbt_exit(q);
-
 	blk_mq_del_queue_tag_set(q);
 
 	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
-- 
2.12.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox