Linux block layer
 help / color / mirror / Atom feed
* Re: [PATCH] blk-cgroup: provide stubs for blkcg_get_fc_appid()
From: Jens Axboe @ 2022-05-19 13:36 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, Keith Busch, linux-block
In-Reply-To: <20220519130207.6492-1-hare@suse.de>

On 5/19/22 7:02 AM, Hannes Reinecke wrote:
> Signed-off-by: Hannes Reinecke <hare@suse.de>

This needs both an actual commit message and a Fixes tag.

-- 
Jens Axboe


^ permalink raw reply

* Re: [PATCH V2 0/1] ubd: add io_uring based userspace block driver
From: Ming Lei @ 2022-05-19 13:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, Harris James R, io-uring,
	Gabriel Krisman Bertazi, ZiyangZhang, Xiaoguang Wang,
	Stefan Hajnoczi, ming.lei
In-Reply-To: <20220517055358.3164431-1-ming.lei@redhat.com>

On Tue, May 17, 2022 at 01:53:57PM +0800, Ming Lei wrote:
> Hello Guys,
> 
> ubd driver is one kernel driver for implementing generic userspace block
> device/driver, which delivers io request from ubd block device(/dev/ubdbN) into
> ubd server[1] which is the userspace part of ubd for communicating
> with ubd driver and handling specific io logic by its target module.
> 
> Another thing ubd driver handles is to copy data between user space buffer
> and request/bio's pages, or take zero copy if mm is ready for support it in
> future. ubd driver doesn't handle any IO logic of the specific driver, so
> it is small/simple, and all io logics are done by the target code in ubdserver.
> 
> The above two are main jobs done by ubd driver.
> 
> ubd driver can help to move IO logic into userspace, in which the
> development work is easier/more effective than doing in kernel, such as,
> ubd-loop takes < 200 lines of loop specific code to get basically same 
> function with kernel loop block driver, meantime the performance is
> still good. ubdsrv[1] provide built-in test for comparing both by running
> "make test T=loop".
> 
> Another example is high performance qcow2 support[2], which could be built with
> ubd framework more easily than doing it inside kernel.
> 
> Also there are more people who express interests on userspace block driver[3],
> Gabriel Krisman Bertazi proposes this topic in lsf/mm/ebpf 2022 and mentioned
> requirement from Google. Ziyang Zhang from Alibaba said they "plan to
> replace TCMU by UBD as a new choice" because UBD can get better throughput than
> TCMU even with single queue[4], meantime UBD is simple. Also there is userspace
> storage service for providing storage to containers.
> 
> It is io_uring based: io request is delivered to userspace via new added
> io_uring command which has been proved as very efficient for making nvme
> passthrough IO to get better IOPS than io_uring(READ/WRITE). Meantime one
> shared/mmap buffer is used for sharing io descriptor to userspace, the
> buffer is readonly for userspace, each IO just takes 24bytes so far.
> It is suggested to use io_uring in userspace(target part of ubd server)
> to handle IO request too. And it is still easy for ubdserver to support
> io handling by non-io_uring, and this work isn't done yet, but can be
> supported easily with help o eventfd.
> 
> This way is efficient since no extra io command copy is required, no sleep
> is needed in transferring io command to userspace. Meantime the communication
> protocol is simple and efficient, one single command of
> UBD_IO_COMMIT_AND_FETCH_REQ can handle both fetching io request desc and commit
> command result in one trip. IO handling is often batched after single
> io_uring_enter() returns, both IO requests from ubd server target and
> IO commands could be handled as a whole batch.
> 
> Remove RFC now because ubd driver codes gets lots of cleanup, enhancement and
> bug fixes since V1:
> 
> - cleanup uapi: remove ubd specific error code,  switch to linux error code,
> remove one command op, remove one field from cmd_desc
> 
> - add monitor mechanism to handle ubq_daemon being killed, ubdsrv[1]
>   includes builtin tests for covering heavy IO with deleting ubd / killing
>   ubq_daemon at the same time, and V2 pass all the two tests(make test T=generic),
>   and the abort/stop mechanism is simple
> 
> - fix MQ command buffer mmap bug, and now 'xfstetests -g auto' works well on
>   MQ ubd-loop devices(test/scratch)
> 
> - improve batching submission as suggested by Jens
> 
> - improve handling for starting device, replace random wait/poll with
> completion
> 
> - all kinds of cleanup, bug fix,..
> 
> And the patch by patch change since V1 can be found in the following
> tree:
> 
> https://github.com/ming1/linux/commits/my_for-5.18-ubd-devel_v2

BTW, a one-line fix[1] is added to above branch, which fixes performance
obviously on small BS(< 128k) test. If anyone run performance test,
please include this fix.

[1] https://github.com/ming1/linux/commit/fa91354b418e83953304a3efad4ee6ac40ea6110

Thanks,
Ming


^ permalink raw reply

* Re: [PATCH -next 7/8] block, bfq: cleanup bfq_bfqq_update_budg_for_activation()
From: yukuai (C) @ 2022-05-19 13:21 UTC (permalink / raw)
  To: Jan Kara
  Cc: paolo.valente, axboe, tj, linux-block, cgroups, linux-kernel,
	yi.zhang
In-Reply-To: <20220519111856.wvk4oetm7odnkg3w@quack3.lan>

在 2022/05/19 19:18, Jan Kara 写道:
> On Sat 14-05-22 17:05:21, Yu Kuai wrote:
>> It will only be called from bfq_bfqq_handle_idle_busy_switch() in
>> specific code branch, there is no need to precaculate
>> 'bfqq_wants_to_preempt' each time bfq_bfqq_handle_idle_busy_switch()
>> is caleld.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> 
> Please see below:
> 
>> @@ -1816,14 +1807,6 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
>>   		  (bfqq->bic || RQ_BIC(rq)->stably_merged) &&
>>   		   (*interactive || soft_rt)));
>>   
>> -	/*
>> -	 * Using the last flag, update budget and check whether bfqq
>> -	 * may want to preempt the in-service queue.
>> -	 */
>> -	bfqq_wants_to_preempt =
>> -		bfq_bfqq_update_budg_for_activation(bfqd, bfqq,
>> -						    arrived_in_time);
>> -
>>   	/*
>>   	 * If bfqq happened to be activated in a burst, but has been
>>   	 * idle for much more than an interactive queue, then we
> ...
>> @@ -1918,7 +1900,7 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
>>   	 * (2) this switch of bfqq to busy changes the scenario.
>>   	 */
>>   	if (bfqd->in_service_queue &&
>> -	    ((bfqq_wants_to_preempt &&
>> +	    ((bfq_bfqq_update_budg_for_activation(bfqd, bfqq) &&
>>   	      bfqq->wr_coeff >= bfqd->in_service_queue->wr_coeff) ||
>>   	     bfq_bfqq_higher_class_or_weight(bfqq, bfqd->in_service_queue) ||
>>   	     !bfq_better_to_idle(bfqd->in_service_queue)) &&
> 
> So these changes are actually wrong because
> bfq_bfqq_update_budg_for_activation() relies on
> bfq_bfqq_non_blocking_wait_rq() but bfq_add_bfqq_busy() clears that. And
> bfq_add_bfqq_busy() is called between the place where
> bfq_bfqq_update_budg_for_activation() was called previously and now so your
> patch breaks this logic.

Hi,

You are right, thanks for the explanation, I'll remove this patch and
the next patch in next version.

Kuai
> 
> 								Honza
> 

^ permalink raw reply

* Re: [linux-next] build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
From: Hannes Reinecke @ 2022-05-19 13:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tasmiya Nalatwad, linux-block, linux-kernel, Abdul Haleem,
	sachinp, mputtash, Christoph Hellwig,
	linux-nvme@lists.infradead.org
In-Reply-To: <c2d252a1-7223-4899-e5c9-e4bb27e2fc8a@kernel.dk>

On 5/19/22 14:42, Jens Axboe wrote:
> On 5/19/22 6:40 AM, Jens Axboe wrote:
>> On Thu, May 19, 2022 at 6:38 AM Hannes Reinecke <hare@suse.de> wrote:
>>>
>>> On 5/19/22 14:14, Jens Axboe wrote:
>>>> On 5/19/22 1:49 AM, Tasmiya Nalatwad wrote:
>>>>> Greetings,
>>>>>
>>>>> linux-next build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
>>>>>
>>>>> Console Logs :
>>>>>
>>>>> [console-expect]#make -j 17 -s && make modules && make modules_install && make install
>>>>> make -j 17 -s && make modules && make modules_install && make install
>>>>> ERROR: modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
>>>>> make[1]: *** [scripts/Makefile.modpost:134: modules-only.symvers] Error 1
>>>>> make: *** [Makefile:1914: modules] Error 2
>>>>> make: *** Waiting for unfinished jobs....
>>>>
>>>> Christoph, can you fix this up?
>>>>
>>> Cannot reproduce with commit 21498d01d045c5b95b93e0a0625ae965b4330ebe.
>>> Please share details.
>>
>> The kerneltest bot also reported this a few days ago, you might be able
>> to find the details there as that includes config etc.
> 
> Here: https://lore.kernel.org/linux-mm/202205190527.o9wVEvHI-lkp@intel.com/
> 
Right. Send a patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply

* [PATCH] blk-cgroup: provide stubs for blkcg_get_fc_appid()
From: Hannes Reinecke @ 2022-05-19 13:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Keith Busch, linux-block, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 include/linux/blk-cgroup.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 9f40dbc65f82..4756f4d2b8e7 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -33,6 +33,9 @@ void blkcg_unpin_online(struct cgroup_subsys_state *blkcg_css);
 struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css);
 struct cgroup_subsys_state *bio_blkcg_css(struct bio *bio);
 
+int blkcg_set_fc_appid(char *app_id, u64 cgrp_id, size_t app_id_len);
+char *blkcg_get_fc_appid(struct bio *bio);
+
 #else	/* CONFIG_BLK_CGROUP */
 
 #define blkcg_root_css	((struct cgroup_subsys_state *)ERR_PTR(-EINVAL))
@@ -44,9 +47,15 @@ static inline struct cgroup_subsys_state *bio_blkcg_css(struct bio *bio)
 {
 	return NULL;
 }
+static inline int blkcg_set_fc_appid(char *app_id, u64 cgrp_id,
+				     size_t app_id_len)
+{
+	return -EINVAL;
+}
+static inline char *blkcg_get_fc_appid(struct bio *bio)
+{
+	return NULL;
+}
 #endif	/* CONFIG_BLK_CGROUP */
 
-int blkcg_set_fc_appid(char *app_id, u64 cgrp_id, size_t app_id_len);
-char *blkcg_get_fc_appid(struct bio *bio);
-
 #endif	/* _BLK_CGROUP_H */
-- 
2.29.2


^ permalink raw reply related

* Re: [PATCH 0/4] bfq: Improve waker detection
From: Jens Axboe @ 2022-05-19 12:52 UTC (permalink / raw)
  To: paolo.valente, jack; +Cc: linux-block
In-Reply-To: <20220519104621.15456-1-jack@suse.cz>

On Thu, 19 May 2022 12:52:28 +0200, Jan Kara wrote:
> this patch series restores regression in dbench for large number of processes
> that was introduced by c65e6fd460b4 ("bfq: Do not let waker requests skip
> proper accounting"). The detailed explanation is in the first patch but the
> culprit in the end was that with large number of dbench clients it often
> happened that flush worker bfqq replaced jbd2 bfqq as a waker of the bfqq
> shared by dbench clients and that resulted in lot of idling and reduced
> throughput.
> 
> [...]

Applied, thanks!

[1/4] bfq: Relax waker detection for shared queues
      commit: f950667356ce90a41b446b726d4595a10cb65415
[2/4] bfq: Allow current waker to defend against a tentative one
      commit: c5ac56bb6110e42e79d3106866658376b2e48ab9
[3/4] bfq: Remove superfluous conversion from RQ_BIC()
      commit: e79cf8892e332b9dafc99aef02189a2897eced24
[4/4] bfq: Remove bfq_requeue_request_body()
      commit: a249ca7dfbce1eb82bcd3a5a6bb21daeade20469

Best regards,
-- 
Jens Axboe



^ permalink raw reply

* Re: [PATCHv2 0/3] direct io alignment relax
From: Jens Axboe @ 2022-05-19 12:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-fsdevel, linux-block, Kernel Team, bvanassche,
	damien.lemoal, Keith Busch
In-Reply-To: <20220519074225.GH22301@lst.de>

On 5/19/22 1:42 AM, Christoph Hellwig wrote:
> On Wed, May 18, 2022 at 04:45:10PM -0600, Jens Axboe wrote:
>> On 5/18/22 11:11 AM, Keith Busch wrote:
>>> From: Keith Busch <kbusch@kernel.org>
>>>
>>> Including the fs list this time.
>>>
>>> I am still working on a better interface to report the dio alignment to
>>> an application. The most recent suggestion of using statx is proving to
>>> be less straight forward than I thought, but I don't want to hold this
>>> series up for that.
>>
>> This looks good to me. Anyone object to queueing this one up?
> 
> Yes.  I really do like this feature, but I don't think it is ready to
> rush it in.  In addition to the ongoing discussions in this thread
> we absolutely need proper statx support for the alignments to avoid
> userspace growing all kinds of sysfs growling crap to make use of it.

OK fair enough, I do agree that we need a better story for exposing this
data, and in fact for a whole bunch of other stuff that is currently
hard to get programatically. We can defer to 5.20 and get the statx side
hashed out at the same time.

-- 
Jens Axboe


^ permalink raw reply

* Re: [linux-next] build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
From: Jens Axboe @ 2022-05-19 12:42 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Tasmiya Nalatwad, linux-block, linux-kernel, Abdul Haleem,
	sachinp, mputtash, Christoph Hellwig,
	linux-nvme@lists.infradead.org
In-Reply-To: <900f57bc-9978-9ba6-22fb-48f03fcf5011@kernel.dk>

On 5/19/22 6:40 AM, Jens Axboe wrote:
> On Thu, May 19, 2022 at 6:38 AM Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 5/19/22 14:14, Jens Axboe wrote:
>>> On 5/19/22 1:49 AM, Tasmiya Nalatwad wrote:
>>>> Greetings,
>>>>
>>>> linux-next build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
>>>>
>>>> Console Logs :
>>>>
>>>> [console-expect]#make -j 17 -s && make modules && make modules_install && make install
>>>> make -j 17 -s && make modules && make modules_install && make install
>>>> ERROR: modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
>>>> make[1]: *** [scripts/Makefile.modpost:134: modules-only.symvers] Error 1
>>>> make: *** [Makefile:1914: modules] Error 2
>>>> make: *** Waiting for unfinished jobs....
>>>
>>> Christoph, can you fix this up?
>>>
>> Cannot reproduce with commit 21498d01d045c5b95b93e0a0625ae965b4330ebe.
>> Please share details.
> 
> The kerneltest bot also reported this a few days ago, you might be able
> to find the details there as that includes config etc.

Here: https://lore.kernel.org/linux-mm/202205190527.o9wVEvHI-lkp@intel.com/

-- 
Jens Axboe


^ permalink raw reply

* Re: [linux-next] build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
From: Jens Axboe @ 2022-05-19 12:40 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Tasmiya Nalatwad, linux-block, linux-kernel, Abdul Haleem,
	sachinp, mputtash, Christoph Hellwig,
	linux-nvme@lists.infradead.org
In-Reply-To: <fc5f8b96-3c93-5400-b917-a1d991cbe7c9@suse.de>

On Thu, May 19, 2022 at 6:38 AM Hannes Reinecke <hare@suse.de> wrote:
>
> On 5/19/22 14:14, Jens Axboe wrote:
> > On 5/19/22 1:49 AM, Tasmiya Nalatwad wrote:
> >> Greetings,
> >>
> >> linux-next build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
> >>
> >> Console Logs :
> >>
> >> [console-expect]#make -j 17 -s && make modules && make modules_install && make install
> >> make -j 17 -s && make modules && make modules_install && make install
> >> ERROR: modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
> >> make[1]: *** [scripts/Makefile.modpost:134: modules-only.symvers] Error 1
> >> make: *** [Makefile:1914: modules] Error 2
> >> make: *** Waiting for unfinished jobs....
> >
> > Christoph, can you fix this up?
> >
> Cannot reproduce with commit 21498d01d045c5b95b93e0a0625ae965b4330ebe.
> Please share details.

The kerneltest bot also reported this a few days ago, you might be able
to find the details there as that includes config etc.

-- 
Jens Axboe


^ permalink raw reply

* Re: [linux-next] build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
From: Hannes Reinecke @ 2022-05-19 12:38 UTC (permalink / raw)
  To: Jens Axboe, Tasmiya Nalatwad, linux-block
  Cc: linux-kernel, abdhalee, sachinp, mputtash, Christoph Hellwig,
	linux-nvme@lists.infradead.org
In-Reply-To: <7b16694b-0281-d06d-7564-c4f26760a25e@kernel.dk>

On 5/19/22 14:14, Jens Axboe wrote:
> On 5/19/22 1:49 AM, Tasmiya Nalatwad wrote:
>> Greetings,
>>
>> linux-next build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
>>
>> Console Logs :
>>
>> [console-expect]#make -j 17 -s && make modules && make modules_install && make install
>> make -j 17 -s && make modules && make modules_install && make install
>> ERROR: modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
>> make[1]: *** [scripts/Makefile.modpost:134: modules-only.symvers] Error 1
>> make: *** [Makefile:1914: modules] Error 2
>> make: *** Waiting for unfinished jobs....
> 
> Christoph, can you fix this up?
> 
Cannot reproduce with commit 21498d01d045c5b95b93e0a0625ae965b4330ebe.
Please share details.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply

* Re: [PATCH -next v3 2/2] blk-throttle: fix io hung due to configuration updates
From: yukuai (C) @ 2022-05-19 12:14 UTC (permalink / raw)
  To: Michal Koutný
  Cc: tj, axboe, ming.lei, geert, cgroups, linux-block, linux-kernel,
	yi.zhang
In-Reply-To: <20220519095857.GE16096@blackbody.suse.cz>

在 2022/05/19 17:58, Michal Koutný 写道:
> Hello Kuayi.
> 
> On Thu, May 19, 2022 at 04:58:11PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
>> If new configuration is submitted while a bio is throttled, then new
>> waiting time is recaculated regardless that the bio might aready wait
>> for some time:
>>
>> tg_conf_updated
>>   throtl_start_new_slice
>>    tg_update_disptime
>>    throtl_schedule_next_dispatch
>>
>> Then io hung can be triggered by always submmiting new configuration
>> before the throttled bio is dispatched.
> 
> O.K.
> 
>> -	/*
>> -	 * We're already holding queue_lock and know @tg is valid.  Let's
>> -	 * apply the new config directly.
>> -	 *
>> -	 * Restart the slices for both READ and WRITES. It might happen
>> -	 * that a group's limit are dropped suddenly and we don't want to
>> -	 * account recently dispatched IO with new low rate.
>> -	 */
>> -	throtl_start_new_slice(tg, READ);
>> -	throtl_start_new_slice(tg, WRITE);
>> +	throtl_update_slice(tg, old_limits);
> 
> throtl_start_new_slice zeroes *_disp fields.
Hi,

The problem is not just zeroes *_disp fields, in fact, the real problem
is that 'slice_start' is reset to jiffies.

> If for instance, new config allowed only 0.5 throughput, the *_disp
> fields would be scaled to 0.5.
> How that change helps (better) the previously throttled bio to be dispatched?
> 
tg_with_in_bps_limit() is caculating 'wait' based on 'slice_start'and
'bytes_disp':

tg_with_in_bps_limit:
  jiffy_elapsed_rnd = jiffies - tg->slice_start[rw];
  tmp = bps_limit * jiffy_elapsed_rnd;
  do_div(tmp, HZ);
  bytes_allowed = tmp; -> how many bytes are allowed in this slice,
		         incluing dispatched.
  if (tg->bytes_disp[rw] + bio_size <= bytes_allowed)
   *wait = 0 -> no need to wait if this bio is within limit

  extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
  -> extra_bytes is based on 'bytes_disp'

For example:

1) bps_limit is 2k, we issue two io, (1k and 9k)
2) the first io(1k) will be dispatched, bytes_disp = 1k, slice_start = 0
    the second io(9k) is waiting for (9 - (2 - 1)) / 2 = 4 s
3) after 3 s, we update bps_limit to 1k, then new waiting is caculated:

without this patch:  bytes_disp = 0, slict_start =3:
bytes_allowed = 1k
extra_bytes = 9k - 1k = 8k
wait = 8s

whth this patch: bytes_disp = 0.5k, slice_start =  0,
bytes_allowed = 1k * 3 + 1k = 4k
extra_bytes =  0.5k + 9k - 4k = 5.5k
wait = 5.5s

I hope I can expliain it clearly...

Thanks,
Kuai
> (Is it because you omit update of slice_{start,end}?)
> 
> Thanks,
> Michal
> 
> .
> 

^ permalink raw reply

* Re: [linux-next] build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
From: Jens Axboe @ 2022-05-19 12:14 UTC (permalink / raw)
  To: Tasmiya Nalatwad, linux-block
  Cc: linux-kernel, abdhalee, sachinp, mputtash, Christoph Hellwig,
	linux-nvme@lists.infradead.org
In-Reply-To: <86768c9d-9a9c-653b-ab99-86de3bc434d8@linux.vnet.ibm.com>

On 5/19/22 1:49 AM, Tasmiya Nalatwad wrote:
> Greetings,
> 
> linux-next build fails modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
> 
> Console Logs :
> 
> [console-expect]#make -j 17 -s && make modules && make modules_install && make install
> make -j 17 -s && make modules && make modules_install && make install
> ERROR: modpost: "blkcg_get_fc_appid" [drivers/nvme/host/nvme-fc.ko] undefined!
> make[1]: *** [scripts/Makefile.modpost:134: modules-only.symvers] Error 1
> make: *** [Makefile:1914: modules] Error 2
> make: *** Waiting for unfinished jobs....

Christoph, can you fix this up?

-- 
Jens Axboe


^ permalink raw reply

* Re: [RFC: kdevops] Standardizing on failure rate nomenclature for expunges
From: Zorro Lang @ 2022-05-19 11:24 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Luis Chamberlain, linux-fsdevel, linux-block, pankydev8,
	Theodore Tso, Josef Bacik, jmeneghi, Jan Kara, Davidlohr Bueso,
	Dan Williams, Jake Edge, Klaus Jensen, fstests
In-Reply-To: <CAOQ4uxhKHMjGq0QKKMPFAV6iJFwe1H5hBomCVVeT1EWJzo0eXg@mail.gmail.com>

On Thu, May 19, 2022 at 09:36:41AM +0300, Amir Goldstein wrote:
> [adding fstests and Zorro]
> 
> On Thu, May 19, 2022 at 6:07 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > I've been promoting the idea that running fstests once is nice,
> > but things get interesting if you try to run fstests multiple
> > times until a failure is found. It turns out at least kdevops has
> > found tests which fail with a failure rate of typically 1/2 to
> > 1/30 average failure rate. That is 1/2 means a failure can happen
> > 50% of the time, whereas 1/30 means it takes 30 runs to find the
> > failure.
> >
> > I have tried my best to annotate failure rates when I know what
> > they might be on the test expunge list, as an example:
> >
> > workflows/fstests/expunges/5.17.0-rc7/xfs/unassigned/xfs_reflink.txt:generic/530 # failure rate about 1/15 https://gist.github.com/mcgrof/4129074db592c170e6bf748aa11d783d
> >
> > The term "failure rate 1/15" is 16 characters long, so I'd like
> > to propose to standardize a way to represent this. How about
> >
> > generic/530 # F:1/15
> >
> 
> I am not fond of the 1/15 annotation at all, because the only fact that you
> are able to document is that the test failed after 15 runs.
> Suggesting that this means failure rate of 1/15 is a very big step.
> 
> > Then we could extend the definition. F being current estimate, and this
> > can be just how long it took to find the first failure. A more valuable
> > figure would be failure rate avarage, so running the test multiple
> > times, say 10, to see what the failure rate is and then averaging the
> > failure out. So this could be a more accurate representation. For this
> > how about:
> >
> > generic/530 # FA:1/15
> >
> > This would mean on average there failure rate has been found to be about
> > 1/15, and this was determined based on 10 runs.
> >
> > We should also go extend check for fstests/blktests to run a test
> > until a failure is found and report back the number of successes.
> >
> > Thoughts?
> >
> 
> I have had a discussion about those tests with Zorro.

Hi Amir,

Thanks for publicing this discussion.

Yes, we talked about this, but if I don't rememeber wrong, I recommended each
downstream testers maintain their own "testing data/config", likes exclude
list, failed ratio, known failures etc. I think they're not suitable to be
fixed in the mainline fstests.

About the other idea I metioned in LSF, we can create some more group names to
mark those cases with random load/data/env etc, they're worth to be run more
times. I also talked about that with Darrick, we haven't maken a decision,
but I'd like to push that if most of other forks would like to see that.

In my internal regression test for RHEL, I give some fstests cases a new
group name "redhat_random" (sure, I know it's not a good name, it's just
for my internal test, welcome better name, I'm not a good english speaker :).
Then combine with quick and stress group name, I loop run "redhat_random"
cases different times, with different LOAD/TIME_FACTOR.

So I hope to have one "or more specific" group name to mark those random
test cases at first, likes [1] (I'm sure it's incomplete, but can be improved
if we can get more help from more people :)

Thanks,
Zorro

[1]
generic/013
generic/019
generic/051
generic/068
generic/070
generic/075
generic/076
generic/083
generic/091
generic/112
generic/117
generic/127
generic/231
generic/232
generic/233
generic/263
generic/269
generic/270
generic/388
generic/390
generic/413
generic/455
generic/457
generic/461
generic/464
generic/475
generic/476
generic/482
generic/521
generic/522
generic/547
generic/551
generic/560
generic/561
generic/616
generic/617
generic/648
generic/650
xfs/011
xfs/013
xfs/017
xfs/032
xfs/051
xfs/057
xfs/068
xfs/079
xfs/104
xfs/137
xfs/141
xfs/167
xfs/297
xfs/305
xfs/442
xfs/517

> 
> Those tests that some people refer to as "flaky" are valuable,
> but they are not deterministic, they are stochastic.
> 
> I think MTBF is the standard way to describe reliability
> of such tests, but I am having a hard time imagining how
> the community can manage to document accurate annotations
> of this sort, so I would stick with documenting the facts
> (i.e. the test fails after N runs).
> 
> OTOH, we do have deterministic tests, maybe even the majority of
> fstests are deterministic(?)
> 
> Considering that every auto test loop takes ~2 hours on our rig and that
> I have been running over 100 loops over the past two weeks, if half
> of fstests are deterministic, that is a lot of wait time and a lot of carbon
> emission gone to waste.
> 
> It would have been nice if I was able to exclude a "deterministic" group.
> The problem is - can a developer ever tag a test as being "deterministic"?
> 
> Thanks,
> Amir.
> 


^ permalink raw reply

* Re: [PATCH -next 8/8] block, bfq: cleanup bfq_bfqq_handle_idle_busy_switch()
From: Jan Kara @ 2022-05-19 11:23 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, paolo.valente, axboe, tj, linux-block, cgroups,
	linux-kernel, yi.zhang
In-Reply-To: <20220514090522.1669270-9-yukuai3@huawei.com>

On Sat 14-05-22 17:05:22, Yu Kuai wrote:
> 'wr_or_deserves_wr' is only used in bfq_update_bfqq_wr_on_rq_arrival(),
> which is only called from bfq_bfqq_handle_idle_busy_switch() in specific
> code branch, thus there is no need to precaculate 'wr_or_deserves_wr'
> each time bfq_bfqq_handle_idle_busy_switch() is called.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

With this patch there's the same problem as with the previous one. Some of
the variables passed to bfq_update_bfqq_wr_on_rq_arrival() (in_burst,
soft_rt) would actually evaluate differently later in the function.

								Honza

> ---
>  block/bfq-iosched.c | 110 +++++++++++++++++++++++++-------------------
>  1 file changed, 62 insertions(+), 48 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 1e57d76c8dd3..cea8cb3f5ee2 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -1624,15 +1624,65 @@ static unsigned long bfq_smallest_from_now(void)
>  	return jiffies - MAX_JIFFY_OFFSET;
>  }
>  
> +/*
> + * bfqq deserves to be weight-raised if:
> + * - it is sync,
> + * - it does not belong to a large burst,
> + * - it has been idle for enough time or is soft real-time,
> + * - is linked to a bfq_io_cq (it is not shared in any sense),
> + * - has a default weight (otherwise we assume the user wanted
> + *   to control its weight explicitly)
> + *
> + * Merged bfq_queues are kept out of weight-raising
> + * (low-latency) mechanisms. The reason is that these queues
> + * are usually created for non-interactive and
> + * non-soft-real-time tasks. Yet this is not the case for
> + * stably-merged queues. These queues are merged just because
> + * they are created shortly after each other. So they may
> + * easily serve the I/O of an interactive or soft-real time
> + * application, if the application happens to spawn multiple
> + * processes. So let also stably-merged queued enjoy weight
> + * raising.
> + */
> +static bool bfqq_wr_or_deserves_wr(struct bfq_data *bfqd,
> +				   struct bfq_queue *bfqq,
> +				   struct request *rq,
> +				   bool interactive, bool soft_rt)
> +{
> +	if (!bfqd->low_latency)
> +		return false;
> +
> +	if (bfqq->wr_coeff > 1)
> +		return true;
> +
> +	if (!bfq_bfqq_sync(bfqq))
> +		return false;
> +
> +	if (!bfqq->bic && !RQ_BIC(rq)->stably_merged)
> +		return false;
> +
> +	if (!interactive && !soft_rt)
> +		return false;
> +
> +	return true;
> +}
> +
>  static void bfq_update_bfqq_wr_on_rq_arrival(struct bfq_data *bfqd,
>  					     struct bfq_queue *bfqq,
>  					     unsigned int old_wr_coeff,
> -					     bool wr_or_deserves_wr,
> -					     bool interactive,
> -					     bool in_burst,
> -					     bool soft_rt)
> -{
> -	if (old_wr_coeff == 1 && wr_or_deserves_wr) {
> +					     struct request *rq,
> +					     bool interactive)
> +{
> +	bool in_burst = bfq_bfqq_in_large_burst(bfqq);
> +	bool soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
> +		       !BFQQ_TOTALLY_SEEKY(bfqq) &&
> +		       !in_burst &&
> +		       time_is_before_jiffies(bfqq->soft_rt_next_start) &&
> +		       bfqq->dispatched == 0 &&
> +		       bfqq->entity.new_weight == 40;
> +
> +	if (old_wr_coeff == 1 &&
> +	    bfqq_wr_or_deserves_wr(bfqd, bfqq, rq, interactive, soft_rt)) {
>  		/* start a weight-raising period */
>  		if (interactive) {
>  			bfqq->service_from_wr = 0;
> @@ -1674,9 +1724,9 @@ static void bfq_update_bfqq_wr_on_rq_arrival(struct bfq_data *bfqd,
>  		if (interactive) { /* update wr coeff and duration */
>  			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
>  			bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
> -		} else if (in_burst)
> +		} else if (in_burst) {
>  			bfqq->wr_coeff = 1;
> -		else if (soft_rt) {
> +		} else if (soft_rt) {
>  			/*
>  			 * The application is now or still meeting the
>  			 * requirements for being deemed soft rt.  We
> @@ -1768,44 +1818,11 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
>  					     struct request *rq,
>  					     bool *interactive)
>  {
> -	bool soft_rt, in_burst,	wr_or_deserves_wr,
> -		idle_for_long_time = bfq_bfqq_idle_for_long_time(bfqd, bfqq);
> +	bool in_burst = bfq_bfqq_in_large_burst(bfqq);
> +	bool idle_for_long_time = bfq_bfqq_idle_for_long_time(bfqd, bfqq);
>  
> -	/*
> -	 * bfqq deserves to be weight-raised if:
> -	 * - it is sync,
> -	 * - it does not belong to a large burst,
> -	 * - it has been idle for enough time or is soft real-time,
> -	 * - is linked to a bfq_io_cq (it is not shared in any sense),
> -	 * - has a default weight (otherwise we assume the user wanted
> -	 *   to control its weight explicitly)
> -	 */
> -	in_burst = bfq_bfqq_in_large_burst(bfqq);
> -	soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
> -		!BFQQ_TOTALLY_SEEKY(bfqq) &&
> -		!in_burst &&
> -		time_is_before_jiffies(bfqq->soft_rt_next_start) &&
> -		bfqq->dispatched == 0 &&
> -		bfqq->entity.new_weight == 40;
>  	*interactive = !in_burst && idle_for_long_time &&
>  		bfqq->entity.new_weight == 40;
> -	/*
> -	 * Merged bfq_queues are kept out of weight-raising
> -	 * (low-latency) mechanisms. The reason is that these queues
> -	 * are usually created for non-interactive and
> -	 * non-soft-real-time tasks. Yet this is not the case for
> -	 * stably-merged queues. These queues are merged just because
> -	 * they are created shortly after each other. So they may
> -	 * easily serve the I/O of an interactive or soft-real time
> -	 * application, if the application happens to spawn multiple
> -	 * processes. So let also stably-merged queued enjoy weight
> -	 * raising.
> -	 */
> -	wr_or_deserves_wr = bfqd->low_latency &&
> -		(bfqq->wr_coeff > 1 ||
> -		 (bfq_bfqq_sync(bfqq) &&
> -		  (bfqq->bic || RQ_BIC(rq)->stably_merged) &&
> -		   (*interactive || soft_rt)));
>  
>  	/*
>  	 * If bfqq happened to be activated in a burst, but has been
> @@ -1840,11 +1857,8 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
>  		if (time_is_before_jiffies(bfqq->split_time +
>  					   bfqd->bfq_wr_min_idle_time)) {
>  			bfq_update_bfqq_wr_on_rq_arrival(bfqd, bfqq,
> -							 old_wr_coeff,
> -							 wr_or_deserves_wr,
> -							 *interactive,
> -							 in_burst,
> -							 soft_rt);
> +							 old_wr_coeff, rq,
> +							 *interactive);
>  
>  			if (old_wr_coeff != bfqq->wr_coeff)
>  				bfqq->entity.prio_changed = 1;
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH -next 7/8] block, bfq: cleanup bfq_bfqq_update_budg_for_activation()
From: Jan Kara @ 2022-05-19 11:18 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, paolo.valente, axboe, tj, linux-block, cgroups,
	linux-kernel, yi.zhang
In-Reply-To: <20220514090522.1669270-8-yukuai3@huawei.com>

On Sat 14-05-22 17:05:21, Yu Kuai wrote:
> It will only be called from bfq_bfqq_handle_idle_busy_switch() in
> specific code branch, there is no need to precaculate
> 'bfqq_wants_to_preempt' each time bfq_bfqq_handle_idle_busy_switch()
> is caleld.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Please see below:

> @@ -1816,14 +1807,6 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
>  		  (bfqq->bic || RQ_BIC(rq)->stably_merged) &&
>  		   (*interactive || soft_rt)));
>  
> -	/*
> -	 * Using the last flag, update budget and check whether bfqq
> -	 * may want to preempt the in-service queue.
> -	 */
> -	bfqq_wants_to_preempt =
> -		bfq_bfqq_update_budg_for_activation(bfqd, bfqq,
> -						    arrived_in_time);
> -
>  	/*
>  	 * If bfqq happened to be activated in a burst, but has been
>  	 * idle for much more than an interactive queue, then we
...
> @@ -1918,7 +1900,7 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
>  	 * (2) this switch of bfqq to busy changes the scenario.
>  	 */
>  	if (bfqd->in_service_queue &&
> -	    ((bfqq_wants_to_preempt &&
> +	    ((bfq_bfqq_update_budg_for_activation(bfqd, bfqq) &&
>  	      bfqq->wr_coeff >= bfqd->in_service_queue->wr_coeff) ||
>  	     bfq_bfqq_higher_class_or_weight(bfqq, bfqd->in_service_queue) ||
>  	     !bfq_better_to_idle(bfqd->in_service_queue)) &&

So these changes are actually wrong because
bfq_bfqq_update_budg_for_activation() relies on
bfq_bfqq_non_blocking_wait_rq() but bfq_add_bfqq_busy() clears that. And
bfq_add_bfqq_busy() is called between the place where
bfq_bfqq_update_budg_for_activation() was called previously and now so your
patch breaks this logic.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH -next 6/8] block, bfq: remove dead code for updating 'rq_in_driver'
From: Jan Kara @ 2022-05-19 11:06 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, paolo.valente, axboe, tj, linux-block, cgroups,
	linux-kernel, yi.zhang
In-Reply-To: <20220514090522.1669270-7-yukuai3@huawei.com>

On Sat 14-05-22 17:05:20, Yu Kuai wrote:
> Such code are not even compiled since they are inside marco "#if 0".
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Sure. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bfq-iosched.c | 16 ----------------
>  1 file changed, 16 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 1d0141c450c0..e36a16684fb4 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2322,22 +2322,6 @@ static sector_t get_sdist(sector_t last_pos, struct request *rq)
>  	return 0;
>  }
>  
> -#if 0 /* Still not clear if we can do without next two functions */
> -static void bfq_activate_request(struct request_queue *q, struct request *rq)
> -{
> -	struct bfq_data *bfqd = q->elevator->elevator_data;
> -
> -	bfqd->rq_in_driver++;
> -}
> -
> -static void bfq_deactivate_request(struct request_queue *q, struct request *rq)
> -{
> -	struct bfq_data *bfqd = q->elevator->elevator_data;
> -
> -	bfqd->rq_in_driver--;
> -}
> -#endif
> -
>  static void bfq_remove_request(struct request_queue *q,
>  			       struct request *rq)
>  {
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH -next 5/8] block, bfq: cleanup bfq_activate_requeue_entity()
From: Jan Kara @ 2022-05-19 11:06 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, paolo.valente, axboe, tj, linux-block, cgroups,
	linux-kernel, yi.zhang
In-Reply-To: <20220514090522.1669270-6-yukuai3@huawei.com>

On Sat 14-05-22 17:05:19, Yu Kuai wrote:
> Just make the code a litter cleaner by removing the unnecessary
> variable 'sd'.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

OK. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bfq-wf2q.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index 15b97687493a..da189c732376 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -1085,12 +1085,12 @@ static void __bfq_requeue_entity(struct bfq_entity *entity)
>  }
>  
>  static void __bfq_activate_requeue_entity(struct bfq_entity *entity,
> -					  struct bfq_sched_data *sd,
>  					  bool non_blocking_wait_rq)
>  {
>  	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
>  
> -	if (sd->in_service_entity == entity || entity->tree == &st->active)
> +	if (entity->sched_data->in_service_entity == entity ||
> +	    entity->tree == &st->active)
>  		 /*
>  		  * in service or already queued on the active tree,
>  		  * requeue or reposition
> @@ -1122,14 +1122,10 @@ static void bfq_activate_requeue_entity(struct bfq_entity *entity,
>  					bool non_blocking_wait_rq,
>  					bool requeue, bool expiration)
>  {
> -	struct bfq_sched_data *sd;
> -
>  	for_each_entity(entity) {
> -		sd = entity->sched_data;
> -		__bfq_activate_requeue_entity(entity, sd, non_blocking_wait_rq);
> -
> -		if (!bfq_update_next_in_service(sd, entity, expiration) &&
> -		    !requeue)
> +		__bfq_activate_requeue_entity(entity, non_blocking_wait_rq);
> +		if (!bfq_update_next_in_service(entity->sched_data, entity,
> +						expiration) && !requeue)
>  			break;
>  	}
>  }
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH -next 3/8] block, bfq: factor out code to update 'active_entities'
From: Jan Kara @ 2022-05-19 11:04 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, paolo.valente, axboe, tj, linux-block, cgroups,
	linux-kernel, yi.zhang
In-Reply-To: <20220514090522.1669270-4-yukuai3@huawei.com>

On Sat 14-05-22 17:05:17, Yu Kuai wrote:
> Current code is a bit ugly and hard to read.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Yeah, looks a bit better. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bfq-wf2q.c | 61 +++++++++++++++++++++++++-----------------------
>  1 file changed, 32 insertions(+), 29 deletions(-)
> 
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index 2f3fb45a32c3..c58568a4b009 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -230,6 +230,26 @@ static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
>  		bfqq->bfqd->num_groups_with_busy_queues--;
>  }
>  
> +static void bfq_inc_active_entities(struct bfq_entity *entity)
> +{
> +	struct bfq_sched_data *sd = entity->sched_data;
> +	struct bfq_group *bfqg = container_of(sd, struct bfq_group, sched_data);
> +	struct bfq_data *bfqd = (struct bfq_data *)bfqg->bfqd;
> +
> +	if (bfqg != bfqd->root_group)
> +		bfqg->active_entities++;
> +}
> +
> +static void bfq_dec_active_entities(struct bfq_entity *entity)
> +{
> +	struct bfq_sched_data *sd = entity->sched_data;
> +	struct bfq_group *bfqg = container_of(sd, struct bfq_group, sched_data);
> +	struct bfq_data *bfqd = (struct bfq_data *)bfqg->bfqd;
> +
> +	if (bfqg != bfqd->root_group)
> +		bfqg->active_entities--;
> +}
> +
>  #else /* CONFIG_BFQ_GROUP_IOSCHED */
>  
>  static bool bfq_update_parent_budget(struct bfq_entity *next_in_service)
> @@ -250,6 +270,14 @@ static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
>  {
>  }
>  
> +static void bfq_inc_active_entities(struct bfq_entity *entity)
> +{
> +}
> +
> +static void bfq_dec_active_entities(struct bfq_entity *entity)
> +{
> +}
> +
>  #endif /* CONFIG_BFQ_GROUP_IOSCHED */
>  
>  /*
> @@ -476,11 +504,6 @@ static void bfq_active_insert(struct bfq_service_tree *st,
>  {
>  	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
>  	struct rb_node *node = &entity->rb_node;
> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
> -	struct bfq_sched_data *sd = NULL;
> -	struct bfq_group *bfqg = NULL;
> -	struct bfq_data *bfqd = NULL;
> -#endif
>  
>  	bfq_insert(&st->active, entity);
>  
> @@ -491,17 +514,10 @@ static void bfq_active_insert(struct bfq_service_tree *st,
>  
>  	bfq_update_active_tree(node);
>  
> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
> -	sd = entity->sched_data;
> -	bfqg = container_of(sd, struct bfq_group, sched_data);
> -	bfqd = (struct bfq_data *)bfqg->bfqd;
> -#endif
>  	if (bfqq)
>  		list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
> -	if (bfqg != bfqd->root_group)
> -		bfqg->active_entities++;
> -#endif
> +
> +	bfq_inc_active_entities(entity);
>  }
>  
>  /**
> @@ -578,29 +594,16 @@ static void bfq_active_extract(struct bfq_service_tree *st,
>  {
>  	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
>  	struct rb_node *node;
> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
> -	struct bfq_sched_data *sd = NULL;
> -	struct bfq_group *bfqg = NULL;
> -	struct bfq_data *bfqd = NULL;
> -#endif
>  
>  	node = bfq_find_deepest(&entity->rb_node);
>  	bfq_extract(&st->active, entity);
>  
>  	if (node)
>  		bfq_update_active_tree(node);
> -
> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
> -	sd = entity->sched_data;
> -	bfqg = container_of(sd, struct bfq_group, sched_data);
> -	bfqd = (struct bfq_data *)bfqg->bfqd;
> -#endif
>  	if (bfqq)
>  		list_del(&bfqq->bfqq_list);
> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
> -	if (bfqg != bfqd->root_group)
> -		bfqg->active_entities--;
> -#endif
> +
> +	bfq_dec_active_entities(entity);
>  }
>  
>  /**
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH -next 2/8] block, bfq: cleanup __bfq_weights_tree_remove()
From: Jan Kara @ 2022-05-19 11:02 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, paolo.valente, axboe, tj, linux-block, cgroups,
	linux-kernel, yi.zhang
In-Reply-To: <20220514090522.1669270-3-yukuai3@huawei.com>

On Sat 14-05-22 17:05:16, Yu Kuai wrote:
> It's the same with bfq_weights_tree_remove() now.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Sure. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bfq-iosched.c | 13 +------------
>  block/bfq-iosched.h |  1 -
>  block/bfq-wf2q.c    |  2 +-
>  3 files changed, 2 insertions(+), 14 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index bcbe78d71143..1d0141c450c0 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -944,8 +944,7 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq)
>   * See the comments to the function bfq_weights_tree_add() for considerations
>   * about overhead.
>   */
> -void __bfq_weights_tree_remove(struct bfq_data *bfqd,
> -			       struct bfq_queue *bfqq)
> +void bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq)
>  {
>  	struct rb_root_cached *root;
>  	if (!bfqq->weight_counter)
> @@ -964,16 +963,6 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd,
>  	bfq_put_queue(bfqq);
>  }
>  
> -/*
> - * Invoke __bfq_weights_tree_remove on bfqq and decrement the number
> - * of active groups for each queue's inactive parent entity.
> - */
> -void bfq_weights_tree_remove(struct bfq_data *bfqd,
> -			     struct bfq_queue *bfqq)
> -{
> -	__bfq_weights_tree_remove(bfqd, bfqq);
> -}
> -
>  /*
>   * Return expired entry, or NULL to just start from scratch in rbtree.
>   */
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index 0a3415abb994..bc54b9824b1e 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -933,7 +933,6 @@ void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync);
>  struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
>  void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
>  void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq);
> -void __bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq);
>  void bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq);
>  void bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>  		     bool compensate, enum bfqq_expiration reason);
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index ccd227fed1c3..2f3fb45a32c3 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -790,7 +790,7 @@ __bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
>  		 * there is a counter associated with the entity).
>  		 */
>  		if (prev_weight != new_weight && bfqq)
> -			__bfq_weights_tree_remove(bfqd, bfqq);
> +			bfq_weights_tree_remove(bfqd, bfqq);
>  		entity->weight = new_weight;
>  		/*
>  		 * Add the entity, if it is not a weight-raised queue,
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH -next 4/8] block, bfq: don't declare 'bfqd' as type 'void *' in bfq_group
From: Jan Kara @ 2022-05-19 11:01 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, paolo.valente, axboe, tj, linux-block, cgroups,
	linux-kernel, yi.zhang
In-Reply-To: <20220514090522.1669270-5-yukuai3@huawei.com>

On Sat 14-05-22 17:05:18, Yu Kuai wrote:
> Prevent unnecessary format conversion for bfqg->bfqd in multiple
> places.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Yeah, this was annoying me as well :) Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bfq-cgroup.c  | 2 +-
>  block/bfq-iosched.h | 2 +-
>  block/bfq-wf2q.c    | 8 +++-----
>  3 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 4d516879d9fa..b4e39ab4ad17 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -224,7 +224,7 @@ void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
>  {
>  	blkg_rwstat_add(&bfqg->stats.queued, op, 1);
>  	bfqg_stats_end_empty_time(&bfqg->stats);
> -	if (!(bfqq == ((struct bfq_data *)bfqg->bfqd)->in_service_queue))
> +	if (!(bfqq == bfqg->bfqd->in_service_queue))
>  		bfqg_stats_set_start_group_wait_time(bfqg, bfqq_group(bfqq));
>  }
>  
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index bc54b9824b1e..d57e4848f57f 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -894,7 +894,7 @@ struct bfq_group {
>  	struct bfq_entity entity;
>  	struct bfq_sched_data sched_data;
>  
> -	void *bfqd;
> +	struct bfq_data *bfqd;
>  
>  	struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
>  	struct bfq_queue *async_idle_bfqq;
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index c58568a4b009..15b97687493a 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -234,9 +234,8 @@ static void bfq_inc_active_entities(struct bfq_entity *entity)
>  {
>  	struct bfq_sched_data *sd = entity->sched_data;
>  	struct bfq_group *bfqg = container_of(sd, struct bfq_group, sched_data);
> -	struct bfq_data *bfqd = (struct bfq_data *)bfqg->bfqd;
>  
> -	if (bfqg != bfqd->root_group)
> +	if (bfqg != bfqg->bfqd->root_group)
>  		bfqg->active_entities++;
>  }
>  
> @@ -244,9 +243,8 @@ static void bfq_dec_active_entities(struct bfq_entity *entity)
>  {
>  	struct bfq_sched_data *sd = entity->sched_data;
>  	struct bfq_group *bfqg = container_of(sd, struct bfq_group, sched_data);
> -	struct bfq_data *bfqd = (struct bfq_data *)bfqg->bfqd;
>  
> -	if (bfqg != bfqd->root_group)
> +	if (bfqg != bfqg->bfqd->root_group)
>  		bfqg->active_entities--;
>  }
>  
> @@ -741,7 +739,7 @@ __bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
>  		else {
>  			sd = entity->my_sched_data;
>  			bfqg = container_of(sd, struct bfq_group, sched_data);
> -			bfqd = (struct bfq_data *)bfqg->bfqd;
> +			bfqd = bfqg->bfqd;
>  		}
>  #endif
>  
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH -next 1/8] block, bfq: cleanup bfq_weights_tree add/remove apis
From: Jan Kara @ 2022-05-19 10:59 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, paolo.valente, axboe, tj, linux-block, cgroups,
	linux-kernel, yi.zhang
In-Reply-To: <20220514090522.1669270-2-yukuai3@huawei.com>

On Sat 14-05-22 17:05:15, Yu Kuai wrote:
> They already pass 'bfqd' as the first parameter, there is no need to
> pass 'bfqd->queue_weights_tree' as another parameter.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Just one nit below:

> @@ -945,12 +945,13 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>   * about overhead.
>   */
>  void __bfq_weights_tree_remove(struct bfq_data *bfqd,
> -			       struct bfq_queue *bfqq,
> -			       struct rb_root_cached *root)
> +			       struct bfq_queue *bfqq)
>  {
> +	struct rb_root_cached *root;

Add empty line here please.

>  	if (!bfqq->weight_counter)
>  		return;
>  
> +	root = &bfqd->queue_weights_tree;
>  	bfqq->weight_counter->num_active--;
>  	if (bfqq->weight_counter->num_active > 0)
>  		goto reset_entity_pointer;

Otherwise the patch looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* [PATCH 1/4] bfq: Relax waker detection for shared queues
From: Jan Kara @ 2022-05-19 10:52 UTC (permalink / raw)
  To: Paolo Valente; +Cc: linux-block, Jens Axboe, Jan Kara
In-Reply-To: <20220519104621.15456-1-jack@suse.cz>

Currently we look for waker only if current queue has no requests. This
makes sense for bfq queues with a single process however for shared
queues when there is a larger number of processes the condition that
queue has no requests is difficult to meet because often at least one
process has some request in flight although all the others are waiting
for the waker to do the work and this harms throughput. Relax the "no
queued request for bfq queue" condition to "the current task has no
queued requests yet". For this, we also need to start tracking number of
requests in flight for each task.

This patch (together with the following one) restores the performance
for dbench with 128 clients that regressed with commit c65e6fd460b4
("bfq: Do not let waker requests skip proper accounting") because
this commit makes requests of wakers properly enter BFQ queues and thus
these queues become ineligible for the old waker detection logic.
Dbench results:

         Vanilla 5.18-rc3        5.18-rc3 + revert      5.18-rc3 patched
Mean     1237.36 (   0.00%)      950.16 *  23.21%*      988.35 *  20.12%*

Numbers are time to complete workload so lower is better.

Fixes: c65e6fd460b4 ("bfq: Do not let waker requests skip proper accounting")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 5 +++--
 block/bfq-iosched.h | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 2e0dd68a3cbe..397f6be3c8fe 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2127,7 +2127,6 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 	if (!bfqd->last_completed_rq_bfqq ||
 	    bfqd->last_completed_rq_bfqq == bfqq ||
 	    bfq_bfqq_has_short_ttime(bfqq) ||
-	    bfqq->dispatched > 0 ||
 	    now_ns - bfqd->last_completion >= 4 * NSEC_PER_MSEC ||
 	    bfqd->last_completed_rq_bfqq == bfqq->waker_bfqq)
 		return;
@@ -2204,7 +2203,7 @@ static void bfq_add_request(struct request *rq)
 	bfqq->queued[rq_is_sync(rq)]++;
 	bfqd->queued++;
 
-	if (RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_bfqq_sync(bfqq)) {
+	if (bfq_bfqq_sync(bfqq) && RQ_BIC(rq)->requests <= 1) {
 		bfq_check_waker(bfqd, bfqq, now_ns);
 
 		/*
@@ -6557,6 +6556,7 @@ static void bfq_finish_requeue_request(struct request *rq)
 		bfq_completed_request(bfqq, bfqd);
 	}
 	bfq_finish_requeue_request_body(bfqq);
+	RQ_BIC(rq)->requests--;
 	spin_unlock_irqrestore(&bfqd->lock, flags);
 
 	/*
@@ -6790,6 +6790,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
 
 	bfqq_request_allocated(bfqq);
 	bfqq->ref++;
+	bic->requests++;
 	bfq_log_bfqq(bfqd, bfqq, "get_request %p: bfqq %p, %d",
 		     rq, bfqq, bfqq->ref);
 
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 3b83e3d1c2e5..25fada961bc9 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -468,6 +468,7 @@ struct bfq_io_cq {
 	struct bfq_queue *stable_merge_bfqq;
 
 	bool stably_merged;	/* non splittable if true */
+	unsigned int requests;	/* Number of requests this process has in flight */
 };
 
 /**
-- 
2.35.3


^ permalink raw reply related

* [PATCH 4/4] bfq: Remove bfq_requeue_request_body()
From: Jan Kara @ 2022-05-19 10:52 UTC (permalink / raw)
  To: Paolo Valente; +Cc: linux-block, Jens Axboe, Jan Kara
In-Reply-To: <20220519104621.15456-1-jack@suse.cz>

The function has only a single caller and two lines. Just remove it
since it is pointless and just harming readability.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 904fe56495ce..a1127dfe647e 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6352,12 +6352,6 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
 		bfq_schedule_dispatch(bfqd);
 }
 
-static void bfq_finish_requeue_request_body(struct bfq_queue *bfqq)
-{
-	bfqq_request_freed(bfqq);
-	bfq_put_queue(bfqq);
-}
-
 /*
  * The processes associated with bfqq may happen to generate their
  * cumulative I/O at a lower rate than the rate at which the device
@@ -6554,7 +6548,8 @@ static void bfq_finish_requeue_request(struct request *rq)
 
 		bfq_completed_request(bfqq, bfqd);
 	}
-	bfq_finish_requeue_request_body(bfqq);
+	bfqq_request_freed(bfqq);
+	bfq_put_queue(bfqq);
 	RQ_BIC(rq)->requests--;
 	spin_unlock_irqrestore(&bfqd->lock, flags);
 
-- 
2.35.3


^ permalink raw reply related

* [PATCH 2/4] bfq: Allow current waker to defend against a tentative one
From: Jan Kara @ 2022-05-19 10:52 UTC (permalink / raw)
  To: Paolo Valente; +Cc: linux-block, Jens Axboe, Jan Kara
In-Reply-To: <20220519104621.15456-1-jack@suse.cz>

The code in bfq_check_waker() ignores wake up events from the current
waker. This makes it more likely we select a new tentative waker
although the current one is generating more wake up events. Treat
current waker the same way as any other process and allow it to reset
the waker detection logic.

Fixes: 71217df39dc6 ("block, bfq: make waker-queue detection more robust")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 397f6be3c8fe..1cc242c90fb6 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2127,8 +2127,7 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 	if (!bfqd->last_completed_rq_bfqq ||
 	    bfqd->last_completed_rq_bfqq == bfqq ||
 	    bfq_bfqq_has_short_ttime(bfqq) ||
-	    now_ns - bfqd->last_completion >= 4 * NSEC_PER_MSEC ||
-	    bfqd->last_completed_rq_bfqq == bfqq->waker_bfqq)
+	    now_ns - bfqd->last_completion >= 4 * NSEC_PER_MSEC)
 		return;
 
 	/*
-- 
2.35.3


^ permalink raw reply related

* [PATCH 3/4] bfq: Remove superfluous conversion from RQ_BIC()
From: Jan Kara @ 2022-05-19 10:52 UTC (permalink / raw)
  To: Paolo Valente; +Cc: linux-block, Jens Axboe, Jan Kara
In-Reply-To: <20220519104621.15456-1-jack@suse.cz>

We store struct bfq_io_cq pointer in rq->elv.priv[0] in bfq_init_rq().
Thus a call to icq_to_bic() in RQ_BIC() is wrong. Luckily it does no
harm currently because struct io_iq is the first one in struct
bfq_io_cq.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 1cc242c90fb6..904fe56495ce 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -374,7 +374,7 @@ static const unsigned long bfq_activation_stable_merging = 600;
  */
 static const unsigned long bfq_late_stable_merging = 600;
 
-#define RQ_BIC(rq)		icq_to_bic((rq)->elv.priv[0])
+#define RQ_BIC(rq)		((struct bfq_io_cq *)((rq)->elv.priv[0]))
 #define RQ_BFQQ(rq)		((rq)->elv.priv[1])
 
 struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
-- 
2.35.3


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox