linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] Block/blk-wbt: do not let background writes block sync writes
@ 2017-08-26  0:14 Liu Bo
  2017-08-27 17:14 ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Liu Bo @ 2017-08-26  0:14 UTC (permalink / raw)
  To: linux-block

While using blk-wbt, sometimes sync writes are blocked by background
writes, for example,

a) a background write reaches the (background) limit returned by
get_limit(), so it's added into the rqw->wait (non kswapd's rqw in
this case) and goes to sleep.

b) then a sync write gets queued and goes to sleep when finding that
waitqueue_active() returns true and someone else is already on the
waiting list.

Thus, the sync write will get its rq after the background write
getting rq.

With this, only background writes will check waitqueue's status and
sync writes will only be throttled by the (max) limit returned by
get_limit().

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---

- Besides the above problem, it seems waitqueue_active() also requires
  a smp_mb().

 block/blk-wbt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 6a9a0f0..698d9f7 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -519,7 +519,8 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
 	 * If the waitqueue is already active and we are not the next
 	 * in line to be woken up, wait for our turn.
 	 */
-	if (waitqueue_active(&rqw->wait) &&
+	if ((rw & REQ_BACKGROUND) &&
+	    waitqueue_active(&rqw->wait) &&
 	    rqw->wait.head.next != &wait->entry)
 		return false;
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] Block/blk-wbt: do not let background writes block sync writes
  2017-08-26  0:14 [PATCH RFC] Block/blk-wbt: do not let background writes block sync writes Liu Bo
@ 2017-08-27 17:14 ` Jens Axboe
  2017-08-28 17:45   ` Liu Bo
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2017-08-27 17:14 UTC (permalink / raw)
  To: Liu Bo, linux-block

On 08/25/2017 06:14 PM, Liu Bo wrote:
> While using blk-wbt, sometimes sync writes are blocked by background
> writes, for example,
> 
> a) a background write reaches the (background) limit returned by
> get_limit(), so it's added into the rqw->wait (non kswapd's rqw in
> this case) and goes to sleep.
> 
> b) then a sync write gets queued and goes to sleep when finding that
> waitqueue_active() returns true and someone else is already on the
> waiting list.
> 
> Thus, the sync write will get its rq after the background write
> getting rq.
> 
> With this, only background writes will check waitqueue's status and
> sync writes will only be throttled by the (max) limit returned by
> get_limit().

Curious how you ran into this. Is this just speculation, or is it
observed behavior?

I don't think the patch makes sense, conceptually. If you end up in
may_queue(), you are (by definition) in a class of writes that
should be throttled. Skipping the wait line would be unfair.

One modification would be to check the depth, as we assign
different allowances to different types of writes. If the sync
write isn't above it's limit, we should not sleep.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] Block/blk-wbt: do not let background writes block sync writes
  2017-08-27 17:14 ` Jens Axboe
@ 2017-08-28 17:45   ` Liu Bo
  2017-08-28 18:07     ` Liu Bo
  2017-08-28 21:05     ` Jens Axboe
  0 siblings, 2 replies; 5+ messages in thread
From: Liu Bo @ 2017-08-28 17:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Sun, Aug 27, 2017 at 11:14:20AM -0600, Jens Axboe wrote:
> On 08/25/2017 06:14 PM, Liu Bo wrote:
> > While using blk-wbt, sometimes sync writes are blocked by background
> > writes, for example,
> > 
> > a) a background write reaches the (background) limit returned by
> > get_limit(), so it's added into the rqw->wait (non kswapd's rqw in
> > this case) and goes to sleep.
> > 
> > b) then a sync write gets queued and goes to sleep when finding that
> > waitqueue_active() returns true and someone else is already on the
> > waiting list.
> > 
> > Thus, the sync write will get its rq after the background write
> > getting rq.
> > 
> > With this, only background writes will check waitqueue's status and
> > sync writes will only be throttled by the (max) limit returned by
> > get_limit().
> 
> Curious how you ran into this. Is this just speculation, or is it
> observed behavior?
>

I got that from a simple test,

xfs_io -f -c "falloc 0 4K" /mnt/btrfs/foobar
xfs_io -f -c "falloc 0 5G" /mnt/btrfs/dummy
xfs_io -f -c "pwrite -b 128M 0 5G" /mnt/btrfs/dummy &

off=0
while true; do
        date
	let $((off = off + 1))
        xfs_io -s -c "pwrite $off 1"  /mnt/btrfs/foobar > /dev/null
        sleep 1
done

---

So a lot of writeback is happening in background, and a write with
O_SYNC is issued every 1 sec.

My ftrace observation shows that, write with O_SYNC is waiting in
may_queue() because of the running writeback writes.

   kworker/u16:2-137   [003] ....  3104.193996: block_bio_queue: 8,64 W 8751312 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.194003: may_queue.part.12: inflight 23 limit 24 wait ffffc900007cb500
   kworker/u16:2-137   [003] ....  3104.194003: block_getrq: 8,64 W 8751312 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.194004: blk_mq_flush_plug_list <-blk_flush_plug_list
   kworker/u16:2-137   [003] ....  3104.194004: block_unplug: [kworker/u16:2] 1
   kworker/u16:2-137   [003] ....  3104.194005: block_rq_insert: 8,64 W 1048576 () 8749264 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.194006: block_rq_issue: 8,64 W 1048576 () 8749264 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.194017: block_plug: [kworker/u16:2]
   kworker/u16:2-137   [003] d.h.  3104.194454: block_rq_complete: 8,64 W () 8704208 + 2048 [0]
   kworker/u16:2-137   [003] ....  3104.194978: block_bio_queue: 8,64 W 8753360 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.194985: may_queue.part.12: inflight 23 limit 24 wait ffffc900007cb500
   kworker/u16:2-137   [003] ....  3104.194985: block_getrq: 8,64 W 8753360 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.194986: blk_mq_flush_plug_list <-blk_flush_plug_list
   kworker/u16:2-137   [003] ....  3104.194987: block_unplug: [kworker/u16:2] 1
   kworker/u16:2-137   [003] ....  3104.194987: block_rq_insert: 8,64 W 1048576 () 8751312 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.194990: block_rq_issue: 8,64 W 1048576 () 8751312 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.195001: block_plug: [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.195654: block_bio_queue: 8,64 W 8755408 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.195661: may_queue.part.12: inflight 24 limit 24 wait ffffc900007cb500
   kworker/u16:2-137   [003] ....  3104.195662: may_queue.part.12: inflight 24 limit 24 wait ffffc900007cb500
                                                                   ^^^^^^^^^^^^^^^
								   (wb write puts itself on rqw->wait when reaching the limit)
								   
   kworker/u16:2-137   [003] ....  3104.195662: io_schedule <-wbt_wait
   kworker/u16:2-137   [003] ....  3104.195662: blk_mq_flush_plug_list <-blk_flush_plug_list
   kworker/u16:2-137   [003] ....  3104.195663: block_unplug: [kworker/u16:2] 1
   kworker/u16:2-137   [003] ....  3104.195663: block_rq_insert: 8,64 W 1048576 () 8753360 + 2048 [kworker/u16:2]
    kworker/3:1H-162   [003] ....  3104.195680: block_rq_issue: 8,64 W 1048576 () 8753360 + 2048 [kworker/3:1H]
          <idle>-0     [003] d.h.  3104.200533: block_rq_complete: 8,64 W () 8706256 + 2048 [0]
          <idle>-0     [003] d.h.  3104.205473: block_rq_complete: 8,64 W () 8708304 + 2048 [0]
          <idle>-0     [003] d.h.  3104.211451: block_rq_complete: 8,64 W () 8710352 + 2048 [0]
          <idle>-0     [003] d.h.  3104.216163: block_rq_complete: 8,64 W () 8712400 + 2048 [0]
          <idle>-0     [003] d.h.  3104.222072: block_rq_complete: 8,64 W () 8714448 + 2048 [0]
          <idle>-0     [003] d.h.  3104.226931: block_rq_complete: 8,64 W () 8716496 + 2048 [0]
          <idle>-0     [003] d.h.  3104.232915: block_rq_complete: 8,64 W () 8718544 + 2048 [0]
          <idle>-0     [003] d.h.  3104.238851: block_rq_complete: 8,64 W () 8720592 + 2048 [0]
          <idle>-0     [003] d.h.  3104.243593: block_rq_complete: 8,64 W () 8722640 + 2048 [0]
          <idle>-0     [003] d.h.  3104.249667: block_rq_complete: 8,64 W () 8724688 + 2048 [0]
          <idle>-0     [003] d.h.  3104.254248: block_rq_complete: 8,64 W () 8726736 + 2048 [0]
          xfs_io-1045  [007] ....  3104.254347: btrfs_sync_file: 77102a7b-fc17-42cb-afb7-ac57020b6bfb: root=5(FS_TREE) ino=257 parent=256 datasync=0
          xfs_io-1045  [007] ....  3104.254384: cow_file_range.isra.58: ino 257 start 0x0 disk_num_bytes 0x1000
          xfs_io-1045  [007] ....  3104.254402: btrfs_ordered_extent_add: 77102a7b-fc17-42cb-afb7-ac57020b6bfb: root=5(FS_TREE) ino=257 file_offset=0 start=12615680 len=4096 disk_len=4096 truncated_len=18446744073709551615 bytes_left=4096 flags= compress_type=0 refs=1
          xfs_io-1045  [007] ....  3104.254441: block_bio_queue: 8,64 WS 24640 + 8 [xfs_io]
          xfs_io-1045  [007] ....  3104.254446: wbt_wait: someone is waiting ffffc900007cb500
          xfs_io-1045  [007] ....  3104.254448: wbt_wait: someone is waiting ffffc900007cb500
          xfs_io-1045  [007] ....  3104.254449: io_schedule <-wbt_wait
	                                                    ^^^^^^^^^^^^^^^^^^^
							    (write with O_SYNC waits for wb write,
							    while the sync limit is more than 24)

          <idle>-0     [003] d.h.  3104.260393: block_rq_complete: 8,64 W () 8728784 + 2048 [0]
          <idle>-0     [003] d.h.  3104.266310: block_rq_complete: 8,64 W () 8730832 + 2048 [0]
          <idle>-0     [003] d.h.  3104.271152: block_rq_complete: 8,64 W () 8732880 + 2048 [0]
          <idle>-0     [003] d.h.  3104.277141: block_rq_complete: 8,64 W () 8734928 + 2048 [0]
          <idle>-0     [003] d.h.  3104.281908: block_rq_complete: 8,64 W () 8736976 + 2048 [0]
          <idle>-0     [003] d.h.  3104.287744: block_rq_complete: 8,64 W () 8739024 + 2048 [0]
          <idle>-0     [003] d.h.  3104.292584: block_rq_complete: 8,64 W () 8741072 + 2048 [0]
          <idle>-0     [003] d.h.  3104.298541: block_rq_complete: 8,64 W () 8743120 + 2048 [0]
          <idle>-0     [003] d.h.  3104.303266: block_rq_complete: 8,64 W () 8745168 + 2048 [0]
          <idle>-0     [003] d.h.  3104.309273: block_rq_complete: 8,64 W () 8747216 + 2048 [0]
          <idle>-0     [003] d.h.  3104.315312: block_rq_complete: 8,64 W () 8749264 + 2048 [0]
          <idle>-0     [003] d.h.  3104.320060: block_rq_complete: 8,64 W () 8751312 + 2048 [0]
          <idle>-0     [003] d.h.  3104.326148: block_rq_complete: 8,64 W () 8753360 + 2048 [0]
   kworker/u16:2-137   [003] ....  3104.326534: may_queue.part.12: inflight 0 limit 24 wait ffffc900007cb500
   kworker/u16:2-137   [003] ....  3104.326537: block_getrq: 8,64 W 8755408 + 2048 [kworker/u16:2]
   kworker/u16:2-137   [003] ....  3104.326543: block_plug: [kworker/u16:2]
          xfs_io-1045  [007] ....  3104.326546: may_queue.part.12: inflight 1 limit 96 wait ffffc90000ab7890
          xfs_io-1045  [007] ....  3104.326548: block_getrq: 8,64 WS 24640 + 8 [xfs_io]
          xfs_io-1045  [007] ....  3104.326553: block_rq_insert: 8,64 WS 4096 () 24640 + 8 [xfs_io]
	                                        ^^^^^^^^^^^^^^^^^
						(wb write and write with O_SYNC got waked up one by one)


> I don't think the patch makes sense, conceptually. If you end up in
> may_queue(), you are (by definition) in a class of writes that
> should be throttled. Skipping the wait line would be unfair.
>

OK, got it.

> One modification would be to check the depth, as we assign
> different allowances to different types of writes. If the sync
> write isn't above it's limit, we should not sleep.
>

Yeah, that seems better, so it'd be like

if ((rw & REQ_HIPRIO) &&
    atomic_inc_below(&rqw->inflight, get_limit(rwb, rw)))
    return true;

if (waitqueue_active(&rqw->wait) &&
    rqw->wait.head.next != &wait->entry)
	return false;


What about kswapd?  It's also a high priority write, isn't it?

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] Block/blk-wbt: do not let background writes block sync writes
  2017-08-28 17:45   ` Liu Bo
@ 2017-08-28 18:07     ` Liu Bo
  2017-08-28 21:05     ` Jens Axboe
  1 sibling, 0 replies; 5+ messages in thread
From: Liu Bo @ 2017-08-28 18:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Mon, Aug 28, 2017 at 11:45:31AM -0600, Liu Bo wrote:
> On Sun, Aug 27, 2017 at 11:14:20AM -0600, Jens Axboe wrote:
> > On 08/25/2017 06:14 PM, Liu Bo wrote:
> > > While using blk-wbt, sometimes sync writes are blocked by background
> > > writes, for example,
> > > 
> > > a) a background write reaches the (background) limit returned by
> > > get_limit(), so it's added into the rqw->wait (non kswapd's rqw in
> > > this case) and goes to sleep.
> > > 
> > > b) then a sync write gets queued and goes to sleep when finding that
> > > waitqueue_active() returns true and someone else is already on the
> > > waiting list.
> > > 
> > > Thus, the sync write will get its rq after the background write
> > > getting rq.
> > > 
> > > With this, only background writes will check waitqueue's status and
> > > sync writes will only be throttled by the (max) limit returned by
> > > get_limit().
> > 
> > Curious how you ran into this. Is this just speculation, or is it
> > observed behavior?
> >
> 
> I got that from a simple test,
> 
> xfs_io -f -c "falloc 0 4K" /mnt/btrfs/foobar
> xfs_io -f -c "falloc 0 5G" /mnt/btrfs/dummy
> xfs_io -f -c "pwrite -b 128M 0 5G" /mnt/btrfs/dummy &
> 
> off=0
> while true; do
>         date
> 	let $((off = off + 1))
>         xfs_io -s -c "pwrite $off 1"  /mnt/btrfs/foobar > /dev/null
>         sleep 1
> done
> 
> ---
> 
> So a lot of writeback is happening in background, and a write with
> O_SYNC is issued every 1 sec.
> 
> My ftrace observation shows that, write with O_SYNC is waiting in
> may_queue() because of the running writeback writes.
> 
>    kworker/u16:2-137   [003] ....  3104.193996: block_bio_queue: 8,64 W 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194003: may_queue.part.12: inflight 23 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.194003: block_getrq: 8,64 W 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194004: blk_mq_flush_plug_list <-blk_flush_plug_list
>    kworker/u16:2-137   [003] ....  3104.194004: block_unplug: [kworker/u16:2] 1
>    kworker/u16:2-137   [003] ....  3104.194005: block_rq_insert: 8,64 W 1048576 () 8749264 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194006: block_rq_issue: 8,64 W 1048576 () 8749264 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194017: block_plug: [kworker/u16:2]
>    kworker/u16:2-137   [003] d.h.  3104.194454: block_rq_complete: 8,64 W () 8704208 + 2048 [0]
>    kworker/u16:2-137   [003] ....  3104.194978: block_bio_queue: 8,64 W 8753360 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194985: may_queue.part.12: inflight 23 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.194985: block_getrq: 8,64 W 8753360 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194986: blk_mq_flush_plug_list <-blk_flush_plug_list
>    kworker/u16:2-137   [003] ....  3104.194987: block_unplug: [kworker/u16:2] 1
>    kworker/u16:2-137   [003] ....  3104.194987: block_rq_insert: 8,64 W 1048576 () 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194990: block_rq_issue: 8,64 W 1048576 () 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.195001: block_plug: [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.195654: block_bio_queue: 8,64 W 8755408 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.195661: may_queue.part.12: inflight 24 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.195662: may_queue.part.12: inflight 24 limit 24 wait ffffc900007cb500
>                                                                    ^^^^^^^^^^^^^^^
> 								   (wb write puts itself on rqw->wait when reaching the limit)
> 								   
>    kworker/u16:2-137   [003] ....  3104.195662: io_schedule <-wbt_wait
>    kworker/u16:2-137   [003] ....  3104.195662: blk_mq_flush_plug_list <-blk_flush_plug_list
>    kworker/u16:2-137   [003] ....  3104.195663: block_unplug: [kworker/u16:2] 1
>    kworker/u16:2-137   [003] ....  3104.195663: block_rq_insert: 8,64 W 1048576 () 8753360 + 2048 [kworker/u16:2]
>     kworker/3:1H-162   [003] ....  3104.195680: block_rq_issue: 8,64 W 1048576 () 8753360 + 2048 [kworker/3:1H]
>           <idle>-0     [003] d.h.  3104.200533: block_rq_complete: 8,64 W () 8706256 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.205473: block_rq_complete: 8,64 W () 8708304 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.211451: block_rq_complete: 8,64 W () 8710352 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.216163: block_rq_complete: 8,64 W () 8712400 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.222072: block_rq_complete: 8,64 W () 8714448 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.226931: block_rq_complete: 8,64 W () 8716496 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.232915: block_rq_complete: 8,64 W () 8718544 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.238851: block_rq_complete: 8,64 W () 8720592 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.243593: block_rq_complete: 8,64 W () 8722640 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.249667: block_rq_complete: 8,64 W () 8724688 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.254248: block_rq_complete: 8,64 W () 8726736 + 2048 [0]
>           xfs_io-1045  [007] ....  3104.254347: btrfs_sync_file: 77102a7b-fc17-42cb-afb7-ac57020b6bfb: root=5(FS_TREE) ino=257 parent=256 datasync=0
>           xfs_io-1045  [007] ....  3104.254384: cow_file_range.isra.58: ino 257 start 0x0 disk_num_bytes 0x1000
>           xfs_io-1045  [007] ....  3104.254402: btrfs_ordered_extent_add: 77102a7b-fc17-42cb-afb7-ac57020b6bfb: root=5(FS_TREE) ino=257 file_offset=0 start=12615680 len=4096 disk_len=4096 truncated_len=18446744073709551615 bytes_left=4096 flags= compress_type=0 refs=1
>           xfs_io-1045  [007] ....  3104.254441: block_bio_queue: 8,64 WS 24640 + 8 [xfs_io]
>           xfs_io-1045  [007] ....  3104.254446: wbt_wait: someone is waiting ffffc900007cb500
>           xfs_io-1045  [007] ....  3104.254448: wbt_wait: someone is waiting ffffc900007cb500
>           xfs_io-1045  [007] ....  3104.254449: io_schedule <-wbt_wait
> 	                                                    ^^^^^^^^^^^^^^^^^^^
> 							    (write with O_SYNC waits for wb write,
> 							    while the sync limit is more than 24)
> 
>           <idle>-0     [003] d.h.  3104.260393: block_rq_complete: 8,64 W () 8728784 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.266310: block_rq_complete: 8,64 W () 8730832 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.271152: block_rq_complete: 8,64 W () 8732880 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.277141: block_rq_complete: 8,64 W () 8734928 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.281908: block_rq_complete: 8,64 W () 8736976 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.287744: block_rq_complete: 8,64 W () 8739024 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.292584: block_rq_complete: 8,64 W () 8741072 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.298541: block_rq_complete: 8,64 W () 8743120 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.303266: block_rq_complete: 8,64 W () 8745168 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.309273: block_rq_complete: 8,64 W () 8747216 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.315312: block_rq_complete: 8,64 W () 8749264 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.320060: block_rq_complete: 8,64 W () 8751312 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.326148: block_rq_complete: 8,64 W () 8753360 + 2048 [0]
>    kworker/u16:2-137   [003] ....  3104.326534: may_queue.part.12: inflight 0 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.326537: block_getrq: 8,64 W 8755408 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.326543: block_plug: [kworker/u16:2]
>           xfs_io-1045  [007] ....  3104.326546: may_queue.part.12: inflight 1 limit 96 wait ffffc90000ab7890
>           xfs_io-1045  [007] ....  3104.326548: block_getrq: 8,64 WS 24640 + 8 [xfs_io]
>           xfs_io-1045  [007] ....  3104.326553: block_rq_insert: 8,64 WS 4096 () 24640 + 8 [xfs_io]
> 	                                        ^^^^^^^^^^^^^^^^^
> 						(wb write and write with O_SYNC got waked up one by one)
> 
> 
> > I don't think the patch makes sense, conceptually. If you end up in
> > may_queue(), you are (by definition) in a class of writes that
> > should be throttled. Skipping the wait line would be unfair.
> >
> 
> OK, got it.
> 
> > One modification would be to check the depth, as we assign
> > different allowances to different types of writes. If the sync
> > write isn't above it's limit, we should not sleep.
> >
> 
> Yeah, that seems better, so it'd be like
> 
> if ((rw & REQ_HIPRIO) &&
>     atomic_inc_below(&rqw->inflight, get_limit(rwb, rw)))
>     return true;
> 
> if (waitqueue_active(&rqw->wait) &&
>     rqw->wait.head.next != &wait->entry)
> 	return false;
> 
> 
> What about kswapd?  It's also a high priority write, isn't it?

I think it would be fine without checking kswapd as it has a different
%rqw.

I'll send a updated one if the above code looks OK.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] Block/blk-wbt: do not let background writes block sync writes
  2017-08-28 17:45   ` Liu Bo
  2017-08-28 18:07     ` Liu Bo
@ 2017-08-28 21:05     ` Jens Axboe
  1 sibling, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2017-08-28 21:05 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-block

On 08/28/2017 11:45 AM, Liu Bo wrote:
> On Sun, Aug 27, 2017 at 11:14:20AM -0600, Jens Axboe wrote:
>> On 08/25/2017 06:14 PM, Liu Bo wrote:
>>> While using blk-wbt, sometimes sync writes are blocked by background
>>> writes, for example,
>>>
>>> a) a background write reaches the (background) limit returned by
>>> get_limit(), so it's added into the rqw->wait (non kswapd's rqw in
>>> this case) and goes to sleep.
>>>
>>> b) then a sync write gets queued and goes to sleep when finding that
>>> waitqueue_active() returns true and someone else is already on the
>>> waiting list.
>>>
>>> Thus, the sync write will get its rq after the background write
>>> getting rq.
>>>
>>> With this, only background writes will check waitqueue's status and
>>> sync writes will only be throttled by the (max) limit returned by
>>> get_limit().
>>
>> Curious how you ran into this. Is this just speculation, or is it
>> observed behavior?
>>
> 
> I got that from a simple test,
> 
> xfs_io -f -c "falloc 0 4K" /mnt/btrfs/foobar
> xfs_io -f -c "falloc 0 5G" /mnt/btrfs/dummy
> xfs_io -f -c "pwrite -b 128M 0 5G" /mnt/btrfs/dummy &
> 
> off=0
> while true; do
>         date
> 	let $((off = off + 1))
>         xfs_io -s -c "pwrite $off 1"  /mnt/btrfs/foobar > /dev/null
>         sleep 1
> done
> 
> ---
> 
> So a lot of writeback is happening in background, and a write with
> O_SYNC is issued every 1 sec.
> 
> My ftrace observation shows that, write with O_SYNC is waiting in
> may_queue() because of the running writeback writes.
> 
>    kworker/u16:2-137   [003] ....  3104.193996: block_bio_queue: 8,64 W 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194003: may_queue.part.12: inflight 23 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.194003: block_getrq: 8,64 W 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194004: blk_mq_flush_plug_list <-blk_flush_plug_list
>    kworker/u16:2-137   [003] ....  3104.194004: block_unplug: [kworker/u16:2] 1
>    kworker/u16:2-137   [003] ....  3104.194005: block_rq_insert: 8,64 W 1048576 () 8749264 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194006: block_rq_issue: 8,64 W 1048576 () 8749264 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194017: block_plug: [kworker/u16:2]
>    kworker/u16:2-137   [003] d.h.  3104.194454: block_rq_complete: 8,64 W () 8704208 + 2048 [0]
>    kworker/u16:2-137   [003] ....  3104.194978: block_bio_queue: 8,64 W 8753360 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194985: may_queue.part.12: inflight 23 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.194985: block_getrq: 8,64 W 8753360 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194986: blk_mq_flush_plug_list <-blk_flush_plug_list
>    kworker/u16:2-137   [003] ....  3104.194987: block_unplug: [kworker/u16:2] 1
>    kworker/u16:2-137   [003] ....  3104.194987: block_rq_insert: 8,64 W 1048576 () 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.194990: block_rq_issue: 8,64 W 1048576 () 8751312 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.195001: block_plug: [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.195654: block_bio_queue: 8,64 W 8755408 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.195661: may_queue.part.12: inflight 24 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.195662: may_queue.part.12: inflight 24 limit 24 wait ffffc900007cb500
>                                                                    ^^^^^^^^^^^^^^^
> 								   (wb write puts itself on rqw->wait when reaching the limit)
> 								   
>    kworker/u16:2-137   [003] ....  3104.195662: io_schedule <-wbt_wait
>    kworker/u16:2-137   [003] ....  3104.195662: blk_mq_flush_plug_list <-blk_flush_plug_list
>    kworker/u16:2-137   [003] ....  3104.195663: block_unplug: [kworker/u16:2] 1
>    kworker/u16:2-137   [003] ....  3104.195663: block_rq_insert: 8,64 W 1048576 () 8753360 + 2048 [kworker/u16:2]
>     kworker/3:1H-162   [003] ....  3104.195680: block_rq_issue: 8,64 W 1048576 () 8753360 + 2048 [kworker/3:1H]
>           <idle>-0     [003] d.h.  3104.200533: block_rq_complete: 8,64 W () 8706256 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.205473: block_rq_complete: 8,64 W () 8708304 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.211451: block_rq_complete: 8,64 W () 8710352 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.216163: block_rq_complete: 8,64 W () 8712400 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.222072: block_rq_complete: 8,64 W () 8714448 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.226931: block_rq_complete: 8,64 W () 8716496 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.232915: block_rq_complete: 8,64 W () 8718544 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.238851: block_rq_complete: 8,64 W () 8720592 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.243593: block_rq_complete: 8,64 W () 8722640 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.249667: block_rq_complete: 8,64 W () 8724688 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.254248: block_rq_complete: 8,64 W () 8726736 + 2048 [0]
>           xfs_io-1045  [007] ....  3104.254347: btrfs_sync_file: 77102a7b-fc17-42cb-afb7-ac57020b6bfb: root=5(FS_TREE) ino=257 parent=256 datasync=0
>           xfs_io-1045  [007] ....  3104.254384: cow_file_range.isra.58: ino 257 start 0x0 disk_num_bytes 0x1000
>           xfs_io-1045  [007] ....  3104.254402: btrfs_ordered_extent_add: 77102a7b-fc17-42cb-afb7-ac57020b6bfb: root=5(FS_TREE) ino=257 file_offset=0 start=12615680 len=4096 disk_len=4096 truncated_len=18446744073709551615 bytes_left=4096 flags= compress_type=0 refs=1
>           xfs_io-1045  [007] ....  3104.254441: block_bio_queue: 8,64 WS 24640 + 8 [xfs_io]
>           xfs_io-1045  [007] ....  3104.254446: wbt_wait: someone is waiting ffffc900007cb500
>           xfs_io-1045  [007] ....  3104.254448: wbt_wait: someone is waiting ffffc900007cb500
>           xfs_io-1045  [007] ....  3104.254449: io_schedule <-wbt_wait
> 	                                                    ^^^^^^^^^^^^^^^^^^^
> 							    (write with O_SYNC waits for wb write,
> 							    while the sync limit is more than 24)
> 
>           <idle>-0     [003] d.h.  3104.260393: block_rq_complete: 8,64 W () 8728784 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.266310: block_rq_complete: 8,64 W () 8730832 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.271152: block_rq_complete: 8,64 W () 8732880 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.277141: block_rq_complete: 8,64 W () 8734928 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.281908: block_rq_complete: 8,64 W () 8736976 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.287744: block_rq_complete: 8,64 W () 8739024 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.292584: block_rq_complete: 8,64 W () 8741072 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.298541: block_rq_complete: 8,64 W () 8743120 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.303266: block_rq_complete: 8,64 W () 8745168 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.309273: block_rq_complete: 8,64 W () 8747216 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.315312: block_rq_complete: 8,64 W () 8749264 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.320060: block_rq_complete: 8,64 W () 8751312 + 2048 [0]
>           <idle>-0     [003] d.h.  3104.326148: block_rq_complete: 8,64 W () 8753360 + 2048 [0]
>    kworker/u16:2-137   [003] ....  3104.326534: may_queue.part.12: inflight 0 limit 24 wait ffffc900007cb500
>    kworker/u16:2-137   [003] ....  3104.326537: block_getrq: 8,64 W 8755408 + 2048 [kworker/u16:2]
>    kworker/u16:2-137   [003] ....  3104.326543: block_plug: [kworker/u16:2]
>           xfs_io-1045  [007] ....  3104.326546: may_queue.part.12: inflight 1 limit 96 wait ffffc90000ab7890
>           xfs_io-1045  [007] ....  3104.326548: block_getrq: 8,64 WS 24640 + 8 [xfs_io]
>           xfs_io-1045  [007] ....  3104.326553: block_rq_insert: 8,64 WS 4096 () 24640 + 8 [xfs_io]
> 	                                        ^^^^^^^^^^^^^^^^^
> 						(wb write and write with O_SYNC got waked up one by one)
> 
> 
>> I don't think the patch makes sense, conceptually. If you end up in
>> may_queue(), you are (by definition) in a class of writes that
>> should be throttled. Skipping the wait line would be unfair.
>>
> 
> OK, got it.
> 
>> One modification would be to check the depth, as we assign
>> different allowances to different types of writes. If the sync
>> write isn't above it's limit, we should not sleep.
>>
> 
> Yeah, that seems better, so it'd be like
> 
> if ((rw & REQ_HIPRIO) &&
>     atomic_inc_below(&rqw->inflight, get_limit(rwb, rw)))
>     return true;
> 
> if (waitqueue_active(&rqw->wait) &&
>     rqw->wait.head.next != &wait->entry)
> 	return false;

I thought a bit more about this, and one potential issue with doing it
this way, is that we'll potentially starve the background writes
forever. Let's say in the current system the stable limits are 4/8/16
for background/normal/high priority writes. Let's further assume that we
have some normal or sync writers that keep the queue depth at eg 6.
Background writes would be added to the wait queue, and the normal write
would proceed. If the normal write keeps the queue depth over the
background limit, then we'll never do any background writes. That's
probably worse than the current approach, since at least'll we'll ensure
forward progress.

We could split the wait queues for the bg/normal/high priority writes,
and add some heuristics to ensure fairness.

> What about kswapd?  It's also a high priority write, isn't it?

As you discovered in the next email, kswapd has its own wait queue.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-28 21:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-26  0:14 [PATCH RFC] Block/blk-wbt: do not let background writes block sync writes Liu Bo
2017-08-27 17:14 ` Jens Axboe
2017-08-28 17:45   ` Liu Bo
2017-08-28 18:07     ` Liu Bo
2017-08-28 21:05     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).