All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@fb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
	<linux-block@vger.kernel.org>
Subject: Re: [PATCHSET v3][RFC] Make background writeback not suck
Date: Fri, 1 Apr 2016 08:33:18 -0600	[thread overview]
Message-ID: <56FE86AE.7070704@fb.com> (raw)
In-Reply-To: <20160401061610.GX11812@dastard>

On 04/01/2016 12:16 AM, Dave Chinner wrote:
> On Thu, Mar 31, 2016 at 09:39:25PM -0600, Jens Axboe wrote:
>> On 03/31/2016 09:29 PM, Jens Axboe wrote:
>>>>> I can't seem to reproduce this at all. On an nvme device, I get a
>>>>> fairly steady 60K/sec file creation rate, and we're nowhere near
>>>>> being IO bound. So the throttling has no effect at all.
>>>>
>>>> That's too slow to show the stalls - your likely concurrency bound
>>>> in allocation by the default AG count (4) from mkfs. Use mkfs.xfs -d
>>>> agcount=32 so that every thread works in it's own AG.
>>>
>>> That's the key, with that I get 300-400K ops/sec instead. I'll run some
>>> testing with this tomorrow and see what I can find, it did one full run
>>> now and I didn't see any issues, but I need to run it at various
>>> settings and see if I can find the issue.
>>
>> No stalls seen, I get the same performance with it disabled and with
>> it enabled, at both default settings, and lower ones
>> (wb_percent=20). Looking at iostat, we don't drive a lot of depth,
>> so it makes sense, even with the throttling we're doing essentially
>> the same amount of IO.
>
> Try appending numa=fake=4 to your guest's kernel command line.
>
> (that's what I'm using)

Sure, I can give that a go.

>> What does 'nr_requests' say for your virtio_blk device? Looks like
>> virtio_blk has a queue_depth setting, but it's not set by default,
>> and then it uses the free entries in the ring. But I don't know what
>> that is...
>
> $ cat /sys/block/vdc/queue/nr_requests
> 128

OK, so that would put you in the 16/32/64 category for idle/normal/high 
priority writeback. Which fits with the iostat below, which is in the 
~16 range.

So the META thing should help, it'll bump it up a bit. But we're also 
seeing smaller requests, and I think that could be because after we do 
throttle, we could potentially have a merge candidate. The code doesn't 
check post-sleeping, it'll allow any merges before though. Though that 
part is a little harder to read from the iostat numbers, but there does 
seem to be a correlation between your higher depths and bigger request 
sizes.

> I'll try the "don't throttle REQ_META" patch, but this seems like a
> fragile way to solve this problem - it shuts up the messenger, but
> doesn't solve the problem for any other subsystem that might have a
> similer issue. e.g. next we're going to have to make sure direct IO
> (which is also REQ_WRITE dispatch) does not get throttled, and so
> on....

I don't think there's anything wrong with the REQ_META patch. Sure, we 
could have better classifications (like discussed below), but that's 
mainly tweaking. As long as we get the same answers, it's fine. There's 
no throttling of O_DIRECT writes in the current code, it specifically 
doesn't include those. It's only for the unbounded writes, which 
writeback tends to be.

> It seems to me that the right thing to do here is add a separate
> classification flag for IO that can be throttled. e.g. as
> REQ_WRITEBACK and only background writeback work sets this flag.
> That would ensure that when the IO is being dispatched from other
> sources (e.g. fsync, sync_file_range(), direct IO, filesystem
> metadata, etc) it is clear that it is not a target for throttling.
> This would also allow us to easily switch off throttling if
> writeback is occurring for memory reclaim reasons, and so on.
> Throttling policy decisions belong above the block layer, even
> though the throttle mechanism itself is in the block layer.

We're already doing all of that, it's just doesn't include a specific 
REQ_WRITEBACK flag. And yeah, that would clean up the checking for 
request type, but functionally it should be the same as it is now. It'll 
be a bit more robust and easier to read if we just have a REQ_WRITEBACK, 
right now it's WRITE_SYNC vs WRITE for important vs not-important, with 
a check for write vs O_DIRECT write as well.


-- 
Jens Axboe


  reply	other threads:[~2016-04-01 14:33 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-30 15:07 [PATCHSET v3][RFC] Make background writeback not suck Jens Axboe
2016-03-30 15:07 ` [PATCH 1/9] writeback: propagate the various reasons for writeback Jens Axboe
2016-03-30 15:07 ` [PATCH 2/9] writeback: add wbc_to_write() Jens Axboe
2016-03-30 15:07 ` [PATCH 3/9] writeback: use WRITE_SYNC for reclaim or sync writeback Jens Axboe
2016-03-30 15:07 ` [PATCH 4/9] writeback: track if we're sleeping on progress in balance_dirty_pages() Jens Axboe
2016-04-13 13:08   ` Jan Kara
2016-04-13 14:20     ` Jens Axboe
2016-03-30 15:07 ` [PATCH 5/9] block: add ability to flag write back caching on a device Jens Axboe
2016-03-30 15:42   ` Christoph Hellwig
2016-03-30 15:46     ` Jens Axboe
2016-03-30 16:23       ` Jens Axboe
2016-03-30 17:29         ` Christoph Hellwig
2016-03-30 15:07 ` [PATCH 6/9] sd: inform block layer of write cache state Jens Axboe
2016-03-30 15:07 ` [PATCH 7/9] NVMe: " Jens Axboe
2016-03-30 15:07 ` [PATCH 8/9] block: add code to track actual device queue depth Jens Axboe
2016-03-30 15:07 ` [PATCH 9/9] writeback: throttle buffered writeback Jens Axboe
2016-03-31  8:24 ` [PATCHSET v3][RFC] Make background writeback not suck Dave Chinner
2016-03-31 14:29   ` Jens Axboe
2016-03-31 16:21     ` Jens Axboe
2016-04-01  0:56       ` Dave Chinner
2016-04-01  3:29         ` Jens Axboe
2016-04-01  3:33           ` Jens Axboe
2016-04-01  3:39           ` Jens Axboe
2016-04-01  6:16             ` Dave Chinner
2016-04-01 14:33               ` Jens Axboe [this message]
2016-04-01  5:04           ` Dave Chinner
2016-04-01  0:46     ` Dave Chinner
2016-04-01  3:25       ` Jens Axboe
2016-04-01  6:27         ` Dave Chinner
2016-04-01 14:34           ` Jens Axboe
2016-03-31 22:09 ` Holger Hoffstätte
2016-04-01  1:01   ` Dave Chinner
2016-04-01  1:01     ` Dave Chinner
2016-04-01 16:58     ` Holger Hoffstätte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FE86AE.7070704@fb.com \
    --to=axboe@fb.com \
    --cc=david@fromorbit.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.