From: NeilBrown <neilb@suse.de>
To: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>,
Nikolai Grigoriev <ngrigoriev@gmail.com>,
linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-raid@vger.kernel.org, linux-mm@kvack.org,
Jens Axboe <axboe@kernel.dk>
Subject: Re: ext4 vs btrfs performance on SSD array
Date: Wed, 3 Sep 2014 10:01:58 +1000 [thread overview]
Message-ID: <20140903100158.34916d34@notabene.brown> (raw)
In-Reply-To: <20140902012222.GA21405@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 1763 bytes --]
On Mon, 1 Sep 2014 18:22:22 -0700 Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Sep 02, 2014 at 10:08:22AM +1000, Dave Chinner wrote:
> > Pretty obvious difference: avgrq-sz. btrfs is doing 512k IOs, ext4
> > and XFS are doing is doing 128k IOs because that's the default block
> > device readahead size. 'blockdev --setra 1024 /dev/sdd' before
> > mounting the filesystem will probably fix it.
>
> Btw, it's really getting time to make Linux storage fs work out the
> box. There's way to many things that are stupid by default and we
> require everyone to fix up manually:
>
> - the ridiculously low max_sectors default
> - the very small max readahead size
> - replacing cfq with deadline (or noop)
> - the too small RAID5 stripe cache size
>
> and probably a few I forgot about. It's time to make things perform
> well out of the box..
Do we still need maximums at all?
There was a time when the queue limit in the block device (or bdi) was an
important part of the write throttle strategy. Without a queue limit, all of
memory could be consumed by memory in write-back, all queued for some device.
This wasn't healthy.
But since then the write throttling has been completely re-written. I'm not
certain (and should check) but I suspect it doesn't depend on submit_bio
blocking when the queue is full any more.
So can we just remove the limit on max_sectors and the RAID5 stripe cache
size? I'm certainly keen to remove the later and just use a mempool if the
limit isn't needed.
I have seen reports that a very large raid5 stripe cache size can cause
a reduction in performance. I don't know why but I suspect it is a bug that
should be found and fixed.
Do we need max_sectors ??
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2014-09-03 0:01 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAEp=YLgzsLbmEfGB5YKVcHP4CQ-_z1yxnZ0tpo7gjKZ2e1ma5g@mail.gmail.com>
[not found] ` <20140902000822.GA20473@dastard>
2014-09-02 1:22 ` ext4 vs btrfs performance on SSD array Christoph Hellwig
2014-09-02 10:39 ` Zack Coffey
2014-09-02 11:31 ` Theodore Ts'o
2014-09-02 14:20 ` Jan Kara
2014-09-02 14:55 ` Theodore Ts'o
2014-09-02 12:55 ` Zack Coffey
2014-09-02 13:40 ` Austin S Hemmelgarn
2014-09-03 0:01 ` NeilBrown [this message]
2014-09-05 16:08 ` Christoph Hellwig
2014-09-05 16:40 ` Jeff Moyer
2014-09-05 16:50 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140903100158.34916d34@notabene.brown \
--to=neilb@suse.de \
--cc=axboe@kernel.dk \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-raid@vger.kernel.org \
--cc=ngrigoriev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).