From: Matthias Ferdinand <bcache@mfedv.net>
To: Adriano Silva <adriano_da_silva@yahoo.com.br>
Cc: Bcache Linux <linux-bcache@vger.kernel.org>
Subject: Re: Bcache in writes direct with fsync. Are IOPS limited?
Date: Wed, 11 May 2022 23:21:43 +0200 [thread overview]
Message-ID: <Ynwo5yUIaP6irAHW@xoff> (raw)
In-Reply-To: <935414163.1122596.1652273928576@mail.yahoo.com>
On Wed, May 11, 2022 at 12:58:48PM +0000, Adriano Silva wrote:
> Tank you for your answer!
>
> > bcache needs to do a lot of metadata work, resulting in a noticeable
> > write amplification. My testing with bcache (some years ago and only with
> > SATA SSDs) showed that bcache latency increases a lot with high amounts
> > of dirty data
>
> I'm testing with empty devices, no data.
>
> Wouldn't write amplification be noticeable in dstat? Because it doesn't seem significant during the tests, since I monitor reads and writes in all disks in dstat.
yes, you are right, that would be visible. I was misled from ~3k writes
to nvme (vs. ~1.5k writes from fio), but the same ~3k writes are on
bcache.
> > I also found performance to increase slightly when a bcache device
> > was created with 4k block size instead of default 512bytes.
>
> Are you talking about changing the block size for the cache device or the backing device?
neither - it was the "-w" argument to make-bcache. I found some old
logfile from my tests. Where both hdd and ssd showed as
512b-sector-devices, the command to create the bcache device was
make-bcache --data_offset 2048 --wipe-bcache -w 4k -C /dev/sde1 -B /dev/sdb
In /sys/block/bcacheX/queue/hw_sector_size it then says "4096".
> But when I remove the fsync flag in the test with fio, which tells the application to wait for the write response, the 4K write happens much faster, reaching 73.6 MB/s and 17k IOPS. This is half the device's performance, but it's more than enough for my case. The fsync flag makes no significant difference to the performance of my flash disk when testing directly on it. The fact that bcache speeds up when the fsync flag is removed makes me believe that bcache is not slow to write, but for some reason, bcache is taking a while to respond that the write is complete. I think that should be the point!
I can't claim to fully understand what fsync does (or how a block
device driver is supposed to handle it), but this might account for the
roughly doubled writes shown with dstat as opposed to the fio results.
From the name "journal-test" I guess you are trying something like
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
He uses very similar parameters, except with "--sync=1", not
"--fsync=1".
This is a proper benchmark for the old ceph filestore journal, as this
was written linearly, and in the worst case could have been written in
chunks as small as 4k.
As you are using proxmox, I guess you want to use its ceph component.
They use the modern ceph bluestore format, and there is no journal
anymore. I don't know if the bluestore WAL exhibits similar access
patterns as the old journal and if this benchmark still has real-world
relevance. But when having enough NVMe disk space, you are advised to
put bluestore WAL and ideally also the bluestore DB directly on NVMe,
and use bcache only for the bluestore data part. If you do so, make sure
to set rotational=1 on the bcache device before creating the OSD, or
ceph will use unsuitable bluestore parameters, possibly overwhelming the
hdd:
https://www.spinics.net/lists/ceph-users/msg71646.html
Matthias
next prev parent reply other threads:[~2022-05-11 21:21 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <958894243.922478.1652201375900.ref@mail.yahoo.com>
2022-05-10 16:49 ` Bcache in writes direct with fsync. Are IOPS limited? Adriano Silva
2022-05-11 6:20 ` Matthias Ferdinand
2022-05-11 12:58 ` Adriano Silva
2022-05-11 21:21 ` Matthias Ferdinand [this message]
2022-05-18 1:22 ` Eric Wheeler
2022-05-23 14:07 ` Coly Li
2022-05-26 19:15 ` Eric Wheeler
2022-05-27 17:28 ` colyli
2022-05-28 0:58 ` Eric Wheeler
2022-05-23 18:36 ` [RFC] Add sysctl option to drop disk flushes in bcache? (was: Bcache in writes direct with fsync) Eric Wheeler
2022-05-24 5:34 ` Christoph Hellwig
2022-05-24 20:14 ` Eric Wheeler
2022-05-24 20:34 ` Keith Busch
2022-05-24 21:34 ` Eric Wheeler
2022-05-25 5:20 ` Christoph Hellwig
2022-05-25 18:44 ` Eric Wheeler
2022-05-26 9:06 ` Christoph Hellwig
2022-05-28 1:52 ` Eric Wheeler
2022-05-28 3:57 ` Keith Busch
2022-05-28 4:59 ` Christoph Hellwig
2022-05-28 12:57 ` Adriano Silva
2022-05-29 3:18 ` Keith Busch
2022-05-31 19:42 ` Eric Wheeler
2022-05-31 20:22 ` Keith Busch
2022-05-31 23:04 ` Eric Wheeler
2022-06-01 0:36 ` Keith Busch
2022-06-01 18:48 ` Eric Wheeler
[not found] ` <2064546094.2440522.1653825057164@mail.yahoo.com>
[not found] ` <YpTKfHHWz27Qugi+@kbusch-mbp.dhcp.thefacebook.com>
2022-06-01 19:27 ` Adriano Silva
2022-06-01 21:11 ` Eric Wheeler
2022-06-02 5:26 ` Christoph Hellwig
2022-05-25 5:17 ` Christoph Hellwig
[not found] ` <681726005.1812841.1653564986700@mail.yahoo.com>
2022-05-26 20:20 ` Bcache in writes direct with fsync. Are IOPS limited? Adriano Silva
2022-05-26 20:28 ` Eric Wheeler
2022-05-27 4:07 ` Adriano Silva
2022-05-28 1:27 ` Eric Wheeler
2022-05-28 7:22 ` Matthias Ferdinand
2022-05-28 12:09 ` Adriano Silva
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ynwo5yUIaP6irAHW@xoff \
--to=bcache@mfedv.net \
--cc=adriano_da_silva@yahoo.com.br \
--cc=linux-bcache@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox