From: Adriano Silva <adriano_da_silva@yahoo.com.br>
To: Eric Wheeler <bcache@lists.ewheeler.net>, Coly Li <colyli@suse.de>
Cc: Bcache Linux <linux-bcache@vger.kernel.org>,
Martin McClure <martin.mcclure@gemtalksystems.com>
Subject: Re: Writeback cache all used.
Date: Thu, 4 May 2023 14:34:33 +0000 (UTC) [thread overview]
Message-ID: <230809962.2194275.1683210873801@mail.yahoo.com> (raw)
In-Reply-To: <95701AD2-A13A-4E79-AE27-AAEFF6AA87D3@suse.de>
Hi Coly,
If I can help you with anything, please let me know.
Thanks!
Guys, can I take advantage and ask one more question? If you prefer, I'll open another topic, but as it has something to do with the subject discussed here, I'll ask for now right here.
I decided to make (for now) a bash script to change the cache parameters trying a temporary workaround to solve the issue manually in at least one of my clusters.
So: I use in production cache_mode as writeback, writeback_percent to 2 (I think low is safer and faster for a flush, while staying at 10 hasn't shown better performance in my case - i am wrong?). I use discard as false, as it is slow to discard each bucket that is modified (I believe the discard would need to be by large batches of free buckets). I use 0 (zero) in sequence_cutoff because using the bluestore file system (from ceph), it seems to me that using any other value in this variable, bcache understands everything as sequential and bypasses it to the back disk. I also use congested_read_threshold_us and congested_write_threshold_us to 0 (zero) as it seems to give slightly better performance, lower latency. I always use rotational as 1, never change it. They always say that for Ceph it works better, I've been using it ever since. I put these parameters at system startup.
So, I decided at 01:00 that I'm going to run a bash script to change these parameters in order to clear the cache and use it to back up my data from databases and others. So, I change writeback_percent to 0 (zero) for it to clean all the dirt from the cache. Then I keep checking the status until it's "cleared". I then pass the cache_mode to writethrough.
In the sequence I confirm if the cache remains "clean". Being "clean", I change cache_mode to "none" and then comes the following line:
echo $cache_cset > /sys/block/$bcache_device/bcache/cache/unregister
Here ends the script that runs at 01:00 am.
So, then I perform backups of my data, without the reading of this data going through and being all written in my cache. (Am I thinking correctly?)
Users will continue to use the system normally, however the system will be slower because the Ceph OSD will be working on top of the bcache device without having a cache. But a lower performance at that time, for my case, is acceptable at that time.
After the backup is complete, at 05:00 am I run the following sequence:
wipefs -a /dev/nvme0n1p1
sleep 1
blkdiscard /dev/nvme0n1p1
sleep 1
makebcache=$(make-bcache --wipe-bcache -w 4k --bucket 256K -C /dev/$cache_device)
sleep 1 cache_cset=$(bcache-super-show /dev/$cache_device | grep cset | awk '{ print $2 }')
echo $cache_cset > /sys/block/bcache0/bcache/attach
One thing to point out here is the size of the bucket I use (256K) which I defined according to the performance tests I did. While I didn't notice any big performance differences during these tests, I thought 256K was the best performing smallest block I got with my NVMe device, which is an enterprise device (with non-volatile cache), but I don't have information about the size minimum erasure block. I did not find this information about the smallest erase block of this device anywhere. I looked in several ways, the manufacturer didn't inform me, the nvme-cli tool didn't show me either. Would 256 really be a good number to use?
Anyway, after attaching the cache again, I return the parameters to what I have been using in production:
echo writeback > /sys/block/$bcache_device/bcache/cache_mode
echo 1 > /sys/devices/virtual/block/$bcache_device/queue/rotational
echo 1 > /sys/fs/bcache/$cache_cset/internal/gc_after_writeback
echo 1 > /sys/block/$bcache_device/bcache/cache/internal/trigger_gc
echo 2 > /sys/block/$bcache_device/bcache/writeback_percent
echo 0 > /sys/fs/bcache/$cache_cset/cache0/discard
echo 0 > /sys/block/$bcache_device/bcache/sequential_cutoff
echo 0 > /sys/fs/bcache/$cache_cset/congested_read_threshold_us
echo 0 > /sys/fs/bcache/$cache_cset/congested_write_threshold_us
I created the scripts in a test environment and it seems to have worked as expected.
My question: Would it be a correct way to temporarily solve the problem as a palliative? Is it safe to do it this way with a mounted file system, with files in use by users and databases in working order? Are there greater risks involved in putting this into production? Do you see any problems or anything that could be different?
Thanks!
Em quinta-feira, 4 de maio de 2023 às 01:56:23 BRT, Coly Li <colyli@suse.de> escreveu:
> 2023年5月3日 04:34,Eric Wheeler <bcache@lists.ewheeler.net> 写道:
>
> On Thu, 20 Apr 2023, Adriano Silva wrote:
>> I continue to investigate the situation. There is actually a performance
>> gain when the bcache device is only half filled versus full. There is a
>> reduction and greater stability in the latency of direct writes and this
>> improves my scenario.
>
> Hi Coly, have you been able to look at this?
>
> This sounds like a great optimization and Adriano is in a place to test
> this now and report his findings.
>
> I think you said this should be a simple hack to add early reclaim, so
> maybe you can throw a quick patch together (even a rough first-pass with
> hard-coded reclaim values)
>
> If we can get back to Adriano quickly then he can test while he has an
> easy-to-reproduce environment. Indeed, this could benefit all bcache
> users.
My current to-do list on hand is a little bit long. Yes I’d like and plan to do it, but the response time cannot be estimated.
Coly Li
[snipped]
next prev parent reply other threads:[~2023-05-04 14:34 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1012241948.1268315.1680082721600.ref@mail.yahoo.com>
2023-03-29 9:38 ` Writeback cache all used Adriano Silva
2023-03-29 19:18 ` Eric Wheeler
2023-03-30 1:38 ` Adriano Silva
2023-03-30 4:55 ` Martin McClure
2023-03-31 0:17 ` Adriano Silva
2023-04-02 0:01 ` Eric Wheeler
2023-04-03 7:14 ` Coly Li
2023-04-03 19:27 ` Eric Wheeler
2023-04-04 8:19 ` Coly Li
2023-04-04 20:29 ` Adriano Silva
2023-04-05 13:57 ` Coly Li
2023-04-05 19:24 ` Eric Wheeler
2023-04-05 19:31 ` Adriano Silva
2023-04-06 21:21 ` Eric Wheeler
2023-04-07 3:15 ` Adriano Silva
2023-04-09 16:37 ` Coly Li
2023-04-09 20:14 ` Adriano Silva
2023-04-09 21:07 ` Adriano Silva
2023-04-20 11:35 ` Adriano Silva
2023-05-02 20:34 ` Eric Wheeler
2023-05-04 4:56 ` Coly Li
2023-05-04 14:34 ` Adriano Silva [this message]
2023-05-09 0:29 ` Eric Wheeler
2023-05-09 0:42 ` Eric Wheeler
2023-05-09 2:21 ` Adriano Silva
2023-05-11 23:10 ` Eric Wheeler
2023-05-12 5:13 ` Coly Li
2023-05-13 21:05 ` Eric Wheeler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=230809962.2194275.1683210873801@mail.yahoo.com \
--to=adriano_da_silva@yahoo.com.br \
--cc=bcache@lists.ewheeler.net \
--cc=colyli@suse.de \
--cc=linux-bcache@vger.kernel.org \
--cc=martin.mcclure@gemtalksystems.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox