From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF063C6FD1D for ; Sun, 2 Apr 2023 00:01:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229452AbjDBABK (ORCPT ); Sat, 1 Apr 2023 20:01:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229448AbjDBABK (ORCPT ); Sat, 1 Apr 2023 20:01:10 -0400 Received: from mx.ewheeler.net (mx.ewheeler.net [173.205.220.69]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 954A91880E for ; Sat, 1 Apr 2023 17:01:06 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mx.ewheeler.net (Postfix) with ESMTP id 0ED4085; Sat, 1 Apr 2023 17:01:06 -0700 (PDT) X-Virus-Scanned: amavisd-new at ewheeler.net Received: from mx.ewheeler.net ([127.0.0.1]) by localhost (mx.ewheeler.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Ik8PHsgngJqT; Sat, 1 Apr 2023 17:01:04 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx.ewheeler.net (Postfix) with ESMTPSA id 6EB5345; Sat, 1 Apr 2023 17:01:04 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 mx.ewheeler.net 6EB5345 Date: Sat, 1 Apr 2023 17:01:04 -0700 (PDT) From: Eric Wheeler To: Coly Li cc: Bcache Linux , Martin McClure , Adriano Silva Subject: Re: Writeback cache all used. In-Reply-To: <1121771993.1793905.1680221827127@mail.yahoo.com> Message-ID: References: <1012241948.1268315.1680082721600.ref@mail.yahoo.com> <1012241948.1268315.1680082721600@mail.yahoo.com> <1121771993.1793905.1680221827127@mail.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="8323328-367368634-1680393344=:27286" Precedence: bulk List-ID: X-Mailing-List: linux-bcache@vger.kernel.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323328-367368634-1680393344=:27286 Content-Type: text/plain; CHARSET=ISO-8859-15 Content-Transfer-Encoding: 8BIT On Fri, 31 Mar 2023, Adriano Silva wrote: > Thank you very much! > > > I don't know for sure, but I'd think that since 91% of the cache is > > evictable, writing would just evict some data from the cache (without > > writing to the HDD, since it's not dirty data) and write to that area of > > the cache, *not* to the HDD. It wouldn't make sense in many cases to > > actually remove data from the cache, because then any reads of that data > > would have to read from the HDD; leaving it in the cache has very little > > cost and would speed up any reads of that data. > > Maybe you're right, it seems to be writing to the cache, despite it > indicating that the cache is at 100% full. > > I noticed that it has excellent reading performance, but the writing > performance dropped a lot when the cache was full. It's still a higher > performance than the HDD, but much lower than it is when it's half full > or empty. > > Sequential writing tests with "_fio" now show me 240MB/s of writing, > which was already 900MB/s when the cache was still half full. Write > latency has also increased. IOPS on random 4K writes are now in the 5K > range. It was 16K with half used cache. At random 4K with Ioping, > latency went up. With half cache it was 500us. It is now 945us. > > For reading, nothing has changed. > > However, for systems where writing time is critical, it makes a > significant difference. If possible I would like to always keep it with > a reasonable amount of empty space, to improve writing responses. Reduce > 4K latency, mostly. Even if it were for me to program a script in > crontab or something like that, so that during the night or something > like that the system executes a command for it to clear a percentage of > the cache (about 30% for example) that has been unused for the longest > time . This would possibly make the cache more efficient on writes as > well. That is an intersting idea since it saves latency. Keeping a few unused ready to go would prevent GC during a cached write. Coly, would this be an easy feature to add? Bcache would need a `cache_min_free` tunable that would (asynchronously) free the least recently used buckets that are not dirty. -Eric > > If anyone knows a solution, thanks! > > > > On 3/29/23 02:38, Adriano Silva wrote: > > Hey guys, > > > > I'm using bcache to support Ceph. Ten Cluster nodes have a bcache device each consisting of an HDD block device and an NVMe cache. But I am noticing what I consider to be a problem: My cache is 100% used even though I still have 80% of the space available on my HDD. > > > > It is true that there is more data written than would fit in the cache. However, I imagine that most of them should only be on the HDD and not in the cache, as they are cold data, almost never used. > > > > I noticed that there was a significant drop in performance on the disks (writes) and went to check. Benchmark tests confirmed this. Then I noticed that there was 100% cache full and 85% cache evictable. There was a bit of dirty cache. I found an internet message talking about the garbage collector, so I tried the following: > > > > echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc > > > > That doesn't seem to have helped. > > > > Then I collected the following data: > > > > --- bcache --- > > Device /dev/sdc (8:32) > > UUID 38e81dff-a7c9-449f-9ddd-182128a19b69 > > Block Size 4.00KiB > > Bucket Size 256.00KiB > > Congested? False > > Read Congestion 0.0ms > > Write Congestion 0.0ms > > Total Cache Size 553.31GiB > > Total Cache Used 547.78GiB (99%) > > Total Unused Cache 5.53GiB (1%) > > Dirty Data 0B (0%) > > Evictable Cache 503.52GiB (91%) > > Replacement Policy [lru] fifo random > > Cache Mode writethrough [writeback] writearound none > > Total Hits 33361829 (99%) > > Total Missions 185029 > > Total Bypass Hits 6203 (100%) > > Total Bypass Misses 0 > > Total Bypassed 59.20MiB > > --- Cache Device --- > >     Device /dev/nvme0n1p1 (259:1) > >     Size 553.31GiB > >     Block Size 4.00KiB > >     Bucket Size 256.00KiB > >     Replacement Policy [lru] fifo random > >     Discard? False > >     I/O Errors 0 > >     Metadata Written 395.00GiB > >     Data Written 1.50 TiB > >     Buckets 2266376 > >     Cache Used 547.78GiB (99%) > >     Cache Unused 5.53GiB (0%) > > --- Backing Device --- > >     Device /dev/sdc (8:32) > >     Size 5.46TiB > >     Cache Mode writethrough [writeback] writearound none > >     Readhead > >     Sequential Cutoff 0B > >     Sequential merge? False > >     state clean > >     Writeback? true > >     Dirty Data 0B > >     Total Hits 32903077 (99%) > >     Total Missions 185029 > >     Total Bypass Hits 6203 (100%) > >     Total Bypass Misses 0 > >     Total Bypassed 59.20MiB > > > > The dirty data has disappeared. But the cache remains 99% utilization, down just 1%. Already the evictable cache increased to 91%! > > > > The impression I have is that this harms the write cache. That is, if I need to write again, the data goes straight to the HDD disks, as there is no space available in the Cache. > > > > Shouldn't bcache remove the least used part of the cache? > > I don't know for sure, but I'd think that since 91% of the cache is > evictable, writing would just evict some data from the cache (without > writing to the HDD, since it's not dirty data) and write to that area of > the cache, *not* to the HDD. It wouldn't make sense in many cases to > actually remove data from the cache, because then any reads of that data > would have to read from the HDD; leaving it in the cache has very little > cost and would speed up any reads of that data. > > Regards, > -Martin > > > > > > Does anyone know why this isn't happening? > > > > I may be talking nonsense, but isn't there a way to tell bcache to keep a write-free space rate in the cache automatically? Or even if it was manually by some command that I would trigger at low disk access times? > > > > Thanks! > > --8323328-367368634-1680393344=:27286--