From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from b.ns.miles-group.at ([95.130.255.144] helo=radon.swed.at) by bombadil.infradead.org with esmtps (Exim 4.85_2 #1 (Red Hat Linux)) id 1bP9ZP-0002rM-47 for linux-mtd@lists.infradead.org; Mon, 18 Jul 2016 14:30:56 +0000 Subject: Re: [RFC][PATCH] ubi: Implement a read cache To: Boris Brezillon References: <1468587454-7590-1-git-send-email-richard@nod.at> <20160718162141.1d061791@bbrezillon> Cc: linux-mtd@lists.infradead.org, david@sigma-star.at, computersforpeace@gmail.com, dedekind1@gmail.com From: Richard Weinberger Message-ID: <578CE807.5060204@nod.at> Date: Mon, 18 Jul 2016 16:30:31 +0200 MIME-Version: 1.0 In-Reply-To: <20160718162141.1d061791@bbrezillon> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Boris, Am 18.07.2016 um 16:21 schrieb Boris Brezillon: > Hi Richard, > > On Fri, 15 Jul 2016 14:57:34 +0200 > Richard Weinberger wrote: > >> Implement a simple read cache such that small adjacent reads >> with in the same NAND page can be cached. >> When a read smaller than a NAND page is requested, UBI reads the full >> page and caches it. >> NAND core has already such a cache but it will go away soon. >> >> To be able to benchmark better, there is also a debugfs file which reports >> cache stats, misses, hits and no_query. >> Misses and hits should be clear, no_query is incremented when the cache >> cannot be used, i.e. for reads larger than a page size. >> >> This patch is just a PoC. >> > > Just a few comments on the approach. I'll try to review the code later. > >> Signed-off-by: Richard Weinberger >> --- >> Hi! >> >> This is an initial draft for a read cache implemented in UBI. >> >> The cache sits in ubi_eba_read_leb() and is used when the read length is >> smaller than a NAND page and does not cross a page boundary, this can be >> improved later. > > Hm, we should probably use the cache for aligned accesses and > multi-pages read. > >> Currently the cache consists of 61 cache lines where the PEB acts >> as selector. > > If we go for a multi-pages cache, we should probably use an LRU cache > instead of this 'simple PEB hash + no collision' approach. Remember > that reading a page from NAND is more time consuming than iterating > over a small array of NAND page buffers. Yeah, this implementation more or less moves the cache we have nand_base.c into UBI. >> Maybe less cache lines would also do it, this needs to be benchmarked >> in more detail. > > See my comment below. > >> >> I did already some benchmarks. >> The system I've used has a 4GiB MLC NAND with 16KiB page size. >> Userspace is a Debian 8 with systemd. >> >> The following tasks have been benchmarked so far. >> 1. system bootup (userspace time as reported by systemd-analyze) >> 2. walking /usr, since this involves a lot of TNC lookups this is interesting >> 3. removing a 1GiB file, it involves also a lot of TNC lookups >> >> Baseline >> ======== >> >> 1. bootup time: 18 seconds >> >> 2. time find /usr > /dev/null >> real 0m19.869s >> user 0m0.120s >> sys 0m7.300s >> >> 3. time rm /root/viel >> real 0m32.992s >> user 0m0.000s >> sys 0m12.380s >> >> While the results for 1 and 2 are okay, we observe that removing the file >> took a rather long time. >> The NAND driver on this target supports subpage reads and therefore the NAND >> page cache was never used. >> Since removing a large file leads to a lot of TNC lookups UBIFS issues many >> reads with a length around 200 bytes and the overhead of every single mtd_read() >> kickes in. >> >> Baseline + NAND cache (hence, subpage read support disabled in our driver) >> ========================================================================== >> >> 1. bootup time: 19,5 seconds >> >> 2. time find /usr > /dev/null >> real 0m24.555s >> user 0m0.150s >> sys 0m8.220s >> >> 3. time rm /root/viel >> real 0m5.327s >> user 0m0.000s >> sys 0m3.300s >> >> We observe that the bootup took a little longer, but the jitter here is about one >> second. >> Walking /usr took about 5 seconds longer, not sure why. This needs more inspection. >> >> The most interesting result is that removing the large file took only 5 seconds, >> compared to 33 seconds this is huge speedup. >> Since the znodes of the same file are adjacent data structures on the flash >> the cache on NAND level helped a lot. >> >> UBI read cache with 1 cache line >> ================================ >> >> 1. bootup time: 19,5 seconds > > Hm, I would expect an attach time close to what we have in Baseline > with subpage read. Do you know where the difference comes from? Boottime is pure userspace boot time. I suspect systemd "jitter". Debian 8 starts a lot of stuff in parallel... As soon I'll have time, I'll benchmark ubi attach time. Or when David has time... >> cache usage: >> hits: 1546 >> misses: 6887 >> no_query: 649 >> >> >> 2. time find /usr > /dev/null >> real 0m24.297s >> user 0m0.100s >> sys 0m7.800s >> >> cache usage: >> hits: 4068 >> misses: 17669 >> no_query: 107 >> >> 3. time rm /root/viel >> real 0m5.457s >> user 0m0.000s >> sys 0m2.730s >> >> cache usage: >> hits: 34517 >> misses: 2490 >> no_query: 212 >> >> The results are more or less the same as with caching on the NAND side. >> So, we could drop the NAND cache and have it implemented in UBI. >> As expected the cache brings us most while deleting large files. >> >> UBI read cache with 61 cache lines >> ================================== >> >> 1. bootup time: 17,5 seconds >> hits: 2991 >> misses: 5436 >> no_query: 649 >> >> 2. time find /usr > /dev/null >> real 0m20.557s >> user 0m0.120s >> sys 0m7.080s >> >> hits: 7064 >> misses: 14145 >> no_query: 116 >> >> 3. time rm /root/viel >> real 0m5.244s >> user 0m0.000s >> sys 0m2.840s >> >> hits: 34228 >> misses: 2248 >> no_query: 202 >> >> With more cache lines we can reduce the boot time a bit, we also observe >> that, compared to a single line, the cache hit rate is better. >> Same for walking /usr. >> Removing a large file is also fast. >> So, having more cache lines helps but we need to figure out first how >> much lines make sense. > > That's true, but if we go for an adjustable UBI write-unit cache, I'd > prefer to have an implementation relying the page-cache mechanism so > that the system can adjust the cache size (by reclaiming cached pages) > on-demand instead of hardcoding the number of cache lines at compile > time. > > IMO, we should either implement a basic 'single page cache' mechanism > at the UBI level to replace the one at the NAND level and still benefit > from subpage reads for EC and VID headers, or go for the more complex > page-cache based mechanism. > > Artem, Brian, any opinion? > > BTW, this implementation still assumes that the upper layer will try to > fill NAND pages entirely. This seems to be true for UBIFS, so I'm not > sure we should bother about other cases now, but I'd still like to have > your opinions. Yeah, it does what nand_base.c does and helps fulfilling UBIFS' assumptions. I'm not sure whether a complex LRU cache is worth the performance gain. But we'll see. :-) Thanks, //richard