From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Nelson Date: Mon, 21 Mar 2016 10:56:07 -0700 Subject: [U-Boot] [RFC V2 PATCH 0/3] Add cache for block devices In-Reply-To: <56F02617.6010700@denx.de> References: <1458524727-4643-1-git-send-email-eric@nelint.com> <56EF559D.3040608@denx.de> <56EFFBAD.4070700@cox.net> <56F02617.6010700@denx.de> Message-ID: <56F035B7.6090408@nelint.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de Hi Marek, On 03/21/2016 09:49 AM, Marek Vasut wrote: > On 03/21/2016 02:48 PM, Eric Nelson wrote: >> On 03/20/2016 06:59 PM, Marek Vasut wrote: >>> On 03/21/2016 02:45 AM, Eric Nelson wrote: >>>> Here's a more full-featured implementation of a cache for block >>>> devices that uses a small linked list of cache blocks. >>> >>> Why do you use linked list ? You have four entries, you can as well >>> use fixed array. Maybe you should implement an adaptive cache would >>> would use the unpopulated malloc area and hash the sector number(s) >>> into that area ? >>> >> >> I was looking for a simple implementation that would allow tweaking of >> the max entries/size per entry. >> >> We could get higher performance through hashing, but with such a >> small cache, it's probably not worth extra code. > > The hashing function can be a simple modulo on sector number ;-) That'd > be less code than linked lists. > I'm not seeing how. I'm going look first at a better way to integrate than the approach taken in patch 3. >> Using an array and re-allocating on changes to the max entries variable >> is feasible, but I think it would be slightly more code. > > That would indeed be more code. > >>>> Experimentation loading a 4.5 MiB kernel from the root directory of >>>> a FAT filesystem shows that a single cache entry of a single >>>> block is the only >>> >>> only ... what ? This is where things started to be interesting, but >>> you leave us hanging :) >>> >> >> Oops. >> >> ... I was planning on re-wording that. >> >> My testing showed no gain in performance (additional cache hits) past a >> single entry of a single block. This was done on a small (32MiB) >> partition with a small number of files (~10) and only a single >> read is skipped. > > I'd kinda expect that indeed. > Yeah, and the single-digit-ms improvement isn't worth much. >> => blkc c ; blkc i ; blkc 0 0 ; >> changed to max of 0 entries of 0 blocks each >> => load mmc 0 10008000 /zImage >> reading /zImage >> 4955304 bytes read in 247 ms (19.1 MiB/s) >> => blkc >> block cache: >> 0 hits >> 7 misses >> 0 entries in cache >> trace off >> max blocks/entry 0 >> max entries 0 >> => blkc c ; blkc i ; blkc 1 1 ; >> changed to max of 1 entries of 1 blocks each >> => load mmc 0 10008000 /zImage >> reading /zImage >> 4955304 bytes read in 243 ms (19.4 MiB/s) >> => blkc >> block cache: >> 1 hits >> 6 misses >> 1 entries in cache >> trace off >> max blocks/entry 1 >> max entries 1 >> >> I don't believe that enabling the cache is worth the extra code >> for this use case. >> >> By comparison, a load of 150 MiB compressed disk image from >> ext4 showed a 30x speedup with the V1 patch (single block, >> single entry) from ~150s to 5s. >> >> Without some form of cache, the 150s was long enough to make >> a user (me) think something is broken. > > I'm obviously loving this improvement. > Glad to hear it.