From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marek Vasut <marex@denx.de>
Date: Mon, 21 Mar 2016 17:49:27 +0100
Subject: [U-Boot] [RFC V2 PATCH 0/3] Add cache for block devices
In-Reply-To: <56EFFBAD.4070700@cox.net>
References: <1458524727-4643-1-git-send-email-eric@nelint.com>
 <56EF559D.3040608@denx.de> <56EFFBAD.4070700@cox.net>
Message-ID: <56F02617.6010700@denx.de>
List-Id: <u-boot.lists.denx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: u-boot@lists.denx.de

On 03/21/2016 02:48 PM, Eric Nelson wrote:
> Hi Marek,
> 
> On 03/20/2016 06:59 PM, Marek Vasut wrote:
>> On 03/21/2016 02:45 AM, Eric Nelson wrote:
>>> Here's a more full-featured implementation of a cache for block
>>> devices that uses a small linked list of cache blocks.
>>
>> Why do you use linked list ? You have four entries, you can as well
>> use fixed array. Maybe you should implement an adaptive cache would
>> would use the unpopulated malloc area and hash the sector number(s)
>> into that area ?
>>
> 
> I was looking for a simple implementation that would allow tweaking of
> the max entries/size per entry.
> 
> We could get higher performance through hashing, but with such a
> small cache, it's probably not worth extra code.

The hashing function can be a simple modulo on sector number ;-) That'd
be less code than linked lists.

> Using an array and re-allocating on changes to the max entries variable
> is feasible, but I think it would be slightly more code.

That would indeed be more code.

>>> Experimentation loading a 4.5 MiB kernel from the root directory of
>>>  a FAT filesystem shows that a single cache entry of a single
>>> block is the only
>>
>> only ... what ? This is where things started to be interesting, but
>> you leave us hanging :)
>>
> 
> Oops.
> 
> ... I was planning on re-wording that.
> 
> My testing showed no gain in performance (additional cache hits) past a
> single entry of a single block. This was done on a small (32MiB)
> partition with a small number of files (~10) and only a single
> read is skipped.

I'd kinda expect that indeed.

> => blkc c ; blkc i ; blkc 0 0 ;
> changed to max of 0 entries of 0 blocks each
> => load mmc 0 10008000 /zImage
> reading /zImage
> 4955304 bytes read in 247 ms (19.1 MiB/s)
> => blkc
> block cache:
> 0	hits
> 7	misses
> 0	entries in cache
> trace off
> max blocks/entry 0
> max entries 0
> => blkc c ; blkc i ; blkc 1 1 ;
> changed to max of 1 entries of 1 blocks each
> => load mmc 0 10008000 /zImage
> reading /zImage
> 4955304 bytes read in 243 ms (19.4 MiB/s)
> => blkc
> block cache:
> 1	hits
> 6	misses
> 1	entries in cache
> trace off
> max blocks/entry 1
> max entries 1
> 
> I don't believe that enabling the cache is worth the extra code
> for this use case.
> 
> By comparison, a load of 150 MiB compressed disk image from
> ext4 showed a 30x speedup with the V1 patch (single block,
> single entry) from ~150s to 5s.
> 
> Without some form of cache, the 150s was long enough to make
> a user (me) think something is broken.

I'm obviously loving this improvement.

Best regards,
Marek Vasut