From mboxrd@z Thu Jan  1 00:00:00 1970
From: Eric Nelson <eric@nelint.com>
Date: Mon, 21 Mar 2016 10:56:07 -0700
Subject: [U-Boot] [RFC V2 PATCH 0/3] Add cache for block devices
In-Reply-To: <56F02617.6010700@denx.de>
References: <1458524727-4643-1-git-send-email-eric@nelint.com>
 <56EF559D.3040608@denx.de> <56EFFBAD.4070700@cox.net>
 <56F02617.6010700@denx.de>
Message-ID: <56F035B7.6090408@nelint.com>
List-Id: <u-boot.lists.denx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: u-boot@lists.denx.de

Hi Marek,

On 03/21/2016 09:49 AM, Marek Vasut wrote:
> On 03/21/2016 02:48 PM, Eric Nelson wrote:
>> On 03/20/2016 06:59 PM, Marek Vasut wrote:
>>> On 03/21/2016 02:45 AM, Eric Nelson wrote:
>>>> Here's a more full-featured implementation of a cache for block
>>>> devices that uses a small linked list of cache blocks.
>>>
>>> Why do you use linked list ? You have four entries, you can as well
>>> use fixed array. Maybe you should implement an adaptive cache would
>>> would use the unpopulated malloc area and hash the sector number(s)
>>> into that area ?
>>>
>>
>> I was looking for a simple implementation that would allow tweaking of
>> the max entries/size per entry.
>>
>> We could get higher performance through hashing, but with such a
>> small cache, it's probably not worth extra code.
> 
> The hashing function can be a simple modulo on sector number ;-) That'd
> be less code than linked lists.
> 

I'm not seeing how.

I'm going look first at a better way to integrate than the approach
taken in patch 3.

>> Using an array and re-allocating on changes to the max entries variable
>> is feasible, but I think it would be slightly more code.
> 
> That would indeed be more code.
> 
>>>> Experimentation loading a 4.5 MiB kernel from the root directory of
>>>>  a FAT filesystem shows that a single cache entry of a single
>>>> block is the only
>>>
>>> only ... what ? This is where things started to be interesting, but
>>> you leave us hanging :)
>>>
>>
>> Oops.
>>
>> ... I was planning on re-wording that.
>>
>> My testing showed no gain in performance (additional cache hits) past a
>> single entry of a single block. This was done on a small (32MiB)
>> partition with a small number of files (~10) and only a single
>> read is skipped.
> 
> I'd kinda expect that indeed.
> 

Yeah, and the single-digit-ms improvement isn't worth much.

>> => blkc c ; blkc i ; blkc 0 0 ;
>> changed to max of 0 entries of 0 blocks each
>> => load mmc 0 10008000 /zImage
>> reading /zImage
>> 4955304 bytes read in 247 ms (19.1 MiB/s)
>> => blkc
>> block cache:
>> 0	hits
>> 7	misses
>> 0	entries in cache
>> trace off
>> max blocks/entry 0
>> max entries 0
>> => blkc c ; blkc i ; blkc 1 1 ;
>> changed to max of 1 entries of 1 blocks each
>> => load mmc 0 10008000 /zImage
>> reading /zImage
>> 4955304 bytes read in 243 ms (19.4 MiB/s)
>> => blkc
>> block cache:
>> 1	hits
>> 6	misses
>> 1	entries in cache
>> trace off
>> max blocks/entry 1
>> max entries 1
>>
>> I don't believe that enabling the cache is worth the extra code
>> for this use case.
>>
>> By comparison, a load of 150 MiB compressed disk image from
>> ext4 showed a 30x speedup with the V1 patch (single block,
>> single entry) from ~150s to 5s.
>>
>> Without some form of cache, the 150s was long enough to make
>> a user (me) think something is broken.
> 
> I'm obviously loving this improvement.
> 

Glad to hear it.