[U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned

public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed

From: Marek Vasut <marex@denx.de>
To: u-boot@lists.denx.de
Subject: [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
Date: Tue, 26 Jun 2012 22:39:08 +0200	[thread overview]
Message-ID: <201206262239.08804.marex@denx.de> (raw)
In-Reply-To: <4FEA0C90.7060606@freescale.com>

Dear Scott Wood,

> On 06/25/2012 08:33 PM, Marek Vasut wrote:
> > Dear Scott Wood,
> > 
> >> On 06/25/2012 06:37 PM, Marek Vasut wrote:
> >>> Dear Scott Wood,
> >>> 
> >>>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
> >>>>> but that involves a lot of copying and therefore degrades performance
> >>>>> rapidly. Therefore disallow this possibility of unaligned load
> >>>>> address altogether if data cache is on.
> >>>> 
> >>>> How about use the bounce buffer only if the address is misaligned?
> >>> 
> >>> Not happening, bounce buffer is bullshit,
> >> 
> >> Hacking up the common frontend with a new limitation because you can't
> >> be bothered to fix your drivers is bullshit.
> > 
> > The drivers are not broken, they have hardware limitations.
> 
> They're broken because they ignore those limitations.
> 
> > And checking for
> > those has to be done as early as possible.
> 
> Why?

Why keep an overhead.

> > And it's not a new common frontend!
> 
> No, it's a compatibility-breaking change to the existing common frontend.

Well, those are corner cases, it's not like the people will start hitting it en-
masse.

I agree it should be somehow platform or even CPU specific.

> >>> It's like driving a car in the wrong lane. Sure, you can do it, but
> >>> it'll eventually have some consequences. And using a bounce buffer is
> >>> like driving a tank in the wrong lane ...
> >> 
> >> Using a bounce buffer is like parking your car before going into the
> >> building, rather than insisting the building's hallways be paved.
> > 
> > The other is obviously faster, more comfortable and lets you carry more
> > stuff at once.
> 
> Then you end up needing buildings to be many times as large to give
> every cubicle an adjacent parking spot, maneuvering room, etc.  You'll
> be breathing fumes all day, and it'll be a lot less comfortable to get
> even across the hallway without using a car, etc.  Communication between
> coworkers would be limited to horns and obscene gestures. :-)

Ok, this has gone waaay out of hand here :-)

> > And if you drive a truck, you can dump a lot of payload instead of
> > carrying it back and forth from the building. That's why there's a
> > special garage for trucks possibly with cargo elevators etc.
> 
> Yes, it's called targeted optimization rather than premature optimization.
> 
> >>>> The
> >>>> corrective action a user has to take is the same as with this patch,
> >>>> except for an additional option of living with the slight performance
> >>>> penalty.
> >>> 
> >>> Slight is very weak word here.
> >> 
> >> Prove me wrong with benchmarks.
> > 
> > Well, copying data back and forth is tremendous overhead. You don't need
> > a benchmark to calculate something like this:
> > 
> > 133MHz SDRAM (pumped) gives you what ... 133 Mb/s throughput
> 
> You're saying you get only a little more bandwidth from memory than
> you'd get from a 100 Mb/s ethernet port?  Come on.  Data buses are not
> one bit wide.

Good point, it was too late and I forgot to count that in.

> And how fast can you pull data out of a NAND chip, even with DMA?
> 
> > (now if it's DDR, dual/quad pumped, that doesn't give you any more
> > advantage
> 
> So such things were implemented for fun?
> 
> > since you have to: send address, read the data, send address, write the
> > data ...
> 
> What about bursts?  I'm pretty sure you don't have to send the address
> separately for every single byte.

If you do memcpy? You only have registers, sure, you can optimize it, but on 
intel for example, you don't have many of those.

> > this is expensive ... without data cache on, even more so)
> 
> Why do we care about "without data cache"?  You don't need the bounce
> buffer in that case.

Correct, it's expensive in both cases though.

> > Now consider you do it via really dump memcpy, what happens:
> It looks like ARM U-Boot has an optimized memcpy.
> 
> > 1) You need to read the data into register
> > 1a) Send address
> > 1b) Read the data into register
> > 2) You need to write the data to a new location
> > 2a) Send address
> > 2b) Write the data into the memory
> > 
> > In the meantime, you get some refresh cycles etc. Now, if you take read
> > and write in 1 time unit and "send address" in 0.5 time unit (this gives
> > total 3 time units per one loop) and consider you're not doing sustained
> > read/write, you should see you'll be able to copy at speed of about
> > 133/3 ~= 40Mb/s
> > 
> > If you want to load 3MB kernel at 40Mb/s onto an unaligned address via
> > DMA, the DMA will deploy it via sustained write, that'll be at 10MB/s,
> > therefore in 300ms. But the subsequent copy will take another 600ms.
> 
> On a p5020ds, using NAND hardware that doesn't do DMA at all, I'm able
> to load a 3MiB image from NAND in around 300-400 ms.  This is with using
> memcpy_fromio() on an uncached hardware buffer.

blazing almost half a second ... but everyone these days wants a faster boot, 
without memcpy, it can go down to 100ms or even less! And this kind of 
limitation is not something that'd inconvenience anyone.

> Again, I'm not saying that bounce buffers are always negligible overhead
> -- just that I doubt NAND is fast enough that it makes a huge difference
> in this specific case.

It does make a difference. Correct, thinking about it -- implementing a generic 
bounce buffer for cache-impotent hardware might be a better way to go.

> >>>> How often does this actually happen?  How much does it
> >>>> actually slow things down compared to the speed of the NAND chip?
> >>> 
> >>> If the user is dumb, always. But if you tell the user how to milk the
> >>> most of the hardware, he'll be happier.
> >> 
> >> So, if you use bounce buffers conditionally (based on whether the
> >> address is misaligned), there's no impact except to "dumb" users, and
> >> for those users they would merely get a performance degradation rather
> >> than breakage.  How is this "bullshit"?
> > 
> > Correct, but users will complain if they get a subpar performance.
> 
> If you expend the minimal effort required to make the bounce buffer
> usage conditional on the address actually being misaligned, the only
> users that will see subpar performance are those who would see breakage
> with your approach.  Users will complain if they see breakage even more
> than when the see subpar performance.
> 
> If the issue is educating the user to avoid the performance hit
> (regardless of magnitude), and you care enough, have the driver print a
> warning (not error) message the first time it needs to use a bounce buffer.

Ok, on a second thought, you're right. Let's do it the same way we did it with 
mmc.

> -Scott

Best regards,
Marek Vasut

next prev parent reply	other threads:[~2012-06-26 20:39 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 1/9] COMMON: Add __stringify() function Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro Marek Vasut
2012-06-25 21:12   ` Scott Wood
2012-06-25 23:30     ` Marek Vasut
2012-07-07  3:00       ` Aneesh V
2012-06-25  0:17 ` [U-Boot] [PATCH 3/9] CACHE: ext2load: Test if start address is aligned Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 4/9] CACHE: fatload: " Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 5/9] CACHE: mmc read/write: " Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 6/9] CACHE: nand " Marek Vasut
2012-06-25 16:58   ` Scott Wood
2012-06-25 18:43     ` Tom Rini
2012-06-25 20:08       ` Scott Wood
2012-06-25 20:48         ` Tom Rini
2012-06-25 21:17           ` Scott Wood
2012-06-25 21:22             ` Tom Rini
2012-06-25 23:42             ` Marek Vasut
2012-06-26  0:37               ` Scott Wood
2012-06-26  1:16                 ` Marek Vasut
2012-06-26 19:38                   ` Scott Wood
2012-06-25 23:38       ` Marek Vasut
2012-06-25 23:37     ` Marek Vasut
2012-06-25 23:57       ` Scott Wood
2012-06-26  1:33         ` Marek Vasut
2012-06-26 19:25           ` Scott Wood
2012-06-26 20:39             ` Marek Vasut [this message]
2012-07-07  3:05   ` Aneesh V
2012-06-25  0:17 ` [U-Boot] [PATCH 7/9] CACHE: net: " Marek Vasut
2012-06-25 18:05   ` Joe Hershberger
2012-06-25 23:16     ` Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on Marek Vasut
2012-06-25 18:07   ` Joe Hershberger
2012-06-25 23:16     ` Marek Vasut
2012-07-06 23:09     ` Marek Vasut
2012-07-06 23:16       ` Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 9/9] M28EVK: Enable instruction and data cache Marek Vasut

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201206262239.08804.marex@denx.de \
    --to=marex@denx.de \
    --cc=u-boot@lists.denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox