All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marek Vasut <marex@denx.de>
To: u-boot@lists.denx.de
Subject: [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
Date: Tue, 26 Jun 2012 22:39:08 +0200	[thread overview]
Message-ID: <201206262239.08804.marex@denx.de> (raw)
In-Reply-To: <4FEA0C90.7060606@freescale.com>

Dear Scott Wood,

> On 06/25/2012 08:33 PM, Marek Vasut wrote:
> > Dear Scott Wood,
> > 
> >> On 06/25/2012 06:37 PM, Marek Vasut wrote:
> >>> Dear Scott Wood,
> >>> 
> >>>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
> >>>>> but that involves a lot of copying and therefore degrades performance
> >>>>> rapidly. Therefore disallow this possibility of unaligned load
> >>>>> address altogether if data cache is on.
> >>>> 
> >>>> How about use the bounce buffer only if the address is misaligned?
> >>> 
> >>> Not happening, bounce buffer is bullshit,
> >> 
> >> Hacking up the common frontend with a new limitation because you can't
> >> be bothered to fix your drivers is bullshit.
> > 
> > The drivers are not broken, they have hardware limitations.
> 
> They're broken because they ignore those limitations.
> 
> > And checking for
> > those has to be done as early as possible.
> 
> Why?

Why keep an overhead.

> > And it's not a new common frontend!
> 
> No, it's a compatibility-breaking change to the existing common frontend.

Well, those are corner cases, it's not like the people will start hitting it en-
masse.

I agree it should be somehow platform or even CPU specific.

> >>> It's like driving a car in the wrong lane. Sure, you can do it, but
> >>> it'll eventually have some consequences. And using a bounce buffer is
> >>> like driving a tank in the wrong lane ...
> >> 
> >> Using a bounce buffer is like parking your car before going into the
> >> building, rather than insisting the building's hallways be paved.
> > 
> > The other is obviously faster, more comfortable and lets you carry more
> > stuff at once.
> 
> Then you end up needing buildings to be many times as large to give
> every cubicle an adjacent parking spot, maneuvering room, etc.  You'll
> be breathing fumes all day, and it'll be a lot less comfortable to get
> even across the hallway without using a car, etc.  Communication between
> coworkers would be limited to horns and obscene gestures. :-)

Ok, this has gone waaay out of hand here :-)

> > And if you drive a truck, you can dump a lot of payload instead of
> > carrying it back and forth from the building. That's why there's a
> > special garage for trucks possibly with cargo elevators etc.
> 
> Yes, it's called targeted optimization rather than premature optimization.
> 
> >>>> The
> >>>> corrective action a user has to take is the same as with this patch,
> >>>> except for an additional option of living with the slight performance
> >>>> penalty.
> >>> 
> >>> Slight is very weak word here.
> >> 
> >> Prove me wrong with benchmarks.
> > 
> > Well, copying data back and forth is tremendous overhead. You don't need
> > a benchmark to calculate something like this:
> > 
> > 133MHz SDRAM (pumped) gives you what ... 133 Mb/s throughput
> 
> You're saying you get only a little more bandwidth from memory than
> you'd get from a 100 Mb/s ethernet port?  Come on.  Data buses are not
> one bit wide.

Good point, it was too late and I forgot to count that in.

> And how fast can you pull data out of a NAND chip, even with DMA?
> 
> > (now if it's DDR, dual/quad pumped, that doesn't give you any more
> > advantage
> 
> So such things were implemented for fun?
> 
> > since you have to: send address, read the data, send address, write the
> > data ...
> 
> What about bursts?  I'm pretty sure you don't have to send the address
> separately for every single byte.

If you do memcpy? You only have registers, sure, you can optimize it, but on 
intel for example, you don't have many of those.

> > this is expensive ... without data cache on, even more so)
> 
> Why do we care about "without data cache"?  You don't need the bounce
> buffer in that case.

Correct, it's expensive in both cases though.

> > Now consider you do it via really dump memcpy, what happens:
> It looks like ARM U-Boot has an optimized memcpy.
> 
> > 1) You need to read the data into register
> > 1a) Send address
> > 1b) Read the data into register
> > 2) You need to write the data to a new location
> > 2a) Send address
> > 2b) Write the data into the memory
> > 
> > In the meantime, you get some refresh cycles etc. Now, if you take read
> > and write in 1 time unit and "send address" in 0.5 time unit (this gives
> > total 3 time units per one loop) and consider you're not doing sustained
> > read/write, you should see you'll be able to copy at speed of about
> > 133/3 ~= 40Mb/s
> > 
> > If you want to load 3MB kernel at 40Mb/s onto an unaligned address via
> > DMA, the DMA will deploy it via sustained write, that'll be at 10MB/s,
> > therefore in 300ms. But the subsequent copy will take another 600ms.
> 
> On a p5020ds, using NAND hardware that doesn't do DMA at all, I'm able
> to load a 3MiB image from NAND in around 300-400 ms.  This is with using
> memcpy_fromio() on an uncached hardware buffer.

blazing almost half a second ... but everyone these days wants a faster boot, 
without memcpy, it can go down to 100ms or even less! And this kind of 
limitation is not something that'd inconvenience anyone.

> Again, I'm not saying that bounce buffers are always negligible overhead
> -- just that I doubt NAND is fast enough that it makes a huge difference
> in this specific case.

It does make a difference. Correct, thinking about it -- implementing a generic 
bounce buffer for cache-impotent hardware might be a better way to go.

> >>>> How often does this actually happen?  How much does it
> >>>> actually slow things down compared to the speed of the NAND chip?
> >>> 
> >>> If the user is dumb, always. But if you tell the user how to milk the
> >>> most of the hardware, he'll be happier.
> >> 
> >> So, if you use bounce buffers conditionally (based on whether the
> >> address is misaligned), there's no impact except to "dumb" users, and
> >> for those users they would merely get a performance degradation rather
> >> than breakage.  How is this "bullshit"?
> > 
> > Correct, but users will complain if they get a subpar performance.
> 
> If you expend the minimal effort required to make the bounce buffer
> usage conditional on the address actually being misaligned, the only
> users that will see subpar performance are those who would see breakage
> with your approach.  Users will complain if they see breakage even more
> than when the see subpar performance.
> 
> If the issue is educating the user to avoid the performance hit
> (regardless of magnitude), and you care enough, have the driver print a
> warning (not error) message the first time it needs to use a bounce buffer.

Ok, on a second thought, you're right. Let's do it the same way we did it with 
mmc.

> -Scott

Best regards,
Marek Vasut

  reply	other threads:[~2012-06-26 20:39 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 1/9] COMMON: Add __stringify() function Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro Marek Vasut
2012-06-25 21:12   ` Scott Wood
2012-06-25 23:30     ` Marek Vasut
2012-07-07  3:00       ` Aneesh V
2012-06-25  0:17 ` [U-Boot] [PATCH 3/9] CACHE: ext2load: Test if start address is aligned Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 4/9] CACHE: fatload: " Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 5/9] CACHE: mmc read/write: " Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 6/9] CACHE: nand " Marek Vasut
2012-06-25 16:58   ` Scott Wood
2012-06-25 18:43     ` Tom Rini
2012-06-25 20:08       ` Scott Wood
2012-06-25 20:48         ` Tom Rini
2012-06-25 21:17           ` Scott Wood
2012-06-25 21:22             ` Tom Rini
2012-06-25 23:42             ` Marek Vasut
2012-06-26  0:37               ` Scott Wood
2012-06-26  1:16                 ` Marek Vasut
2012-06-26 19:38                   ` Scott Wood
2012-06-25 23:38       ` Marek Vasut
2012-06-25 23:37     ` Marek Vasut
2012-06-25 23:57       ` Scott Wood
2012-06-26  1:33         ` Marek Vasut
2012-06-26 19:25           ` Scott Wood
2012-06-26 20:39             ` Marek Vasut [this message]
2012-07-07  3:05   ` Aneesh V
2012-06-25  0:17 ` [U-Boot] [PATCH 7/9] CACHE: net: " Marek Vasut
2012-06-25 18:05   ` Joe Hershberger
2012-06-25 23:16     ` Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on Marek Vasut
2012-06-25 18:07   ` Joe Hershberger
2012-06-25 23:16     ` Marek Vasut
2012-07-06 23:09     ` Marek Vasut
2012-07-06 23:16       ` Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 9/9] M28EVK: Enable instruction and data cache Marek Vasut

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201206262239.08804.marex@denx.de \
    --to=marex@denx.de \
    --cc=u-boot@lists.denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.