* [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers
@ 2012-04-24 7:53 Thierry Reding
2012-04-24 7:53 ` [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines Thierry Reding
2012-04-24 18:31 ` [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Simon Glass
0 siblings, 2 replies; 21+ messages in thread
From: Thierry Reding @ 2012-04-24 7:53 UTC (permalink / raw)
To: u-boot
Some hardware requires the data buffers to be cacheline-aligned to make
sure DMA operations can be properly executed.
This patch uses the ALLOC_CACHE_ALIGN_BUFFER macro to allocate buffers
with proper alignment. The same was already done for EFI partitions in
commit f75dd58 "part_efi: dcache: allocate cacheline aligned buffers".
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
---
This fixes boot failures on Avionic Design Plutux and Medcom boards.
disk/part_dos.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/disk/part_dos.c b/disk/part_dos.c
index b5bcb37..c028aaf 100644
--- a/disk/part_dos.c
+++ b/disk/part_dos.c
@@ -87,7 +87,7 @@ static int test_block_type(unsigned char *buffer)
int test_part_dos (block_dev_desc_t *dev_desc)
{
- unsigned char buffer[dev_desc->blksz];
+ ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
if ((dev_desc->block_read(dev_desc->dev, 0, 1, (ulong *) buffer) != 1) ||
(buffer[DOS_PART_MAGIC_OFFSET + 0] != 0x55) ||
@@ -102,7 +102,7 @@ int test_part_dos (block_dev_desc_t *dev_desc)
static void print_partition_extended (block_dev_desc_t *dev_desc, int ext_part_sector, int relative,
int part_num)
{
- unsigned char buffer[dev_desc->blksz];
+ ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
dos_partition_t *pt;
int i;
@@ -166,7 +166,7 @@ static int get_partition_info_extended (block_dev_desc_t *dev_desc, int ext_part
int relative, int part_num,
int which_part, disk_partition_t *info)
{
- unsigned char buffer[dev_desc->blksz];
+ ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
dos_partition_t *pt;
int i;
--
1.7.10
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-04-24 7:53 [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Thierry Reding
@ 2012-04-24 7:53 ` Thierry Reding
2012-04-25 22:54 ` Mike Frysinger
2012-04-24 18:31 ` [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Simon Glass
1 sibling, 1 reply; 21+ messages in thread
From: Thierry Reding @ 2012-04-24 7:53 UTC (permalink / raw)
To: u-boot
The MMC core sometimes reads buffers that are smaller than a complete
cacheline, for example when reading the SCR. In order to avoid a warning
from the ARM v7 cache handling code, this patch makes sure that complete
cachelines are flushed.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
---
drivers/mmc/tegra2_mmc.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c
index fb8a57d..3615876 100644
--- a/drivers/mmc/tegra2_mmc.c
+++ b/drivers/mmc/tegra2_mmc.c
@@ -323,12 +323,13 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
}
writel(mask, &host->reg->norintsts);
if (data->flags & MMC_DATA_READ) {
+ ulong end = (ulong)data->dest +
+ roundup(data->blocks * data->blocksize,
+ ARCH_DMA_MINALIGN);
if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1))
printf("Warning: unaligned read from %p "
"may fail\n", data->dest);
- invalidate_dcache_range((ulong)data->dest,
- (ulong)data->dest +
- data->blocks * data->blocksize);
+ invalidate_dcache_range((ulong)data->dest, end);
}
}
--
1.7.10
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers
2012-04-24 7:53 [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Thierry Reding
2012-04-24 7:53 ` [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines Thierry Reding
@ 2012-04-24 18:31 ` Simon Glass
2012-04-24 18:45 ` Thierry Reding
1 sibling, 1 reply; 21+ messages in thread
From: Simon Glass @ 2012-04-24 18:31 UTC (permalink / raw)
To: u-boot
On Tue, Apr 24, 2012 at 7:53 PM, Thierry Reding
<thierry.reding@avionic-design.de> wrote:
> Some hardware requires the data buffers to be cacheline-aligned to make
> sure DMA operations can be properly executed.
>
> This patch uses the ALLOC_CACHE_ALIGN_BUFFER macro to allocate buffers
> with proper alignment. The same was already done for EFI partitions in
> commit f75dd58 "part_efi: dcache: allocate cacheline aligned buffers".
>
> Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Simon Glass <sjg@chromium.org>
> ---
> This fixes boot failures on Avionic Design Plutux and Medcom boards.
>
> ?disk/part_dos.c | ? ?6 +++---
> ?1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/disk/part_dos.c b/disk/part_dos.c
> index b5bcb37..c028aaf 100644
> --- a/disk/part_dos.c
> +++ b/disk/part_dos.c
> @@ -87,7 +87,7 @@ static int test_block_type(unsigned char *buffer)
>
> ?int test_part_dos (block_dev_desc_t *dev_desc)
> ?{
> - ? ? ? unsigned char buffer[dev_desc->blksz];
> + ? ? ? ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
>
> ? ? ? ?if ((dev_desc->block_read(dev_desc->dev, 0, 1, (ulong *) buffer) != 1) ||
> ? ? ? ? ? ?(buffer[DOS_PART_MAGIC_OFFSET + 0] != 0x55) ||
> @@ -102,7 +102,7 @@ int test_part_dos (block_dev_desc_t *dev_desc)
> ?static void print_partition_extended (block_dev_desc_t *dev_desc, int ext_part_sector, int relative,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int part_num)
> ?{
> - ? ? ? unsigned char buffer[dev_desc->blksz];
> + ? ? ? ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
> ? ? ? ?dos_partition_t *pt;
> ? ? ? ?int i;
>
> @@ -166,7 +166,7 @@ static int get_partition_info_extended (block_dev_desc_t *dev_desc, int ext_part
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int relative, int part_num,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int which_part, disk_partition_t *info)
> ?{
> - ? ? ? unsigned char buffer[dev_desc->blksz];
> + ? ? ? ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz);
> ? ? ? ?dos_partition_t *pt;
> ? ? ? ?int i;
>
> --
> 1.7.10
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers
2012-04-24 18:31 ` [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Simon Glass
@ 2012-04-24 18:45 ` Thierry Reding
0 siblings, 0 replies; 21+ messages in thread
From: Thierry Reding @ 2012-04-24 18:45 UTC (permalink / raw)
To: u-boot
* Simon Glass wrote:
> On Tue, Apr 24, 2012 at 7:53 PM, Thierry Reding
> <thierry.reding@avionic-design.de> wrote:
> > Some hardware requires the data buffers to be cacheline-aligned to make
> > sure DMA operations can be properly executed.
> >
> > This patch uses the ALLOC_CACHE_ALIGN_BUFFER macro to allocate buffers
> > with proper alignment. The same was already done for EFI partitions in
> > commit f75dd58 "part_efi: dcache: allocate cacheline aligned buffers".
> >
> > Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
>
> Acked-by: Simon Glass <sjg@chromium.org>
I just noticed that the same patch was already posted by Eric Nelson back in
March. Anatolij Gustschin just applied it to u-boot-staging/agust at denx.de.
Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120424/044dc5dd/attachment.pgp>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-04-24 7:53 ` [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines Thierry Reding
@ 2012-04-25 22:54 ` Mike Frysinger
2012-04-26 6:18 ` Thierry Reding
0 siblings, 1 reply; 21+ messages in thread
From: Mike Frysinger @ 2012-04-25 22:54 UTC (permalink / raw)
To: u-boot
On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote:
> The MMC core sometimes reads buffers that are smaller than a complete
> cacheline, for example when reading the SCR. In order to avoid a warning
> from the ARM v7 cache handling code, this patch makes sure that complete
> cachelines are flushed.
this is still wrong. all you've done is bypass the error message without
addressing the underlying problem -- you're invalidating too much.
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120425/536ec64c/attachment.pgp>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-04-25 22:54 ` Mike Frysinger
@ 2012-04-26 6:18 ` Thierry Reding
2012-04-26 10:16 ` Simon Glass
0 siblings, 1 reply; 21+ messages in thread
From: Thierry Reding @ 2012-04-26 6:18 UTC (permalink / raw)
To: u-boot
* Mike Frysinger wrote:
> On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote:
> > The MMC core sometimes reads buffers that are smaller than a complete
> > cacheline, for example when reading the SCR. In order to avoid a warning
> > from the ARM v7 cache handling code, this patch makes sure that complete
> > cachelines are flushed.
>
> this is still wrong. all you've done is bypass the error message without
> addressing the underlying problem -- you're invalidating too much.
Reading 8 bytes is always less than a cacheline, so we don't have much
choice, do we? We could of course always read a whole cacheline even if only
8 bytes are requested, but does that have any advantage over reading 8 bytes
and then invalidating the cacheline?
Or maybe I'm missing the point.
Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120426/f3ea4a75/attachment.pgp>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-04-26 6:18 ` Thierry Reding
@ 2012-04-26 10:16 ` Simon Glass
2012-04-26 10:34 ` Thierry Reding
0 siblings, 1 reply; 21+ messages in thread
From: Simon Glass @ 2012-04-26 10:16 UTC (permalink / raw)
To: u-boot
Hi Thierry,
On Thu, Apr 26, 2012 at 6:18 PM, Thierry Reding <
thierry.reding@avionic-design.de> wrote:
> * Mike Frysinger wrote:
> > On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote:
> > > The MMC core sometimes reads buffers that are smaller than a complete
> > > cacheline, for example when reading the SCR. In order to avoid a
> warning
> > > from the ARM v7 cache handling code, this patch makes sure that
> complete
> > > cachelines are flushed.
> >
> > this is still wrong. all you've done is bypass the error message without
> > addressing the underlying problem -- you're invalidating too much.
>
> Reading 8 bytes is always less than a cacheline, so we don't have much
> choice, do we? We could of course always read a whole cacheline even if
> only
> 8 bytes are requested, but does that have any advantage over reading 8
> bytes
> and then invalidating the cacheline?
>
> Or maybe I'm missing the point.
>
Well the point is that you can read 8 bytes but you still must use a buffer
that is large enough for DMA activity. So the caller must allocate a buffer
larger than 8 bytes. I worry that what you have done will just introduce
obscure bugs, since we will potentially invalidate stack variants (for
example) and lose their values.
With the case problems, we are trying to fix them at source (i.e. at the
higher level).
Regards,
Simon
>
> Thierry
>
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-04-26 10:16 ` Simon Glass
@ 2012-04-26 10:34 ` Thierry Reding
2012-04-27 5:29 ` Mike Frysinger
0 siblings, 1 reply; 21+ messages in thread
From: Thierry Reding @ 2012-04-26 10:34 UTC (permalink / raw)
To: u-boot
* Simon Glass wrote:
> Hi Thierry,
>
> On Thu, Apr 26, 2012 at 6:18 PM, Thierry Reding <
> thierry.reding at avionic-design.de> wrote:
>
> > * Mike Frysinger wrote:
> > > On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote:
> > > > The MMC core sometimes reads buffers that are smaller than a complete
> > > > cacheline, for example when reading the SCR. In order to avoid a
> > warning
> > > > from the ARM v7 cache handling code, this patch makes sure that
> > complete
> > > > cachelines are flushed.
> > >
> > > this is still wrong. all you've done is bypass the error message without
> > > addressing the underlying problem -- you're invalidating too much.
> >
> > Reading 8 bytes is always less than a cacheline, so we don't have much
> > choice, do we? We could of course always read a whole cacheline even if
> > only
> > 8 bytes are requested, but does that have any advantage over reading 8
> > bytes
> > and then invalidating the cacheline?
> >
> > Or maybe I'm missing the point.
> >
>
> Well the point is that you can read 8 bytes but you still must use a buffer
> that is large enough for DMA activity. So the caller must allocate a buffer
> larger than 8 bytes.
The buffer used in this case is allocated with the ALLOC_CACHE_ALIGN_BUFFER()
macro, so the buffer size is not an issue. However, the mmc_data structure's
blocksize field is set to 8, which is the value that will eventually end up
in invalidate_dcache_range().
For reference, see sd_change_freq() in drivers/mmc/mmc.c.
> I worry that what you have done will just introduce obscure bugs, since we
> will potentially invalidate stack variants (for example) and lose their
> values.
>
> With the case problems, we are trying to fix them at source (i.e. at the
> higher level).
I understand. The question then becomes how to best fix the passed in size.
Always passing the size of a complete cacheline in the SEND_SCR command
doesn't seem like a better option because it may have other implications on
other hardware.
Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120426/778720f5/attachment.pgp>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-04-26 10:34 ` Thierry Reding
@ 2012-04-27 5:29 ` Mike Frysinger
2012-04-27 5:50 ` Simon Glass
2012-11-02 20:22 ` Stephen Warren
0 siblings, 2 replies; 21+ messages in thread
From: Mike Frysinger @ 2012-04-27 5:29 UTC (permalink / raw)
To: u-boot
On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
> For reference, see sd_change_freq() in drivers/mmc/mmc.c.
yes, that shows what we're talking about.
int sd_change_freq(struct mmc *mmc)
{
... stack vars ...
int err;
ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2);
struct mmc_data data;
int timeout;
... stack vars ...
...
data.dest = (char *)scr;
data.blocksize = 8;
data.blocks = 1;
data.flags = MMC_DATA_READ;
err = mmc_send_cmd(mmc, &cmd, &data);
...
static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
struct mmc_data *data)
{
...
if (data->flags & MMC_DATA_READ) {
if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1))
printf("Warning: unaligned read from %p "
"may fail\n", data->dest);
invalidate_dcache_range((ulong)data->dest,
(ulong)data->dest +
data->blocks * data->blocksize);
}
...
so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes
after that. the higher layers make sure &scr is aligned properly, but not
that it spans a full cache line. so if the stack layout is:
[random stuff]
0x100 [scr[0]]
0x104 [scr[1]]
0x108 [err]
0x10a [timeout]
[more stuff]
this implicitly invalidates [err] and [timeout] and more stuff. by you
forcibly rounding up the length, you no longer get a warning, but you still
(silently) blew away those variables from cache possibly losing changes that
weren't written back to external memory yet.
> > I worry that what you have done will just introduce obscure bugs, since
> > we will potentially invalidate stack variants (for example) and lose
> > their values.
> >
> > With the case problems, we are trying to fix them at source (i.e. at the
> > higher level).
>
> I understand. The question then becomes how to best fix the passed in size.
> Always passing the size of a complete cacheline in the SEND_SCR command
> doesn't seem like a better option because it may have other implications on
> other hardware.
i proposed this in a previous thread, but i think Simon missed it.
perhaps the warning in the core code could be dropped and all your changes in
fringe code obsoleted (such as these USB patches): when it detects that an
address is starting on an unaligned boundary, flush that line first, and then
let it be invalidated. accordingly, when the end length is on an unaligned
boundary, do the same flush-then-invalidate step. this should also make things
work without a (significant) loss in performance. if anything, i suspect the
overhead of doing runtime buffer size calculations and manually aligning
pointers (which is what ALLOC_CACHE_ALIGN_BUFFER does) is a wash compared to
partially flushing cache lines in the core ...
Simon: what do you think of this last idea ?
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120427/d6c2667e/attachment.pgp>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-04-27 5:29 ` Mike Frysinger
@ 2012-04-27 5:50 ` Simon Glass
2012-11-02 20:22 ` Stephen Warren
1 sibling, 0 replies; 21+ messages in thread
From: Simon Glass @ 2012-04-27 5:50 UTC (permalink / raw)
To: u-boot
Hi Mike,
On Fri, Apr 27, 2012 at 5:29 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
> > For reference, see sd_change_freq() in drivers/mmc/mmc.c.
>
> yes, that shows what we're talking about.
> int sd_change_freq(struct mmc *mmc)
> {
> ... stack vars ...
> int err;
> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2);
> struct mmc_data data;
> int timeout;
> ... stack vars ...
> ...
> data.dest = (char *)scr;
> data.blocksize = 8;
> data.blocks = 1;
> data.flags = MMC_DATA_READ;
> err = mmc_send_cmd(mmc, &cmd, &data);
> ...
>
> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
> struct mmc_data *data)
> {
> ...
> if (data->flags & MMC_DATA_READ) {
> if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1))
> printf("Warning: unaligned read from %p "
> "may fail\n", data->dest);
> invalidate_dcache_range((ulong)data->dest,
> (ulong)data->dest +
> data->blocks * data->blocksize);
> }
> ...
>
> so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes
> after that. the higher layers make sure &scr is aligned properly, but not
> that it spans a full cache line. so if the stack layout is:
> [random stuff]
> 0x100 [scr[0]]
> 0x104 [scr[1]]
> 0x108 [err]
> 0x10a [timeout]
> [more stuff]
>
> this implicitly invalidates [err] and [timeout] and more stuff. by you
> forcibly rounding up the length, you no longer get a warning, but you still
> (silently) blew away those variables from cache possibly losing changes
> that
> weren't written back to external memory yet.
>
> > > I worry that what you have done will just introduce obscure bugs, since
> > > we will potentially invalidate stack variants (for example) and lose
> > > their values.
> > >
> > > With the case problems, we are trying to fix them at source (i.e. at
> the
> > > higher level).
> >
> > I understand. The question then becomes how to best fix the passed in
> size.
> > Always passing the size of a complete cacheline in the SEND_SCR command
> > doesn't seem like a better option because it may have other implications
> on
> > other hardware.
>
> i proposed this in a previous thread, but i think Simon missed it.
>
> perhaps the warning in the core code could be dropped and all your changes
> in
> fringe code obsoleted (such as these USB patches): when it detects that an
> address is starting on an unaligned boundary, flush that line first, and
> then
> let it be invalidated. accordingly, when the end length is on an unaligned
> boundary, do the same flush-then-invalidate step. this should also make
> things
> work without a (significant) loss in performance. if anything, i suspect
> the
> overhead of doing runtime buffer size calculations and manually aligning
> pointers (which is what ALLOC_CACHE_ALIGN_BUFFER does) is a wash compared
> to
> partially flushing cache lines in the core ...
>
> Simon: what do you think of this last idea ?
>
I think I proved to myself a while back that it can't be done. Even if you
flush you can't know that more activity will not occur on that cache line
when you access the stack a microsecond later. Then much later when the DMA
updates the RAM address with the data, you will not see it, because you
have already pulled in that cache line. We are relying on the invalidate to
'hold' until we are ready to read the data.
How about just using an array of one element? So instead of:
struct fred fred;
do_something_with(&fred);
use:
ALLOC_CACHE_ALIGN_BUFFER(struct fred, scr, 1)
do_something_with(fred);
Regards,
Simon
> -mike
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-04-27 5:29 ` Mike Frysinger
2012-04-27 5:50 ` Simon Glass
@ 2012-11-02 20:22 ` Stephen Warren
2012-11-02 20:25 ` Simon Glass
1 sibling, 1 reply; 21+ messages in thread
From: Stephen Warren @ 2012-11-02 20:22 UTC (permalink / raw)
To: u-boot
On 04/26/2012 11:29 PM, Mike Frysinger wrote:
> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
>> For reference, see sd_change_freq() in drivers/mmc/mmc.c.
This is a follow-up to:
http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
which was referenced from:
http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
> yes, that shows what we're talking about. int sd_change_freq(struct
> mmc *mmc) { ... stack vars ... int err;
> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int
> timeout; ... stack vars ... ... data.dest = (char *)scr;
> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ;
> err = mmc_send_cmd(mmc, &cmd, &data); ...
>
> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if
> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning:
> unaligned read from %p " "may fail\n", data->dest);
> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest +
> data->blocks * data->blocksize); } ...
>
> so what invalidate_dcache_range() will invalidate is from &scr and
> 8 bytes after that. the higher layers make sure &scr is aligned
> properly, but not that it spans a full cache line. so if the stack
> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108
> [err] 0x10a [timeout] [more stuff]
That's not the stack layout. scr is allocated (as quoted above) using
ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for
scr is always cache-aligned in address and size, even if the code only
uses 8 bytes of it.
> this implicitly invalidates [err] and [timeout] and more stuff. by
> you forcibly rounding up the length, you no longer get a warning,
> but you still (silently) blew away those variables from cache
> possibly losing changes that weren't written back to external
> memory yet.
So, this problem won't actually occur here.
So, Thierry's proposed solution (which I'll re-quote below) would
actually work just fine:
> [PATCH 2/2] mmc: tegra: invalidate complete cachelines
>
> The MMC core sometimes reads buffers that are smaller than a
> complete cacheline, for example when reading the SCR. In order to
> avoid a warning from the ARM v7 cache handling code, this patch
> makes sure that complete cachelines are flushed.
>
> Signed-off-by: Thierry Reding <thierry.reding at
> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1
> file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c
> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++
> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int
> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask,
> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { +
> ulong end = (ulong)data->dest + + roundup(data->blocks *
> data->blocksize, + ARCH_DMA_MINALIGN); if
> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning:
> unaligned read from %p " "may fail\n", data->dest); -
> invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest
> + - data->blocks * data->blocksize); +
> invalidate_dcache_range((ulong)data->dest, end); } }
>
> --
... so long as we require any struct mmc_data data's .dest field to
point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
So, I'd like to propose that we take Thierry's fix, perhaps having
validated (or implemented some way of enforcing) that
ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-02 20:22 ` Stephen Warren
@ 2012-11-02 20:25 ` Simon Glass
2012-11-02 20:38 ` Marek Vasut
2012-11-02 20:55 ` Stephen Warren
0 siblings, 2 replies; 21+ messages in thread
From: Simon Glass @ 2012-11-02 20:25 UTC (permalink / raw)
To: u-boot
Hi Stephen,
On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 04/26/2012 11:29 PM, Mike Frysinger wrote:
>> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
>>> For reference, see sd_change_freq() in drivers/mmc/mmc.c.
>
> This is a follow-up to:
>
> http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
>
> which was referenced from:
>
> http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
>
>> yes, that shows what we're talking about. int sd_change_freq(struct
>> mmc *mmc) { ... stack vars ... int err;
>> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int
>> timeout; ... stack vars ... ... data.dest = (char *)scr;
>> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ;
>> err = mmc_send_cmd(mmc, &cmd, &data); ...
>>
>> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
>> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if
>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning:
>> unaligned read from %p " "may fail\n", data->dest);
>> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest +
>> data->blocks * data->blocksize); } ...
>>
>> so what invalidate_dcache_range() will invalidate is from &scr and
>> 8 bytes after that. the higher layers make sure &scr is aligned
>> properly, but not that it spans a full cache line. so if the stack
>> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108
>> [err] 0x10a [timeout] [more stuff]
>
> That's not the stack layout. scr is allocated (as quoted above) using
> ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for
> scr is always cache-aligned in address and size, even if the code only
> uses 8 bytes of it.
>
>> this implicitly invalidates [err] and [timeout] and more stuff. by
>> you forcibly rounding up the length, you no longer get a warning,
>> but you still (silently) blew away those variables from cache
>> possibly losing changes that weren't written back to external
>> memory yet.
>
> So, this problem won't actually occur here.
>
> So, Thierry's proposed solution (which I'll re-quote below) would
> actually work just fine:
>
>> [PATCH 2/2] mmc: tegra: invalidate complete cachelines
>>
>> The MMC core sometimes reads buffers that are smaller than a
>> complete cacheline, for example when reading the SCR. In order to
>> avoid a warning from the ARM v7 cache handling code, this patch
>> makes sure that complete cachelines are flushed.
>>
>> Signed-off-by: Thierry Reding <thierry.reding at
>> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1
>> file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c
>> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++
>> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int
>> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask,
>> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { +
>> ulong end = (ulong)data->dest + + roundup(data->blocks *
>> data->blocksize, + ARCH_DMA_MINALIGN); if
>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning:
>> unaligned read from %p " "may fail\n", data->dest); -
>> invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest
>> + - data->blocks * data->blocksize); +
>> invalidate_dcache_range((ulong)data->dest, end); } }
>>
>> --
>
> ... so long as we require any struct mmc_data data's .dest field to
> point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
>
> So, I'd like to propose that we take Thierry's fix, perhaps having
> validated (or implemented some way of enforcing) that
> ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
That sounds ok to me.
It is important to avoid silent failure if we can. Are you proposing
to pass a 'size' parameter along with any buffer pointer, or something
else?
Regards,
Simon
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-02 20:25 ` Simon Glass
@ 2012-11-02 20:38 ` Marek Vasut
2012-11-02 20:58 ` Stephen Warren
2012-11-02 20:55 ` Stephen Warren
1 sibling, 1 reply; 21+ messages in thread
From: Marek Vasut @ 2012-11-02 20:38 UTC (permalink / raw)
To: u-boot
Dear Simon Glass,
> Hi Stephen,
>
> On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> > On 04/26/2012 11:29 PM, Mike Frysinger wrote:
> >> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
> >>> For reference, see sd_change_freq() in drivers/mmc/mmc.c.
> >
> > This is a follow-up to:
> >
> > http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
> >
> > which was referenced from:
> >
> > http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
> >
> >> yes, that shows what we're talking about. int sd_change_freq(struct
> >> mmc *mmc) { ... stack vars ... int err;
> >> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int
> >> timeout; ... stack vars ... ... data.dest = (char *)scr;
> >> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ;
> >> err = mmc_send_cmd(mmc, &cmd, &data); ...
> >>
> >> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
> >> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if
> >> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning:
> >> unaligned read from %p " "may fail\n", data->dest);
> >> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest +
> >> data->blocks * data->blocksize); } ...
> >>
> >> so what invalidate_dcache_range() will invalidate is from &scr and
> >> 8 bytes after that. the higher layers make sure &scr is aligned
> >> properly, but not that it spans a full cache line. so if the stack
> >> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108
> >> [err] 0x10a [timeout] [more stuff]
> >
> > That's not the stack layout. scr is allocated (as quoted above) using
> > ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for
> > scr is always cache-aligned in address and size, even if the code only
> > uses 8 bytes of it.
> >
> >> this implicitly invalidates [err] and [timeout] and more stuff. by
> >> you forcibly rounding up the length, you no longer get a warning,
> >> but you still (silently) blew away those variables from cache
> >> possibly losing changes that weren't written back to external
> >> memory yet.
> >
> > So, this problem won't actually occur here.
> >
> > So, Thierry's proposed solution (which I'll re-quote below) would
> >
> > actually work just fine:
> >> [PATCH 2/2] mmc: tegra: invalidate complete cachelines
> >>
> >> The MMC core sometimes reads buffers that are smaller than a
> >> complete cacheline, for example when reading the SCR. In order to
> >> avoid a warning from the ARM v7 cache handling code, this patch
> >> makes sure that complete cachelines are flushed.
> >>
> >> Signed-off-by: Thierry Reding <thierry.reding at
> >> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1
> >> file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c
> >> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++
> >> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int
> >> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask,
> >> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { +
> >> ulong end = (ulong)data->dest + +
> >> roundup(data->blocks * data->blocksize, +
> >> ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest &
> >> (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may
> >> fail\n", data->dest); -
> >> invalidate_dcache_range((ulong)data->dest, -
> >> (ulong)data->dest + - data->blocks *
> >> data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); }
> >> }
> >>
> >> --
> >
> > ... so long as we require any struct mmc_data data's .dest field to
> > point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
> >
> > So, I'd like to propose that we take Thierry's fix, perhaps having
> > validated (or implemented some way of enforcing) that
> > ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
>
> That sounds ok to me.
>
> It is important to avoid silent failure if we can. Are you proposing
> to pass a 'size' parameter along with any buffer pointer, or something
> else?
Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if
so, what happens if you do raw read to unaligned address (aka. how come you
don't need the bounce buffer)?
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-02 20:25 ` Simon Glass
2012-11-02 20:38 ` Marek Vasut
@ 2012-11-02 20:55 ` Stephen Warren
1 sibling, 0 replies; 21+ messages in thread
From: Stephen Warren @ 2012-11-02 20:55 UTC (permalink / raw)
To: u-boot
On 11/02/2012 02:25 PM, Simon Glass wrote:
> Hi Stephen,
>
> On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> On 04/26/2012 11:29 PM, Mike Frysinger wrote:
>>> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
>>>> For reference, see sd_change_freq() in drivers/mmc/mmc.c.
>>
>> This is a follow-up to:
>>
>> http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
>>
>> which was referenced from:
>>
>> http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
>>
>>> yes, that shows what we're talking about. int sd_change_freq(struct
>>> mmc *mmc) { ... stack vars ... int err;
>>> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int
>>> timeout; ... stack vars ... ... data.dest = (char *)scr;
>>> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ;
>>> err = mmc_send_cmd(mmc, &cmd, &data); ...
>>>
>>> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
>>> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if
>>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning:
>>> unaligned read from %p " "may fail\n", data->dest);
>>> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest +
>>> data->blocks * data->blocksize); } ...
>>>
>>> so what invalidate_dcache_range() will invalidate is from &scr and
>>> 8 bytes after that. the higher layers make sure &scr is aligned
>>> properly, but not that it spans a full cache line. so if the stack
>>> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108
>>> [err] 0x10a [timeout] [more stuff]
>>
>> That's not the stack layout. scr is allocated (as quoted above) using
>> ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for
>> scr is always cache-aligned in address and size, even if the code only
>> uses 8 bytes of it.
>>
>>> this implicitly invalidates [err] and [timeout] and more stuff. by
>>> you forcibly rounding up the length, you no longer get a warning,
>>> but you still (silently) blew away those variables from cache
>>> possibly losing changes that weren't written back to external
>>> memory yet.
>>
>> So, this problem won't actually occur here.
>>
>> So, Thierry's proposed solution (which I'll re-quote below) would
>> actually work just fine:
>>
>>> [PATCH 2/2] mmc: tegra: invalidate complete cachelines
>>>
>>> The MMC core sometimes reads buffers that are smaller than a
>>> complete cacheline, for example when reading the SCR. In order to
>>> avoid a warning from the ARM v7 cache handling code, this patch
>>> makes sure that complete cachelines are flushed.
>>>
>>> Signed-off-by: Thierry Reding <thierry.reding at
>>> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1
>>> file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c
>>> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++
>>> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int
>>> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask,
>>> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { +
>>> ulong end = (ulong)data->dest + + roundup(data->blocks *
>>> data->blocksize, + ARCH_DMA_MINALIGN); if
>>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning:
>>> unaligned read from %p " "may fail\n", data->dest); -
>>> invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest
>>> + - data->blocks * data->blocksize); +
>>> invalidate_dcache_range((ulong)data->dest, end); } }
>>>
>>> --
>>
>> ... so long as we require any struct mmc_data data's .dest field to
>> point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
>>
>> So, I'd like to propose that we take Thierry's fix, perhaps having
>> validated (or implemented some way of enforcing) that
>> ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
>
> That sounds ok to me.
>
> It is important to avoid silent failure if we can. Are you proposing
> to pass a 'size' parameter along with any buffer pointer, or something
> else?
I was wondering about creating some kind of macro to initialize the
buffer pointer in struct mmc_data, and writing that macro so that the
buffer passed to it had to have been allocated using
ALLOC_CACHE_ALIGN_BUFFER (e.g. by referencing the hidden variable it
creates, or something like that).
Passing in the buffer size would also work, although it seems like it'd
be easier to screw that up and hide the silent errors you mentioned?
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-02 20:38 ` Marek Vasut
@ 2012-11-02 20:58 ` Stephen Warren
2012-11-02 21:28 ` Marek Vasut
0 siblings, 1 reply; 21+ messages in thread
From: Stephen Warren @ 2012-11-02 20:58 UTC (permalink / raw)
To: u-boot
On 11/02/2012 02:38 PM, Marek Vasut wrote:
> Dear Simon Glass,
>
>> Hi Stephen,
>>
>> On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>> On 04/26/2012 11:29 PM, Mike Frysinger wrote:
>>>> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
>>>>> For reference, see sd_change_freq() in drivers/mmc/mmc.c.
>>>
>>> This is a follow-up to:
>>>
>>> http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
>>>
>>> which was referenced from:
>>>
>>> http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
>>>
>>>> yes, that shows what we're talking about. int sd_change_freq(struct
>>>> mmc *mmc) { ... stack vars ... int err;
>>>> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int
>>>> timeout; ... stack vars ... ... data.dest = (char *)scr;
>>>> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ;
>>>> err = mmc_send_cmd(mmc, &cmd, &data); ...
>>>>
>>>> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
>>>> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if
>>>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning:
>>>> unaligned read from %p " "may fail\n", data->dest);
>>>> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest +
>>>> data->blocks * data->blocksize); } ...
>>>>
>>>> so what invalidate_dcache_range() will invalidate is from &scr and
>>>> 8 bytes after that. the higher layers make sure &scr is aligned
>>>> properly, but not that it spans a full cache line. so if the stack
>>>> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108
>>>> [err] 0x10a [timeout] [more stuff]
>>>
>>> That's not the stack layout. scr is allocated (as quoted above) using
>>> ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for
>>> scr is always cache-aligned in address and size, even if the code only
>>> uses 8 bytes of it.
>>>
>>>> this implicitly invalidates [err] and [timeout] and more stuff. by
>>>> you forcibly rounding up the length, you no longer get a warning,
>>>> but you still (silently) blew away those variables from cache
>>>> possibly losing changes that weren't written back to external
>>>> memory yet.
>>>
>>> So, this problem won't actually occur here.
>>>
>>> So, Thierry's proposed solution (which I'll re-quote below) would
>>>
>>> actually work just fine:
>>>> [PATCH 2/2] mmc: tegra: invalidate complete cachelines
>>>>
>>>> The MMC core sometimes reads buffers that are smaller than a
>>>> complete cacheline, for example when reading the SCR. In order to
>>>> avoid a warning from the ARM v7 cache handling code, this patch
>>>> makes sure that complete cachelines are flushed.
>>>>
>>>> Signed-off-by: Thierry Reding <thierry.reding at
>>>> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1
>>>> file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c
>>>> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++
>>>> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int
>>>> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask,
>>>> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { +
>>>> ulong end = (ulong)data->dest + +
>>>> roundup(data->blocks * data->blocksize, +
>>>> ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest &
>>>> (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may
>>>> fail\n", data->dest); -
>>>> invalidate_dcache_range((ulong)data->dest, -
>>>> (ulong)data->dest + - data->blocks *
>>>> data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); }
>>>> }
>>>>
>>>> --
>>>
>>> ... so long as we require any struct mmc_data data's .dest field to
>>> point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
>>>
>>> So, I'd like to propose that we take Thierry's fix, perhaps having
>>> validated (or implemented some way of enforcing) that
>>> ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
>>
>> That sounds ok to me.
>>
>> It is important to avoid silent failure if we can. Are you proposing
>> to pass a 'size' parameter along with any buffer pointer, or something
>> else?
>
> Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if
> so, what happens if you do raw read to unaligned address (aka. how come you
> don't need the bounce buffer)?
Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why
the driver is flushing caches!)
I guess we only support the use of aligned addresses, so e.g. the
following would work:
ext2load mmc 0:1 0x00100000 /file
but the following wouldn't:
ext2load mmc 0:1 0x00100004 /file
which while I suppose it is an artificial restriction, hasn't been an
issue in practice.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-02 20:58 ` Stephen Warren
@ 2012-11-02 21:28 ` Marek Vasut
2012-11-02 21:31 ` Stephen Warren
0 siblings, 1 reply; 21+ messages in thread
From: Marek Vasut @ 2012-11-02 21:28 UTC (permalink / raw)
To: u-boot
Dear Stephen Warren,
> On 11/02/2012 02:38 PM, Marek Vasut wrote:
> > Dear Simon Glass,
> >
> >> Hi Stephen,
> >>
> >> On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org>
wrote:
> >>> On 04/26/2012 11:29 PM, Mike Frysinger wrote:
> >>>> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
> >>>>> For reference, see sd_change_freq() in drivers/mmc/mmc.c.
> >>>
> >>> This is a follow-up to:
> >>>
> >>> http://lists.denx.de/pipermail/u-boot/2012-April/123080.html
> >>>
> >>> which was referenced from:
> >>>
> >>> http://lists.denx.de/pipermail/u-boot/2012-September/133641.html
> >>>
> >>>> yes, that shows what we're talking about. int sd_change_freq(struct
> >>>> mmc *mmc) { ... stack vars ... int err;
> >>>> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int
> >>>> timeout; ... stack vars ... ... data.dest = (char *)scr;
> >>>> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ;
> >>>> err = mmc_send_cmd(mmc, &cmd, &data); ...
> >>>>
> >>>> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
> >>>> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if
> >>>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning:
> >>>> unaligned read from %p " "may fail\n", data->dest);
> >>>> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest +
> >>>> data->blocks * data->blocksize); } ...
> >>>>
> >>>> so what invalidate_dcache_range() will invalidate is from &scr and
> >>>> 8 bytes after that. the higher layers make sure &scr is aligned
> >>>> properly, but not that it spans a full cache line. so if the stack
> >>>> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108
> >>>> [err] 0x10a [timeout] [more stuff]
> >>>
> >>> That's not the stack layout. scr is allocated (as quoted above) using
> >>> ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for
> >>> scr is always cache-aligned in address and size, even if the code only
> >>> uses 8 bytes of it.
> >>>
> >>>> this implicitly invalidates [err] and [timeout] and more stuff. by
> >>>> you forcibly rounding up the length, you no longer get a warning,
> >>>> but you still (silently) blew away those variables from cache
> >>>> possibly losing changes that weren't written back to external
> >>>> memory yet.
> >>>
> >>> So, this problem won't actually occur here.
> >>>
> >>> So, Thierry's proposed solution (which I'll re-quote below) would
> >>>
> >>> actually work just fine:
> >>>> [PATCH 2/2] mmc: tegra: invalidate complete cachelines
> >>>>
> >>>> The MMC core sometimes reads buffers that are smaller than a
> >>>> complete cacheline, for example when reading the SCR. In order to
> >>>> avoid a warning from the ARM v7 cache handling code, this patch
> >>>> makes sure that complete cachelines are flushed.
> >>>>
> >>>> Signed-off-by: Thierry Reding <thierry.reding at
> >>>> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1
> >>>> file changed, 4 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c
> >>>> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++
> >>>> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int
> >>>> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask,
> >>>> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { +
> >>>> ulong end = (ulong)data->dest + +
> >>>> roundup(data->blocks * data->blocksize, +
> >>>>
> >>>> ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest &
> >>>>
> >>>> (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p "
> >>>> "may fail\n", data->dest); -
> >>>> invalidate_dcache_range((ulong)data->dest, -
> >>>> (ulong)data->dest + - data->blocks *
> >>>> data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); }
> >>>> }
> >>>>
> >>>> --
> >>>
> >>> ... so long as we require any struct mmc_data data's .dest field to
> >>> point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER.
> >>>
> >>> So, I'd like to propose that we take Thierry's fix, perhaps having
> >>> validated (or implemented some way of enforcing) that
> >>> ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer.
> >>
> >> That sounds ok to me.
> >>
> >> It is important to avoid silent failure if we can. Are you proposing
> >> to pass a 'size' parameter along with any buffer pointer, or something
> >> else?
> >
> > Dumb question -- might be unrelated. Does the tegra mmc driver do DMA?
> > And if so, what happens if you do raw read to unaligned address (aka.
> > how come you don't need the bounce buffer)?
>
> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why
> the driver is flushing caches!)
>
> I guess we only support the use of aligned addresses, so e.g. the
> following would work:
>
> ext2load mmc 0:1 0x00100000 /file
>
> but the following wouldn't:
>
> ext2load mmc 0:1 0x00100004 /file
>
> which while I suppose it is an artificial restriction, hasn't been an
> issue in practice.
Then just enable the bounce buffer and it will work ;-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-02 21:28 ` Marek Vasut
@ 2012-11-02 21:31 ` Stephen Warren
2012-11-02 22:10 ` Marek Vasut
0 siblings, 1 reply; 21+ messages in thread
From: Stephen Warren @ 2012-11-02 21:31 UTC (permalink / raw)
To: u-boot
On 11/02/2012 03:28 PM, Marek Vasut wrote:
> Dear Stephen Warren,
>
>> On 11/02/2012 02:38 PM, Marek Vasut wrote:
...
>>> Dumb question -- might be unrelated. Does the tegra mmc driver do DMA?
>>> And if so, what happens if you do raw read to unaligned address (aka.
>>> how come you don't need the bounce buffer)?
>>
>> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why
>> the driver is flushing caches!)
>>
>> I guess we only support the use of aligned addresses, so e.g. the
>> following would work:
>>
>> ext2load mmc 0:1 0x00100000 /file
>>
>> but the following wouldn't:
>>
>> ext2load mmc 0:1 0x00100004 /file
>>
>> which while I suppose it is an artificial restriction, hasn't been an
>> issue in practice.
>
> Then just enable the bounce buffer and it will work ;-)
You suggested that last time, and it made no difference then... In fact,
the config option you mentioned isn't used anywhere in the srouce tree
except adding bouncebuf.o into the build right now; which config option
do you think I should use?
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-02 21:31 ` Stephen Warren
@ 2012-11-02 22:10 ` Marek Vasut
2012-11-05 19:50 ` Stephen Warren
0 siblings, 1 reply; 21+ messages in thread
From: Marek Vasut @ 2012-11-02 22:10 UTC (permalink / raw)
To: u-boot
Dear Stephen Warren,
> On 11/02/2012 03:28 PM, Marek Vasut wrote:
> > Dear Stephen Warren,
> >
> >> On 11/02/2012 02:38 PM, Marek Vasut wrote:
> ...
>
> >>> Dumb question -- might be unrelated. Does the tegra mmc driver do DMA?
> >>> And if so, what happens if you do raw read to unaligned address (aka.
> >>> how come you don't need the bounce buffer)?
> >>
> >> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why
> >> the driver is flushing caches!)
> >>
> >> I guess we only support the use of aligned addresses, so e.g. the
> >> following would work:
> >>
> >> ext2load mmc 0:1 0x00100000 /file
> >>
> >> but the following wouldn't:
> >>
> >> ext2load mmc 0:1 0x00100004 /file
> >>
> >> which while I suppose it is an artificial restriction, hasn't been an
> >> issue in practice.
> >
> > Then just enable the bounce buffer and it will work ;-)
>
> You suggested that last time, and it made no difference then... In fact,
> the config option you mentioned isn't used anywhere in the srouce tree
> except adding bouncebuf.o
Bouncebuf.o ?
> into the build right now; which config option
> do you think I should use?
CONFIG_BOUNCE_BUFFER
It's used on mx28-based boards.
Best regards,
Marek Vasut
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-02 22:10 ` Marek Vasut
@ 2012-11-05 19:50 ` Stephen Warren
2012-11-05 22:59 ` Marek Vasut
2012-11-05 23:07 ` Simon Glass
0 siblings, 2 replies; 21+ messages in thread
From: Stephen Warren @ 2012-11-05 19:50 UTC (permalink / raw)
To: u-boot
On 11/02/2012 04:10 PM, Marek Vasut wrote:
> Dear Stephen Warren,
>
>> On 11/02/2012 03:28 PM, Marek Vasut wrote:
>>> Dear Stephen Warren,
>>>
>>>> On 11/02/2012 02:38 PM, Marek Vasut wrote:
>> ...
>>
>>>>> Dumb question -- might be unrelated. Does the tegra mmc driver do DMA?
>>>>> And if so, what happens if you do raw read to unaligned address (aka.
>>>>> how come you don't need the bounce buffer)?
>>>>
>>>> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why
>>>> the driver is flushing caches!)
>>>>
>>>> I guess we only support the use of aligned addresses, so e.g. the
>>>> following would work:
>>>>
>>>> ext2load mmc 0:1 0x00100000 /file
>>>>
>>>> but the following wouldn't:
>>>>
>>>> ext2load mmc 0:1 0x00100004 /file
>>>>
>>>> which while I suppose it is an artificial restriction, hasn't been an
>>>> issue in practice.
>>>
>>> Then just enable the bounce buffer and it will work ;-)
>>
>> You suggested that last time, and it made no difference then... In fact,
>> the config option you mentioned isn't used anywhere in the srouce tree
>> except adding bouncebuf.o
>
> Bouncebuf.o ?
common/bouncebuf.c
>> into the build right now; which config option
>> do you think I should use?
>
> CONFIG_BOUNCE_BUFFER
>
> It's used on mx28-based boards.
Ahah, there's much more to it than just defining the config option, and
grep'ing for the config option doesn't reveal that.
Defining CONFIG_BOUNCE_BUFFER simply pulls in bouncebuf.o into the
build. It's then up to the individual MMC/... driver to actually make
use of the new functions - i.e. explicitly call e.g.
bounce_buffer_start(). The MXS MMC driver calls these without any ifdef
CONFIG_BOUNCE_BUFFER, and hence doesn't show up when you grep for that.
I had assumed that such functionality would be a conditional part of the
MMC core rather than the individual drivers, hence had missed the usage
in the MXS driver.
So, that config option plus some changes to the Tegra MMC driver might
work - I'll take a look. However, making that change will simply hide
any unaligned buffer and reduce performance, so I'd still prefer to
require that buffers be aligned in the future where possible.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-05 19:50 ` Stephen Warren
@ 2012-11-05 22:59 ` Marek Vasut
2012-11-05 23:07 ` Simon Glass
1 sibling, 0 replies; 21+ messages in thread
From: Marek Vasut @ 2012-11-05 22:59 UTC (permalink / raw)
To: u-boot
Dear Stephen Warren,
> On 11/02/2012 04:10 PM, Marek Vasut wrote:
> > Dear Stephen Warren,
> >
> >> On 11/02/2012 03:28 PM, Marek Vasut wrote:
> >>> Dear Stephen Warren,
> >>>
> >>>> On 11/02/2012 02:38 PM, Marek Vasut wrote:
> >> ...
> >>
> >>>>> Dumb question -- might be unrelated. Does the tegra mmc driver do
> >>>>> DMA? And if so, what happens if you do raw read to unaligned address
> >>>>> (aka. how come you don't need the bounce buffer)?
> >>>>
> >>>> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea
> >>>> why the driver is flushing caches!)
> >>>>
> >>>> I guess we only support the use of aligned addresses, so e.g. the
> >>>> following would work:
> >>>>
> >>>> ext2load mmc 0:1 0x00100000 /file
> >>>>
> >>>> but the following wouldn't:
> >>>>
> >>>> ext2load mmc 0:1 0x00100004 /file
> >>>>
> >>>> which while I suppose it is an artificial restriction, hasn't been an
> >>>> issue in practice.
> >>>
> >>> Then just enable the bounce buffer and it will work ;-)
> >>
> >> You suggested that last time, and it made no difference then... In fact,
> >> the config option you mentioned isn't used anywhere in the srouce tree
> >> except adding bouncebuf.o
> >
> > Bouncebuf.o ?
>
> common/bouncebuf.c
Ugh, it was applied ...
> >> into the build right now; which config option
> >> do you think I should use?
> >
> > CONFIG_BOUNCE_BUFFER
> >
> > It's used on mx28-based boards.
>
> Ahah, there's much more to it than just defining the config option, and
> grep'ing for the config option doesn't reveal that.
>
> Defining CONFIG_BOUNCE_BUFFER simply pulls in bouncebuf.o into the
> build. It's then up to the individual MMC/... driver to actually make
> use of the new functions - i.e. explicitly call e.g.
> bounce_buffer_start(). The MXS MMC driver calls these without any ifdef
> CONFIG_BOUNCE_BUFFER, and hence doesn't show up when you grep for that.
> I had assumed that such functionality would be a conditional part of the
> MMC core rather than the individual drivers, hence had missed the usage
> in the MXS driver.
Sorry about that, I didn't know the patch was pulled in so I expected the old
approach.
> So, that config option plus some changes to the Tegra MMC driver might
> work - I'll take a look. However, making that change will simply hide
> any unaligned buffer and reduce performance, so I'd still prefer to
> require that buffers be aligned in the future where possible.
Best regards,
Marek Vasut
^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines
2012-11-05 19:50 ` Stephen Warren
2012-11-05 22:59 ` Marek Vasut
@ 2012-11-05 23:07 ` Simon Glass
1 sibling, 0 replies; 21+ messages in thread
From: Simon Glass @ 2012-11-05 23:07 UTC (permalink / raw)
To: u-boot
Hi Stephen,
On Mon, Nov 5, 2012 at 11:50 AM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 11/02/2012 04:10 PM, Marek Vasut wrote:
>> Dear Stephen Warren,
>>
>>> On 11/02/2012 03:28 PM, Marek Vasut wrote:
>>>> Dear Stephen Warren,
>>>>
>>>>> On 11/02/2012 02:38 PM, Marek Vasut wrote:
>>> ...
>>>
>>>>>> Dumb question -- might be unrelated. Does the tegra mmc driver do DMA?
>>>>>> And if so, what happens if you do raw read to unaligned address (aka.
>>>>>> how come you don't need the bounce buffer)?
>>>>>
>>>>> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why
>>>>> the driver is flushing caches!)
>>>>>
>>>>> I guess we only support the use of aligned addresses, so e.g. the
>>>>> following would work:
>>>>>
>>>>> ext2load mmc 0:1 0x00100000 /file
>>>>>
>>>>> but the following wouldn't:
>>>>>
>>>>> ext2load mmc 0:1 0x00100004 /file
>>>>>
>>>>> which while I suppose it is an artificial restriction, hasn't been an
>>>>> issue in practice.
>>>>
>>>> Then just enable the bounce buffer and it will work ;-)
>>>
>>> You suggested that last time, and it made no difference then... In fact,
>>> the config option you mentioned isn't used anywhere in the srouce tree
>>> except adding bouncebuf.o
>>
>> Bouncebuf.o ?
>
> common/bouncebuf.c
>
>>> into the build right now; which config option
>>> do you think I should use?
>>
>> CONFIG_BOUNCE_BUFFER
>>
>> It's used on mx28-based boards.
>
> Ahah, there's much more to it than just defining the config option, and
> grep'ing for the config option doesn't reveal that.
>
> Defining CONFIG_BOUNCE_BUFFER simply pulls in bouncebuf.o into the
> build. It's then up to the individual MMC/... driver to actually make
> use of the new functions - i.e. explicitly call e.g.
> bounce_buffer_start(). The MXS MMC driver calls these without any ifdef
> CONFIG_BOUNCE_BUFFER, and hence doesn't show up when you grep for that.
> I had assumed that such functionality would be a conditional part of the
> MMC core rather than the individual drivers, hence had missed the usage
> in the MXS driver.
>
> So, that config option plus some changes to the Tegra MMC driver might
> work - I'll take a look. However, making that change will simply hide
> any unaligned buffer and reduce performance, so I'd still prefer to
> require that buffers be aligned in the future where possible.
Yes that seems right to me.
Regards,
Simon
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2012-11-05 23:07 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-24 7:53 [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Thierry Reding
2012-04-24 7:53 ` [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines Thierry Reding
2012-04-25 22:54 ` Mike Frysinger
2012-04-26 6:18 ` Thierry Reding
2012-04-26 10:16 ` Simon Glass
2012-04-26 10:34 ` Thierry Reding
2012-04-27 5:29 ` Mike Frysinger
2012-04-27 5:50 ` Simon Glass
2012-11-02 20:22 ` Stephen Warren
2012-11-02 20:25 ` Simon Glass
2012-11-02 20:38 ` Marek Vasut
2012-11-02 20:58 ` Stephen Warren
2012-11-02 21:28 ` Marek Vasut
2012-11-02 21:31 ` Stephen Warren
2012-11-02 22:10 ` Marek Vasut
2012-11-05 19:50 ` Stephen Warren
2012-11-05 22:59 ` Marek Vasut
2012-11-05 23:07 ` Simon Glass
2012-11-02 20:55 ` Stephen Warren
2012-04-24 18:31 ` [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Simon Glass
2012-04-24 18:45 ` Thierry Reding
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox