* [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers @ 2012-04-24 7:53 Thierry Reding 2012-04-24 7:53 ` [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines Thierry Reding 2012-04-24 18:31 ` [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Simon Glass 0 siblings, 2 replies; 21+ messages in thread From: Thierry Reding @ 2012-04-24 7:53 UTC (permalink / raw) To: u-boot Some hardware requires the data buffers to be cacheline-aligned to make sure DMA operations can be properly executed. This patch uses the ALLOC_CACHE_ALIGN_BUFFER macro to allocate buffers with proper alignment. The same was already done for EFI partitions in commit f75dd58 "part_efi: dcache: allocate cacheline aligned buffers". Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de> --- This fixes boot failures on Avionic Design Plutux and Medcom boards. disk/part_dos.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/disk/part_dos.c b/disk/part_dos.c index b5bcb37..c028aaf 100644 --- a/disk/part_dos.c +++ b/disk/part_dos.c @@ -87,7 +87,7 @@ static int test_block_type(unsigned char *buffer) int test_part_dos (block_dev_desc_t *dev_desc) { - unsigned char buffer[dev_desc->blksz]; + ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz); if ((dev_desc->block_read(dev_desc->dev, 0, 1, (ulong *) buffer) != 1) || (buffer[DOS_PART_MAGIC_OFFSET + 0] != 0x55) || @@ -102,7 +102,7 @@ int test_part_dos (block_dev_desc_t *dev_desc) static void print_partition_extended (block_dev_desc_t *dev_desc, int ext_part_sector, int relative, int part_num) { - unsigned char buffer[dev_desc->blksz]; + ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz); dos_partition_t *pt; int i; @@ -166,7 +166,7 @@ static int get_partition_info_extended (block_dev_desc_t *dev_desc, int ext_part int relative, int part_num, int which_part, disk_partition_t *info) { - unsigned char buffer[dev_desc->blksz]; + ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz); dos_partition_t *pt; int i; -- 1.7.10 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-04-24 7:53 [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Thierry Reding @ 2012-04-24 7:53 ` Thierry Reding 2012-04-25 22:54 ` Mike Frysinger 2012-04-24 18:31 ` [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Simon Glass 1 sibling, 1 reply; 21+ messages in thread From: Thierry Reding @ 2012-04-24 7:53 UTC (permalink / raw) To: u-boot The MMC core sometimes reads buffers that are smaller than a complete cacheline, for example when reading the SCR. In order to avoid a warning from the ARM v7 cache handling code, this patch makes sure that complete cachelines are flushed. Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + ulong end = (ulong)data->dest + + roundup(data->blocks * data->blocksize, + ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); - invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest + - data->blocks * data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); } } -- 1.7.10 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-04-24 7:53 ` [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines Thierry Reding @ 2012-04-25 22:54 ` Mike Frysinger 2012-04-26 6:18 ` Thierry Reding 0 siblings, 1 reply; 21+ messages in thread From: Mike Frysinger @ 2012-04-25 22:54 UTC (permalink / raw) To: u-boot On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote: > The MMC core sometimes reads buffers that are smaller than a complete > cacheline, for example when reading the SCR. In order to avoid a warning > from the ARM v7 cache handling code, this patch makes sure that complete > cachelines are flushed. this is still wrong. all you've done is bypass the error message without addressing the underlying problem -- you're invalidating too much. -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120425/536ec64c/attachment.pgp> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-04-25 22:54 ` Mike Frysinger @ 2012-04-26 6:18 ` Thierry Reding 2012-04-26 10:16 ` Simon Glass 0 siblings, 1 reply; 21+ messages in thread From: Thierry Reding @ 2012-04-26 6:18 UTC (permalink / raw) To: u-boot * Mike Frysinger wrote: > On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote: > > The MMC core sometimes reads buffers that are smaller than a complete > > cacheline, for example when reading the SCR. In order to avoid a warning > > from the ARM v7 cache handling code, this patch makes sure that complete > > cachelines are flushed. > > this is still wrong. all you've done is bypass the error message without > addressing the underlying problem -- you're invalidating too much. Reading 8 bytes is always less than a cacheline, so we don't have much choice, do we? We could of course always read a whole cacheline even if only 8 bytes are requested, but does that have any advantage over reading 8 bytes and then invalidating the cacheline? Or maybe I'm missing the point. Thierry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120426/f3ea4a75/attachment.pgp> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-04-26 6:18 ` Thierry Reding @ 2012-04-26 10:16 ` Simon Glass 2012-04-26 10:34 ` Thierry Reding 0 siblings, 1 reply; 21+ messages in thread From: Simon Glass @ 2012-04-26 10:16 UTC (permalink / raw) To: u-boot Hi Thierry, On Thu, Apr 26, 2012 at 6:18 PM, Thierry Reding < thierry.reding@avionic-design.de> wrote: > * Mike Frysinger wrote: > > On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote: > > > The MMC core sometimes reads buffers that are smaller than a complete > > > cacheline, for example when reading the SCR. In order to avoid a > warning > > > from the ARM v7 cache handling code, this patch makes sure that > complete > > > cachelines are flushed. > > > > this is still wrong. all you've done is bypass the error message without > > addressing the underlying problem -- you're invalidating too much. > > Reading 8 bytes is always less than a cacheline, so we don't have much > choice, do we? We could of course always read a whole cacheline even if > only > 8 bytes are requested, but does that have any advantage over reading 8 > bytes > and then invalidating the cacheline? > > Or maybe I'm missing the point. > Well the point is that you can read 8 bytes but you still must use a buffer that is large enough for DMA activity. So the caller must allocate a buffer larger than 8 bytes. I worry that what you have done will just introduce obscure bugs, since we will potentially invalidate stack variants (for example) and lose their values. With the case problems, we are trying to fix them at source (i.e. at the higher level). Regards, Simon > > Thierry > > _______________________________________________ > U-Boot mailing list > U-Boot at lists.denx.de > http://lists.denx.de/mailman/listinfo/u-boot > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-04-26 10:16 ` Simon Glass @ 2012-04-26 10:34 ` Thierry Reding 2012-04-27 5:29 ` Mike Frysinger 0 siblings, 1 reply; 21+ messages in thread From: Thierry Reding @ 2012-04-26 10:34 UTC (permalink / raw) To: u-boot * Simon Glass wrote: > Hi Thierry, > > On Thu, Apr 26, 2012 at 6:18 PM, Thierry Reding < > thierry.reding at avionic-design.de> wrote: > > > * Mike Frysinger wrote: > > > On Tuesday 24 April 2012 03:53:44 Thierry Reding wrote: > > > > The MMC core sometimes reads buffers that are smaller than a complete > > > > cacheline, for example when reading the SCR. In order to avoid a > > warning > > > > from the ARM v7 cache handling code, this patch makes sure that > > complete > > > > cachelines are flushed. > > > > > > this is still wrong. all you've done is bypass the error message without > > > addressing the underlying problem -- you're invalidating too much. > > > > Reading 8 bytes is always less than a cacheline, so we don't have much > > choice, do we? We could of course always read a whole cacheline even if > > only > > 8 bytes are requested, but does that have any advantage over reading 8 > > bytes > > and then invalidating the cacheline? > > > > Or maybe I'm missing the point. > > > > Well the point is that you can read 8 bytes but you still must use a buffer > that is large enough for DMA activity. So the caller must allocate a buffer > larger than 8 bytes. The buffer used in this case is allocated with the ALLOC_CACHE_ALIGN_BUFFER() macro, so the buffer size is not an issue. However, the mmc_data structure's blocksize field is set to 8, which is the value that will eventually end up in invalidate_dcache_range(). For reference, see sd_change_freq() in drivers/mmc/mmc.c. > I worry that what you have done will just introduce obscure bugs, since we > will potentially invalidate stack variants (for example) and lose their > values. > > With the case problems, we are trying to fix them at source (i.e. at the > higher level). I understand. The question then becomes how to best fix the passed in size. Always passing the size of a complete cacheline in the SEND_SCR command doesn't seem like a better option because it may have other implications on other hardware. Thierry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120426/778720f5/attachment.pgp> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-04-26 10:34 ` Thierry Reding @ 2012-04-27 5:29 ` Mike Frysinger 2012-04-27 5:50 ` Simon Glass 2012-11-02 20:22 ` Stephen Warren 0 siblings, 2 replies; 21+ messages in thread From: Mike Frysinger @ 2012-04-27 5:29 UTC (permalink / raw) To: u-boot On Thursday 26 April 2012 06:34:43 Thierry Reding wrote: > For reference, see sd_change_freq() in drivers/mmc/mmc.c. yes, that shows what we're talking about. int sd_change_freq(struct mmc *mmc) { ... stack vars ... int err; ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int timeout; ... stack vars ... ... data.dest = (char *)scr; data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; err = mmc_send_cmd(mmc, &cmd, &data); ... static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may fail\n", data->dest); invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + data->blocks * data->blocksize); } ... so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes after that. the higher layers make sure &scr is aligned properly, but not that it spans a full cache line. so if the stack layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 [err] 0x10a [timeout] [more stuff] this implicitly invalidates [err] and [timeout] and more stuff. by you forcibly rounding up the length, you no longer get a warning, but you still (silently) blew away those variables from cache possibly losing changes that weren't written back to external memory yet. > > I worry that what you have done will just introduce obscure bugs, since > > we will potentially invalidate stack variants (for example) and lose > > their values. > > > > With the case problems, we are trying to fix them at source (i.e. at the > > higher level). > > I understand. The question then becomes how to best fix the passed in size. > Always passing the size of a complete cacheline in the SEND_SCR command > doesn't seem like a better option because it may have other implications on > other hardware. i proposed this in a previous thread, but i think Simon missed it. perhaps the warning in the core code could be dropped and all your changes in fringe code obsoleted (such as these USB patches): when it detects that an address is starting on an unaligned boundary, flush that line first, and then let it be invalidated. accordingly, when the end length is on an unaligned boundary, do the same flush-then-invalidate step. this should also make things work without a (significant) loss in performance. if anything, i suspect the overhead of doing runtime buffer size calculations and manually aligning pointers (which is what ALLOC_CACHE_ALIGN_BUFFER does) is a wash compared to partially flushing cache lines in the core ... Simon: what do you think of this last idea ? -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120427/d6c2667e/attachment.pgp> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-04-27 5:29 ` Mike Frysinger @ 2012-04-27 5:50 ` Simon Glass 2012-11-02 20:22 ` Stephen Warren 1 sibling, 0 replies; 21+ messages in thread From: Simon Glass @ 2012-04-27 5:50 UTC (permalink / raw) To: u-boot Hi Mike, On Fri, Apr 27, 2012 at 5:29 PM, Mike Frysinger <vapier@gentoo.org> wrote: > On Thursday 26 April 2012 06:34:43 Thierry Reding wrote: > > For reference, see sd_change_freq() in drivers/mmc/mmc.c. > > yes, that shows what we're talking about. > int sd_change_freq(struct mmc *mmc) > { > ... stack vars ... > int err; > ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); > struct mmc_data data; > int timeout; > ... stack vars ... > ... > data.dest = (char *)scr; > data.blocksize = 8; > data.blocks = 1; > data.flags = MMC_DATA_READ; > err = mmc_send_cmd(mmc, &cmd, &data); > ... > > static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, > struct mmc_data *data) > { > ... > if (data->flags & MMC_DATA_READ) { > if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) > printf("Warning: unaligned read from %p " > "may fail\n", data->dest); > invalidate_dcache_range((ulong)data->dest, > (ulong)data->dest + > data->blocks * data->blocksize); > } > ... > > so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes > after that. the higher layers make sure &scr is aligned properly, but not > that it spans a full cache line. so if the stack layout is: > [random stuff] > 0x100 [scr[0]] > 0x104 [scr[1]] > 0x108 [err] > 0x10a [timeout] > [more stuff] > > this implicitly invalidates [err] and [timeout] and more stuff. by you > forcibly rounding up the length, you no longer get a warning, but you still > (silently) blew away those variables from cache possibly losing changes > that > weren't written back to external memory yet. > > > > I worry that what you have done will just introduce obscure bugs, since > > > we will potentially invalidate stack variants (for example) and lose > > > their values. > > > > > > With the case problems, we are trying to fix them at source (i.e. at > the > > > higher level). > > > > I understand. The question then becomes how to best fix the passed in > size. > > Always passing the size of a complete cacheline in the SEND_SCR command > > doesn't seem like a better option because it may have other implications > on > > other hardware. > > i proposed this in a previous thread, but i think Simon missed it. > > perhaps the warning in the core code could be dropped and all your changes > in > fringe code obsoleted (such as these USB patches): when it detects that an > address is starting on an unaligned boundary, flush that line first, and > then > let it be invalidated. accordingly, when the end length is on an unaligned > boundary, do the same flush-then-invalidate step. this should also make > things > work without a (significant) loss in performance. if anything, i suspect > the > overhead of doing runtime buffer size calculations and manually aligning > pointers (which is what ALLOC_CACHE_ALIGN_BUFFER does) is a wash compared > to > partially flushing cache lines in the core ... > > Simon: what do you think of this last idea ? > I think I proved to myself a while back that it can't be done. Even if you flush you can't know that more activity will not occur on that cache line when you access the stack a microsecond later. Then much later when the DMA updates the RAM address with the data, you will not see it, because you have already pulled in that cache line. We are relying on the invalidate to 'hold' until we are ready to read the data. How about just using an array of one element? So instead of: struct fred fred; do_something_with(&fred); use: ALLOC_CACHE_ALIGN_BUFFER(struct fred, scr, 1) do_something_with(fred); Regards, Simon > -mike > ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-04-27 5:29 ` Mike Frysinger 2012-04-27 5:50 ` Simon Glass @ 2012-11-02 20:22 ` Stephen Warren 2012-11-02 20:25 ` Simon Glass 1 sibling, 1 reply; 21+ messages in thread From: Stephen Warren @ 2012-11-02 20:22 UTC (permalink / raw) To: u-boot On 04/26/2012 11:29 PM, Mike Frysinger wrote: > On Thursday 26 April 2012 06:34:43 Thierry Reding wrote: >> For reference, see sd_change_freq() in drivers/mmc/mmc.c. This is a follow-up to: http://lists.denx.de/pipermail/u-boot/2012-April/123080.html which was referenced from: http://lists.denx.de/pipermail/u-boot/2012-September/133641.html > yes, that shows what we're talking about. int sd_change_freq(struct > mmc *mmc) { ... stack vars ... int err; > ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int > timeout; ... stack vars ... ... data.dest = (char *)scr; > data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; > err = mmc_send_cmd(mmc, &cmd, &data); ... > > static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, > struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if > ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: > unaligned read from %p " "may fail\n", data->dest); > invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + > data->blocks * data->blocksize); } ... > > so what invalidate_dcache_range() will invalidate is from &scr and > 8 bytes after that. the higher layers make sure &scr is aligned > properly, but not that it spans a full cache line. so if the stack > layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 > [err] 0x10a [timeout] [more stuff] That's not the stack layout. scr is allocated (as quoted above) using ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for scr is always cache-aligned in address and size, even if the code only uses 8 bytes of it. > this implicitly invalidates [err] and [timeout] and more stuff. by > you forcibly rounding up the length, you no longer get a warning, > but you still (silently) blew away those variables from cache > possibly losing changes that weren't written back to external > memory yet. So, this problem won't actually occur here. So, Thierry's proposed solution (which I'll re-quote below) would actually work just fine: > [PATCH 2/2] mmc: tegra: invalidate complete cachelines > > The MMC core sometimes reads buffers that are smaller than a > complete cacheline, for example when reading the SCR. In order to > avoid a warning from the ARM v7 cache handling code, this patch > makes sure that complete cachelines are flushed. > > Signed-off-by: Thierry Reding <thierry.reding at > avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 > file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c > index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ > b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int > mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, > &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + > ulong end = (ulong)data->dest + + roundup(data->blocks * > data->blocksize, + ARCH_DMA_MINALIGN); if > ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: > unaligned read from %p " "may fail\n", data->dest); - > invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest > + - data->blocks * data->blocksize); + > invalidate_dcache_range((ulong)data->dest, end); } } > > -- ... so long as we require any struct mmc_data data's .dest field to point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER. So, I'd like to propose that we take Thierry's fix, perhaps having validated (or implemented some way of enforcing) that ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-02 20:22 ` Stephen Warren @ 2012-11-02 20:25 ` Simon Glass 2012-11-02 20:38 ` Marek Vasut 2012-11-02 20:55 ` Stephen Warren 0 siblings, 2 replies; 21+ messages in thread From: Simon Glass @ 2012-11-02 20:25 UTC (permalink / raw) To: u-boot Hi Stephen, On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 04/26/2012 11:29 PM, Mike Frysinger wrote: >> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote: >>> For reference, see sd_change_freq() in drivers/mmc/mmc.c. > > This is a follow-up to: > > http://lists.denx.de/pipermail/u-boot/2012-April/123080.html > > which was referenced from: > > http://lists.denx.de/pipermail/u-boot/2012-September/133641.html > >> yes, that shows what we're talking about. int sd_change_freq(struct >> mmc *mmc) { ... stack vars ... int err; >> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int >> timeout; ... stack vars ... ... data.dest = (char *)scr; >> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; >> err = mmc_send_cmd(mmc, &cmd, &data); ... >> >> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, >> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if >> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: >> unaligned read from %p " "may fail\n", data->dest); >> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + >> data->blocks * data->blocksize); } ... >> >> so what invalidate_dcache_range() will invalidate is from &scr and >> 8 bytes after that. the higher layers make sure &scr is aligned >> properly, but not that it spans a full cache line. so if the stack >> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 >> [err] 0x10a [timeout] [more stuff] > > That's not the stack layout. scr is allocated (as quoted above) using > ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for > scr is always cache-aligned in address and size, even if the code only > uses 8 bytes of it. > >> this implicitly invalidates [err] and [timeout] and more stuff. by >> you forcibly rounding up the length, you no longer get a warning, >> but you still (silently) blew away those variables from cache >> possibly losing changes that weren't written back to external >> memory yet. > > So, this problem won't actually occur here. > > So, Thierry's proposed solution (which I'll re-quote below) would > actually work just fine: > >> [PATCH 2/2] mmc: tegra: invalidate complete cachelines >> >> The MMC core sometimes reads buffers that are smaller than a >> complete cacheline, for example when reading the SCR. In order to >> avoid a warning from the ARM v7 cache handling code, this patch >> makes sure that complete cachelines are flushed. >> >> Signed-off-by: Thierry Reding <thierry.reding at >> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 >> file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c >> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ >> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int >> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, >> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + >> ulong end = (ulong)data->dest + + roundup(data->blocks * >> data->blocksize, + ARCH_DMA_MINALIGN); if >> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: >> unaligned read from %p " "may fail\n", data->dest); - >> invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest >> + - data->blocks * data->blocksize); + >> invalidate_dcache_range((ulong)data->dest, end); } } >> >> -- > > ... so long as we require any struct mmc_data data's .dest field to > point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER. > > So, I'd like to propose that we take Thierry's fix, perhaps having > validated (or implemented some way of enforcing) that > ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer. That sounds ok to me. It is important to avoid silent failure if we can. Are you proposing to pass a 'size' parameter along with any buffer pointer, or something else? Regards, Simon ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-02 20:25 ` Simon Glass @ 2012-11-02 20:38 ` Marek Vasut 2012-11-02 20:58 ` Stephen Warren 2012-11-02 20:55 ` Stephen Warren 1 sibling, 1 reply; 21+ messages in thread From: Marek Vasut @ 2012-11-02 20:38 UTC (permalink / raw) To: u-boot Dear Simon Glass, > Hi Stephen, > > On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > > On 04/26/2012 11:29 PM, Mike Frysinger wrote: > >> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote: > >>> For reference, see sd_change_freq() in drivers/mmc/mmc.c. > > > > This is a follow-up to: > > > > http://lists.denx.de/pipermail/u-boot/2012-April/123080.html > > > > which was referenced from: > > > > http://lists.denx.de/pipermail/u-boot/2012-September/133641.html > > > >> yes, that shows what we're talking about. int sd_change_freq(struct > >> mmc *mmc) { ... stack vars ... int err; > >> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int > >> timeout; ... stack vars ... ... data.dest = (char *)scr; > >> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; > >> err = mmc_send_cmd(mmc, &cmd, &data); ... > >> > >> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, > >> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if > >> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: > >> unaligned read from %p " "may fail\n", data->dest); > >> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + > >> data->blocks * data->blocksize); } ... > >> > >> so what invalidate_dcache_range() will invalidate is from &scr and > >> 8 bytes after that. the higher layers make sure &scr is aligned > >> properly, but not that it spans a full cache line. so if the stack > >> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 > >> [err] 0x10a [timeout] [more stuff] > > > > That's not the stack layout. scr is allocated (as quoted above) using > > ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for > > scr is always cache-aligned in address and size, even if the code only > > uses 8 bytes of it. > > > >> this implicitly invalidates [err] and [timeout] and more stuff. by > >> you forcibly rounding up the length, you no longer get a warning, > >> but you still (silently) blew away those variables from cache > >> possibly losing changes that weren't written back to external > >> memory yet. > > > > So, this problem won't actually occur here. > > > > So, Thierry's proposed solution (which I'll re-quote below) would > > > > actually work just fine: > >> [PATCH 2/2] mmc: tegra: invalidate complete cachelines > >> > >> The MMC core sometimes reads buffers that are smaller than a > >> complete cacheline, for example when reading the SCR. In order to > >> avoid a warning from the ARM v7 cache handling code, this patch > >> makes sure that complete cachelines are flushed. > >> > >> Signed-off-by: Thierry Reding <thierry.reding at > >> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 > >> file changed, 4 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c > >> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ > >> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int > >> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, > >> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + > >> ulong end = (ulong)data->dest + + > >> roundup(data->blocks * data->blocksize, + > >> ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & > >> (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may > >> fail\n", data->dest); - > >> invalidate_dcache_range((ulong)data->dest, - > >> (ulong)data->dest + - data->blocks * > >> data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); } > >> } > >> > >> -- > > > > ... so long as we require any struct mmc_data data's .dest field to > > point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER. > > > > So, I'd like to propose that we take Thierry's fix, perhaps having > > validated (or implemented some way of enforcing) that > > ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer. > > That sounds ok to me. > > It is important to avoid silent failure if we can. Are you proposing > to pass a 'size' parameter along with any buffer pointer, or something > else? Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if so, what happens if you do raw read to unaligned address (aka. how come you don't need the bounce buffer)? ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-02 20:38 ` Marek Vasut @ 2012-11-02 20:58 ` Stephen Warren 2012-11-02 21:28 ` Marek Vasut 0 siblings, 1 reply; 21+ messages in thread From: Stephen Warren @ 2012-11-02 20:58 UTC (permalink / raw) To: u-boot On 11/02/2012 02:38 PM, Marek Vasut wrote: > Dear Simon Glass, > >> Hi Stephen, >> >> On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >>> On 04/26/2012 11:29 PM, Mike Frysinger wrote: >>>> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote: >>>>> For reference, see sd_change_freq() in drivers/mmc/mmc.c. >>> >>> This is a follow-up to: >>> >>> http://lists.denx.de/pipermail/u-boot/2012-April/123080.html >>> >>> which was referenced from: >>> >>> http://lists.denx.de/pipermail/u-boot/2012-September/133641.html >>> >>>> yes, that shows what we're talking about. int sd_change_freq(struct >>>> mmc *mmc) { ... stack vars ... int err; >>>> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int >>>> timeout; ... stack vars ... ... data.dest = (char *)scr; >>>> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; >>>> err = mmc_send_cmd(mmc, &cmd, &data); ... >>>> >>>> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, >>>> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if >>>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: >>>> unaligned read from %p " "may fail\n", data->dest); >>>> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + >>>> data->blocks * data->blocksize); } ... >>>> >>>> so what invalidate_dcache_range() will invalidate is from &scr and >>>> 8 bytes after that. the higher layers make sure &scr is aligned >>>> properly, but not that it spans a full cache line. so if the stack >>>> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 >>>> [err] 0x10a [timeout] [more stuff] >>> >>> That's not the stack layout. scr is allocated (as quoted above) using >>> ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for >>> scr is always cache-aligned in address and size, even if the code only >>> uses 8 bytes of it. >>> >>>> this implicitly invalidates [err] and [timeout] and more stuff. by >>>> you forcibly rounding up the length, you no longer get a warning, >>>> but you still (silently) blew away those variables from cache >>>> possibly losing changes that weren't written back to external >>>> memory yet. >>> >>> So, this problem won't actually occur here. >>> >>> So, Thierry's proposed solution (which I'll re-quote below) would >>> >>> actually work just fine: >>>> [PATCH 2/2] mmc: tegra: invalidate complete cachelines >>>> >>>> The MMC core sometimes reads buffers that are smaller than a >>>> complete cacheline, for example when reading the SCR. In order to >>>> avoid a warning from the ARM v7 cache handling code, this patch >>>> makes sure that complete cachelines are flushed. >>>> >>>> Signed-off-by: Thierry Reding <thierry.reding at >>>> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 >>>> file changed, 4 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c >>>> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ >>>> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int >>>> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, >>>> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + >>>> ulong end = (ulong)data->dest + + >>>> roundup(data->blocks * data->blocksize, + >>>> ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & >>>> (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " "may >>>> fail\n", data->dest); - >>>> invalidate_dcache_range((ulong)data->dest, - >>>> (ulong)data->dest + - data->blocks * >>>> data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); } >>>> } >>>> >>>> -- >>> >>> ... so long as we require any struct mmc_data data's .dest field to >>> point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER. >>> >>> So, I'd like to propose that we take Thierry's fix, perhaps having >>> validated (or implemented some way of enforcing) that >>> ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer. >> >> That sounds ok to me. >> >> It is important to avoid silent failure if we can. Are you proposing >> to pass a 'size' parameter along with any buffer pointer, or something >> else? > > Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? And if > so, what happens if you do raw read to unaligned address (aka. how come you > don't need the bounce buffer)? Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why the driver is flushing caches!) I guess we only support the use of aligned addresses, so e.g. the following would work: ext2load mmc 0:1 0x00100000 /file but the following wouldn't: ext2load mmc 0:1 0x00100004 /file which while I suppose it is an artificial restriction, hasn't been an issue in practice. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-02 20:58 ` Stephen Warren @ 2012-11-02 21:28 ` Marek Vasut 2012-11-02 21:31 ` Stephen Warren 0 siblings, 1 reply; 21+ messages in thread From: Marek Vasut @ 2012-11-02 21:28 UTC (permalink / raw) To: u-boot Dear Stephen Warren, > On 11/02/2012 02:38 PM, Marek Vasut wrote: > > Dear Simon Glass, > > > >> Hi Stephen, > >> > >> On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > >>> On 04/26/2012 11:29 PM, Mike Frysinger wrote: > >>>> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote: > >>>>> For reference, see sd_change_freq() in drivers/mmc/mmc.c. > >>> > >>> This is a follow-up to: > >>> > >>> http://lists.denx.de/pipermail/u-boot/2012-April/123080.html > >>> > >>> which was referenced from: > >>> > >>> http://lists.denx.de/pipermail/u-boot/2012-September/133641.html > >>> > >>>> yes, that shows what we're talking about. int sd_change_freq(struct > >>>> mmc *mmc) { ... stack vars ... int err; > >>>> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int > >>>> timeout; ... stack vars ... ... data.dest = (char *)scr; > >>>> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; > >>>> err = mmc_send_cmd(mmc, &cmd, &data); ... > >>>> > >>>> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, > >>>> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if > >>>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: > >>>> unaligned read from %p " "may fail\n", data->dest); > >>>> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + > >>>> data->blocks * data->blocksize); } ... > >>>> > >>>> so what invalidate_dcache_range() will invalidate is from &scr and > >>>> 8 bytes after that. the higher layers make sure &scr is aligned > >>>> properly, but not that it spans a full cache line. so if the stack > >>>> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 > >>>> [err] 0x10a [timeout] [more stuff] > >>> > >>> That's not the stack layout. scr is allocated (as quoted above) using > >>> ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for > >>> scr is always cache-aligned in address and size, even if the code only > >>> uses 8 bytes of it. > >>> > >>>> this implicitly invalidates [err] and [timeout] and more stuff. by > >>>> you forcibly rounding up the length, you no longer get a warning, > >>>> but you still (silently) blew away those variables from cache > >>>> possibly losing changes that weren't written back to external > >>>> memory yet. > >>> > >>> So, this problem won't actually occur here. > >>> > >>> So, Thierry's proposed solution (which I'll re-quote below) would > >>> > >>> actually work just fine: > >>>> [PATCH 2/2] mmc: tegra: invalidate complete cachelines > >>>> > >>>> The MMC core sometimes reads buffers that are smaller than a > >>>> complete cacheline, for example when reading the SCR. In order to > >>>> avoid a warning from the ARM v7 cache handling code, this patch > >>>> makes sure that complete cachelines are flushed. > >>>> > >>>> Signed-off-by: Thierry Reding <thierry.reding at > >>>> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 > >>>> file changed, 4 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c > >>>> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ > >>>> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int > >>>> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, > >>>> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + > >>>> ulong end = (ulong)data->dest + + > >>>> roundup(data->blocks * data->blocksize, + > >>>> > >>>> ARCH_DMA_MINALIGN); if ((uintptr_t)data->dest & > >>>> > >>>> (ARCH_DMA_MINALIGN - 1)) printf("Warning: unaligned read from %p " > >>>> "may fail\n", data->dest); - > >>>> invalidate_dcache_range((ulong)data->dest, - > >>>> (ulong)data->dest + - data->blocks * > >>>> data->blocksize); + invalidate_dcache_range((ulong)data->dest, end); } > >>>> } > >>>> > >>>> -- > >>> > >>> ... so long as we require any struct mmc_data data's .dest field to > >>> point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER. > >>> > >>> So, I'd like to propose that we take Thierry's fix, perhaps having > >>> validated (or implemented some way of enforcing) that > >>> ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer. > >> > >> That sounds ok to me. > >> > >> It is important to avoid silent failure if we can. Are you proposing > >> to pass a 'size' parameter along with any buffer pointer, or something > >> else? > > > > Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? > > And if so, what happens if you do raw read to unaligned address (aka. > > how come you don't need the bounce buffer)? > > Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why > the driver is flushing caches!) > > I guess we only support the use of aligned addresses, so e.g. the > following would work: > > ext2load mmc 0:1 0x00100000 /file > > but the following wouldn't: > > ext2load mmc 0:1 0x00100004 /file > > which while I suppose it is an artificial restriction, hasn't been an > issue in practice. Then just enable the bounce buffer and it will work ;-) ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-02 21:28 ` Marek Vasut @ 2012-11-02 21:31 ` Stephen Warren 2012-11-02 22:10 ` Marek Vasut 0 siblings, 1 reply; 21+ messages in thread From: Stephen Warren @ 2012-11-02 21:31 UTC (permalink / raw) To: u-boot On 11/02/2012 03:28 PM, Marek Vasut wrote: > Dear Stephen Warren, > >> On 11/02/2012 02:38 PM, Marek Vasut wrote: ... >>> Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? >>> And if so, what happens if you do raw read to unaligned address (aka. >>> how come you don't need the bounce buffer)? >> >> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why >> the driver is flushing caches!) >> >> I guess we only support the use of aligned addresses, so e.g. the >> following would work: >> >> ext2load mmc 0:1 0x00100000 /file >> >> but the following wouldn't: >> >> ext2load mmc 0:1 0x00100004 /file >> >> which while I suppose it is an artificial restriction, hasn't been an >> issue in practice. > > Then just enable the bounce buffer and it will work ;-) You suggested that last time, and it made no difference then... In fact, the config option you mentioned isn't used anywhere in the srouce tree except adding bouncebuf.o into the build right now; which config option do you think I should use? ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-02 21:31 ` Stephen Warren @ 2012-11-02 22:10 ` Marek Vasut 2012-11-05 19:50 ` Stephen Warren 0 siblings, 1 reply; 21+ messages in thread From: Marek Vasut @ 2012-11-02 22:10 UTC (permalink / raw) To: u-boot Dear Stephen Warren, > On 11/02/2012 03:28 PM, Marek Vasut wrote: > > Dear Stephen Warren, > > > >> On 11/02/2012 02:38 PM, Marek Vasut wrote: > ... > > >>> Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? > >>> And if so, what happens if you do raw read to unaligned address (aka. > >>> how come you don't need the bounce buffer)? > >> > >> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why > >> the driver is flushing caches!) > >> > >> I guess we only support the use of aligned addresses, so e.g. the > >> following would work: > >> > >> ext2load mmc 0:1 0x00100000 /file > >> > >> but the following wouldn't: > >> > >> ext2load mmc 0:1 0x00100004 /file > >> > >> which while I suppose it is an artificial restriction, hasn't been an > >> issue in practice. > > > > Then just enable the bounce buffer and it will work ;-) > > You suggested that last time, and it made no difference then... In fact, > the config option you mentioned isn't used anywhere in the srouce tree > except adding bouncebuf.o Bouncebuf.o ? > into the build right now; which config option > do you think I should use? CONFIG_BOUNCE_BUFFER It's used on mx28-based boards. Best regards, Marek Vasut ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-02 22:10 ` Marek Vasut @ 2012-11-05 19:50 ` Stephen Warren 2012-11-05 22:59 ` Marek Vasut 2012-11-05 23:07 ` Simon Glass 0 siblings, 2 replies; 21+ messages in thread From: Stephen Warren @ 2012-11-05 19:50 UTC (permalink / raw) To: u-boot On 11/02/2012 04:10 PM, Marek Vasut wrote: > Dear Stephen Warren, > >> On 11/02/2012 03:28 PM, Marek Vasut wrote: >>> Dear Stephen Warren, >>> >>>> On 11/02/2012 02:38 PM, Marek Vasut wrote: >> ... >> >>>>> Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? >>>>> And if so, what happens if you do raw read to unaligned address (aka. >>>>> how come you don't need the bounce buffer)? >>>> >>>> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why >>>> the driver is flushing caches!) >>>> >>>> I guess we only support the use of aligned addresses, so e.g. the >>>> following would work: >>>> >>>> ext2load mmc 0:1 0x00100000 /file >>>> >>>> but the following wouldn't: >>>> >>>> ext2load mmc 0:1 0x00100004 /file >>>> >>>> which while I suppose it is an artificial restriction, hasn't been an >>>> issue in practice. >>> >>> Then just enable the bounce buffer and it will work ;-) >> >> You suggested that last time, and it made no difference then... In fact, >> the config option you mentioned isn't used anywhere in the srouce tree >> except adding bouncebuf.o > > Bouncebuf.o ? common/bouncebuf.c >> into the build right now; which config option >> do you think I should use? > > CONFIG_BOUNCE_BUFFER > > It's used on mx28-based boards. Ahah, there's much more to it than just defining the config option, and grep'ing for the config option doesn't reveal that. Defining CONFIG_BOUNCE_BUFFER simply pulls in bouncebuf.o into the build. It's then up to the individual MMC/... driver to actually make use of the new functions - i.e. explicitly call e.g. bounce_buffer_start(). The MXS MMC driver calls these without any ifdef CONFIG_BOUNCE_BUFFER, and hence doesn't show up when you grep for that. I had assumed that such functionality would be a conditional part of the MMC core rather than the individual drivers, hence had missed the usage in the MXS driver. So, that config option plus some changes to the Tegra MMC driver might work - I'll take a look. However, making that change will simply hide any unaligned buffer and reduce performance, so I'd still prefer to require that buffers be aligned in the future where possible. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-05 19:50 ` Stephen Warren @ 2012-11-05 22:59 ` Marek Vasut 2012-11-05 23:07 ` Simon Glass 1 sibling, 0 replies; 21+ messages in thread From: Marek Vasut @ 2012-11-05 22:59 UTC (permalink / raw) To: u-boot Dear Stephen Warren, > On 11/02/2012 04:10 PM, Marek Vasut wrote: > > Dear Stephen Warren, > > > >> On 11/02/2012 03:28 PM, Marek Vasut wrote: > >>> Dear Stephen Warren, > >>> > >>>> On 11/02/2012 02:38 PM, Marek Vasut wrote: > >> ... > >> > >>>>> Dumb question -- might be unrelated. Does the tegra mmc driver do > >>>>> DMA? And if so, what happens if you do raw read to unaligned address > >>>>> (aka. how come you don't need the bounce buffer)? > >>>> > >>>> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea > >>>> why the driver is flushing caches!) > >>>> > >>>> I guess we only support the use of aligned addresses, so e.g. the > >>>> following would work: > >>>> > >>>> ext2load mmc 0:1 0x00100000 /file > >>>> > >>>> but the following wouldn't: > >>>> > >>>> ext2load mmc 0:1 0x00100004 /file > >>>> > >>>> which while I suppose it is an artificial restriction, hasn't been an > >>>> issue in practice. > >>> > >>> Then just enable the bounce buffer and it will work ;-) > >> > >> You suggested that last time, and it made no difference then... In fact, > >> the config option you mentioned isn't used anywhere in the srouce tree > >> except adding bouncebuf.o > > > > Bouncebuf.o ? > > common/bouncebuf.c Ugh, it was applied ... > >> into the build right now; which config option > >> do you think I should use? > > > > CONFIG_BOUNCE_BUFFER > > > > It's used on mx28-based boards. > > Ahah, there's much more to it than just defining the config option, and > grep'ing for the config option doesn't reveal that. > > Defining CONFIG_BOUNCE_BUFFER simply pulls in bouncebuf.o into the > build. It's then up to the individual MMC/... driver to actually make > use of the new functions - i.e. explicitly call e.g. > bounce_buffer_start(). The MXS MMC driver calls these without any ifdef > CONFIG_BOUNCE_BUFFER, and hence doesn't show up when you grep for that. > I had assumed that such functionality would be a conditional part of the > MMC core rather than the individual drivers, hence had missed the usage > in the MXS driver. Sorry about that, I didn't know the patch was pulled in so I expected the old approach. > So, that config option plus some changes to the Tegra MMC driver might > work - I'll take a look. However, making that change will simply hide > any unaligned buffer and reduce performance, so I'd still prefer to > require that buffers be aligned in the future where possible. Best regards, Marek Vasut ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-05 19:50 ` Stephen Warren 2012-11-05 22:59 ` Marek Vasut @ 2012-11-05 23:07 ` Simon Glass 1 sibling, 0 replies; 21+ messages in thread From: Simon Glass @ 2012-11-05 23:07 UTC (permalink / raw) To: u-boot Hi Stephen, On Mon, Nov 5, 2012 at 11:50 AM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 11/02/2012 04:10 PM, Marek Vasut wrote: >> Dear Stephen Warren, >> >>> On 11/02/2012 03:28 PM, Marek Vasut wrote: >>>> Dear Stephen Warren, >>>> >>>>> On 11/02/2012 02:38 PM, Marek Vasut wrote: >>> ... >>> >>>>>> Dumb question -- might be unrelated. Does the tegra mmc driver do DMA? >>>>>> And if so, what happens if you do raw read to unaligned address (aka. >>>>>> how come you don't need the bounce buffer)? >>>>> >>>>> Yes, it does DMA, I believe. (At least if it doesn't, I have no idea why >>>>> the driver is flushing caches!) >>>>> >>>>> I guess we only support the use of aligned addresses, so e.g. the >>>>> following would work: >>>>> >>>>> ext2load mmc 0:1 0x00100000 /file >>>>> >>>>> but the following wouldn't: >>>>> >>>>> ext2load mmc 0:1 0x00100004 /file >>>>> >>>>> which while I suppose it is an artificial restriction, hasn't been an >>>>> issue in practice. >>>> >>>> Then just enable the bounce buffer and it will work ;-) >>> >>> You suggested that last time, and it made no difference then... In fact, >>> the config option you mentioned isn't used anywhere in the srouce tree >>> except adding bouncebuf.o >> >> Bouncebuf.o ? > > common/bouncebuf.c > >>> into the build right now; which config option >>> do you think I should use? >> >> CONFIG_BOUNCE_BUFFER >> >> It's used on mx28-based boards. > > Ahah, there's much more to it than just defining the config option, and > grep'ing for the config option doesn't reveal that. > > Defining CONFIG_BOUNCE_BUFFER simply pulls in bouncebuf.o into the > build. It's then up to the individual MMC/... driver to actually make > use of the new functions - i.e. explicitly call e.g. > bounce_buffer_start(). The MXS MMC driver calls these without any ifdef > CONFIG_BOUNCE_BUFFER, and hence doesn't show up when you grep for that. > I had assumed that such functionality would be a conditional part of the > MMC core rather than the individual drivers, hence had missed the usage > in the MXS driver. > > So, that config option plus some changes to the Tegra MMC driver might > work - I'll take a look. However, making that change will simply hide > any unaligned buffer and reduce performance, so I'd still prefer to > require that buffers be aligned in the future where possible. Yes that seems right to me. Regards, Simon ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines 2012-11-02 20:25 ` Simon Glass 2012-11-02 20:38 ` Marek Vasut @ 2012-11-02 20:55 ` Stephen Warren 1 sibling, 0 replies; 21+ messages in thread From: Stephen Warren @ 2012-11-02 20:55 UTC (permalink / raw) To: u-boot On 11/02/2012 02:25 PM, Simon Glass wrote: > Hi Stephen, > > On Fri, Nov 2, 2012 at 1:22 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >> On 04/26/2012 11:29 PM, Mike Frysinger wrote: >>> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote: >>>> For reference, see sd_change_freq() in drivers/mmc/mmc.c. >> >> This is a follow-up to: >> >> http://lists.denx.de/pipermail/u-boot/2012-April/123080.html >> >> which was referenced from: >> >> http://lists.denx.de/pipermail/u-boot/2012-September/133641.html >> >>> yes, that shows what we're talking about. int sd_change_freq(struct >>> mmc *mmc) { ... stack vars ... int err; >>> ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2); struct mmc_data data; int >>> timeout; ... stack vars ... ... data.dest = (char *)scr; >>> data.blocksize = 8; data.blocks = 1; data.flags = MMC_DATA_READ; >>> err = mmc_send_cmd(mmc, &cmd, &data); ... >>> >>> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, >>> struct mmc_data *data) { ... if (data->flags & MMC_DATA_READ) { if >>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: >>> unaligned read from %p " "may fail\n", data->dest); >>> invalidate_dcache_range((ulong)data->dest, (ulong)data->dest + >>> data->blocks * data->blocksize); } ... >>> >>> so what invalidate_dcache_range() will invalidate is from &scr and >>> 8 bytes after that. the higher layers make sure &scr is aligned >>> properly, but not that it spans a full cache line. so if the stack >>> layout is: [random stuff] 0x100 [scr[0]] 0x104 [scr[1]] 0x108 >>> [err] 0x10a [timeout] [more stuff] >> >> That's not the stack layout. scr is allocated (as quoted above) using >> ALLOC_CACHE_ALIGN_BUFFER. Therefore, the storage space allocated for >> scr is always cache-aligned in address and size, even if the code only >> uses 8 bytes of it. >> >>> this implicitly invalidates [err] and [timeout] and more stuff. by >>> you forcibly rounding up the length, you no longer get a warning, >>> but you still (silently) blew away those variables from cache >>> possibly losing changes that weren't written back to external >>> memory yet. >> >> So, this problem won't actually occur here. >> >> So, Thierry's proposed solution (which I'll re-quote below) would >> actually work just fine: >> >>> [PATCH 2/2] mmc: tegra: invalidate complete cachelines >>> >>> The MMC core sometimes reads buffers that are smaller than a >>> complete cacheline, for example when reading the SCR. In order to >>> avoid a warning from the ARM v7 cache handling code, this patch >>> makes sure that complete cachelines are flushed. >>> >>> Signed-off-by: Thierry Reding <thierry.reding at >>> avionic-design.de> --- drivers/mmc/tegra2_mmc.c | 7 ++++--- 1 >>> file changed, 4 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/mmc/tegra2_mmc.c b/drivers/mmc/tegra2_mmc.c >>> index fb8a57d..3615876 100644 --- a/drivers/mmc/tegra2_mmc.c +++ >>> b/drivers/mmc/tegra2_mmc.c @@ -323,12 +323,13 @@ static int >>> mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd, } writel(mask, >>> &host->reg->norintsts); if (data->flags & MMC_DATA_READ) { + >>> ulong end = (ulong)data->dest + + roundup(data->blocks * >>> data->blocksize, + ARCH_DMA_MINALIGN); if >>> ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1)) printf("Warning: >>> unaligned read from %p " "may fail\n", data->dest); - >>> invalidate_dcache_range((ulong)data->dest, - (ulong)data->dest >>> + - data->blocks * data->blocksize); + >>> invalidate_dcache_range((ulong)data->dest, end); } } >>> >>> -- >> >> ... so long as we require any struct mmc_data data's .dest field to >> point at a buffer that was allocated using ALLOC_CACHE_ALIGN_BUFFER. >> >> So, I'd like to propose that we take Thierry's fix, perhaps having >> validated (or implemented some way of enforcing) that >> ALLOC_CACHE_ALIGN_BUFFER() was used to allocate the buffer pointer. > > That sounds ok to me. > > It is important to avoid silent failure if we can. Are you proposing > to pass a 'size' parameter along with any buffer pointer, or something > else? I was wondering about creating some kind of macro to initialize the buffer pointer in struct mmc_data, and writing that macro so that the buffer passed to it had to have been allocated using ALLOC_CACHE_ALIGN_BUFFER (e.g. by referencing the hidden variable it creates, or something like that). Passing in the buffer size would also work, although it seems like it'd be easier to screw that up and hide the silent errors you mentioned? ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers 2012-04-24 7:53 [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Thierry Reding 2012-04-24 7:53 ` [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines Thierry Reding @ 2012-04-24 18:31 ` Simon Glass 2012-04-24 18:45 ` Thierry Reding 1 sibling, 1 reply; 21+ messages in thread From: Simon Glass @ 2012-04-24 18:31 UTC (permalink / raw) To: u-boot On Tue, Apr 24, 2012 at 7:53 PM, Thierry Reding <thierry.reding@avionic-design.de> wrote: > Some hardware requires the data buffers to be cacheline-aligned to make > sure DMA operations can be properly executed. > > This patch uses the ALLOC_CACHE_ALIGN_BUFFER macro to allocate buffers > with proper alignment. The same was already done for EFI partitions in > commit f75dd58 "part_efi: dcache: allocate cacheline aligned buffers". > > Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de> Acked-by: Simon Glass <sjg@chromium.org> > --- > This fixes boot failures on Avionic Design Plutux and Medcom boards. > > ?disk/part_dos.c | ? ?6 +++--- > ?1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/disk/part_dos.c b/disk/part_dos.c > index b5bcb37..c028aaf 100644 > --- a/disk/part_dos.c > +++ b/disk/part_dos.c > @@ -87,7 +87,7 @@ static int test_block_type(unsigned char *buffer) > > ?int test_part_dos (block_dev_desc_t *dev_desc) > ?{ > - ? ? ? unsigned char buffer[dev_desc->blksz]; > + ? ? ? ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz); > > ? ? ? ?if ((dev_desc->block_read(dev_desc->dev, 0, 1, (ulong *) buffer) != 1) || > ? ? ? ? ? ?(buffer[DOS_PART_MAGIC_OFFSET + 0] != 0x55) || > @@ -102,7 +102,7 @@ int test_part_dos (block_dev_desc_t *dev_desc) > ?static void print_partition_extended (block_dev_desc_t *dev_desc, int ext_part_sector, int relative, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int part_num) > ?{ > - ? ? ? unsigned char buffer[dev_desc->blksz]; > + ? ? ? ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz); > ? ? ? ?dos_partition_t *pt; > ? ? ? ?int i; > > @@ -166,7 +166,7 @@ static int get_partition_info_extended (block_dev_desc_t *dev_desc, int ext_part > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int relative, int part_num, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int which_part, disk_partition_t *info) > ?{ > - ? ? ? unsigned char buffer[dev_desc->blksz]; > + ? ? ? ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buffer, dev_desc->blksz); > ? ? ? ?dos_partition_t *pt; > ? ? ? ?int i; > > -- > 1.7.10 > ^ permalink raw reply [flat|nested] 21+ messages in thread
* [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers 2012-04-24 18:31 ` [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Simon Glass @ 2012-04-24 18:45 ` Thierry Reding 0 siblings, 0 replies; 21+ messages in thread From: Thierry Reding @ 2012-04-24 18:45 UTC (permalink / raw) To: u-boot * Simon Glass wrote: > On Tue, Apr 24, 2012 at 7:53 PM, Thierry Reding > <thierry.reding@avionic-design.de> wrote: > > Some hardware requires the data buffers to be cacheline-aligned to make > > sure DMA operations can be properly executed. > > > > This patch uses the ALLOC_CACHE_ALIGN_BUFFER macro to allocate buffers > > with proper alignment. The same was already done for EFI partitions in > > commit f75dd58 "part_efi: dcache: allocate cacheline aligned buffers". > > > > Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de> > > Acked-by: Simon Glass <sjg@chromium.org> I just noticed that the same patch was already posted by Eric Nelson back in March. Anatolij Gustschin just applied it to u-boot-staging/agust at denx.de. Thierry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120424/044dc5dd/attachment.pgp> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2012-11-05 23:07 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-04-24 7:53 [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Thierry Reding 2012-04-24 7:53 ` [U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines Thierry Reding 2012-04-25 22:54 ` Mike Frysinger 2012-04-26 6:18 ` Thierry Reding 2012-04-26 10:16 ` Simon Glass 2012-04-26 10:34 ` Thierry Reding 2012-04-27 5:29 ` Mike Frysinger 2012-04-27 5:50 ` Simon Glass 2012-11-02 20:22 ` Stephen Warren 2012-11-02 20:25 ` Simon Glass 2012-11-02 20:38 ` Marek Vasut 2012-11-02 20:58 ` Stephen Warren 2012-11-02 21:28 ` Marek Vasut 2012-11-02 21:31 ` Stephen Warren 2012-11-02 22:10 ` Marek Vasut 2012-11-05 19:50 ` Stephen Warren 2012-11-05 22:59 ` Marek Vasut 2012-11-05 23:07 ` Simon Glass 2012-11-02 20:55 ` Stephen Warren 2012-04-24 18:31 ` [U-Boot] [PATCH 1/2] part_dos: allocate cacheline aligned buffers Simon Glass 2012-04-24 18:45 ` Thierry Reding
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox