* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart @ 2019-07-11 7:10 Weijie Gao 2019-07-19 0:00 ` Tom Rini 2019-08-22 13:58 ` Felix Brack 0 siblings, 2 replies; 7+ messages in thread From: Weijie Gao @ 2019-07-11 7:10 UTC (permalink / raw) To: u-boot Some storage devices have multiple hw partitions and both address from zero, for example eMMC. However currently block cache invalidation only applies to block write/erase. This can cause a problem that data of current hw partition is cached before switching to another hw partition. And the following read operation of the latter hw partition will get wrong data when reading from the addresses that have been cached previously. To solve this problem, invalidate block cache after a successful select_hwpart operation. Signed-off-by: Weijie Gao <weijie.gao@mediatek.com> --- drivers/block/blk-uclass.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c index baaf431e5e0..c23b6682a6c 100644 --- a/drivers/block/blk-uclass.c +++ b/drivers/block/blk-uclass.c @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) if (ret) return ret; - return blk_select_hwpart(dev, hwpart); + ret = blk_select_hwpart(dev, hwpart); + if (!ret) + blkcache_invalidate(if_type, devnum); + + return ret; } int blk_list_part(enum if_type if_type) @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart) int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) { - return blk_select_hwpart(desc->bdev, hwpart); + int ret; + + ret = blk_select_hwpart(desc->bdev, hwpart); + if (!ret) + blkcache_invalidate(desc->if_type, desc->devnum); + + return ret; } int blk_first_device(int if_type, struct udevice **devp) -- 2.18.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart 2019-07-11 7:10 [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart Weijie Gao @ 2019-07-19 0:00 ` Tom Rini 2019-08-22 13:58 ` Felix Brack 1 sibling, 0 replies; 7+ messages in thread From: Tom Rini @ 2019-07-19 0:00 UTC (permalink / raw) To: u-boot On Thu, Jul 11, 2019 at 03:10:23PM +0800, Weijie Gao wrote: > Some storage devices have multiple hw partitions and both address from > zero, for example eMMC. > However currently block cache invalidation only applies to block > write/erase. > This can cause a problem that data of current hw partition is cached > before switching to another hw partition. And the following read > operation of the latter hw partition will get wrong data when reading > from the addresses that have been cached previously. > > To solve this problem, invalidate block cache after a successful > select_hwpart operation. > > Signed-off-by: Weijie Gao <weijie.gao@mediatek.com> Applied to u-boot/master, thanks! -- Tom -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: <http://lists.denx.de/pipermail/u-boot/attachments/20190718/f2d83dd7/attachment.sig> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart 2019-07-11 7:10 [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart Weijie Gao 2019-07-19 0:00 ` Tom Rini @ 2019-08-22 13:58 ` Felix Brack 2019-08-26 8:19 ` Weijie Gao 1 sibling, 1 reply; 7+ messages in thread From: Felix Brack @ 2019-08-22 13:58 UTC (permalink / raw) To: u-boot On 11.07.19 09:10, Weijie Gao wrote: > Some storage devices have multiple hw partitions and both address from > zero, for example eMMC. > However currently block cache invalidation only applies to block > write/erase. > This can cause a problem that data of current hw partition is cached > before switching to another hw partition. And the following read > operation of the latter hw partition will get wrong data when reading > from the addresses that have been cached previously. > > To solve this problem, invalidate block cache after a successful > select_hwpart operation. > > Signed-off-by: Weijie Gao <weijie.gao@mediatek.com> > --- This patch breaks correct operation of PDU001 board. > drivers/block/blk-uclass.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c > index baaf431e5e0..c23b6682a6c 100644 > --- a/drivers/block/blk-uclass.c > +++ b/drivers/block/blk-uclass.c > @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) > if (ret) > return ret; > > - return blk_select_hwpart(dev, hwpart); > + ret = blk_select_hwpart(dev, hwpart); > + if (!ret) > + blkcache_invalidate(if_type, devnum); > + > + return ret; > } > > int blk_list_part(enum if_type if_type) > @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart) > > int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) > { > - return blk_select_hwpart(desc->bdev, hwpart); > + int ret; > + > + ret = blk_select_hwpart(desc->bdev, hwpart); > + if (!ret) > + blkcache_invalidate(desc->if_type, desc->devnum); > + Commenting the invalidation of the block cache on this line results in proper working of the board. > + return ret; > } > > int blk_first_device(int if_type, struct udevice **devp) > With the patch active, files from the boot FAT partition of the SD-card (mmc device 1) do not load anymore, hence booting fails. Using the eMMC (mmc device 0) instead works fine, files can bee loaded and the board boots. To isolate the problem I have modified the configuration to only load a 10k test file (test.bin) instead of loading the DTB and zImage files followed by booting LINUX. Furthermore I have enabled debugging for some parts of the code. When I boot from the SD-card (which fails) I get the following logged: === CPU : AM335X-GP rev 1.0 Model: EETS,PDU001 DRAM: 256 MiB mmc_bind: alias ret=-2, devnum=-1 mmc_bind: alias ret=-2, devnum=-1 MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 In: serial at 481a6000 Out: serial at 481a6000 Err: serial at 481a6000 Press SPACE to abort autoboot in 1 seconds blk_get_devnum_by_typename: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0 blk_get_devnum_by_typename: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1 clock is disabled (0Hz) clock is enabled (400000Hz) clock is enabled (50000000Hz) miss: start 0, count 1 blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1 fill: start 0, count 1 part_init: try 'EFI': ret=-1 hit: start 0, count 1 part_init: try 'DOS': ret=0 blk_get_devnum_by_typename: Device desc 8dfb3a98 miss: start 0, count 1 blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1 fill: start 0, count 1 part_init: try 'EFI': ret=-1 hit: start 0, count 1 part_init: try 'DOS': ret=0 hit: start 0, count 1 miss: start 800, count 1 blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1 fill: start 800, count 1 hit: start 800, count 1 FAT16, fat_sect: 1, fatlength: 64 Rootdir begins at cluster: 0, sector: 129, offset: 10200 Data begins at: 153 Sector size: 512, cluster size: 4 FAT read(sect=129), clust_size=4, read_size=4, DIRENTSPERBLOCK=16 miss: start 881, count 4 blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1 fill: start 881, count 4 miss: start 800, count 1 blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1 fill: start 800, count 1 FAT16, fat_sect: 1, fatlength: 64 Rootdir begins at cluster: 0, sector: 129, offset: 10200 Data begins at: 153 Sector size: 512, cluster size: 4 FAT read(sect=129), clust_size=4, read_size=4, DIRENTSPERBLOCK=16 miss: start 881, count 4 blk_find_device: if_type=6, devnum=1: mmc at 48060000.blk, 6, 0 blk_find_device: if_type=6, devnum=1: mmc at 481d8000.blk, 6, 1 fill: start 881, count 4 reading test.bin at pos 0 Filesize: 0 bytes Read position past EOF: 0 0 bytes read in 94 ms (0 Bytes/s) === Obviously the the file test.bin does not get loaded. Booting from the eMMC (which works) logs the following: === CPU : AM335X-GP rev 1.0 Model: EETS,PDU001 DRAM: 256 MiB mmc_bind: alias ret=-2, devnum=-1 mmc_bind: alias ret=-2, devnum=-1 MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 In: serial at 481a6000 Out: serial at 481a6000 Err: serial at 481a6000 Press SPACE to abort autoboot in 1 seconds blk_get_devnum_by_typename: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0 clock is disabled (0Hz) clock is enabled (400000Hz) clock is enabled (25000000Hz) clock is enabled (52000000Hz) miss: start 0, count 1 blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0 fill: start 0, count 1 part_init: try 'EFI': ret=-1 hit: start 0, count 1 part_init: try 'DOS': ret=0 blk_get_devnum_by_typename: Device desc 8dfb3800 miss: start 0, count 1 blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0 fill: start 0, count 1 part_init: try 'EFI': ret=-1 hit: start 0, count 1 part_init: try 'DOS': ret=0 hit: start 0, count 1 miss: start 1, count 1 blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0 fill: start 1, count 1 hit: start 1, count 1 FAT16, fat_sect: 1, fatlength: 20 Rootdir begins at cluster: 0, sector: 41, offset: 5200 Data begins at: 65 Sector size: 512, cluster size: 4 FAT read(sect=41), clust_size=4, read_size=4, DIRENTSPERBLOCK=16 miss: start 2a, count 4 blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0 fill: start 2a, count 4 miss: start 1, count 1 blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0 fill: start 1, count 1 FAT16, fat_sect: 1, fatlength: 20 Rootdir begins at cluster: 0, sector: 41, offset: 5200 Data begins at: 65 Sector size: 512, cluster size: 4 FAT read(sect=41), clust_size=4, read_size=4, DIRENTSPERBLOCK=16 miss: start 2a, count 4 blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0 fill: start 2a, count 4 reading test.bin at pos 0 Filesize: 10240 bytes 10240 bytes FAT16: entry: 0x000000a3 = 163, offset: 0x00a3 = 163 miss: start 2, count 6 blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0 fill: start 2, count 6 FAT16: ret: 0x000000a4, entry: 0x000000a3, offset: 0x00a3 FAT16: entry: 0x000000a4 = 164, offset: 0x00a4 = 164 FAT16: ret: 0x000000a5, entry: 0x000000a4, offset: 0x00a4 FAT16: entry: 0x000000a5 = 165, offset: 0x00a5 = 165 FAT16: ret: 0x000000a6, entry: 0x000000a5, offset: 0x00a5 FAT16: entry: 0x000000a6 = 166, offset: 0x00a6 = 166 FAT16: ret: 0x000000a7, entry: 0x000000a6, offset: 0x00a6 gc - clustnum: 163, startsect: 717 miss: start 2ce, count 20 blk_find_device: if_type=6, devnum=0: mmc at 48060000.blk, 6, 0 10240 bytes read in 135 ms (73.2 KiB/s) === This time the file test.bin gets loaded correctly. To be honest I don't really see why things are going wrong. It looks like the block cache gets filled but then again invalidated before it's data is used. I also feel that this problem is related to the SD-card not being the first (number 0) but the second device (number 1). Can anybody shed some light on this and give me a hint? many thanks, Felix ^ permalink raw reply [flat|nested] 7+ messages in thread
* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart 2019-08-22 13:58 ` Felix Brack @ 2019-08-26 8:19 ` Weijie Gao 2019-08-26 12:43 ` Felix Brack 0 siblings, 1 reply; 7+ messages in thread From: Weijie Gao @ 2019-08-26 8:19 UTC (permalink / raw) To: u-boot On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote: > On 11.07.19 09:10, Weijie Gao wrote: > > > Some storage devices have multiple hw partitions and both address from > > zero, for example eMMC. > > However currently block cache invalidation only applies to block > > write/erase. > > This can cause a problem that data of current hw partition is cached > > before switching to another hw partition. And the following read > > operation of the latter hw partition will get wrong data when reading > > from the addresses that have been cached previously. > > > > To solve this problem, invalidate block cache after a successful > > select_hwpart operation. > > > > Signed-off-by: Weijie Gao <weijie.gao@mediatek.com> > > --- > This patch breaks correct operation of PDU001 board. > > > drivers/block/blk-uclass.c | 14 ++++++++++++-- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c > > index baaf431e5e0..c23b6682a6c 100644 > > --- a/drivers/block/blk-uclass.c > > +++ b/drivers/block/blk-uclass.c > > @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) > > if (ret) > > return ret; > > > > - return blk_select_hwpart(dev, hwpart); > > + ret = blk_select_hwpart(dev, hwpart); > > + if (!ret) > > + blkcache_invalidate(if_type, devnum); > > + > > + return ret; > > } > > > > int blk_list_part(enum if_type if_type) > > @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart) > > > > int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) > > { > > - return blk_select_hwpart(desc->bdev, hwpart); > > + int ret; > > + > > + ret = blk_select_hwpart(desc->bdev, hwpart); > > + if (!ret) > > + blkcache_invalidate(desc->if_type, desc->devnum); > > + > Commenting the invalidation of the block cache on this line results in > proper working of the board. > > > + return ret; > > } > > > > int blk_first_device(int if_type, struct udevice **devp) > > > With the patch active, files from the boot FAT partition of the SD-card > (mmc device 1) do not load anymore, hence booting fails. > Using the eMMC (mmc device 0) instead works fine, files can bee loaded > and the board boots. > > To isolate the problem I have modified the configuration to only load a > 10k test file (test.bin) instead of loading the DTB and zImage files > followed by booting LINUX. Furthermore I have enabled debugging for some > parts of the code. > > When I boot from the SD-card (which fails) I get the following logged: > === > CPU : AM335X-GP rev 1.0 > === > > This time the file test.bin gets loaded correctly. > > To be honest I don't really see why things are going wrong. It looks > like the block cache gets filled but then again invalidated before it's > data is used. I also feel that this problem is related to the SD-card > not being the first (number 0) but the second device (number 1). > Can anybody shed some light on this and give me a hint? > > many thanks, Felix Hi Felix, I've found the root cause. There is a bug in the FAT driver. In function file_fat_read_at(), malloc_cache_aligned() allocates memory for the itr. The itr is a pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE] used for storing blocks read from MMC. The FAT driver resolves the file table and stores the required dir entry in the member block. Then the pointer of the dir entry is passed to get_contents() to get the contents of the file. However the itr is freed BEFORE the call to get_contents(). This means the contents dir entry points to is no longer valid. And it results to your situation. But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied. I've also found its cause. The MMC driver calls blk_dselect_hwpart() unconditionally in mmc_bread(). This causes block cache being invalidated and refilled many times during the FAT load operation. The frequent block cache invalidation and refilling finally results in frequent memory allocation and freeing. There is a mechanism in the dlmalloc.c: if the total free space reaches a threshold, the malloc_trim() will be called. malloc_trim() calls sbrk(), which fills the memory being freed with zeros and of course the data of itr is damaged, otherwise the original data remains unchanged. So this is the reason the bug only appears when block cache is enabled and cache invalidation is called after every select_hwpart. I have made a patch, could you please have a try? I've tested it worked. If it works for you too, I will submit it to u-boot. And I will submit another patch to optimize the cache invalidation for select_hwpart. --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t pos, void *buffer, /* For saving default max clustersize memory allocated to malloc pool */ dir_entry *dentptr = itr->dent; - free(itr); - - itr = NULL; - ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread); out_free_both: BTW, I reproduced this failure on a MT7623 board, which also has an eMMC and a SD. And this patch is indeed used for MT7623 to solve the bug on switching hwpart on eMMC. Best Regards, Weijie ^ permalink raw reply [flat|nested] 7+ messages in thread
* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart 2019-08-26 8:19 ` Weijie Gao @ 2019-08-26 12:43 ` Felix Brack 2019-08-27 3:07 ` Weijie Gao 0 siblings, 1 reply; 7+ messages in thread From: Felix Brack @ 2019-08-26 12:43 UTC (permalink / raw) To: u-boot Hello Weijie, On 26.08.19 10:19, Weijie Gao wrote: > On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote: >> On 11.07.19 09:10, Weijie Gao wrote: >> >>> Some storage devices have multiple hw partitions and both address from >>> zero, for example eMMC. >>> However currently block cache invalidation only applies to block >>> write/erase. >>> This can cause a problem that data of current hw partition is cached >>> before switching to another hw partition. And the following read >>> operation of the latter hw partition will get wrong data when reading >>> from the addresses that have been cached previously. >>> >>> To solve this problem, invalidate block cache after a successful >>> select_hwpart operation. >>> >>> Signed-off-by: Weijie Gao <weijie.gao@mediatek.com> >>> --- >> This patch breaks correct operation of PDU001 board. >> >>> drivers/block/blk-uclass.c | 14 ++++++++++++-- >>> 1 file changed, 12 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c >>> index baaf431e5e0..c23b6682a6c 100644 >>> --- a/drivers/block/blk-uclass.c >>> +++ b/drivers/block/blk-uclass.c >>> @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) >>> if (ret) >>> return ret; >>> >>> - return blk_select_hwpart(dev, hwpart); >>> + ret = blk_select_hwpart(dev, hwpart); >>> + if (!ret) >>> + blkcache_invalidate(if_type, devnum); >>> + >>> + return ret; >>> } >>> >>> int blk_list_part(enum if_type if_type) >>> @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart) >>> >>> int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) >>> { >>> - return blk_select_hwpart(desc->bdev, hwpart); >>> + int ret; >>> + >>> + ret = blk_select_hwpart(desc->bdev, hwpart); >>> + if (!ret) >>> + blkcache_invalidate(desc->if_type, desc->devnum); >>> + >> Commenting the invalidation of the block cache on this line results in >> proper working of the board. >> >>> + return ret; >>> } >>> >>> int blk_first_device(int if_type, struct udevice **devp) >>> >> With the patch active, files from the boot FAT partition of the SD-card >> (mmc device 1) do not load anymore, hence booting fails. >> Using the eMMC (mmc device 0) instead works fine, files can bee loaded >> and the board boots. >> >> To isolate the problem I have modified the configuration to only load a >> 10k test file (test.bin) instead of loading the DTB and zImage files >> followed by booting LINUX. Furthermore I have enabled debugging for some >> parts of the code. >> >> When I boot from the SD-card (which fails) I get the following logged: >> === >> CPU : AM335X-GP rev 1.0 >> === >> >> This time the file test.bin gets loaded correctly. >> >> To be honest I don't really see why things are going wrong. It looks >> like the block cache gets filled but then again invalidated before it's >> data is used. I also feel that this problem is related to the SD-card >> not being the first (number 0) but the second device (number 1). >> Can anybody shed some light on this and give me a hint? >> >> many thanks, Felix > > Hi Felix, > > I've found the root cause. > Many thanks for looking into this! > There is a bug in the FAT driver. In function file_fat_read_at(), > malloc_cache_aligned() allocates memory for the itr. The itr is a > pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE] > used for storing blocks read from MMC. > > The FAT driver resolves the file table and stores the required dir entry > in the member block. Then the pointer of the dir entry is passed to > get_contents() to get the contents of the file. > > However the itr is freed BEFORE the call to get_contents(). This means > the contents dir entry points to is no longer valid. And it results to > your situation. > Indeed this is the root of all trouble. The freeing of itr of course also invalidates the assignment made earlier: dir_entry *dentptr = itr->dent; As dentptr is used in the call to get_contents() the evil takes its course... Again many thanks for analyzing and explaining this as detailed as you did. > But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied. > I've also found its cause. > > The MMC driver calls blk_dselect_hwpart() unconditionally in > mmc_bread(). This causes block cache being invalidated and refilled many > times during the FAT load operation. > Yes I have noticed this frequent invalidations of the cache due to calls to blk_dselect_hwpart(). Some optimization, if possible, would be great. > The frequent block cache invalidation and refilling finally results in > frequent memory allocation and freeing. There is a mechanism in the > dlmalloc.c: if the total free space reaches a threshold, the > malloc_trim() will be called. malloc_trim() calls sbrk(), which > fills the memory being freed with zeros and of course the data of itr is > damaged, otherwise the original data remains unchanged. > > So this is the reason the bug only appears when block cache is enabled > and cache invalidation is called after every select_hwpart. > I see and now understand the different behavior of eMMC and SD-card. > I have made a patch, could you please have a try? I've tested it worked. > If it works for you too, I will submit it to u-boot. And I will submit > another patch to optimize the cache invalidation for select_hwpart. > > --- a/fs/fat/fat.c > +++ b/fs/fat/fat.c > @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t > pos, void *buffer, > /* For saving default max clustersize memory allocated to malloc pool > */ > dir_entry *dentptr = itr->dent; > > - free(itr); > - > - itr = NULL; > - > ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread); > > out_free_both: > I can confirm that your patch works perfect, thanks! Please CC me when submitting the two patches you mentioned above. It will be my pleasure to test them on my hardware. > > BTW, I reproduced this failure on a MT7623 board, which also has an eMMC > and a SD. And this patch is indeed used for MT7623 to solve the bug on > switching hwpart on eMMC. > > > Best Regards, > > Weijie > regard, Felix ^ permalink raw reply [flat|nested] 7+ messages in thread
* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart 2019-08-26 12:43 ` Felix Brack @ 2019-08-27 3:07 ` Weijie Gao 2019-08-27 7:46 ` Felix Brack 0 siblings, 1 reply; 7+ messages in thread From: Weijie Gao @ 2019-08-27 3:07 UTC (permalink / raw) To: u-boot On Mon, 2019-08-26 at 14:43 +0200, Felix Brack wrote: > Hello Weijie, > > On 26.08.19 10:19, Weijie Gao wrote: > > On Thu, 2019-08-22 at 15:58 +0200, Felix Brack wrote: > >> On 11.07.19 09:10, Weijie Gao wrote: > >> > >>> Some storage devices have multiple hw partitions and both address from > >>> zero, for example eMMC. > >>> However currently block cache invalidation only applies to block > >>> write/erase. > >>> This can cause a problem that data of current hw partition is cached > >>> before switching to another hw partition. And the following read > >>> operation of the latter hw partition will get wrong data when reading > >>> from the addresses that have been cached previously. > >>> > >>> To solve this problem, invalidate block cache after a successful > >>> select_hwpart operation. > >>> > >>> Signed-off-by: Weijie Gao <weijie.gao@mediatek.com> > >>> --- > >> This patch breaks correct operation of PDU001 board. > >> > >>> drivers/block/blk-uclass.c | 14 ++++++++++++-- > >>> 1 file changed, 12 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/block/blk-uclass.c b/drivers/block/blk-uclass.c > >>> index baaf431e5e0..c23b6682a6c 100644 > >>> --- a/drivers/block/blk-uclass.c > >>> +++ b/drivers/block/blk-uclass.c > >>> @@ -208,7 +208,11 @@ int blk_select_hwpart_devnum(enum if_type if_type, int devnum, int hwpart) > >>> if (ret) > >>> return ret; > >>> > >>> - return blk_select_hwpart(dev, hwpart); > >>> + ret = blk_select_hwpart(dev, hwpart); > >>> + if (!ret) > >>> + blkcache_invalidate(if_type, devnum); > >>> + > >>> + return ret; > >>> } > >>> > >>> int blk_list_part(enum if_type if_type) > >>> @@ -348,7 +352,13 @@ int blk_select_hwpart(struct udevice *dev, int hwpart) > >>> > >>> int blk_dselect_hwpart(struct blk_desc *desc, int hwpart) > >>> { > >>> - return blk_select_hwpart(desc->bdev, hwpart); > >>> + int ret; > >>> + > >>> + ret = blk_select_hwpart(desc->bdev, hwpart); > >>> + if (!ret) > >>> + blkcache_invalidate(desc->if_type, desc->devnum); > >>> + > >> Commenting the invalidation of the block cache on this line results in > >> proper working of the board. > >> > >>> + return ret; > >>> } > >>> > >>> int blk_first_device(int if_type, struct udevice **devp) > >>> > >> With the patch active, files from the boot FAT partition of the SD-card > >> (mmc device 1) do not load anymore, hence booting fails. > >> Using the eMMC (mmc device 0) instead works fine, files can bee loaded > >> and the board boots. > >> > >> To isolate the problem I have modified the configuration to only load a > >> 10k test file (test.bin) instead of loading the DTB and zImage files > >> followed by booting LINUX. Furthermore I have enabled debugging for some > >> parts of the code. > >> > >> When I boot from the SD-card (which fails) I get the following logged: > >> === > >> CPU : AM335X-GP rev 1.0 > >> === > >> > >> This time the file test.bin gets loaded correctly. > >> > >> To be honest I don't really see why things are going wrong. It looks > >> like the block cache gets filled but then again invalidated before it's > >> data is used. I also feel that this problem is related to the SD-card > >> not being the first (number 0) but the second device (number 1). > >> Can anybody shed some light on this and give me a hint? > >> > >> many thanks, Felix > > > > Hi Felix, > > > > I've found the root cause. > > > Many thanks for looking into this! > > > There is a bug in the FAT driver. In function file_fat_read_at(), > > malloc_cache_aligned() allocates memory for the itr. The itr is a > > pointer to struct fat_itr. fat_itr has a member block[MAX_CLUSTSIZE] > > used for storing blocks read from MMC. > > > > The FAT driver resolves the file table and stores the required dir entry > > in the member block. Then the pointer of the dir entry is passed to > > get_contents() to get the contents of the file. > > > > However the itr is freed BEFORE the call to get_contents(). This means > > the contents dir entry points to is no longer valid. And it results to > > your situation. > > > Indeed this is the root of all trouble. The freeing of itr of course > also invalidates the assignment made earlier: > > dir_entry *dentptr = itr->dent; > > As dentptr is used in the call to get_contents() the evil takes its > course... > > Again many thanks for analyzing and explaining this as detailed as you did. > > > But this only happens when CONFIG_BLOCK_CACHE=y and the patch applied. > > I've also found its cause. > > > > The MMC driver calls blk_dselect_hwpart() unconditionally in > > mmc_bread(). This causes block cache being invalidated and refilled many > > times during the FAT load operation. > > > Yes I have noticed this frequent invalidations of the cache due to calls > to blk_dselect_hwpart(). Some optimization, if possible, would be great. > > > The frequent block cache invalidation and refilling finally results in > > frequent memory allocation and freeing. There is a mechanism in the > > dlmalloc.c: if the total free space reaches a threshold, the > > malloc_trim() will be called. malloc_trim() calls sbrk(), which > > fills the memory being freed with zeros and of course the data of itr is > > damaged, otherwise the original data remains unchanged. > > > > So this is the reason the bug only appears when block cache is enabled > > and cache invalidation is called after every select_hwpart. > > > I see and now understand the different behavior of eMMC and SD-card. > > > I have made a patch, could you please have a try? I've tested it worked. > > If it works for you too, I will submit it to u-boot. And I will submit > > another patch to optimize the cache invalidation for select_hwpart. > > > > --- a/fs/fat/fat.c > > +++ b/fs/fat/fat.c > > @@ -1174,10 +1174,6 @@ int file_fat_read_at(const char *filename, loff_t > > pos, void *buffer, > > /* For saving default max clustersize memory allocated to malloc pool > > */ > > dir_entry *dentptr = itr->dent; > > > > - free(itr); > > - > > - itr = NULL; > > - > > ret = get_contents(&fsdata, dentptr, pos, buffer, maxsize, actread); > > > > out_free_both: > > > I can confirm that your patch works perfect, thanks! > > Please CC me when submitting the two patches you mentioned above. It > will be my pleasure to test them on my hardware. > > > > > BTW, I reproduced this failure on a MT7623 board, which also has an eMMC > > and a SD. And this patch is indeed used for MT7623 to solve the bug on > > switching hwpart on eMMC. > > > > > > Best Regards, > > > > Weijie > > > > regard, Felix Hi Felix, I found there's already a commit merged 6 hours ago (mailed on Aug 20) that fixed this issue: d7af2a863017be1f5fd1b65a858ddc7e87d7b876 So there's no need to send patch again. What you need is to pull the latest master branch. I will still send a patch to optimize the invalidation in select_hwpart. Best Regards, Weijie ^ permalink raw reply [flat|nested] 7+ messages in thread
* [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart 2019-08-27 3:07 ` Weijie Gao @ 2019-08-27 7:46 ` Felix Brack 0 siblings, 0 replies; 7+ messages in thread From: Felix Brack @ 2019-08-27 7:46 UTC (permalink / raw) To: u-boot Hello Weijie, On 27.08.19 05:07, Weijie Gao wrote: > Hi Felix, > > I found there's already a commit merged 6 hours ago (mailed on Aug 20) > that fixed this issue: d7af2a863017be1f5fd1b65a858ddc7e87d7b876 > Thanks for the information. The patch referenced can be found here: https://patchwork.ozlabs.org/patch/1151091/ > So there's no need to send patch again. What you need is to pull the > latest master branch. > > I will still send a patch to optimize the invalidation in select_hwpart. > > Best Regards, > > Weijie > > regards, Felix ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-08-27 7:46 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-07-11 7:10 [U-Boot] [PATCH] blk: Invalidate block cache when switching hwpart Weijie Gao 2019-07-19 0:00 ` Tom Rini 2019-08-22 13:58 ` Felix Brack 2019-08-26 8:19 ` Weijie Gao 2019-08-26 12:43 ` Felix Brack 2019-08-27 3:07 ` Weijie Gao 2019-08-27 7:46 ` Felix Brack
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox