* [PATCH 0/3] memtest performance improvements
@ 2025-08-22 18:18 Rasmus Villemoes
2025-08-22 18:18 ` [PATCH 1/3] memtest: don't volatile-qualify local variables Rasmus Villemoes
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Rasmus Villemoes @ 2025-08-22 18:18 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
The first two patches make memtest run ~40x faster (when, as it should
be, dcache is disabled), with the second patch being responsible for
most of that. At least on the beagleboneblack which I used for
testing; other boards and configurations will likely see different
numbers.
This is for CONFIG_SYS_ALT_MEMTEST=y and
CONFIG_SYS_ALT_MEMTEST_BITFLIP=n; one could probably get a similar
improvement in the bitflip case since that also has a schedule() call
in the inner loop.
Rasmus Villemoes (3):
memtest: don't volatile-qualify local variables
memtest: only call schedule() once for every 256 words
memtest: remove use of vu_long typedef in mem_test_alt
cmd/mem.c | 40 +++++++++++++++++++++-------------------
1 file changed, 21 insertions(+), 19 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] memtest: don't volatile-qualify local variables
2025-08-22 18:18 [PATCH 0/3] memtest performance improvements Rasmus Villemoes
@ 2025-08-22 18:18 ` Rasmus Villemoes
2025-08-22 18:18 ` [PATCH 2/3] memtest: only call schedule() once for every 256 words Rasmus Villemoes
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Rasmus Villemoes @ 2025-08-22 18:18 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
It is obviously important that the addr pointer used to access the
memory region being tested is volatile-qualified, to prevent the
compiler from optimizing out the "write this value, read it back,
check that it is what we expect".
However, none of these auxiliary variables have any such need for,
effectively, being forced to live on the stack and cause each and
every reference to them to do a memory access.
This makes the memtest about 15% faster on a beagleboneblack.
Before:
=> dcache off
=> time mtest 0x81000000 0x81100000 0 1
Testing 81000000 ... 81100000:
Iteration: 1
Tested 1 iteration(s) with 0 errors.
time: 10.868 seconds
After:
=> dcache off
=> time mtest 0x81000000 0x81100000 0 1
Testing 81000000 ... 81100000:
Iteration: 1
Tested 1 iteration(s) with 0 errors.
time: 9.209 seconds
[Without the 'dcache off', there's no difference in the time, about
0.6s, but the memtest cannot usefully be done with dcache enabled.]
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
cmd/mem.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/cmd/mem.c b/cmd/mem.c
index 9e716776393..9a1cfa4534c 100644
--- a/cmd/mem.c
+++ b/cmd/mem.c
@@ -717,12 +717,9 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
ulong errs = 0;
ulong val, readback;
int j;
- vu_long offset;
- vu_long test_offset;
- vu_long pattern;
- vu_long temp;
- vu_long anti_pattern;
- vu_long num_words;
+ ulong offset, test_offset;
+ ulong pattern, anti_pattern;
+ ulong temp, num_words;
static const ulong bitpattern[] = {
0x00000001, /* single bit */
0x00000003, /* two adjacent bits */
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] memtest: only call schedule() once for every 256 words
2025-08-22 18:18 [PATCH 0/3] memtest performance improvements Rasmus Villemoes
2025-08-22 18:18 ` [PATCH 1/3] memtest: don't volatile-qualify local variables Rasmus Villemoes
@ 2025-08-22 18:18 ` Rasmus Villemoes
2025-08-22 18:18 ` [PATCH 3/3] memtest: remove use of vu_long typedef in mem_test_alt Rasmus Villemoes
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Rasmus Villemoes @ 2025-08-22 18:18 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
A function call itself for every word written or read+written in these
loops is bad enough. But since the memory test must be run with dcache
disabled, the schedule() call, traversing the linked list of
registered cyclic clients, and accessing the 'struct cyclic_info' for
each to see if any are due for a callback, is quite expensive. On a
beagleboneblack, testing a modest 16MiB region takes 2.5 minutes:
=> dcache off
=> time mtest 0x81000000 0x82000000 0 1
Testing 81000000 ... 82000000:
Iteration: 1
Tested 1 iteration(s) with 0 errors.
time: 2 minutes, 28.946 seconds
There is really no need for calling schedule() so frequently. It is
quite easy to limit the calls to once for every 256 words by using a
u8 variable. With that, the same test as above becomes 37 times
faster:
=> dcache off
=> time mtest 0x81000000 0x82000000 0 1
Testing 81000000 ... 82000000:
Iteration: 1
Tested 1 iteration(s) with 0 errors.
time: 4.052 seconds
Note that we are still making a total of
3 loops * (4 * 2^20 words/loop) / (256 words/call) = 49152 calls
during those ~4000 milliseconds, so the schedule() calls are still
done less than 0.1ms apart.
These numbers are just for a beagleboneblack, other boards may have a
slower memory, but we are _two orders of magnitude_ away from
schedule() "only" being called at 100Hz, which is still more than
enough to ensure any watchdog is kept happy.
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
cmd/mem.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/cmd/mem.c b/cmd/mem.c
index 9a1cfa4534c..013e93f09df 100644
--- a/cmd/mem.c
+++ b/cmd/mem.c
@@ -730,6 +730,8 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
0x00000055, /* four non-adjacent bits */
0xaaaaaaaa, /* alternating 1/0 */
};
+ /* Rate-limit schedule() calls to one for every 256 words. */
+ u8 count = 0;
num_words = (end_addr - start_addr) / sizeof(vu_long);
@@ -885,7 +887,8 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
* Fill memory with a known pattern.
*/
for (pattern = 1, offset = 0; offset < num_words; pattern++, offset++) {
- schedule();
+ if (!count++)
+ schedule();
addr[offset] = pattern;
}
@@ -893,7 +896,8 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
* Check each location and invert it for the second pass.
*/
for (pattern = 1, offset = 0; offset < num_words; pattern++, offset++) {
- schedule();
+ if (!count++)
+ schedule();
temp = addr[offset];
if (temp != pattern) {
printf("\nFAILURE (read/write) @ 0x%.8lx:"
@@ -913,7 +917,8 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
* Check each location for the inverted pattern and zero it.
*/
for (pattern = 1, offset = 0; offset < num_words; pattern++, offset++) {
- schedule();
+ if (!count++)
+ schedule();
anti_pattern = ~pattern;
temp = addr[offset];
if (temp != anti_pattern) {
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] memtest: remove use of vu_long typedef in mem_test_alt
2025-08-22 18:18 [PATCH 0/3] memtest performance improvements Rasmus Villemoes
2025-08-22 18:18 ` [PATCH 1/3] memtest: don't volatile-qualify local variables Rasmus Villemoes
2025-08-22 18:18 ` [PATCH 2/3] memtest: only call schedule() once for every 256 words Rasmus Villemoes
@ 2025-08-22 18:18 ` Rasmus Villemoes
2025-08-28 13:48 ` [PATCH 0/3] memtest performance improvements Anshul Dalal
2025-09-02 22:57 ` Tom Rini
4 siblings, 0 replies; 6+ messages in thread
From: Rasmus Villemoes @ 2025-08-22 18:18 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
Hiding a qualifier such as "volatile" inside a typedef makes the code
much harder to understand. Since addr and dummy being
volatile-qualified are important for the correctness of the test code,
make it more obvious by spelling it out as "volatile ulong".
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
cmd/mem.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/cmd/mem.c b/cmd/mem.c
index 013e93f09df..c8c602ec9c4 100644
--- a/cmd/mem.c
+++ b/cmd/mem.c
@@ -710,10 +710,10 @@ static int do_mem_loopw(struct cmd_tbl *cmdtp, int flag, int argc,
#endif /* CONFIG_LOOPW */
#ifdef CONFIG_CMD_MEMTEST
-static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
- vu_long *dummy)
+static ulong mem_test_alt(volatile ulong *buf, ulong start_addr, ulong end_addr,
+ volatile ulong *dummy)
{
- vu_long *addr;
+ volatile ulong *addr;
ulong errs = 0;
ulong val, readback;
int j;
@@ -733,7 +733,7 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
/* Rate-limit schedule() calls to one for every 256 words. */
u8 count = 0;
- num_words = (end_addr - start_addr) / sizeof(vu_long);
+ num_words = (end_addr - start_addr) / sizeof(ulong);
/*
* Data line test: write a pattern to the first
@@ -815,8 +815,8 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
*
* Returns: 0 if the test succeeds, 1 if the test fails.
*/
- pattern = (vu_long)0xaaaaaaaaaaaaaaaa;
- anti_pattern = (vu_long)0x5555555555555555;
+ pattern = (ulong)0xaaaaaaaaaaaaaaaa;
+ anti_pattern = (ulong)0x5555555555555555;
debug("%s:%d: length = 0x%.8lx\n", __func__, __LINE__, num_words);
/*
@@ -837,7 +837,7 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
if (temp != pattern) {
printf("\nFAILURE: Address bit stuck high @ 0x%.8lx:"
" expected 0x%.8lx, actual 0x%.8lx\n",
- start_addr + offset*sizeof(vu_long),
+ start_addr + offset*sizeof(ulong),
pattern, temp);
errs++;
if (ctrlc())
@@ -859,7 +859,7 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
printf("\nFAILURE: Address bit stuck low or"
" shorted @ 0x%.8lx: expected 0x%.8lx,"
" actual 0x%.8lx\n",
- start_addr + offset*sizeof(vu_long),
+ start_addr + offset*sizeof(ulong),
pattern, temp);
errs++;
if (ctrlc())
@@ -902,7 +902,7 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
if (temp != pattern) {
printf("\nFAILURE (read/write) @ 0x%.8lx:"
" expected 0x%.8lx, actual 0x%.8lx)\n",
- start_addr + offset*sizeof(vu_long),
+ start_addr + offset*sizeof(ulong),
pattern, temp);
errs++;
if (ctrlc())
@@ -924,7 +924,7 @@ static ulong mem_test_alt(vu_long *buf, ulong start_addr, ulong end_addr,
if (temp != anti_pattern) {
printf("\nFAILURE (read/write): @ 0x%.8lx:"
" expected 0x%.8lx, actual 0x%.8lx)\n",
- start_addr + offset*sizeof(vu_long),
+ start_addr + offset*sizeof(ulong),
anti_pattern, temp);
errs++;
if (ctrlc())
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] memtest performance improvements
2025-08-22 18:18 [PATCH 0/3] memtest performance improvements Rasmus Villemoes
` (2 preceding siblings ...)
2025-08-22 18:18 ` [PATCH 3/3] memtest: remove use of vu_long typedef in mem_test_alt Rasmus Villemoes
@ 2025-08-28 13:48 ` Anshul Dalal
2025-09-02 22:57 ` Tom Rini
4 siblings, 0 replies; 6+ messages in thread
From: Anshul Dalal @ 2025-08-28 13:48 UTC (permalink / raw)
To: Rasmus Villemoes, u-boot; +Cc: Tom Rini
Hi Rasmus,
On Fri Aug 22, 2025 at 11:48 PM IST, Rasmus Villemoes wrote:
> The first two patches make memtest run ~40x faster (when, as it should
> be, dcache is disabled), with the second patch being responsible for
> most of that. At least on the beagleboneblack which I used for
> testing; other boards and configurations will likely see different
> numbers.
>
> This is for CONFIG_SYS_ALT_MEMTEST=y and
> CONFIG_SYS_ALT_MEMTEST_BITFLIP=n; one could probably get a similar
> improvement in the bitflip case since that also has a schedule() call
> in the inner loop.
>
> Rasmus Villemoes (3):
> memtest: don't volatile-qualify local variables
> memtest: only call schedule() once for every 256 words
> memtest: remove use of vu_long typedef in mem_test_alt
>
> cmd/mem.c | 40 +++++++++++++++++++++-------------------
> 1 file changed, 21 insertions(+), 19 deletions(-)
For the entire series:
Tested-by: Anshul Dalal <anshuld@ti.com>
Tested on TI AM62P EVM, with ~10x improvment.
Patch:
diff --git a/configs/am62px_evm_a53_defconfig b/configs/am62px_evm_a53_defconfig
index fa857e51137..896e98fb7ca 100644
--- a/configs/am62px_evm_a53_defconfig
+++ b/configs/am62px_evm_a53_defconfig
@@ -150,6 +150,9 @@ CONFIG_SPL_USB_HOST=y
CONFIG_SPL_USB_STORAGE=y
CONFIG_FS_FAT_MAX_CLUSTSIZE=16384
CONFIG_EFI_SET_TIME=y
+CONFIG_CMD_MEMTEST=y
+CONFIG_SYS_ALT_MEMTEST=y
+CONFIG_SYS_ALT_MEMTEST_BITFLIP=n
#include <configs/k3_efi_capsule.config>
#include <configs/am62x_a53_usbdfu.config>
Results:
Before:
=> dcache off
=> time mtest 0x81000000 0x82000000 0 1
Testing 81000000 ... 82000000:
Iteration: 1
Tested 1 iteration(s) with 0 errors.
time: 10.848 seconds
After:
=> dcache off
=> time mtest 0x81000000 0x82000000 0 1
Testing 81000000 ... 82000000:
Iteration: 1
Tested 1 iteration(s) with 0 errors.
time: 1.054 seconds
Regards,
Anshul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] memtest performance improvements
2025-08-22 18:18 [PATCH 0/3] memtest performance improvements Rasmus Villemoes
` (3 preceding siblings ...)
2025-08-28 13:48 ` [PATCH 0/3] memtest performance improvements Anshul Dalal
@ 2025-09-02 22:57 ` Tom Rini
4 siblings, 0 replies; 6+ messages in thread
From: Tom Rini @ 2025-09-02 22:57 UTC (permalink / raw)
To: u-boot, Rasmus Villemoes
On Fri, 22 Aug 2025 20:18:45 +0200, Rasmus Villemoes wrote:
> The first two patches make memtest run ~40x faster (when, as it should
> be, dcache is disabled), with the second patch being responsible for
> most of that. At least on the beagleboneblack which I used for
> testing; other boards and configurations will likely see different
> numbers.
>
> This is for CONFIG_SYS_ALT_MEMTEST=y and
> CONFIG_SYS_ALT_MEMTEST_BITFLIP=n; one could probably get a similar
> improvement in the bitflip case since that also has a schedule() call
> in the inner loop.
>
> [...]
Applied to u-boot/next, thanks!
[1/3] memtest: don't volatile-qualify local variables
commit: 86c5c25b6ca99025ac8ebcbe5c53ea0f398d1f44
[2/3] memtest: only call schedule() once for every 256 words
commit: 835915bb7d2b76d7e422b6d60628a0f7cef30ddb
[3/3] memtest: remove use of vu_long typedef in mem_test_alt
commit: 42529beba5a262992b53893477e0de646c1754b6
--
Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-09-02 22:57 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 18:18 [PATCH 0/3] memtest performance improvements Rasmus Villemoes
2025-08-22 18:18 ` [PATCH 1/3] memtest: don't volatile-qualify local variables Rasmus Villemoes
2025-08-22 18:18 ` [PATCH 2/3] memtest: only call schedule() once for every 256 words Rasmus Villemoes
2025-08-22 18:18 ` [PATCH 3/3] memtest: remove use of vu_long typedef in mem_test_alt Rasmus Villemoes
2025-08-28 13:48 ` [PATCH 0/3] memtest performance improvements Anshul Dalal
2025-09-02 22:57 ` Tom Rini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox