* [LTP] [PATCH] ksm: fix segfault on s390
@ 2025-05-20 20:24 Luiz Capitulino via ltp
2025-05-21 14:08 ` Li Wang via ltp
2025-05-27 9:40 ` Cyril Hrubis
0 siblings, 2 replies; 12+ messages in thread
From: Luiz Capitulino via ltp @ 2025-05-20 20:24 UTC (permalink / raw)
To: ltp
Recently, we started seeing the following segfault when running ksm01
and ksm02 tests on an s390 KSM guest:
"""
[ 119.302817] User process fault: interruption code 0011 ilc:3 in libc.so.6[b14ae,3ff91500000+1c9000]
[ 119.302824] Failing address: 000003ff91400000 TEID: 000003ff91400800
[ 119.302826] Fault in primary space mode while using user ASCE.
[ 119.302828] AS:0000000084bec1c7 R3:00000000824cc007 S:0000000081a28001 P:0000000000000400
[ 119.302833] CPU: 0 UID: 0 PID: 5578 Comm: ksm01 Kdump: loaded Not tainted 6.15.0-rc6+ #8 NONE
[ 119.302837] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[ 119.302839] User PSW : 0705200180000000 000003ff915b14ae
[ 119.302841] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
[ 119.302843] User GPRS: cccccccccccccccd 000000000007efff 000003ff91400000 000003ff814ff010
[ 119.302845] 0000000007ffffff 0000000000000000 0000000000000000 000003ff00000000
[ 119.302847] 0000000000000063 0000000000100000 00000000023db500 0000000008000000
[ 119.302848] 0000000000000063 0000000000000080 00000000010066da 000003ffd7777e20
[ 119.302855] User Code: 000003ff915b149e: a784ffee brc 8,000003ff915b147a
000003ff915b14a2: e31032000036 pfd 1,512(%r3)
#000003ff915b14a8: e31022000036 pfd 1,512(%r2)
>000003ff915b14ae: d5ff30002000 clc 0(256,%r3),0(%r2)
000003ff915b14b4: a784ffef brc 8,000003ff915b1492
000003ff915b14b8: b2220020 ipm %r2
000003ff915b14bc: eb220022000d sllg %r2,%r2,34
000003ff915b14c2: eb22003e000a srag %r2,%r2,62
[ 119.302867] Last Breaking-Event-Address:
[ 119.302868] [<000003ff915b14b4>] libc.so.6[b14b4,3ff91500000+1c9000]
"""
This segfault is triggered by the memcmp() call in verify():
"""
memcmp(memory[start], s, (end - start) * (end2 - start2)
"""
In the default case, this call checks if the memory area starting in
memory[0] (since start=0 by default) matches 's' for 128MB. IOW, this
assumes that the memory areas in memory[] are contiguous. This is wrong,
since create_ksm_child() allocates 128 individual areas of 1MB each. As,
in this particular case, memory[0] happens to be the last 1MB area in
the VMA created by the kernel, we hit a segault at the first byte beyond
memory[0].
Now, the question is how this has worked for so long and why it may still
work on arm64 and x86 (even on s390 it ocassionaly works).
For the s390 case, the reason is upstream kernel commit efa7df3e3bb5
("mm: align larger anonymous mappings on THP boundaries"). Before this
commit, the kernel would always map a library right after the memory[0]
area in the process address space. This causes memcmp() to return
non-zero when reading the first byte beyond memory[0], which in turn
causes the nested loop in verify() to execute. The nested loop is correct
(ie. it doesn't assume the memory areas in memory[] are contiguous) so
the test doesn't fail. The mentioned upstream commit causes the first byte
beyond memory[0] not to be mapped most of the time on s390, which may
result in a segfault.
Now, as it turns out on arm64 and x86 the kernel still maps a library right
after memory[0] which causes the test to suceed as explained above (this
can be easily verified by printing the return value for memcmp()).
This commit fixes verify() to do a byte-by-byte check on each individual
memory area. This also simplifies verify() a lot, which is what we want
to avoid this kind of issue in the future.
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
testcases/kernel/mem/ksm/ksm_test.h | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/testcases/kernel/mem/ksm/ksm_test.h b/testcases/kernel/mem/ksm/ksm_test.h
index 0db759d5a..cbad147d4 100644
--- a/testcases/kernel/mem/ksm/ksm_test.h
+++ b/testcases/kernel/mem/ksm/ksm_test.h
@@ -74,22 +74,15 @@ static inline void verify(char **memory, char value, int proc,
int start, int end, int start2, int end2)
{
int i, j;
- void *s = NULL;
-
- s = SAFE_MALLOC((end - start) * (end2 - start2));
tst_res(TINFO, "child %d verifies memory content.", proc);
- memset(s, value, (end - start) * (end2 - start2));
- if (memcmp(memory[start], s, (end - start) * (end2 - start2))
- != 0)
- for (j = start; j < end; j++)
- for (i = start2; i < end2; i++)
- if (memory[j][i] != value)
- tst_res(TFAIL, "child %d has %c at "
- "%d,%d,%d.",
- proc, memory[j][i], proc,
- j, i);
- free(s);
+
+ for (j = start; j < end; j++)
+ for (i = start2; i < end2; i++)
+ if (memory[j][i] != value)
+ tst_res(TFAIL, "child %d has %c at "
+ "%d,%d,%d.",
+ proc, memory[j][i], proc, j, i);
}
struct ksm_merge_data {
--
2.49.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-20 20:24 [LTP] [PATCH] ksm: fix segfault on s390 Luiz Capitulino via ltp
@ 2025-05-21 14:08 ` Li Wang via ltp
2025-05-21 17:10 ` Luiz Capitulino via ltp
2025-05-27 9:40 ` Cyril Hrubis
1 sibling, 1 reply; 12+ messages in thread
From: Li Wang via ltp @ 2025-05-21 14:08 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: ltp
Hi Luiz,
This is a good catch, thank you, comment inline below.
On Wed, May 21, 2025 at 4:29 PM Luiz Capitulino via ltp
<ltp@lists.linux.it> wrote:
>
> Recently, we started seeing the following segfault when running ksm01
> and ksm02 tests on an s390 KSM guest:
>
> """
> [ 119.302817] User process fault: interruption code 0011 ilc:3 in libc.so.6[b14ae,3ff91500000+1c9000]
> [ 119.302824] Failing address: 000003ff91400000 TEID: 000003ff91400800
> [ 119.302826] Fault in primary space mode while using user ASCE.
> [ 119.302828] AS:0000000084bec1c7 R3:00000000824cc007 S:0000000081a28001 P:0000000000000400
> [ 119.302833] CPU: 0 UID: 0 PID: 5578 Comm: ksm01 Kdump: loaded Not tainted 6.15.0-rc6+ #8 NONE
> [ 119.302837] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
> [ 119.302839] User PSW : 0705200180000000 000003ff915b14ae
> [ 119.302841] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
> [ 119.302843] User GPRS: cccccccccccccccd 000000000007efff 000003ff91400000 000003ff814ff010
> [ 119.302845] 0000000007ffffff 0000000000000000 0000000000000000 000003ff00000000
> [ 119.302847] 0000000000000063 0000000000100000 00000000023db500 0000000008000000
> [ 119.302848] 0000000000000063 0000000000000080 00000000010066da 000003ffd7777e20
> [ 119.302855] User Code: 000003ff915b149e: a784ffee brc 8,000003ff915b147a
> 000003ff915b14a2: e31032000036 pfd 1,512(%r3)
> #000003ff915b14a8: e31022000036 pfd 1,512(%r2)
> >000003ff915b14ae: d5ff30002000 clc 0(256,%r3),0(%r2)
> 000003ff915b14b4: a784ffef brc 8,000003ff915b1492
> 000003ff915b14b8: b2220020 ipm %r2
> 000003ff915b14bc: eb220022000d sllg %r2,%r2,34
> 000003ff915b14c2: eb22003e000a srag %r2,%r2,62
> [ 119.302867] Last Breaking-Event-Address:
> [ 119.302868] [<000003ff915b14b4>] libc.so.6[b14b4,3ff91500000+1c9000]
> """
>
> This segfault is triggered by the memcmp() call in verify():
>
> """
> memcmp(memory[start], s, (end - start) * (end2 - start2)
> """
>
> In the default case, this call checks if the memory area starting in
> memory[0] (since start=0 by default) matches 's' for 128MB. IOW, this
> assumes that the memory areas in memory[] are contiguous. This is wrong,
> since create_ksm_child() allocates 128 individual areas of 1MB each. As,
> in this particular case, memory[0] happens to be the last 1MB area in
> the VMA created by the kernel, we hit a segault at the first byte beyond
> memory[0].
>
> Now, the question is how this has worked for so long and why it may still
> work on arm64 and x86 (even on s390 it ocassionaly works).
>
> For the s390 case, the reason is upstream kernel commit efa7df3e3bb5
> ("mm: align larger anonymous mappings on THP boundaries"). Before this
> commit, the kernel would always map a library right after the memory[0]
> area in the process address space. This causes memcmp() to return
> non-zero when reading the first byte beyond memory[0], which in turn
> causes the nested loop in verify() to execute. The nested loop is correct
> (ie. it doesn't assume the memory areas in memory[] are contiguous) so
> the test doesn't fail. The mentioned upstream commit causes the first byte
> beyond memory[0] not to be mapped most of the time on s390, which may
> result in a segfault.
>
> Now, as it turns out on arm64 and x86 the kernel still maps a library right
> after memory[0] which causes the test to suceed as explained above (this
> can be easily verified by printing the return value for memcmp()).
>
> This commit fixes verify() to do a byte-by-byte check on each individual
> memory area. This also simplifies verify() a lot, which is what we want
> to avoid this kind of issue in the future.
>
> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> ---
> testcases/kernel/mem/ksm/ksm_test.h | 21 +++++++--------------
> 1 file changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/testcases/kernel/mem/ksm/ksm_test.h b/testcases/kernel/mem/ksm/ksm_test.h
> index 0db759d5a..cbad147d4 100644
> --- a/testcases/kernel/mem/ksm/ksm_test.h
> +++ b/testcases/kernel/mem/ksm/ksm_test.h
> @@ -74,22 +74,15 @@ static inline void verify(char **memory, char value, int proc,
> int start, int end, int start2, int end2)
> {
> int i, j;
> - void *s = NULL;
> -
> - s = SAFE_MALLOC((end - start) * (end2 - start2));
>
> tst_res(TINFO, "child %d verifies memory content.", proc);
> - memset(s, value, (end - start) * (end2 - start2));
> - if (memcmp(memory[start], s, (end - start) * (end2 - start2))
> - != 0)
> - for (j = start; j < end; j++)
> - for (i = start2; i < end2; i++)
> - if (memory[j][i] != value)
> - tst_res(TFAIL, "child %d has %c at "
> - "%d,%d,%d.",
> - proc, memory[j][i], proc,
> - j, i);
> - free(s);
> +
> + for (j = start; j < end; j++)
> + for (i = start2; i < end2; i++)
> + if (memory[j][i] != value)
> + tst_res(TFAIL, "child %d has %c at "
> + "%d,%d,%d.",
> + proc, memory[j][i], proc, j, i);
> }
Or, can we optimize the verify() function by using memcmp() per memory
block, rather than falling back to the slow nested loop that checks each
byte individually?
Something like:
------------------
...
char *expected = SAFE_MALLOC(end2 - start2);
memset(expected, value, block_size);
for (j = start; j < end; j++) {
if (memcmp(&memory[j][start2], expected, end2 - start2) != 0)
...
--
Regards,
Li Wang
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-21 14:08 ` Li Wang via ltp
@ 2025-05-21 17:10 ` Luiz Capitulino via ltp
2025-05-21 23:27 ` Li Wang via ltp
0 siblings, 1 reply; 12+ messages in thread
From: Luiz Capitulino via ltp @ 2025-05-21 17:10 UTC (permalink / raw)
To: Li Wang; +Cc: ltp
On 2025-05-21 10:08, Li Wang wrote:
> Hi Luiz,
>
> This is a good catch, thank you, comment inline below.
Hi Li, thanks for taking a look (answers below).
>
> On Wed, May 21, 2025 at 4:29 PM Luiz Capitulino via ltp
> <ltp@lists.linux.it> wrote:
>>
>> Recently, we started seeing the following segfault when running ksm01
>> and ksm02 tests on an s390 KSM guest:
>>
>> """
>> [ 119.302817] User process fault: interruption code 0011 ilc:3 in libc.so.6[b14ae,3ff91500000+1c9000]
>> [ 119.302824] Failing address: 000003ff91400000 TEID: 000003ff91400800
>> [ 119.302826] Fault in primary space mode while using user ASCE.
>> [ 119.302828] AS:0000000084bec1c7 R3:00000000824cc007 S:0000000081a28001 P:0000000000000400
>> [ 119.302833] CPU: 0 UID: 0 PID: 5578 Comm: ksm01 Kdump: loaded Not tainted 6.15.0-rc6+ #8 NONE
>> [ 119.302837] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
>> [ 119.302839] User PSW : 0705200180000000 000003ff915b14ae
>> [ 119.302841] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
>> [ 119.302843] User GPRS: cccccccccccccccd 000000000007efff 000003ff91400000 000003ff814ff010
>> [ 119.302845] 0000000007ffffff 0000000000000000 0000000000000000 000003ff00000000
>> [ 119.302847] 0000000000000063 0000000000100000 00000000023db500 0000000008000000
>> [ 119.302848] 0000000000000063 0000000000000080 00000000010066da 000003ffd7777e20
>> [ 119.302855] User Code: 000003ff915b149e: a784ffee brc 8,000003ff915b147a
>> 000003ff915b14a2: e31032000036 pfd 1,512(%r3)
>> #000003ff915b14a8: e31022000036 pfd 1,512(%r2)
>> >000003ff915b14ae: d5ff30002000 clc 0(256,%r3),0(%r2)
>> 000003ff915b14b4: a784ffef brc 8,000003ff915b1492
>> 000003ff915b14b8: b2220020 ipm %r2
>> 000003ff915b14bc: eb220022000d sllg %r2,%r2,34
>> 000003ff915b14c2: eb22003e000a srag %r2,%r2,62
>> [ 119.302867] Last Breaking-Event-Address:
>> [ 119.302868] [<000003ff915b14b4>] libc.so.6[b14b4,3ff91500000+1c9000]
>> """
>>
>> This segfault is triggered by the memcmp() call in verify():
>>
>> """
>> memcmp(memory[start], s, (end - start) * (end2 - start2)
>> """
>>
>> In the default case, this call checks if the memory area starting in
>> memory[0] (since start=0 by default) matches 's' for 128MB. IOW, this
>> assumes that the memory areas in memory[] are contiguous. This is wrong,
>> since create_ksm_child() allocates 128 individual areas of 1MB each. As,
>> in this particular case, memory[0] happens to be the last 1MB area in
>> the VMA created by the kernel, we hit a segault at the first byte beyond
>> memory[0].
>>
>> Now, the question is how this has worked for so long and why it may still
>> work on arm64 and x86 (even on s390 it ocassionaly works).
>>
>> For the s390 case, the reason is upstream kernel commit efa7df3e3bb5
>> ("mm: align larger anonymous mappings on THP boundaries"). Before this
>> commit, the kernel would always map a library right after the memory[0]
>> area in the process address space. This causes memcmp() to return
>> non-zero when reading the first byte beyond memory[0], which in turn
>> causes the nested loop in verify() to execute. The nested loop is correct
>> (ie. it doesn't assume the memory areas in memory[] are contiguous) so
>> the test doesn't fail. The mentioned upstream commit causes the first byte
>> beyond memory[0] not to be mapped most of the time on s390, which may
>> result in a segfault.
>>
>> Now, as it turns out on arm64 and x86 the kernel still maps a library right
>> after memory[0] which causes the test to suceed as explained above (this
>> can be easily verified by printing the return value for memcmp()).
>>
>> This commit fixes verify() to do a byte-by-byte check on each individual
>> memory area. This also simplifies verify() a lot, which is what we want
>> to avoid this kind of issue in the future.
>>
>> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
>> ---
>> testcases/kernel/mem/ksm/ksm_test.h | 21 +++++++--------------
>> 1 file changed, 7 insertions(+), 14 deletions(-)
>>
>> diff --git a/testcases/kernel/mem/ksm/ksm_test.h b/testcases/kernel/mem/ksm/ksm_test.h
>> index 0db759d5a..cbad147d4 100644
>> --- a/testcases/kernel/mem/ksm/ksm_test.h
>> +++ b/testcases/kernel/mem/ksm/ksm_test.h
>> @@ -74,22 +74,15 @@ static inline void verify(char **memory, char value, int proc,
>> int start, int end, int start2, int end2)
>> {
>> int i, j;
>> - void *s = NULL;
>> -
>> - s = SAFE_MALLOC((end - start) * (end2 - start2));
>>
>> tst_res(TINFO, "child %d verifies memory content.", proc);
>> - memset(s, value, (end - start) * (end2 - start2));
>> - if (memcmp(memory[start], s, (end - start) * (end2 - start2))
>> - != 0)
>> - for (j = start; j < end; j++)
>> - for (i = start2; i < end2; i++)
>> - if (memory[j][i] != value)
>> - tst_res(TFAIL, "child %d has %c at "
>> - "%d,%d,%d.",
>> - proc, memory[j][i], proc,
>> - j, i);
>> - free(s);
>> +
>> + for (j = start; j < end; j++)
>> + for (i = start2; i < end2; i++)
>> + if (memory[j][i] != value)
>> + tst_res(TFAIL, "child %d has %c at "
>> + "%d,%d,%d.",
>> + proc, memory[j][i], proc, j, i);
>> }
>
> Or, can we optimize the verify() function by using memcmp() per memory
> block, rather than falling back to the slow nested loop that checks each
> byte individually?
As I understand it, the nested loop is there to tell us which byte failed
the verification. And that's good information, IMHO.
Now, as for having both as the code is written today, why should we do it?
Meaning, what does the optimization intend to improve? Is it test run-time?
If yes, do we have measurements to justify it?
Also, for the kernels that I tested (with my particular .config) that code
was always falling through the nested loop anyways after failing memcmp()
for 1MB. So, if there's a perceived difference, this patch should make it
faster not slower.
IMO, test code should be simple and direct.
- Luiz
>
> Something like:
> ------------------
>
> ...
> char *expected = SAFE_MALLOC(end2 - start2);
> memset(expected, value, block_size);
>
> for (j = start; j < end; j++) {
> if (memcmp(&memory[j][start2], expected, end2 - start2) != 0)
> ...
>
>
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-21 17:10 ` Luiz Capitulino via ltp
@ 2025-05-21 23:27 ` Li Wang via ltp
2025-05-22 5:29 ` Li Wang via ltp
0 siblings, 1 reply; 12+ messages in thread
From: Li Wang via ltp @ 2025-05-21 23:27 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: ltp
On Thu, May 22, 2025 at 1:10 AM Luiz Capitulino <luizcap@redhat.com> wrote:
>
> On 2025-05-21 10:08, Li Wang wrote:
> > Hi Luiz,
> >
> > This is a good catch, thank you, comment inline below.
>
> Hi Li, thanks for taking a look (answers below).
>
> >
> > On Wed, May 21, 2025 at 4:29 PM Luiz Capitulino via ltp
> > <ltp@lists.linux.it> wrote:
> >>
> >> Recently, we started seeing the following segfault when running ksm01
> >> and ksm02 tests on an s390 KSM guest:
> >>
> >> """
> >> [ 119.302817] User process fault: interruption code 0011 ilc:3 in libc.so.6[b14ae,3ff91500000+1c9000]
> >> [ 119.302824] Failing address: 000003ff91400000 TEID: 000003ff91400800
> >> [ 119.302826] Fault in primary space mode while using user ASCE.
> >> [ 119.302828] AS:0000000084bec1c7 R3:00000000824cc007 S:0000000081a28001 P:0000000000000400
> >> [ 119.302833] CPU: 0 UID: 0 PID: 5578 Comm: ksm01 Kdump: loaded Not tainted 6.15.0-rc6+ #8 NONE
> >> [ 119.302837] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
> >> [ 119.302839] User PSW : 0705200180000000 000003ff915b14ae
> >> [ 119.302841] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
> >> [ 119.302843] User GPRS: cccccccccccccccd 000000000007efff 000003ff91400000 000003ff814ff010
> >> [ 119.302845] 0000000007ffffff 0000000000000000 0000000000000000 000003ff00000000
> >> [ 119.302847] 0000000000000063 0000000000100000 00000000023db500 0000000008000000
> >> [ 119.302848] 0000000000000063 0000000000000080 00000000010066da 000003ffd7777e20
> >> [ 119.302855] User Code: 000003ff915b149e: a784ffee brc 8,000003ff915b147a
> >> 000003ff915b14a2: e31032000036 pfd 1,512(%r3)
> >> #000003ff915b14a8: e31022000036 pfd 1,512(%r2)
> >> >000003ff915b14ae: d5ff30002000 clc 0(256,%r3),0(%r2)
> >> 000003ff915b14b4: a784ffef brc 8,000003ff915b1492
> >> 000003ff915b14b8: b2220020 ipm %r2
> >> 000003ff915b14bc: eb220022000d sllg %r2,%r2,34
> >> 000003ff915b14c2: eb22003e000a srag %r2,%r2,62
> >> [ 119.302867] Last Breaking-Event-Address:
> >> [ 119.302868] [<000003ff915b14b4>] libc.so.6[b14b4,3ff91500000+1c9000]
> >> """
> >>
> >> This segfault is triggered by the memcmp() call in verify():
> >>
> >> """
> >> memcmp(memory[start], s, (end - start) * (end2 - start2)
> >> """
> >>
> >> In the default case, this call checks if the memory area starting in
> >> memory[0] (since start=0 by default) matches 's' for 128MB. IOW, this
> >> assumes that the memory areas in memory[] are contiguous. This is wrong,
> >> since create_ksm_child() allocates 128 individual areas of 1MB each. As,
> >> in this particular case, memory[0] happens to be the last 1MB area in
> >> the VMA created by the kernel, we hit a segault at the first byte beyond
> >> memory[0].
> >>
> >> Now, the question is how this has worked for so long and why it may still
> >> work on arm64 and x86 (even on s390 it ocassionaly works).
> >>
> >> For the s390 case, the reason is upstream kernel commit efa7df3e3bb5
> >> ("mm: align larger anonymous mappings on THP boundaries"). Before this
> >> commit, the kernel would always map a library right after the memory[0]
> >> area in the process address space. This causes memcmp() to return
> >> non-zero when reading the first byte beyond memory[0], which in turn
> >> causes the nested loop in verify() to execute. The nested loop is correct
> >> (ie. it doesn't assume the memory areas in memory[] are contiguous) so
> >> the test doesn't fail. The mentioned upstream commit causes the first byte
> >> beyond memory[0] not to be mapped most of the time on s390, which may
> >> result in a segfault.
> >>
> >> Now, as it turns out on arm64 and x86 the kernel still maps a library right
> >> after memory[0] which causes the test to suceed as explained above (this
> >> can be easily verified by printing the return value for memcmp()).
> >>
> >> This commit fixes verify() to do a byte-by-byte check on each individual
> >> memory area. This also simplifies verify() a lot, which is what we want
> >> to avoid this kind of issue in the future.
> >>
> >> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> >> ---
> >> testcases/kernel/mem/ksm/ksm_test.h | 21 +++++++--------------
> >> 1 file changed, 7 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/testcases/kernel/mem/ksm/ksm_test.h b/testcases/kernel/mem/ksm/ksm_test.h
> >> index 0db759d5a..cbad147d4 100644
> >> --- a/testcases/kernel/mem/ksm/ksm_test.h
> >> +++ b/testcases/kernel/mem/ksm/ksm_test.h
> >> @@ -74,22 +74,15 @@ static inline void verify(char **memory, char value, int proc,
> >> int start, int end, int start2, int end2)
> >> {
> >> int i, j;
> >> - void *s = NULL;
> >> -
> >> - s = SAFE_MALLOC((end - start) * (end2 - start2));
> >>
> >> tst_res(TINFO, "child %d verifies memory content.", proc);
> >> - memset(s, value, (end - start) * (end2 - start2));
> >> - if (memcmp(memory[start], s, (end - start) * (end2 - start2))
> >> - != 0)
> >> - for (j = start; j < end; j++)
> >> - for (i = start2; i < end2; i++)
> >> - if (memory[j][i] != value)
> >> - tst_res(TFAIL, "child %d has %c at "
> >> - "%d,%d,%d.",
> >> - proc, memory[j][i], proc,
> >> - j, i);
> >> - free(s);
> >> +
> >> + for (j = start; j < end; j++)
> >> + for (i = start2; i < end2; i++)
> >> + if (memory[j][i] != value)
> >> + tst_res(TFAIL, "child %d has %c at "
> >> + "%d,%d,%d.",
> >> + proc, memory[j][i], proc, j, i);
> >> }
> >
> > Or, can we optimize the verify() function by using memcmp() per memory
> > block, rather than falling back to the slow nested loop that checks each
> > byte individually?
>
> As I understand it, the nested loop is there to tell us which byte failed
> the verification. And that's good information, IMHO.
>
> Now, as for having both as the code is written today, why should we do it?
> Meaning, what does the optimization intend to improve? Is it test run-time?
> If yes, do we have measurements to justify it?
The original design uses memcmp() for bulk memory comparisons,
which is fast and optimized. When memcmp() fails, it falls back to a
nested loop that performs byte-by-byte comparisons to provide detailed
failure information.
However, as you've pointed out, this approach is flawed because it
assumes that the memory starting at memory[start] is a contiguous
region, which it is not. Each memory[start] points to a separately
allocated memory block (via mmap()), so treating them as a single
contiguous block leads to undefined behavior, and on some architectures
like s390, it even causes a segmentation fault.
That said, we can still retain the performance benefits of bulk comparison
by using memcmp() in a block-by-block manner: checking each memory[][start2]
individually. Since each memory[][start2] points to a contiguous
region (e.g., 1MB),
using memcmp() within each block is both safe and efficient.
This would allow us to:
- Preserve correctness across all platforms and memory layouts
- Avoid unnecessary per-byte comparisons when memory is correct
- Provide detailed diagnostics only when a mismatch is detected
So yes, while the original design had a performance goal in mind,
a refined per-block check using memcmp() can achieve similar speed
benefits without sacrificing correctness or portability.
WDYT?
--- a/testcases/kernel/mem/ksm/ksm_test.h
+++ b/testcases/kernel/mem/ksm/ksm_test.h
@@ -74,15 +74,28 @@ static inline void verify(char **memory, char
value, int proc,
int start, int end, int start2, int end2)
{
int i, j;
+ size_t block_size = end2 - start2;
+ char *expected = SAFE_MALLOC(block_size);
+
+ memset(expected, value, block_size);
tst_res(TINFO, "child %d verifies memory content.", proc);
- for (j = start; j < end; j++)
- for (i = start2; i < end2; i++)
- if (memory[j][i] != value)
- tst_res(TFAIL, "child %d has %c at "
- "%d,%d,%d.",
- proc, memory[j][i], proc, j, i);
+ for (j = start; j < end; j++) {
+ if (memcmp(&memory[j][start2], expected, block_size) != 0) {
+
+ tst_res(TINFO, "====> DEBUG: usually not reach
here <===");
+
+ for (i = start2; i < end2; i++) {
+ if (memory[j][i] != value) {
+ tst_res(TFAIL, "child %d has
%c at %d,%d,%d.",
+ proc, memory[j][i], proc, j, i);
+ }
+ }
+ }
+ }
+
+ free(expected);
}
struct ksm_merge_data {
>
> Also, for the kernels that I tested (with my particular .config) that code
> was always falling through the nested loop anyways after failing memcmp()
> for 1MB. So, if there's a perceived difference, this patch should make it
> faster not slower.
>
> IMO, test code should be simple and direct.
>
> - Luiz
>
>
> >
> > Something like:
> > ------------------
> >
> > ...
> > char *expected = SAFE_MALLOC(end2 - start2);
> > memset(expected, value, block_size);
> >
> > for (j = start; j < end; j++) {
> > if (memcmp(&memory[j][start2], expected, end2 - start2) != 0)
> > ...
> >
> >
>
--
Regards,
Li Wang
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-21 23:27 ` Li Wang via ltp
@ 2025-05-22 5:29 ` Li Wang via ltp
2025-05-22 14:08 ` Luiz Capitulino via ltp
0 siblings, 1 reply; 12+ messages in thread
From: Li Wang via ltp @ 2025-05-22 5:29 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: ltp
On Thu, May 22, 2025 at 7:27 AM Li Wang <liwang@redhat.com> wrote:
>
> On Thu, May 22, 2025 at 1:10 AM Luiz Capitulino <luizcap@redhat.com> wrote:
> >
> > On 2025-05-21 10:08, Li Wang wrote:
> > > Hi Luiz,
> > >
> > > This is a good catch, thank you, comment inline below.
> >
> > Hi Li, thanks for taking a look (answers below).
> >
> > >
> > > On Wed, May 21, 2025 at 4:29 PM Luiz Capitulino via ltp
> > > <ltp@lists.linux.it> wrote:
> > >>
> > >> Recently, we started seeing the following segfault when running ksm01
> > >> and ksm02 tests on an s390 KSM guest:
> > >>
> > >> """
> > >> [ 119.302817] User process fault: interruption code 0011 ilc:3 in libc.so.6[b14ae,3ff91500000+1c9000]
> > >> [ 119.302824] Failing address: 000003ff91400000 TEID: 000003ff91400800
> > >> [ 119.302826] Fault in primary space mode while using user ASCE.
> > >> [ 119.302828] AS:0000000084bec1c7 R3:00000000824cc007 S:0000000081a28001 P:0000000000000400
> > >> [ 119.302833] CPU: 0 UID: 0 PID: 5578 Comm: ksm01 Kdump: loaded Not tainted 6.15.0-rc6+ #8 NONE
> > >> [ 119.302837] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
> > >> [ 119.302839] User PSW : 0705200180000000 000003ff915b14ae
> > >> [ 119.302841] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
> > >> [ 119.302843] User GPRS: cccccccccccccccd 000000000007efff 000003ff91400000 000003ff814ff010
> > >> [ 119.302845] 0000000007ffffff 0000000000000000 0000000000000000 000003ff00000000
> > >> [ 119.302847] 0000000000000063 0000000000100000 00000000023db500 0000000008000000
> > >> [ 119.302848] 0000000000000063 0000000000000080 00000000010066da 000003ffd7777e20
> > >> [ 119.302855] User Code: 000003ff915b149e: a784ffee brc 8,000003ff915b147a
> > >> 000003ff915b14a2: e31032000036 pfd 1,512(%r3)
> > >> #000003ff915b14a8: e31022000036 pfd 1,512(%r2)
> > >> >000003ff915b14ae: d5ff30002000 clc 0(256,%r3),0(%r2)
> > >> 000003ff915b14b4: a784ffef brc 8,000003ff915b1492
> > >> 000003ff915b14b8: b2220020 ipm %r2
> > >> 000003ff915b14bc: eb220022000d sllg %r2,%r2,34
> > >> 000003ff915b14c2: eb22003e000a srag %r2,%r2,62
> > >> [ 119.302867] Last Breaking-Event-Address:
> > >> [ 119.302868] [<000003ff915b14b4>] libc.so.6[b14b4,3ff91500000+1c9000]
> > >> """
> > >>
> > >> This segfault is triggered by the memcmp() call in verify():
> > >>
> > >> """
> > >> memcmp(memory[start], s, (end - start) * (end2 - start2)
> > >> """
> > >>
> > >> In the default case, this call checks if the memory area starting in
> > >> memory[0] (since start=0 by default) matches 's' for 128MB. IOW, this
> > >> assumes that the memory areas in memory[] are contiguous. This is wrong,
> > >> since create_ksm_child() allocates 128 individual areas of 1MB each. As,
> > >> in this particular case, memory[0] happens to be the last 1MB area in
> > >> the VMA created by the kernel, we hit a segault at the first byte beyond
> > >> memory[0].
> > >>
> > >> Now, the question is how this has worked for so long and why it may still
> > >> work on arm64 and x86 (even on s390 it ocassionaly works).
> > >>
> > >> For the s390 case, the reason is upstream kernel commit efa7df3e3bb5
> > >> ("mm: align larger anonymous mappings on THP boundaries"). Before this
> > >> commit, the kernel would always map a library right after the memory[0]
> > >> area in the process address space. This causes memcmp() to return
> > >> non-zero when reading the first byte beyond memory[0], which in turn
> > >> causes the nested loop in verify() to execute. The nested loop is correct
> > >> (ie. it doesn't assume the memory areas in memory[] are contiguous) so
> > >> the test doesn't fail. The mentioned upstream commit causes the first byte
> > >> beyond memory[0] not to be mapped most of the time on s390, which may
> > >> result in a segfault.
> > >>
> > >> Now, as it turns out on arm64 and x86 the kernel still maps a library right
> > >> after memory[0] which causes the test to suceed as explained above (this
> > >> can be easily verified by printing the return value for memcmp()).
> > >>
> > >> This commit fixes verify() to do a byte-by-byte check on each individual
> > >> memory area. This also simplifies verify() a lot, which is what we want
> > >> to avoid this kind of issue in the future.
> > >>
> > >> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> > >> ---
> > >> testcases/kernel/mem/ksm/ksm_test.h | 21 +++++++--------------
> > >> 1 file changed, 7 insertions(+), 14 deletions(-)
> > >>
> > >> diff --git a/testcases/kernel/mem/ksm/ksm_test.h b/testcases/kernel/mem/ksm/ksm_test.h
> > >> index 0db759d5a..cbad147d4 100644
> > >> --- a/testcases/kernel/mem/ksm/ksm_test.h
> > >> +++ b/testcases/kernel/mem/ksm/ksm_test.h
> > >> @@ -74,22 +74,15 @@ static inline void verify(char **memory, char value, int proc,
> > >> int start, int end, int start2, int end2)
> > >> {
> > >> int i, j;
> > >> - void *s = NULL;
> > >> -
> > >> - s = SAFE_MALLOC((end - start) * (end2 - start2));
> > >>
> > >> tst_res(TINFO, "child %d verifies memory content.", proc);
> > >> - memset(s, value, (end - start) * (end2 - start2));
> > >> - if (memcmp(memory[start], s, (end - start) * (end2 - start2))
> > >> - != 0)
> > >> - for (j = start; j < end; j++)
> > >> - for (i = start2; i < end2; i++)
> > >> - if (memory[j][i] != value)
> > >> - tst_res(TFAIL, "child %d has %c at "
> > >> - "%d,%d,%d.",
> > >> - proc, memory[j][i], proc,
> > >> - j, i);
> > >> - free(s);
> > >> +
> > >> + for (j = start; j < end; j++)
> > >> + for (i = start2; i < end2; i++)
> > >> + if (memory[j][i] != value)
> > >> + tst_res(TFAIL, "child %d has %c at "
> > >> + "%d,%d,%d.",
> > >> + proc, memory[j][i], proc, j, i);
> > >> }
> > >
> > > Or, can we optimize the verify() function by using memcmp() per memory
> > > block, rather than falling back to the slow nested loop that checks each
> > > byte individually?
> >
> > As I understand it, the nested loop is there to tell us which byte failed
> > the verification. And that's good information, IMHO.
> >
> > Now, as for having both as the code is written today, why should we do it?
> > Meaning, what does the optimization intend to improve? Is it test run-time?
> > If yes, do we have measurements to justify it?
>
> The original design uses memcmp() for bulk memory comparisons,
> which is fast and optimized. When memcmp() fails, it falls back to a
> nested loop that performs byte-by-byte comparisons to provide detailed
> failure information.
>
> However, as you've pointed out, this approach is flawed because it
> assumes that the memory starting at memory[start] is a contiguous
> region, which it is not. Each memory[start] points to a separately
> allocated memory block (via mmap()), so treating them as a single
> contiguous block leads to undefined behavior, and on some architectures
> like s390, it even causes a segmentation fault.
>
> That said, we can still retain the performance benefits of bulk comparison
> by using memcmp() in a block-by-block manner: checking each memory[][start2]
> individually. Since each memory[][start2] points to a contiguous
> region (e.g., 1MB),
> using memcmp() within each block is both safe and efficient.
>
> This would allow us to:
> - Preserve correctness across all platforms and memory layouts
> - Avoid unnecessary per-byte comparisons when memory is correct
> - Provide detailed diagnostics only when a mismatch is detected
>
> So yes, while the original design had a performance goal in mind,
> a refined per-block check using memcmp() can achieve similar speed
> benefits without sacrificing correctness or portability.
>
> WDYT?
>
> --- a/testcases/kernel/mem/ksm/ksm_test.h
> +++ b/testcases/kernel/mem/ksm/ksm_test.h
> @@ -74,15 +74,28 @@ static inline void verify(char **memory, char
> value, int proc,
> int start, int end, int start2, int end2)
> {
> int i, j;
> + size_t block_size = end2 - start2;
> + char *expected = SAFE_MALLOC(block_size);
> +
> + memset(expected, value, block_size);
>
> tst_res(TINFO, "child %d verifies memory content.", proc);
>
> - for (j = start; j < end; j++)
> - for (i = start2; i < end2; i++)
> - if (memory[j][i] != value)
> - tst_res(TFAIL, "child %d has %c at "
> - "%d,%d,%d.",
> - proc, memory[j][i], proc, j, i);
> + for (j = start; j < end; j++) {
> + if (memcmp(&memory[j][start2], expected, block_size) != 0) {
> +
> + tst_res(TINFO, "====> DEBUG: usually not reach
> here <===");
> +
> + for (i = start2; i < end2; i++) {
> + if (memory[j][i] != value) {
> + tst_res(TFAIL, "child %d has
> %c at %d,%d,%d.",
> + proc, memory[j][i], proc, j, i);
> + }
> + }
> + }
> + }
> +
> + free(expected);
> }
>
> struct ksm_merge_data {
I might be a bit too picky:). So I compared the two approaches on a
2 CPUs, KVM, x86_64 system:
Per-block checking cost time:
real 0m5.862s
user 0m1.098s
sys 0m1.505s
Per-byte checking cost time:
real 0m6.819s
user 0m2.498s
sys 0m1.495s
From the data, block-by-block checking can reduce the total execution
time by about 14% and reduce CPU usage by more than 35%, especially
in user-space calculations. This number may not be large, but considering
that tests are frequently run in CI, I think it would be a good thing if we can
reduce 1 second each time :).
--
Regards,
Li Wang
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-22 5:29 ` Li Wang via ltp
@ 2025-05-22 14:08 ` Luiz Capitulino via ltp
2025-05-22 14:36 ` Li Wang via ltp
2025-05-22 14:49 ` Li Wang via ltp
0 siblings, 2 replies; 12+ messages in thread
From: Luiz Capitulino via ltp @ 2025-05-22 14:08 UTC (permalink / raw)
To: Li Wang; +Cc: ltp
On 2025-05-22 01:29, Li Wang wrote:
> On Thu, May 22, 2025 at 7:27 AM Li Wang <liwang@redhat.com> wrote:
>>
>> On Thu, May 22, 2025 at 1:10 AM Luiz Capitulino <luizcap@redhat.com> wrote:
>>>
>>> On 2025-05-21 10:08, Li Wang wrote:
>>>> Hi Luiz,
>>>>
>>>> This is a good catch, thank you, comment inline below.
>>>
>>> Hi Li, thanks for taking a look (answers below).
>>>
>>>>
>>>> On Wed, May 21, 2025 at 4:29 PM Luiz Capitulino via ltp
>>>> <ltp@lists.linux.it> wrote:
>>>>>
>>>>> Recently, we started seeing the following segfault when running ksm01
>>>>> and ksm02 tests on an s390 KSM guest:
>>>>>
>>>>> """
>>>>> [ 119.302817] User process fault: interruption code 0011 ilc:3 in libc.so.6[b14ae,3ff91500000+1c9000]
>>>>> [ 119.302824] Failing address: 000003ff91400000 TEID: 000003ff91400800
>>>>> [ 119.302826] Fault in primary space mode while using user ASCE.
>>>>> [ 119.302828] AS:0000000084bec1c7 R3:00000000824cc007 S:0000000081a28001 P:0000000000000400
>>>>> [ 119.302833] CPU: 0 UID: 0 PID: 5578 Comm: ksm01 Kdump: loaded Not tainted 6.15.0-rc6+ #8 NONE
>>>>> [ 119.302837] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
>>>>> [ 119.302839] User PSW : 0705200180000000 000003ff915b14ae
>>>>> [ 119.302841] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
>>>>> [ 119.302843] User GPRS: cccccccccccccccd 000000000007efff 000003ff91400000 000003ff814ff010
>>>>> [ 119.302845] 0000000007ffffff 0000000000000000 0000000000000000 000003ff00000000
>>>>> [ 119.302847] 0000000000000063 0000000000100000 00000000023db500 0000000008000000
>>>>> [ 119.302848] 0000000000000063 0000000000000080 00000000010066da 000003ffd7777e20
>>>>> [ 119.302855] User Code: 000003ff915b149e: a784ffee brc 8,000003ff915b147a
>>>>> 000003ff915b14a2: e31032000036 pfd 1,512(%r3)
>>>>> #000003ff915b14a8: e31022000036 pfd 1,512(%r2)
>>>>> >000003ff915b14ae: d5ff30002000 clc 0(256,%r3),0(%r2)
>>>>> 000003ff915b14b4: a784ffef brc 8,000003ff915b1492
>>>>> 000003ff915b14b8: b2220020 ipm %r2
>>>>> 000003ff915b14bc: eb220022000d sllg %r2,%r2,34
>>>>> 000003ff915b14c2: eb22003e000a srag %r2,%r2,62
>>>>> [ 119.302867] Last Breaking-Event-Address:
>>>>> [ 119.302868] [<000003ff915b14b4>] libc.so.6[b14b4,3ff91500000+1c9000]
>>>>> """
>>>>>
>>>>> This segfault is triggered by the memcmp() call in verify():
>>>>>
>>>>> """
>>>>> memcmp(memory[start], s, (end - start) * (end2 - start2)
>>>>> """
>>>>>
>>>>> In the default case, this call checks if the memory area starting in
>>>>> memory[0] (since start=0 by default) matches 's' for 128MB. IOW, this
>>>>> assumes that the memory areas in memory[] are contiguous. This is wrong,
>>>>> since create_ksm_child() allocates 128 individual areas of 1MB each. As,
>>>>> in this particular case, memory[0] happens to be the last 1MB area in
>>>>> the VMA created by the kernel, we hit a segault at the first byte beyond
>>>>> memory[0].
>>>>>
>>>>> Now, the question is how this has worked for so long and why it may still
>>>>> work on arm64 and x86 (even on s390 it ocassionaly works).
>>>>>
>>>>> For the s390 case, the reason is upstream kernel commit efa7df3e3bb5
>>>>> ("mm: align larger anonymous mappings on THP boundaries"). Before this
>>>>> commit, the kernel would always map a library right after the memory[0]
>>>>> area in the process address space. This causes memcmp() to return
>>>>> non-zero when reading the first byte beyond memory[0], which in turn
>>>>> causes the nested loop in verify() to execute. The nested loop is correct
>>>>> (ie. it doesn't assume the memory areas in memory[] are contiguous) so
>>>>> the test doesn't fail. The mentioned upstream commit causes the first byte
>>>>> beyond memory[0] not to be mapped most of the time on s390, which may
>>>>> result in a segfault.
>>>>>
>>>>> Now, as it turns out on arm64 and x86 the kernel still maps a library right
>>>>> after memory[0] which causes the test to suceed as explained above (this
>>>>> can be easily verified by printing the return value for memcmp()).
>>>>>
>>>>> This commit fixes verify() to do a byte-by-byte check on each individual
>>>>> memory area. This also simplifies verify() a lot, which is what we want
>>>>> to avoid this kind of issue in the future.
>>>>>
>>>>> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
>>>>> ---
>>>>> testcases/kernel/mem/ksm/ksm_test.h | 21 +++++++--------------
>>>>> 1 file changed, 7 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/testcases/kernel/mem/ksm/ksm_test.h b/testcases/kernel/mem/ksm/ksm_test.h
>>>>> index 0db759d5a..cbad147d4 100644
>>>>> --- a/testcases/kernel/mem/ksm/ksm_test.h
>>>>> +++ b/testcases/kernel/mem/ksm/ksm_test.h
>>>>> @@ -74,22 +74,15 @@ static inline void verify(char **memory, char value, int proc,
>>>>> int start, int end, int start2, int end2)
>>>>> {
>>>>> int i, j;
>>>>> - void *s = NULL;
>>>>> -
>>>>> - s = SAFE_MALLOC((end - start) * (end2 - start2));
>>>>>
>>>>> tst_res(TINFO, "child %d verifies memory content.", proc);
>>>>> - memset(s, value, (end - start) * (end2 - start2));
>>>>> - if (memcmp(memory[start], s, (end - start) * (end2 - start2))
>>>>> - != 0)
>>>>> - for (j = start; j < end; j++)
>>>>> - for (i = start2; i < end2; i++)
>>>>> - if (memory[j][i] != value)
>>>>> - tst_res(TFAIL, "child %d has %c at "
>>>>> - "%d,%d,%d.",
>>>>> - proc, memory[j][i], proc,
>>>>> - j, i);
>>>>> - free(s);
>>>>> +
>>>>> + for (j = start; j < end; j++)
>>>>> + for (i = start2; i < end2; i++)
>>>>> + if (memory[j][i] != value)
>>>>> + tst_res(TFAIL, "child %d has %c at "
>>>>> + "%d,%d,%d.",
>>>>> + proc, memory[j][i], proc, j, i);
>>>>> }
>>>>
>>>> Or, can we optimize the verify() function by using memcmp() per memory
>>>> block, rather than falling back to the slow nested loop that checks each
>>>> byte individually?
>>>
>>> As I understand it, the nested loop is there to tell us which byte failed
>>> the verification. And that's good information, IMHO.
>>>
>>> Now, as for having both as the code is written today, why should we do it?
>>> Meaning, what does the optimization intend to improve? Is it test run-time?
>>> If yes, do we have measurements to justify it?
>>
>> The original design uses memcmp() for bulk memory comparisons,
>> which is fast and optimized. When memcmp() fails, it falls back to a
>> nested loop that performs byte-by-byte comparisons to provide detailed
>> failure information.
>>
>> However, as you've pointed out, this approach is flawed because it
>> assumes that the memory starting at memory[start] is a contiguous
>> region, which it is not. Each memory[start] points to a separately
>> allocated memory block (via mmap()), so treating them as a single
>> contiguous block leads to undefined behavior, and on some architectures
>> like s390, it even causes a segmentation fault.
>>
>> That said, we can still retain the performance benefits of bulk comparison
>> by using memcmp() in a block-by-block manner: checking each memory[][start2]
>> individually. Since each memory[][start2] points to a contiguous
>> region (e.g., 1MB),
>> using memcmp() within each block is both safe and efficient.
>>
>> This would allow us to:
>> - Preserve correctness across all platforms and memory layouts
>> - Avoid unnecessary per-byte comparisons when memory is correct
>> - Provide detailed diagnostics only when a mismatch is detected
>>
>> So yes, while the original design had a performance goal in mind,
>> a refined per-block check using memcmp() can achieve similar speed
>> benefits without sacrificing correctness or portability.
>>
>> WDYT?
>>
>> --- a/testcases/kernel/mem/ksm/ksm_test.h
>> +++ b/testcases/kernel/mem/ksm/ksm_test.h
>> @@ -74,15 +74,28 @@ static inline void verify(char **memory, char
>> value, int proc,
>> int start, int end, int start2, int end2)
>> {
>> int i, j;
>> + size_t block_size = end2 - start2;
>> + char *expected = SAFE_MALLOC(block_size);
>> +
>> + memset(expected, value, block_size);
>>
>> tst_res(TINFO, "child %d verifies memory content.", proc);
>>
>> - for (j = start; j < end; j++)
>> - for (i = start2; i < end2; i++)
>> - if (memory[j][i] != value)
>> - tst_res(TFAIL, "child %d has %c at "
>> - "%d,%d,%d.",
>> - proc, memory[j][i], proc, j, i);
>> + for (j = start; j < end; j++) {
>> + if (memcmp(&memory[j][start2], expected, block_size) != 0) {
>> +
>> + tst_res(TINFO, "====> DEBUG: usually not reach
>> here <===");
>> +
>> + for (i = start2; i < end2; i++) {
>> + if (memory[j][i] != value) {
>> + tst_res(TFAIL, "child %d has
>> %c at %d,%d,%d.",
>> + proc, memory[j][i], proc, j, i);
>> + }
>> + }
>> + }
>> + }
>> +
>> + free(expected);
>> }
>>
>> struct ksm_merge_data {
>
> I might be a bit too picky:). So I compared the two approaches on a
> 2 CPUs, KVM, x86_64 system:
>
> Per-block checking cost time:
> real 0m5.862s
> user 0m1.098s
> sys 0m1.505s
>
> Per-byte checking cost time:
> real 0m6.819s
> user 0m2.498s
> sys 0m1.495s
>
> From the data, block-by-block checking can reduce the total execution
> time by about 14% and reduce CPU usage by more than 35%, especially
> in user-space calculations. This number may not be large, but considering
> that tests are frequently run in CI, I think it would be a good thing if we can
> reduce 1 second each time :).
Just to make sure I understand: you measured total test run-time, correct?
How many times did you run it?
In any case, I'm not sure a 1 second run-time (or even CPU utilization) matters
that much. You're running test code, you shouldn't expect otherwise unless you
hit a very bad case (say something taking several hours to complete).
The trade off is more complex code with bugs that can hide for 10+ years and
take developer time to debug. Also, higher memory utilization: 's' doubles
memory utilization per child only to do that check.
So, I suggest we stick to the simpler code. Or, get it merged now (since it's
fixing a bug and possibly making the code _faster_) and then you can optimize
on top later if you like.
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-22 14:08 ` Luiz Capitulino via ltp
@ 2025-05-22 14:36 ` Li Wang via ltp
2025-05-22 19:52 ` Petr Vorel
2025-05-23 6:13 ` Jan Stancek via ltp
2025-05-22 14:49 ` Li Wang via ltp
1 sibling, 2 replies; 12+ messages in thread
From: Li Wang via ltp @ 2025-05-22 14:36 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: ltp
Luiz Capitulino <luizcap@redhat.com> wrote:
> > I might be a bit too picky:). So I compared the two approaches on a
> > 2 CPUs, KVM, x86_64 system:
> >
> > Per-block checking cost time:
> > real 0m5.862s
> > user 0m1.098s
> > sys 0m1.505s
> >
> > Per-byte checking cost time:
> > real 0m6.819s
> > user 0m2.498s
> > sys 0m1.495s
> >
> > From the data, block-by-block checking can reduce the total execution
> > time by about 14% and reduce CPU usage by more than 35%, especially
> > in user-space calculations. This number may not be large, but considering
> > that tests are frequently run in CI, I think it would be a good thing if we can
> > reduce 1 second each time :).
>
> Just to make sure I understand: you measured total test run-time, correct?
> How many times did you run it?
>
> In any case, I'm not sure a 1 second run-time (or even CPU utilization) matters
> that much. You're running test code, you shouldn't expect otherwise unless you
> hit a very bad case (say something taking several hours to complete).
>
> The trade off is more complex code with bugs that can hide for 10+ years and
> take developer time to debug. Also, higher memory utilization: 's' doubles
> memory utilization per child only to do that check.
Ture, that's why the problem not been find so many years!
>
> So, I suggest we stick to the simpler code. Or, get it merged now (since it's
> fixing a bug and possibly making the code _faster_) and then you can optimize
> on top later if you like.
Ok, sounds reasonable.
Reviewed-by: Li Wang <liwang@redhat.com>
@Cyril, @Petr, I vote to merge this one (as it is) before our May release.
--
Regards,
Li Wang
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-22 14:08 ` Luiz Capitulino via ltp
2025-05-22 14:36 ` Li Wang via ltp
@ 2025-05-22 14:49 ` Li Wang via ltp
1 sibling, 0 replies; 12+ messages in thread
From: Li Wang via ltp @ 2025-05-22 14:49 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: ltp
Luiz Capitulino <luizcap@redhat.com> wrote:
> Just to make sure I understand: you measured total test run-time, correct?
> How many times did you run it?
Yes, I measure the ksm01 test total time, more than 10 times, each time
checking in per-block is faster than per-byte method 1 second.
If we test with 10 cycles, we still get similar results. So, I would keep this
as a open question to see if others care about the 1 second (like me).
Per-block:
# time ./ksm01 -i 10
...
real 0m58.723s
user 0m10.530s
sys 0m13.973s
Per-byte:
# time ./ksm01 -i 10
...
real 1m7.958s
user 0m24.990s
sys 0m14.016s
--
Regards,
Li Wang
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-22 14:36 ` Li Wang via ltp
@ 2025-05-22 19:52 ` Petr Vorel
2025-05-23 6:13 ` Jan Stancek via ltp
1 sibling, 0 replies; 12+ messages in thread
From: Petr Vorel @ 2025-05-22 19:52 UTC (permalink / raw)
To: Li Wang; +Cc: ltp, Luiz Capitulino
Hi Li, Luiz,
> Luiz Capitulino <luizcap@redhat.com> wrote:
> > > I might be a bit too picky:). So I compared the two approaches on a
> > > 2 CPUs, KVM, x86_64 system:
> > > Per-block checking cost time:
> > > real 0m5.862s
> > > user 0m1.098s
> > > sys 0m1.505s
> > > Per-byte checking cost time:
> > > real 0m6.819s
> > > user 0m2.498s
> > > sys 0m1.495s
> > > From the data, block-by-block checking can reduce the total execution
> > > time by about 14% and reduce CPU usage by more than 35%, especially
> > > in user-space calculations. This number may not be large, but considering
> > > that tests are frequently run in CI, I think it would be a good thing if we can
> > > reduce 1 second each time :).
> > Just to make sure I understand: you measured total test run-time, correct?
> > How many times did you run it?
> > In any case, I'm not sure a 1 second run-time (or even CPU utilization) matters
> > that much. You're running test code, you shouldn't expect otherwise unless you
> > hit a very bad case (say something taking several hours to complete).
> > The trade off is more complex code with bugs that can hide for 10+ years and
> > take developer time to debug. Also, higher memory utilization: 's' doubles
> > memory utilization per child only to do that check.
> Ture, that's why the problem not been find so many years!
> > So, I suggest we stick to the simpler code. Or, get it merged now (since it's
> > fixing a bug and possibly making the code _faster_) and then you can optimize
> > on top later if you like.
> Ok, sounds reasonable.
> Reviewed-by: Li Wang <liwang@redhat.com>
> @Cyril, @Petr, I vote to merge this one (as it is) before our May release.
LGTM.
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Kind regards,
Petr
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-22 14:36 ` Li Wang via ltp
2025-05-22 19:52 ` Petr Vorel
@ 2025-05-23 6:13 ` Jan Stancek via ltp
1 sibling, 0 replies; 12+ messages in thread
From: Jan Stancek via ltp @ 2025-05-23 6:13 UTC (permalink / raw)
To: Li Wang; +Cc: ltp, Luiz Capitulino
On Thu, May 22, 2025 at 4:36 PM Li Wang via ltp <ltp@lists.linux.it> wrote:
>
> Luiz Capitulino <luizcap@redhat.com> wrote:
>
> > > I might be a bit too picky:). So I compared the two approaches on a
> > > 2 CPUs, KVM, x86_64 system:
> > >
> > > Per-block checking cost time:
> > > real 0m5.862s
> > > user 0m1.098s
> > > sys 0m1.505s
> > >
> > > Per-byte checking cost time:
> > > real 0m6.819s
> > > user 0m2.498s
> > > sys 0m1.495s
> > >
> > > From the data, block-by-block checking can reduce the total execution
> > > time by about 14% and reduce CPU usage by more than 35%, especially
> > > in user-space calculations. This number may not be large, but considering
> > > that tests are frequently run in CI, I think it would be a good thing if we can
> > > reduce 1 second each time :).
> >
> > Just to make sure I understand: you measured total test run-time, correct?
> > How many times did you run it?
> >
> > In any case, I'm not sure a 1 second run-time (or even CPU utilization) matters
> > that much. You're running test code, you shouldn't expect otherwise unless you
> > hit a very bad case (say something taking several hours to complete).
> >
> > The trade off is more complex code with bugs that can hide for 10+ years and
> > take developer time to debug. Also, higher memory utilization: 's' doubles
> > memory utilization per child only to do that check.
>
> Ture, that's why the problem not been find so many years!
>
> >
> > So, I suggest we stick to the simpler code. Or, get it merged now (since it's
> > fixing a bug and possibly making the code _faster_) and then you can optimize
> > on top later if you like.
>
> Ok, sounds reasonable.
>
> Reviewed-by: Li Wang <liwang@redhat.com>
>
> @Cyril, @Petr, I vote to merge this one (as it is) before our May release.
Acked-by: Jan Stancek <jstancek@redhat.com>
>
> --
> Regards,
> Li Wang
>
>
> --
> Mailing list info: https://lists.linux.it/listinfo/ltp
>
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-20 20:24 [LTP] [PATCH] ksm: fix segfault on s390 Luiz Capitulino via ltp
2025-05-21 14:08 ` Li Wang via ltp
@ 2025-05-27 9:40 ` Cyril Hrubis
2025-05-27 9:50 ` Li Wang via ltp
1 sibling, 1 reply; 12+ messages in thread
From: Cyril Hrubis @ 2025-05-27 9:40 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: ltp
Hi!
Reviewed-by: Cyril Hrubis <chrubis@suse.cz>
--
Cyril Hrubis
chrubis@suse.cz
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [LTP] [PATCH] ksm: fix segfault on s390
2025-05-27 9:40 ` Cyril Hrubis
@ 2025-05-27 9:50 ` Li Wang via ltp
0 siblings, 0 replies; 12+ messages in thread
From: Li Wang via ltp @ 2025-05-27 9:50 UTC (permalink / raw)
To: LTP List; +Cc: Luiz Capitulino
Hi Luiz, All,
Patch merged, thanks!
--
Regards,
Li Wang
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-05-27 9:51 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20 20:24 [LTP] [PATCH] ksm: fix segfault on s390 Luiz Capitulino via ltp
2025-05-21 14:08 ` Li Wang via ltp
2025-05-21 17:10 ` Luiz Capitulino via ltp
2025-05-21 23:27 ` Li Wang via ltp
2025-05-22 5:29 ` Li Wang via ltp
2025-05-22 14:08 ` Luiz Capitulino via ltp
2025-05-22 14:36 ` Li Wang via ltp
2025-05-22 19:52 ` Petr Vorel
2025-05-23 6:13 ` Jan Stancek via ltp
2025-05-22 14:49 ` Li Wang via ltp
2025-05-27 9:40 ` Cyril Hrubis
2025-05-27 9:50 ` Li Wang via ltp
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.