* [PATCH v2 0/4] Fix building tools/tests/x86_emulator with clang-21
@ 2026-02-27 10:58 Edwin Török
2026-02-27 10:58 ` [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang Edwin Török
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Edwin Török @ 2026-02-27 10:58 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Changed since v1:
* filed upstream bug and added reference for integrated-as issue
* use .ifndef directive instead of volatile workaround
* fixed 2 more undefined behaviour warnings from fsanitize=undefined
This makes the tests build (but not yet run) with clang.
I dropped the ifdef __clang__ patch: compiling just x86-emulate.c with
clang and the rest with GCC fails too, so there are probably some
implicit assumptions about x86_emulate not modifying some registers (but
when compiled with clang it does). I don't have a good solution for
these, and I don't understand what most of these tests are trying to
test.
For example I don't see how this tests the result of the emulator,
where rc is entirely computed by the asm block (and with clang the
result here is 0xfffe instead of 0xffff):
```
asm volatile ( "pcmpeqb %%xmm1, %%xmm1\n"
put_insn(movq_from_mem2, "movq 32(%0), %%xmm1")
:: "c" (NULL) );
set_insn(movq_from_mem2);
rc = x86_emulate(&ctxt, &emulops);
if ( rc != X86EMUL_OKAY || !check_eip(movq_from_mem2) )
goto fail;
asm ( "pcmpgtb %%xmm0, %%xmm0\n\t"
"pcmpeqb %%xmm1, %%xmm0\n\t"
"pmovmskb %%xmm0, %0" : "=r" (rc) );
if ( rc != 0xffff )
goto fail;
```
After fixing the bugs reported by UBSAN/MSAN (and disabling the memset
wrapper which causes it to infloop) the tests still fail.
Using -Os doesn't help either.
valgrind also fails here, but that could be a bug in valgrind:
Testing mulx (%eax),%ecx,%ebx... failed!
Edwin Török (4):
tools/tests/x86_emulator: avoid duplicate symbol error with clang
tools/tests/x86_emulator: fix build on clang-21
tools/tests/x86_emulator: fix undefined behaviour in shift
tools/tests/x86_emulator: avoid passing NULL to memcpy
tools/tests/x86_emulator/Makefile | 5 +++++
tools/tests/x86_emulator/test_x86_emulator.c | 14 ++++++++++----
2 files changed, 15 insertions(+), 4 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang
2026-02-27 10:58 [PATCH v2 0/4] Fix building tools/tests/x86_emulator with clang-21 Edwin Török
@ 2026-02-27 10:58 ` Edwin Török
2026-03-03 13:59 ` Jan Beulich
2026-02-27 10:58 ` [PATCH v2 2/4] tools/tests/x86_emulator: fix build on clang-21 Edwin Török
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Edwin Török @ 2026-02-27 10:58 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
clang would duplicate the loop body and end up with a double definition
of the symbol:
```
/tmp/test_x86_emulator-0f3576.s:27823: Error: symbol `vmovsh_to_mem' is already defined
/tmp/test_x86_emulator-0f3576.s:27825: Error: symbol `.Lvmovsh_to_mem_end' is already defined
```
Avoid this by only emitting the symbols surrounding the instructions if
they are not already defined. (We know that the definitions would be
identical, and we only need one of them as input to the emulator)
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
---
Changed since v1:
* use .ifndef directive instead of volatile workaround
---
tools/tests/x86_emulator/test_x86_emulator.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/tests/x86_emulator/test_x86_emulator.c b/tools/tests/x86_emulator/test_x86_emulator.c
index ea507f9c3a..8f93a8bbcd 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -1882,8 +1882,13 @@ int main(int argc, char **argv)
#define decl_insn(which) extern const unsigned char which[], \
which##_end[] asm ( ".L" #which "_end" )
#define put_insn(which, insn) ".pushsection .test\n" \
- #which ": " insn "\n" \
+ ".ifndef "#which"\n" \
+ #which ": \n" \
+ ".endif\n" \
+ insn "\n" \
+ ".ifndef .L"#which"_end\n" \
".L" #which "_end:\n" \
+ ".endif\n" \
".popsection"
#define set_insn(which) (regs.eip = (unsigned long)(which))
#define valid_eip(which) (regs.eip >= (unsigned long)(which) && \
@@ -5248,7 +5253,7 @@ int main(int argc, char **argv)
memset(res + 3, ~0, 8);
regs.eax = (unsigned long)res;
regs.ecx = ~0;
- for ( i = 0; i < 2; ++i )
+ for (i = 0; i < 2; ++i )
{
decl_insn(vmovsh_to_mem);
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/4] tools/tests/x86_emulator: fix build on clang-21
2026-02-27 10:58 [PATCH v2 0/4] Fix building tools/tests/x86_emulator with clang-21 Edwin Török
2026-02-27 10:58 ` [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang Edwin Török
@ 2026-02-27 10:58 ` Edwin Török
2026-03-03 14:05 ` Jan Beulich
2026-02-27 10:58 ` [PATCH v2 3/4] tools/tests/x86_emulator: fix undefined behaviour in shift Edwin Török
2026-02-27 10:58 ` [PATCH v2 4/4] tools/tests/x86_emulator: avoid passing NULL to memcpy Edwin Török
3 siblings, 1 reply; 18+ messages in thread
From: Edwin Török @ 2026-02-27 10:58 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
clang-21's built-in assembler is enabled by default, but it doesn't
support some mnemonics:
```
test_x86_emulator.c:2636:36: error: invalid instruction mnemonic 'fsaves'
2636 | "fidivs %1\n\t"
test_x86_emulator.c:2640:24: error: invalid instruction mnemonic 'frstors'
2640 | asm volatile ( "frstors %0" :: "m" (res[25]) : "memory" );
| ^
test_x86_emulator.c:4251:24: error: invalid instruction mnemonic 'vpcmpestriq'
4251 | put_insn(vpcmpestri,
| ^
```
Use the external assembler with Clang for consistency with GCC.
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
---
Changed since v1:
* filed upstream bug and added reference
---
tools/tests/x86_emulator/Makefile | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/tests/x86_emulator/Makefile b/tools/tests/x86_emulator/Makefile
index e18725d0c3..5003c464f3 100644
--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -297,6 +297,11 @@ x86_emulate:
HOSTCFLAGS-x86_64 := -fno-PIE
$(call cc-option-add,HOSTCFLAGS-x86_64,HOSTCC,-no-pie)
+
+# clang's integrated as does not support some mnemonics:
+# https://github.com/llvm/llvm-project/issues/183635
+$(call cc-option-add,HOSTCFLAGS-x86_64,HOSTCC,-no-integrated-as)
+
HOSTCFLAGS += $(CFLAGS_xeninclude) -I. $(HOSTCFLAGS-$(XEN_COMPILE_ARCH))
x86.h := $(addprefix $(XEN_ROOT)/tools/include/xen/asm/,\
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/4] tools/tests/x86_emulator: fix undefined behaviour in shift
2026-02-27 10:58 [PATCH v2 0/4] Fix building tools/tests/x86_emulator with clang-21 Edwin Török
2026-02-27 10:58 ` [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang Edwin Török
2026-02-27 10:58 ` [PATCH v2 2/4] tools/tests/x86_emulator: fix build on clang-21 Edwin Török
@ 2026-02-27 10:58 ` Edwin Török
2026-03-03 14:07 ` Jan Beulich
2026-02-27 10:58 ` [PATCH v2 4/4] tools/tests/x86_emulator: avoid passing NULL to memcpy Edwin Török
3 siblings, 1 reply; 18+ messages in thread
From: Edwin Török @ 2026-02-27 10:58 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Fixes this `-fsanitize=undefined` error:
```
test_x86_emulator.c:1103:32: runtime error: left shift of 573785183 by 2 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test_x86_emulator.c:1103:32
```
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
---
tools/tests/x86_emulator/test_x86_emulator.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/tests/x86_emulator/test_x86_emulator.c b/tools/tests/x86_emulator/test_x86_emulator.c
index 8f93a8bbcd..3a03ea0352 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -1100,7 +1100,7 @@ int main(int argc, char **argv)
regs.edi = (unsigned long)res;
rc = x86_emulate(&ctxt, &emulops);
if ( (rc != X86EMUL_OKAY) ||
- (*res != ((0x2233445F << 2) | 2)) ||
+ (*res != ((0x2233445FUL << 2) | 2)) ||
((regs.eflags & (EFLAGS_MASK & ~X86_EFLAGS_OF))
!= EFLAGS_ALWAYS_SET) ||
(regs.eip != (unsigned long)&instr[3]) )
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 4/4] tools/tests/x86_emulator: avoid passing NULL to memcpy
2026-02-27 10:58 [PATCH v2 0/4] Fix building tools/tests/x86_emulator with clang-21 Edwin Török
` (2 preceding siblings ...)
2026-02-27 10:58 ` [PATCH v2 3/4] tools/tests/x86_emulator: fix undefined behaviour in shift Edwin Török
@ 2026-02-27 10:58 ` Edwin Török
2026-03-03 14:18 ` Jan Beulich
3 siblings, 1 reply; 18+ messages in thread
From: Edwin Török @ 2026-02-27 10:58 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Fixes this `-fsanitize=undefined` error:
```
test_x86_emulator.c:614:12: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test_x86_emulator.c:614:12
```
Although this is more of a grey area: I don't see anything in the
standard that'd forbid calling `memset` with NULL and 0, but `glibc`
does specify it as non-null, which allows the compiler to make
optimizations assuming it never is NULL, so this is undefined behaviour
only on glibc.
Best to avoid the potential undefined behaviour though.
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
---
tools/tests/x86_emulator/test_x86_emulator.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/tests/x86_emulator/test_x86_emulator.c b/tools/tests/x86_emulator/test_x86_emulator.c
index 3a03ea0352..87c1289afa 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -611,7 +611,8 @@ static int fetch(
if ( verbose )
printf("** %s(CS:%p,, %u,)\n", __func__, (void *)offset, bytes);
- memcpy(p_data, (void *)offset, bytes);
+ if (bytes)
+ memcpy(p_data, (void *)offset, bytes);
return X86EMUL_OKAY;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang
2026-02-27 10:58 ` [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang Edwin Török
@ 2026-03-03 13:59 ` Jan Beulich
2026-03-03 15:09 ` Edwin Torok
0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2026-03-03 13:59 UTC (permalink / raw)
To: Edwin Török
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, xen-devel
On 27.02.2026 11:58, Edwin Török wrote:
> --- a/tools/tests/x86_emulator/test_x86_emulator.c
> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
> @@ -1882,8 +1882,13 @@ int main(int argc, char **argv)
> #define decl_insn(which) extern const unsigned char which[], \
> which##_end[] asm ( ".L" #which "_end" )
> #define put_insn(which, insn) ".pushsection .test\n" \
> - #which ": " insn "\n" \
> + ".ifndef "#which"\n" \
> + #which ": \n" \
> + ".endif\n" \
> + insn "\n" \
> + ".ifndef .L"#which"_end\n" \
> ".L" #which "_end:\n" \
> + ".endif\n" \
> ".popsection"
Nice idea, but why multiple .ifndef, and why emitting the insn even if the
labels are already there (and hence won't be emitted a 2nd time)?
Further, if the compiler unrolls a loop and instantiates such a put_insn()
twice, it could pick different inputs (where flexibility is allowed). Most
present uses (including ones I have pending) meet this requirement (i.e.
permit only a single register per operand), but vmovdqu{32,16}_to_mem,
evex_vcvtph2ps, vpcompress{d,q,w}, vpdpwssd_{vex1,vex2,evex}, and
vmovsh_to_mem don't. {,v}movsd{,_masked}_to_mem, pcmp{e,i}str{i,m} and a
few more could also end up being problematic if different addressing was
used for the memory operand(s).
None of those sit in loops, I think, so we may be safe. But the constraints
need properly writing down in a comment, I think.
> @@ -5248,7 +5253,7 @@ int main(int argc, char **argv)
> memset(res + 3, ~0, 8);
> regs.eax = (unsigned long)res;
> regs.ecx = ~0;
> - for ( i = 0; i < 2; ++i )
> + for (i = 0; i < 2; ++i )
> {
> decl_insn(vmovsh_to_mem);
Excuse me?
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] tools/tests/x86_emulator: fix build on clang-21
2026-02-27 10:58 ` [PATCH v2 2/4] tools/tests/x86_emulator: fix build on clang-21 Edwin Török
@ 2026-03-03 14:05 ` Jan Beulich
0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2026-03-03 14:05 UTC (permalink / raw)
To: Edwin Török
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, xen-devel
On 27.02.2026 11:58, Edwin Török wrote:
> clang-21's built-in assembler is enabled by default, but it doesn't
> support some mnemonics:
> ```
> test_x86_emulator.c:2636:36: error: invalid instruction mnemonic 'fsaves'
> 2636 | "fidivs %1\n\t"
> test_x86_emulator.c:2640:24: error: invalid instruction mnemonic 'frstors'
> 2640 | asm volatile ( "frstors %0" :: "m" (res[25]) : "memory" );
> | ^
> test_x86_emulator.c:4251:24: error: invalid instruction mnemonic 'vpcmpestriq'
> 4251 | put_insn(vpcmpestri,
> | ^
> ```
Btw, for this last one to work with gcc older than 7.x what I have in a
local "helper" patch is
# if __GNUC__ < 7//temp
put_insn(vpcmpestri,
".byte 0xC4, 0xE3, 0xF9, 0x61, 0x16, 0x7A")
# else
put_insn(vpcmpestri,
"vpcmpestriq $0b01111010, (%1), %%xmm2")
# endif
> Use the external assembler with Clang for consistency with GCC.
>
> Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
> --- a/tools/tests/x86_emulator/Makefile
> +++ b/tools/tests/x86_emulator/Makefile
> @@ -297,6 +297,11 @@ x86_emulate:
>
> HOSTCFLAGS-x86_64 := -fno-PIE
> $(call cc-option-add,HOSTCFLAGS-x86_64,HOSTCC,-no-pie)
> +
> +# clang's integrated as does not support some mnemonics:
> +# https://github.com/llvm/llvm-project/issues/183635
I see they already confirmed at least the {,v}pcmpestri issue.
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] tools/tests/x86_emulator: fix undefined behaviour in shift
2026-02-27 10:58 ` [PATCH v2 3/4] tools/tests/x86_emulator: fix undefined behaviour in shift Edwin Török
@ 2026-03-03 14:07 ` Jan Beulich
2026-03-03 14:09 ` Jan Beulich
2026-03-03 14:49 ` Edwin Torok
0 siblings, 2 replies; 18+ messages in thread
From: Jan Beulich @ 2026-03-03 14:07 UTC (permalink / raw)
To: Edwin Török
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, xen-devel
On 27.02.2026 11:58, Edwin Török wrote:
> --- a/tools/tests/x86_emulator/test_x86_emulator.c
> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
> @@ -1100,7 +1100,7 @@ int main(int argc, char **argv)
> regs.edi = (unsigned long)res;
> rc = x86_emulate(&ctxt, &emulops);
> if ( (rc != X86EMUL_OKAY) ||
> - (*res != ((0x2233445F << 2) | 2)) ||
> + (*res != ((0x2233445FUL << 2) | 2)) ||
Why the L when res is an array of unsigned int? With it dropped (happy
to do so while committing):
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] tools/tests/x86_emulator: fix undefined behaviour in shift
2026-03-03 14:07 ` Jan Beulich
@ 2026-03-03 14:09 ` Jan Beulich
2026-03-03 14:30 ` Edwin Torok
2026-03-03 14:49 ` Edwin Torok
1 sibling, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2026-03-03 14:09 UTC (permalink / raw)
To: Edwin Török
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, xen-devel
On 03.03.2026 15:07, Jan Beulich wrote:
> On 27.02.2026 11:58, Edwin Török wrote:
>> --- a/tools/tests/x86_emulator/test_x86_emulator.c
>> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
>> @@ -1100,7 +1100,7 @@ int main(int argc, char **argv)
>> regs.edi = (unsigned long)res;
>> rc = x86_emulate(&ctxt, &emulops);
>> if ( (rc != X86EMUL_OKAY) ||
>> - (*res != ((0x2233445F << 2) | 2)) ||
>> + (*res != ((0x2233445FUL << 2) | 2)) ||
>
> Why the L when res is an array of unsigned int? With it dropped (happy
> to do so while committing):
> Acked-by: Jan Beulich <jbeulich@suse.com>
I should probably add that nevertheless it's not quite clear to me what use it
is to compile the harness source itself with sanitizer options. I can see that
to be useful for the core emulator files compiled into the harness binary.
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/4] tools/tests/x86_emulator: avoid passing NULL to memcpy
2026-02-27 10:58 ` [PATCH v2 4/4] tools/tests/x86_emulator: avoid passing NULL to memcpy Edwin Török
@ 2026-03-03 14:18 ` Jan Beulich
2026-03-03 15:24 ` Andrew Cooper
0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2026-03-03 14:18 UTC (permalink / raw)
To: Edwin Török
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, xen-devel
On 27.02.2026 11:58, Edwin Török wrote:
> Fixes this `-fsanitize=undefined` error:
> ```
> test_x86_emulator.c:614:12: runtime error: null pointer passed as argument 1, which is declared to never be null
> /usr/include/string.h:44:28: note: nonnull attribute specified here
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test_x86_emulator.c:614:12
> ```
>
> Although this is more of a grey area: I don't see anything in the
> standard that'd forbid calling `memset` with NULL and 0,
There actually is. In the C99 spec clause 2 refers to section 7.1.4, where null
pointer arguments are excluded. Imo for memcpy() etc exceptions should be made
for the case where the count is also zero, but sadly nothing like that is there.
Nit: Why do you talk of memset() though when memcpy() is what you alter?
> but `glibc`
> does specify it as non-null, which allows the compiler to make
> optimizations assuming it never is NULL, so this is undefined behaviour
> only on glibc.
> Best to avoid the potential undefined behaviour though.
>
> Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
with both nits (above and below) addressed. I can again take care of that when
committing. (Same remark applies here as to the usefulness of compiling ...
> --- a/tools/tests/x86_emulator/test_x86_emulator.c
> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
... this file with sanitizer options active.)
> @@ -611,7 +611,8 @@ static int fetch(
> if ( verbose )
> printf("** %s(CS:%p,, %u,)\n", __func__, (void *)offset, bytes);
>
> - memcpy(p_data, (void *)offset, bytes);
> + if (bytes)
Nit: Style. A well formed if() is even visible in context.
Jan
> + memcpy(p_data, (void *)offset, bytes);
> return X86EMUL_OKAY;
> }
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] tools/tests/x86_emulator: fix undefined behaviour in shift
2026-03-03 14:09 ` Jan Beulich
@ 2026-03-03 14:30 ` Edwin Torok
0 siblings, 0 replies; 18+ messages in thread
From: Edwin Torok @ 2026-03-03 14:30 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Roger Pau Monne, Anthony PERARD,
xen-devel@lists.xenproject.org
> On 3 Mar 2026, at 14:09, Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.03.2026 15:07, Jan Beulich wrote:
>> On 27.02.2026 11:58, Edwin Török wrote:
>>> --- a/tools/tests/x86_emulator/test_x86_emulator.c
>>> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
>>> @@ -1100,7 +1100,7 @@ int main(int argc, char **argv)
>>> regs.edi = (unsigned long)res;
>>> rc = x86_emulate(&ctxt, &emulops);
>>> if ( (rc != X86EMUL_OKAY) ||
>>> - (*res != ((0x2233445F << 2) | 2)) ||
>>> + (*res != ((0x2233445FUL << 2) | 2)) ||
>>
>> Why the L when res is an array of unsigned int? With it dropped (happy
>> to do so while committing):
>> Acked-by: Jan Beulich <jbeulich@suse.com>
>
> I should probably add that nevertheless it's not quite clear to me what use it
> is to compile the harness source itself with sanitizer options.
I was trying to see why the tests were failing with clang, but not GCC.
That could also be because clang takes different (optimisation) decisions on how to handle undefined behaviour.
It turned out that undefined behaviour wasn’t the reason the results were different, but I sent some patches to fix
some of the (currently latent) bugs it uncovered.
> I can see that
> to be useful for the core emulator files compiled into the harness binary.
>
For sanitisers like the memory sanitisers (which detects reads from uninitialized values) I think it is best to have the whole program compiled with it
(the uninitialised value could originate in a different file). Or at least as far as reasonable (I’m not rebuilding libc).
But even for the address sanitiser a buffer could be allocated (e.g. on the stack) in a different file than where the bug is.
Best regards,
—Edwin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] tools/tests/x86_emulator: fix undefined behaviour in shift
2026-03-03 14:07 ` Jan Beulich
2026-03-03 14:09 ` Jan Beulich
@ 2026-03-03 14:49 ` Edwin Torok
1 sibling, 0 replies; 18+ messages in thread
From: Edwin Torok @ 2026-03-03 14:49 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Roger Pau Monne, Anthony PERARD,
xen-devel@lists.xenproject.org
> On 3 Mar 2026, at 14:07, Jan Beulich <jbeulich@suse.com> wrote:
>
> On 27.02.2026 11:58, Edwin Török wrote:
>> --- a/tools/tests/x86_emulator/test_x86_emulator.c
>> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
>> @@ -1100,7 +1100,7 @@ int main(int argc, char **argv)
>> regs.edi = (unsigned long)res;
>> rc = x86_emulate(&ctxt, &emulops);
>> if ( (rc != X86EMUL_OKAY) ||
>> - (*res != ((0x2233445F << 2) | 2)) ||
>> + (*res != ((0x2233445FUL << 2) | 2)) ||
>
> Why the L when res is an array of unsigned int? With it dropped (happy
> to do so while committing):
You are right, it isn’t needed, making it unsigned is enough to avoid undefined behaviour.
Best regards,
—Edwin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang
2026-03-03 13:59 ` Jan Beulich
@ 2026-03-03 15:09 ` Edwin Torok
2026-03-03 15:36 ` Jan Beulich
0 siblings, 1 reply; 18+ messages in thread
From: Edwin Torok @ 2026-03-03 15:09 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Roger Pau Monne, Anthony PERARD,
xen-devel@lists.xenproject.org
> On 3 Mar 2026, at 13:59, Jan Beulich <jbeulich@suse.com> wrote:
>
> On 27.02.2026 11:58, Edwin Török wrote:
>> --- a/tools/tests/x86_emulator/test_x86_emulator.c
>> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
>> @@ -1882,8 +1882,13 @@ int main(int argc, char **argv)
>> #define decl_insn(which) extern const unsigned char which[], \
>> which##_end[] asm ( ".L" #which "_end" )
>> #define put_insn(which, insn) ".pushsection .test\n" \
>> - #which ": " insn "\n" \
>> + ".ifndef "#which"\n" \
>> + #which ": \n" \
>> + ".endif\n" \
>> + insn "\n" \
>> + ".ifndef .L"#which"_end\n" \
>> ".L" #which "_end:\n" \
>> + ".endif\n" \
>> ".popsection"
>
> Nice idea, but why multiple .ifndef, and why emitting the insn even if the
> labels are already there (and hence won't be emitted a 2nd time)?
I think we still need to execute the instructions, so they can be compared against the emulator.
(Especially that on the 2nd loop the instructions might get different inputs,
and thus potentially produce different outputs)
IIUC this will look like this:
// i=0 unrolled
asm volatile( // ... first instance defines the label and instructions, stores result)
x86_emulate(…)
… compare emulation and actual execution results
// i=1 unrolled
asm volatile (// … second instance, doesn’t define the label, just executes instructions, and stores result)
x86_emulate(…)
… compare emulation and actual execution results
If we don’t emit the instructions when the label is already present then the 2nd time around we’ll have (stale) data from i=0.
>
> Further, if the compiler unrolls a loop and instantiates such a put_insn()
> twice, it could pick different inputs (where flexibility is allowed). Most
> present uses (including ones I have pending) meet this requirement (i.e.
> permit only a single register per operand), but vmovdqu{32,16}_to_mem,
> evex_vcvtph2ps, vpcompress{d,q,w}, vpdpwssd_{vex1,vex2,evex}, and
> vmovsh_to_mem don't. {,v}movsd{,_masked}_to_mem, pcmp{e,i}str{i,m} and a
> few more could also end up being problematic if different addressing was
> used for the memory operand(s).
>
> None of those sit in loops, I think, so we may be safe. But the constraints
> need properly writing down in a comment, I think.
Good point, there is an implicit assumption that multiple uses of the same “which” argument contain
exactly the same binary form.
If the forms are equivalent the test would still pass (but perhaps emulator test coverage will be reduced),
and worst case if they are different and not equivalent a test could fail.
If we start running these tests in a CI then we should notice if that happens with (future) patches.
Also if someone happens to typo the name of a ‘which’ in a new test in a way that it reuses an existing one then they’ll no
longer get a build failure (or 2 independent series, both adding tests for the same instruction).
Perhaps this one could be avoided if I extend put_insn (or decl_insn) to contain some C symbols that’d cause a conflict.
Will think about this and send a new version including that and the comment.
>
>> @@ -5248,7 +5253,7 @@ int main(int argc, char **argv)
>> memset(res + 3, ~0, 8);
>> regs.eax = (unsigned long)res;
>> regs.ecx = ~0;
>> - for ( i = 0; i < 2; ++i )
>> + for (i = 0; i < 2; ++i )
>> {
>> decl_insn(vmovsh_to_mem);
>
> Excuse me?
>
Sorry, should’ve noticed before sending (a stray change from undoing the volatile, wasn’t meant to be there).
Best regards,
—Edwin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/4] tools/tests/x86_emulator: avoid passing NULL to memcpy
2026-03-03 14:18 ` Jan Beulich
@ 2026-03-03 15:24 ` Andrew Cooper
0 siblings, 0 replies; 18+ messages in thread
From: Andrew Cooper @ 2026-03-03 15:24 UTC (permalink / raw)
To: Jan Beulich, Edwin Török
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, xen-devel
On 03/03/2026 2:18 pm, Jan Beulich wrote:
> On 27.02.2026 11:58, Edwin Török wrote:
>> Fixes this `-fsanitize=undefined` error:
>> ```
>> test_x86_emulator.c:614:12: runtime error: null pointer passed as argument 1, which is declared to never be null
>> /usr/include/string.h:44:28: note: nonnull attribute specified here
>> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test_x86_emulator.c:614:12
>> ```
>>
>> Although this is more of a grey area: I don't see anything in the
>> standard that'd forbid calling `memset` with NULL and 0,
> There actually is. In the C99 spec clause 2 refers to section 7.1.4, where null
> pointer arguments are excluded. Imo for memcpy() etc exceptions should be made
> for the case where the count is also zero, but sadly nothing like that is there.
C23 does finally make NULL with a zero length be well defined behaviour
for memcpy() and friends, but it's going to be a long time before we can
rely on this properly.
GCC 15 gains __attribute__((nonzero_if_null)) too.
~Andrew
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang
2026-03-03 15:09 ` Edwin Torok
@ 2026-03-03 15:36 ` Jan Beulich
2026-03-03 15:55 ` Edwin Torok
0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2026-03-03 15:36 UTC (permalink / raw)
To: Edwin Torok
Cc: Andrew Cooper, Roger Pau Monne, Anthony PERARD,
xen-devel@lists.xenproject.org
On 03.03.2026 16:09, Edwin Torok wrote:
>> On 3 Mar 2026, at 13:59, Jan Beulich <jbeulich@suse.com> wrote:
>> On 27.02.2026 11:58, Edwin Török wrote:
>>> --- a/tools/tests/x86_emulator/test_x86_emulator.c
>>> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
>>> @@ -1882,8 +1882,13 @@ int main(int argc, char **argv)
>>> #define decl_insn(which) extern const unsigned char which[], \
>>> which##_end[] asm ( ".L" #which "_end" )
>>> #define put_insn(which, insn) ".pushsection .test\n" \
>>> - #which ": " insn "\n" \
>>> + ".ifndef "#which"\n" \
>>> + #which ": \n" \
>>> + ".endif\n" \
>>> + insn "\n" \
>>> + ".ifndef .L"#which"_end\n" \
>>> ".L" #which "_end:\n" \
>>> + ".endif\n" \
>>> ".popsection"
>>
>> Nice idea, but why multiple .ifndef, and why emitting the insn even if the
>> labels are already there (and hence won't be emitted a 2nd time)?
>
> I think we still need to execute the instructions, so they can be compared against the emulator.
Of course, but they cannot be executed without having a label. We use the
label to point the emulated IP there, and then we use the end label to
check that after emulation the emulated IP has advanced as expected.
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang
2026-03-03 15:36 ` Jan Beulich
@ 2026-03-03 15:55 ` Edwin Torok
2026-03-03 16:43 ` Jan Beulich
0 siblings, 1 reply; 18+ messages in thread
From: Edwin Torok @ 2026-03-03 15:55 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Roger Pau Monne, Anthony PERARD,
xen-devel@lists.xenproject.org
> On 3 Mar 2026, at 15:36, Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.03.2026 16:09, Edwin Torok wrote:
>>> On 3 Mar 2026, at 13:59, Jan Beulich <jbeulich@suse.com> wrote:
>>> On 27.02.2026 11:58, Edwin Török wrote:
>>>> --- a/tools/tests/x86_emulator/test_x86_emulator.c
>>>> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
>>>> @@ -1882,8 +1882,13 @@ int main(int argc, char **argv)
>>>> #define decl_insn(which) extern const unsigned char which[], \
>>>> which##_end[] asm ( ".L" #which "_end" )
>>>> #define put_insn(which, insn) ".pushsection .test\n" \
>>>> - #which ": " insn "\n" \
>>>> + ".ifndef "#which"\n" \
>>>> + #which ": \n" \
>>>> + ".endif\n" \
>>>> + insn "\n" \
>>>> + ".ifndef .L"#which"_end\n" \
>>>> ".L" #which "_end:\n" \
>>>> + ".endif\n" \
>>>> ".popsection"
>>>
>>> Nice idea, but why multiple .ifndef, and why emitting the insn even if the
>>> labels are already there (and hence won't be emitted a 2nd time)?
>>
>> I think we still need to execute the instructions, so they can be compared against the emulator.
>
> Of course, but they cannot be executed without having a label. We use the
> label to point the emulated IP there, and then we use the end label to
> check that after emulation the emulated IP has advanced as expected.
Oh that means that we won’t actually be testing anything useful in iterations>=1
(the test passes, but it runs the same test as it did on iteration 0).
I had another approach in mind (always use a locally unique label with %=, and update a C pointer to point to current one),
but it’d be very invasive. Would have to use extended asm syntax instead of basic, but that isn’t even the biggest problem:
all the %0, %1 would get shifted because I’d need to introduce another output variable.
And if I don’t introduce an output variable then clang completely optimises away the pointer on the C side, thinking it is never written to,
volatile might help, but that’d require even more casts).
Probably not worth going too deep on this particular rabbit hole.
Since this is just test-code maybe I should try using -O0 instead for compiling the emulator *test code only* (i.e. add O0 to the tests/x86_emulator/Makefile)
I tried -Os from https://lists.xen.org/archives/html/xen-devel/2023-04/msg00283.html, but -Os is -O2 with *extra* optimisations to reduce code size for clang, so it doesn’t really turn any optimisations off (it turns more optimisations on…)
https://clang.llvm.org/docs/CommandGuide/clang.html#cmdoption-O0
That isn’t necessarily the correct solution, but if it works it’d at least allow us to begin running both GCC and clang tests in the CI.
Best regards,
—Edwin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang
2026-03-03 15:55 ` Edwin Torok
@ 2026-03-03 16:43 ` Jan Beulich
2026-03-06 16:46 ` Edwin Torok
0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2026-03-03 16:43 UTC (permalink / raw)
To: Edwin Torok
Cc: Andrew Cooper, Roger Pau Monne, Anthony PERARD,
xen-devel@lists.xenproject.org
On 03.03.2026 16:55, Edwin Torok wrote:
>> On 3 Mar 2026, at 15:36, Jan Beulich <jbeulich@suse.com> wrote:
>> On 03.03.2026 16:09, Edwin Torok wrote:
>>>> On 3 Mar 2026, at 13:59, Jan Beulich <jbeulich@suse.com> wrote:
>>>> On 27.02.2026 11:58, Edwin Török wrote:
>>>>> --- a/tools/tests/x86_emulator/test_x86_emulator.c
>>>>> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
>>>>> @@ -1882,8 +1882,13 @@ int main(int argc, char **argv)
>>>>> #define decl_insn(which) extern const unsigned char which[], \
>>>>> which##_end[] asm ( ".L" #which "_end" )
>>>>> #define put_insn(which, insn) ".pushsection .test\n" \
>>>>> - #which ": " insn "\n" \
>>>>> + ".ifndef "#which"\n" \
>>>>> + #which ": \n" \
>>>>> + ".endif\n" \
>>>>> + insn "\n" \
>>>>> + ".ifndef .L"#which"_end\n" \
>>>>> ".L" #which "_end:\n" \
>>>>> + ".endif\n" \
>>>>> ".popsection"
>>>>
>>>> Nice idea, but why multiple .ifndef, and why emitting the insn even if the
>>>> labels are already there (and hence won't be emitted a 2nd time)?
>>>
>>> I think we still need to execute the instructions, so they can be compared against the emulator.
>>
>> Of course, but they cannot be executed without having a label. We use the
>> label to point the emulated IP there, and then we use the end label to
>> check that after emulation the emulated IP has advanced as expected.
>
> Oh that means that we won’t actually be testing anything useful in iterations>=1
> (the test passes, but it runs the same test as it did on iteration 0).
May I ask for a little less bold statements? Of course the 2nd iteration isn't
identical to the 1st. The insn encoding is the same, but the operands (the mask
in particular, i.e. the value %k3 holds) aren't.
Jan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang
2026-03-03 16:43 ` Jan Beulich
@ 2026-03-06 16:46 ` Edwin Torok
0 siblings, 0 replies; 18+ messages in thread
From: Edwin Torok @ 2026-03-06 16:46 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Roger Pau Monne, Anthony PERARD,
xen-devel@lists.xenproject.org
> On 3 Mar 2026, at 16:43, Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.03.2026 16:55, Edwin Torok wrote:
>>> On 3 Mar 2026, at 15:36, Jan Beulich <jbeulich@suse.com> wrote:
>>> On 03.03.2026 16:09, Edwin Torok wrote:
>>>>> On 3 Mar 2026, at 13:59, Jan Beulich <jbeulich@suse.com> wrote:
>>>>> On 27.02.2026 11:58, Edwin Török wrote:
>>>>>> --- a/tools/tests/x86_emulator/test_x86_emulator.c
>>>>>> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
>>>>>> @@ -1882,8 +1882,13 @@ int main(int argc, char **argv)
>>>>>> #define decl_insn(which) extern const unsigned char which[], \
>>>>>> which##_end[] asm ( ".L" #which "_end" )
>>>>>> #define put_insn(which, insn) ".pushsection .test\n" \
>>>>>> - #which ": " insn "\n" \
>>>>>> + ".ifndef "#which"\n" \
>>>>>> + #which ": \n" \
>>>>>> + ".endif\n" \
>>>>>> + insn "\n" \
>>>>>> + ".ifndef .L"#which"_end\n" \
>>>>>> ".L" #which "_end:\n" \
>>>>>> + ".endif\n" \
>>>>>> ".popsection"
>>>>>
>>>>> Nice idea, but why multiple .ifndef, and why emitting the insn even if the
>>>>> labels are already there (and hence won't be emitted a 2nd time)?
>>>>
>>>> I think we still need to execute the instructions, so they can be compared against the emulator.
>>>
>>> Of course, but they cannot be executed without having a label. We use the
>>> label to point the emulated IP there, and then we use the end label to
>>> check that after emulation the emulated IP has advanced as expected.
>>
>> Oh that means that we won’t actually be testing anything useful in iterations>=1
>> (the test passes, but it runs the same test as it did on iteration 0).
>
> May I ask for a little less bold statements?
Fair enough, I probably still don’t fully understand how this works, and I tend to jump to conclusions too soon.
> Of course the 2nd iteration isn't
> identical to the 1st. The insn encoding is the same, but the operands (the mask
> in particular, i.e. the value %k3 holds) aren't.
>
If we’re very careful with how we use put_insn() then it might work (e.g. with more comments as you suggested).
I think it could go wrong if the compiler would emit a constant that is different in each loop iteration:
the emulator would be given the same binary sequence to emulate, because it always refers to the label from the 1st iteration
(the only label that exists with my patch), and therefore the value is the constant is the one from the first iteration.
Whereas the actual execution would use the actual values of those constants, which are different in each loop iteration.
This may result in a detectable change in the result between the emulator and actual executions.
If instead of a constant, it emits a reference to a register then the problem wouldn’t exist, because the register would contain the correct value on each iteration.
If we’re careful to use the appropriate constraints then the problem can be avoided (e.g. I assume a register constraint means that the compiler must always put the value into a register first and use that, even when the value would be a compile-time constant). But the macro itself can’t control what constraints are used.
Although it might work, this approach is quite brittle. For now I sent a V3 which uses -O0 for test_x86_emulator.o, which is much simpler.
-Os would work too, if I put it into the correct place (has to go into HOSTCFLAGS, not CFLAGS), but then we rely on the optimiser to always correctly estimate the size and avoid duplicating the assembly block. Since this isn’t performance critical I suggest using -O0.
If a better solution is found then this can be revisited again (e.g. one based on .ifndef as discussed here)
Best regards,
—Edwin
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-03-06 16:47 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-27 10:58 [PATCH v2 0/4] Fix building tools/tests/x86_emulator with clang-21 Edwin Török
2026-02-27 10:58 ` [PATCH v2 1/4] tools/tests/x86_emulator: avoid duplicate symbol error with clang Edwin Török
2026-03-03 13:59 ` Jan Beulich
2026-03-03 15:09 ` Edwin Torok
2026-03-03 15:36 ` Jan Beulich
2026-03-03 15:55 ` Edwin Torok
2026-03-03 16:43 ` Jan Beulich
2026-03-06 16:46 ` Edwin Torok
2026-02-27 10:58 ` [PATCH v2 2/4] tools/tests/x86_emulator: fix build on clang-21 Edwin Török
2026-03-03 14:05 ` Jan Beulich
2026-02-27 10:58 ` [PATCH v2 3/4] tools/tests/x86_emulator: fix undefined behaviour in shift Edwin Török
2026-03-03 14:07 ` Jan Beulich
2026-03-03 14:09 ` Jan Beulich
2026-03-03 14:30 ` Edwin Torok
2026-03-03 14:49 ` Edwin Torok
2026-02-27 10:58 ` [PATCH v2 4/4] tools/tests/x86_emulator: avoid passing NULL to memcpy Edwin Török
2026-03-03 14:18 ` Jan Beulich
2026-03-03 15:24 ` Andrew Cooper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.