Generic way to detect qemu linux-user emulation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Generic way to detect qemu linux-user emulation
@ 2025-03-18 10:18 Andreas Schwab
  2025-03-18 10:36 ` Helge Deller
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Schwab @ 2025-03-18 10:18 UTC (permalink / raw)
  To: qemu-devel

Is there a generic way for a program to detect that is it being run
inside the linux-user emulation?

The purpose for that would be to work around limitations of the
emulation, like CLONE_VFORK being unsupported.  For example, python >=
3.13 needs to avoid using posix_spawn in that case, because the
emulation of CLONE_VFORK as a true fork makes it impossible for it to
report errors back to the parent process.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 10:18 Generic way to detect qemu linux-user emulation Andreas Schwab
@ 2025-03-18 10:36 ` Helge Deller
  2025-03-18 10:45   ` Helge Deller
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Helge Deller @ 2025-03-18 10:36 UTC (permalink / raw)
  To: Andreas Schwab, qemu-devel

On 3/18/25 11:18, Andreas Schwab wrote:
> Is there a generic way for a program to detect that is it being run
> inside the linux-user emulation?

Yes, having a reliable way to detect it would be good.

My current (unreliable) way to detect it is using uname.
The kernel string and arch name don't match:

(sid_hppa)root@paq:/# uname -a
Linux paq 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) parisc GNU/Linux

(sid_hppa)root@paq:/# uname -r
6.1.0-31-amd64

(sid_hppa)root@paq:/# uname -m
parisc

This is a qemu-linux-user parisc(hppa) emulation running on x86-64.

> The purpose for that would be to work around limitations of the
> emulation, like CLONE_VFORK being unsupported.

yes, and robust futexes aren't supported either.

>  For example, python >=
> 3.13 needs to avoid using posix_spawn in that case, because the
> emulation of CLONE_VFORK as a true fork makes it impossible for it to
> report errors back to the parent process.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 10:36 ` Helge Deller
@ 2025-03-18 10:45   ` Helge Deller
  2025-03-18 10:53   ` Peter Maydell
  2025-03-18 11:10   ` Andreas Schwab
  2 siblings, 0 replies; 16+ messages in thread
From: Helge Deller @ 2025-03-18 10:45 UTC (permalink / raw)
  To: Andreas Schwab, qemu-devel

On 3/18/25 11:36, Helge Deller wrote:
> On 3/18/25 11:18, Andreas Schwab wrote:
>> Is there a generic way for a program to detect that is it being run
>> inside the linux-user emulation?
>
> Yes, having a reliable way to detect it would be good.

In qemu-user emulation we could change the return values of
"uname --processor" and/or "uname --hardware-platform".
Currently both always return "unknown", but in qemu we could
return the arch of the host.

Another possibility is to extend prctl(), but I think uname is
easier to handle in scripts and such...


> My current (unreliable) way to detect it is using uname.
> The kernel string and arch name don't match:
>
> (sid_hppa)root@paq:/# uname -a
> Linux paq 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) parisc GNU/Linux
>
> (sid_hppa)root@paq:/# uname -r
> 6.1.0-31-amd64
>
> (sid_hppa)root@paq:/# uname -m
> parisc
>
> This is a qemu-linux-user parisc(hppa) emulation running on x86-64.
>
>> The purpose for that would be to work around limitations of the
>> emulation, like CLONE_VFORK being unsupported.
>
> yes, and robust futexes aren't supported either.
>
>>  For example, python >=
>> 3.13 needs to avoid using posix_spawn in that case, because the
>> emulation of CLONE_VFORK as a true fork makes it impossible for it to
>> report errors back to the parent process.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 10:36 ` Helge Deller
  2025-03-18 10:45   ` Helge Deller
@ 2025-03-18 10:53   ` Peter Maydell
  2025-03-18 11:58     ` Daniel P. Berrangé
  2025-03-18 11:10   ` Andreas Schwab
  2 siblings, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2025-03-18 10:53 UTC (permalink / raw)
  To: Helge Deller; +Cc: Andreas Schwab, qemu-devel

On Tue, 18 Mar 2025 at 10:36, Helge Deller <deller@gmx.de> wrote:
>
> On 3/18/25 11:18, Andreas Schwab wrote:
> > Is there a generic way for a program to detect that is it being run
> > inside the linux-user emulation?
>
> Yes, having a reliable way to detect it would be good.
>
> My current (unreliable) way to detect it is using uname.

Yes, I don't believe there's currently an "intended"
mechanism for detecting QEMU, only ways of noticing
long-standing deviations from how the real kernel behaves.

> > The purpose for that would be to work around limitations of the
> > emulation, like CLONE_VFORK being unsupported.
>
> yes, and robust futexes aren't supported either.

You don't need to detect QEMU for that one, though -- you can
just try the get_robust_list syscall and if it fails ENOSYS
then fall back to a codepath that doesn't use them (same as
you would on an ancient kernel that didn't implement the
syscall). Robust futexes are in the "technically extremely
hard to impossible to support" bucket, per the comment in
syscall.c.

> In qemu-user emulation we could change the return values of
> "uname --processor" and/or "uname --hardware-platform".
> Currently both always return "unknown", but in qemu we could
> return the arch of the host.

As a mechanism that feels a bit risky to me -- at some
point somebody may come along and say "my guest program
requires that these return the expected values for
the target CPU", and then you have a conflict between
whether you want them to behave correctly for the
target or to give you the "tell me it's QEMU" behaviour...

-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 10:36 ` Helge Deller
  2025-03-18 10:45   ` Helge Deller
  2025-03-18 10:53   ` Peter Maydell
@ 2025-03-18 11:10   ` Andreas Schwab
  2 siblings, 0 replies; 16+ messages in thread
From: Andreas Schwab @ 2025-03-18 11:10 UTC (permalink / raw)
  To: Helge Deller; +Cc: qemu-devel

On Mär 18 2025, Helge Deller wrote:

> My current (unreliable) way to detect it is using uname.
> The kernel string and arch name don't match:
>
> (sid_hppa)root@paq:/# uname -a
> Linux paq 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) parisc GNU/Linux
>
> (sid_hppa)root@paq:/# uname -r
> 6.1.0-31-amd64
>
> (sid_hppa)root@paq:/# uname -m
> parisc
>
> This is a qemu-linux-user parisc(hppa) emulation running on x86-64.

That is highly distribution specific, by default the release part does
not contain anything arch specific.

For riscv the most reliable way is to look for "uarch *: qemu" in
/proc/cpuinfo.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 10:53   ` Peter Maydell
@ 2025-03-18 11:58     ` Daniel P. Berrangé
  2025-03-18 12:34       ` Andreas Schwab
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel P. Berrangé @ 2025-03-18 11:58 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Helge Deller, Andreas Schwab, qemu-devel

On Tue, Mar 18, 2025 at 10:53:27AM +0000, Peter Maydell wrote:
> On Tue, 18 Mar 2025 at 10:36, Helge Deller <deller@gmx.de> wrote:
> >
> > On 3/18/25 11:18, Andreas Schwab wrote:
> > > Is there a generic way for a program to detect that is it being run
> > > inside the linux-user emulation?
> >
> > Yes, having a reliable way to detect it would be good.
> >
> > My current (unreliable) way to detect it is using uname.
> 
> Yes, I don't believe there's currently an "intended"
> mechanism for detecting QEMU, only ways of noticing
> long-standing deviations from how the real kernel behaves.
> 
> > > The purpose for that would be to work around limitations of the
> > > emulation, like CLONE_VFORK being unsupported.
> >
> > yes, and robust futexes aren't supported either.
> 
> You don't need to detect QEMU for that one, though -- you can
> just try the get_robust_list syscall and if it fails ENOSYS
> then fall back to a codepath that doesn't use them (same as
> you would on an ancient kernel that didn't implement the
> syscall). Robust futexes are in the "technically extremely
> hard to impossible to support" bucket, per the comment in
> syscall.c.
> 
> > In qemu-user emulation we could change the return values of
> > "uname --processor" and/or "uname --hardware-platform".
> > Currently both always return "unknown", but in qemu we could
> > return the arch of the host.
> 
> As a mechanism that feels a bit risky to me -- at some
> point somebody may come along and say "my guest program
> requires that these return the expected values for
> the target CPU", and then you have a conflict between
> whether you want them to behave correctly for the
> target or to give you the "tell me it's QEMU" behaviour...

It also isn't future proof. People will change their program behaviour
based on the limitations of the particular QEMU version they tested
against. QEMU later changes/fixes its impl, and apps are not eithuer
applying a redundant workaround, or worse, applying a workaround that
is now actively harmful.

Whereever practical, it is preferrable to check a discrete feature
or behaviour in a functional way, rather than matching on "is it QEMU"

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 11:58     ` Daniel P. Berrangé
@ 2025-03-18 12:34       ` Andreas Schwab
  2025-03-18 12:43         ` Daniel P. Berrangé
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Schwab @ 2025-03-18 12:34 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Peter Maydell, Helge Deller, qemu-devel

On Mär 18 2025, Daniel P. Berrangé wrote:

> Whereever practical, it is preferrable to check a discrete feature
> or behaviour in a functional way, rather than matching on "is it QEMU"

Do you know a way to detect support for CLONE_VFORK that isn't too
expensive?

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 12:34       ` Andreas Schwab
@ 2025-03-18 12:43         ` Daniel P. Berrangé
  2025-03-18 13:06           ` Peter Maydell
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel P. Berrangé @ 2025-03-18 12:43 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Peter Maydell, Helge Deller, qemu-devel

On Tue, Mar 18, 2025 at 01:34:57PM +0100, Andreas Schwab wrote:
> On Mär 18 2025, Daniel P. Berrangé wrote:
> 
> > Whereever practical, it is preferrable to check a discrete feature
> > or behaviour in a functional way, rather than matching on "is it QEMU"
> 
> Do you know a way to detect support for CLONE_VFORK that isn't too
> expensive?

No, but I feel like the right thing in this particular case is to look
at improving our vfork impl. The current impl is incredibly crude and
acknowledged by the original author

  commit 436d124b7d538b1fd9cf72edf17770664c309856
  Author: Andrzej Zaborowski <balrogg@gmail.com>
  Date:   Sun Sep 21 02:39:45 2008 +0000

    Band-aid vfork() emulation (Kirill Shutemov).

I can see why they did it that way, but I'm feeling like it ought to
be possible to do a better special case vfork impl ni QEMU instead of
overloading the fork() impl.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 12:43         ` Daniel P. Berrangé
@ 2025-03-18 13:06           ` Peter Maydell
  2025-03-18 13:54             ` Daniel P. Berrangé
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2025-03-18 13:06 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel

On Tue, 18 Mar 2025 at 12:43, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Mar 18, 2025 at 01:34:57PM +0100, Andreas Schwab wrote:
> > On Mär 18 2025, Daniel P. Berrangé wrote:
> >
> > > Whereever practical, it is preferrable to check a discrete feature
> > > or behaviour in a functional way, rather than matching on "is it QEMU"
> >
> > Do you know a way to detect support for CLONE_VFORK that isn't too
> > expensive?
>
> No, but I feel like the right thing in this particular case is to look
> at improving our vfork impl. The current impl is incredibly crude and
> acknowledged by the original author
>
>   commit 436d124b7d538b1fd9cf72edf17770664c309856
>   Author: Andrzej Zaborowski <balrogg@gmail.com>
>   Date:   Sun Sep 21 02:39:45 2008 +0000
>
>     Band-aid vfork() emulation (Kirill Shutemov).
>
> I can see why they did it that way, but I'm feeling like it ought to
> be possible to do a better special case vfork impl ni QEMU instead of
> overloading the fork() impl.

The difficulty with vfork() (and, more generally, with various of
the clone() syscall flag combinations) is that because we use the
host libc we are restricted to the thread/process creation options
that that libc permits: which is only fork() and pthread_create().
vfork() wants "create a new process like fork with its own file
descriptors, signal handlers, etc, but share all the memory space with
the parent", and the host libc just doesn't provide us with the tools
to do that. (We can't call the host vfork() because we wouldn't be
abiding by the rules it imposes, like "don't return from the function
that called vfork".)

If we were implemented as a usermode emulator that sat on the raw
kernel syscalls, we could directly call the clone syscall and
use that to provide at least a wider range of the possible clone
flag options; but our dependency on libc means we have to avoid
doing things that would confuse it.

For vfork in particular, we could I guess do something like:
 * use real fork() to create child process
 * parent process arranges to wait until child process exits
   (via waitpid or equivalent) or it tells us it's about to exec
 * we make all the guest memory be mapped read-only in the child
   process, so we can trap writes and tell the parent about them
   so it can update its copy of the memory.
   (Sadly since we can't guaranteedly get control on termination
   events for the child before it really terminates, we can't
   do this memory-transfer in bulk at the end; otherwise we'd
   behave wrongly for the "child process gets SIGKILLed" case.)

Historically we've preferred to go for "assume that guests
will only want the looser POSIX semantics of vfork(), not the
tighter ones of the actual Linux syscall", but unfortunately
glibc has gone for the latter.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 13:06           ` Peter Maydell
@ 2025-03-18 13:54             ` Daniel P. Berrangé
  2025-03-18 14:17               ` Andreas Schwab
  2025-03-18 15:04               ` Peter Maydell
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel P. Berrangé @ 2025-03-18 13:54 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Andreas Schwab, Helge Deller, qemu-devel

On Tue, Mar 18, 2025 at 01:06:17PM +0000, Peter Maydell wrote:
> On Tue, 18 Mar 2025 at 12:43, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Tue, Mar 18, 2025 at 01:34:57PM +0100, Andreas Schwab wrote:
> > > On Mär 18 2025, Daniel P. Berrangé wrote:
> > >
> > > > Whereever practical, it is preferrable to check a discrete feature
> > > > or behaviour in a functional way, rather than matching on "is it QEMU"
> > >
> > > Do you know a way to detect support for CLONE_VFORK that isn't too
> > > expensive?
> >
> > No, but I feel like the right thing in this particular case is to look
> > at improving our vfork impl. The current impl is incredibly crude and
> > acknowledged by the original author
> >
> >   commit 436d124b7d538b1fd9cf72edf17770664c309856
> >   Author: Andrzej Zaborowski <balrogg@gmail.com>
> >   Date:   Sun Sep 21 02:39:45 2008 +0000
> >
> >     Band-aid vfork() emulation (Kirill Shutemov).
> >
> > I can see why they did it that way, but I'm feeling like it ought to
> > be possible to do a better special case vfork impl ni QEMU instead of
> > overloading the fork() impl.
> 
> The difficulty with vfork() (and, more generally, with various of
> the clone() syscall flag combinations) is that because we use the
> host libc we are restricted to the thread/process creation options
> that that libc permits: which is only fork() and pthread_create().
> vfork() wants "create a new process like fork with its own file
> descriptors, signal handlers, etc, but share all the memory space with
> the parent", and the host libc just doesn't provide us with the tools
> to do that. (We can't call the host vfork() because we wouldn't be
> abiding by the rules it imposes, like "don't return from the function
> that called vfork".)
> 
> If we were implemented as a usermode emulator that sat on the raw
> kernel syscalls, we could directly call the clone syscall and
> use that to provide at least a wider range of the possible clone
> flag options; but our dependency on libc means we have to avoid
> doing things that would confuse it.

I guess I'm not seeing how libc is blocking us in this respect ?
The clone() syscall wrapper is exposed by glibc at least, and it
is possible to call it, albeit with some caveats that we might
miss any logic glibc has around its fork() wrapper. The spec
requires that any child must immediately call execve after vfrok
so I'm wondering just what risk of confusion we would have in
practice ?

> For vfork in particular, we could I guess do something like:
>  * use real fork() to create child process
>  * parent process arranges to wait until child process exits
>    (via waitpid or equivalent) or it tells us it's about to exec
>  * we make all the guest memory be mapped read-only in the child
>    process, so we can trap writes and tell the parent about them
>    so it can update its copy of the memory.
>    (Sadly since we can't guaranteedly get control on termination
>    events for the child before it really terminates, we can't
>    do this memory-transfer in bulk at the end; otherwise we'd
>    behave wrongly for the "child process gets SIGKILLed" case.)

That would get the synchronization behaviour of Linux vfork,
but I'm not sure it'd get the performance benefits (of avoiding
page table copying) which is what  Andreas mentioned as the
desired thing ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 13:54             ` Daniel P. Berrangé
@ 2025-03-18 14:17               ` Andreas Schwab
  2025-03-18 17:32                 ` Daniel P. Berrangé
  2025-03-18 15:04               ` Peter Maydell
  1 sibling, 1 reply; 16+ messages in thread
From: Andreas Schwab @ 2025-03-18 14:17 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Peter Maydell, Helge Deller, qemu-devel

On Mär 18 2025, Daniel P. Berrangé wrote:

> That would get the synchronization behaviour of Linux vfork,
> but I'm not sure it'd get the performance benefits (of avoiding
> page table copying) which is what  Andreas mentioned as the
> desired thing ?

For an emulation performance isn't a thing, what we need is accuracy.
The current issue I have right now is that the MozillaFirefox package
fails to build because posix_spawn behaves unexpectedly.

https://build.opensuse.org/package/live_build_log/openSUSE:Factory:RISCV/MozillaFirefox/standard/riscv64

[  666s]  4:55.15 Traceback (most recent call last):
[  666s]  4:55.16   File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 80, in <module>
[  666s]  4:55.16     main()
[  666s]  4:55.16     ~~~~^^
[  666s]  4:55.16   File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 10, in main
[  666s]  4:55.16     cc_is_clang = 'clang' in subprocess.check_output(
[  666s]  4:55.16                              ~~~~~~~~~~~~~~~~~~~~~~~^
[  666s]  4:55.16       [cc, '--version'], universal_newlines=True, stderr=sink)
[  666s]  4:55.16       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[  666s]  4:55.16   File "/usr/lib64/python3.13/subprocess.py", line 474, in check_output
[  666s]  4:55.16     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
[  666s]  4:55.16            ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[  666s]  4:55.17                **kwargs).stdout
[  666s]  4:55.17                ^^^^^^^^^
[  666s]  4:55.17   File "/usr/lib64/python3.13/subprocess.py", line 579, in run
[  666s]  4:55.17     raise CalledProcessError(retcode, process.args,
[  666s]  4:55.17                              output=stdout, stderr=stderr)
[  666s]  4:55.17 subprocess.CalledProcessError: Command '['/usr/bin/ccache /usr/bin/gcc', '--version']' returned non-zero exit status 127.

A real posix_spawn would have set errno to ENOENT.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 13:54             ` Daniel P. Berrangé
  2025-03-18 14:17               ` Andreas Schwab
@ 2025-03-18 15:04               ` Peter Maydell
  2025-03-18 17:08                 ` Peter Maydell
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2025-03-18 15:04 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel

On Tue, 18 Mar 2025 at 13:55, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Mar 18, 2025 at 01:06:17PM +0000, Peter Maydell wrote:
> > The difficulty with vfork() (and, more generally, with various of
> > the clone() syscall flag combinations) is that because we use the
> > host libc we are restricted to the thread/process creation options
> > that that libc permits: which is only fork() and pthread_create().
> > vfork() wants "create a new process like fork with its own file
> > descriptors, signal handlers, etc, but share all the memory space with
> > the parent", and the host libc just doesn't provide us with the tools
> > to do that. (We can't call the host vfork() because we wouldn't be
> > abiding by the rules it imposes, like "don't return from the function
> > that called vfork".)
> >
> > If we were implemented as a usermode emulator that sat on the raw
> > kernel syscalls, we could directly call the clone syscall and
> > use that to provide at least a wider range of the possible clone
> > flag options; but our dependency on libc means we have to avoid
> > doing things that would confuse it.
>
> I guess I'm not seeing how libc is blocking us in this respect ?
> The clone() syscall wrapper is exposed by glibc at least, and it
> is possible to call it, albeit with some caveats that we might
> miss any logic glibc has around its fork() wrapper. The spec
> requires that any child must immediately call execve after vfrok
> so I'm wondering just what risk of confusion we would have in
> practice ?

I think my notes about clone are a red herring for vfork
specifically. For vfork in the child, the vfork spec requires
a very minimal amount of stuff to happen in the child, but QEMU's
own TCG data structures and calls and processes mean that we
will be doing a lot more than the guest does. For instance,
we need to return from the function that called vfork, so we
can continue to execute the guest code. And the guest code will
likely call into the translator to generate more code, which will
(a) mess up the TCG data structures for the parent and (b)
probably result in our calling into libc functions that aren't
OK to call.

More generally, AIUI glibc expects that it has control over what's
happening with threads, so it can set up its own data structures
for the new thread (e.g. for TLS variables). This email from the
glibc mailing list is admittedly now two decades old
https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/
but it says:

# Basically, if you want to call libc functions you should do it from a
# thread that was set up by libc or libpthread.  i.e., if you make your own
# threads with clone, only call libc functions from the initial thread.

> > For vfork in particular, we could I guess do something like:
> >  * use real fork() to create child process
> >  * parent process arranges to wait until child process exits
> >    (via waitpid or equivalent) or it tells us it's about to exec
> >  * we make all the guest memory be mapped read-only in the child
> >    process, so we can trap writes and tell the parent about them
> >    so it can update its copy of the memory.
> >    (Sadly since we can't guaranteedly get control on termination
> >    events for the child before it really terminates, we can't
> >    do this memory-transfer in bulk at the end; otherwise we'd
> >    behave wrongly for the "child process gets SIGKILLed" case.)
>
> That would get the synchronization behaviour of Linux vfork,
> but I'm not sure it'd get the performance benefits (of avoiding
> page table copying) which is what  Andreas mentioned as the
> desired thing ?

The problem is that the guest glibc is using CLONE_VFORK in
a particular way for performance reasons on real hardware,
which is valid for real kernel CLONE_VFORK but which our
lack of accuracy in emulation means we mishandle, causing the
guest to fall over. The actual performance under QEMU isn't
important.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 15:04               ` Peter Maydell
@ 2025-03-18 17:08                 ` Peter Maydell
  2025-03-18 17:18                   ` Daniel P. Berrangé
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2025-03-18 17:08 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel

On Tue, 18 Mar 2025 at 15:04, Peter Maydell <peter.maydell@linaro.org> wrote:
> More generally, AIUI glibc expects that it has control over what's
> happening with threads, so it can set up its own data structures
> for the new thread (e.g. for TLS variables). This email from the
> glibc mailing list is admittedly now two decades old
> https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/
> but it says:
>
> # Basically, if you want to call libc functions you should do it from a
> # thread that was set up by libc or libpthread.  i.e., if you make your own
> # threads with clone, only call libc functions from the initial thread.

I spoke to some glibc devs on IRC and they confirmed that this
remains true for modern glibc: because glibc needs to set up
things like TLS on new threads, you can't mix your own direct
calls to clone() with calls to glibc functions.

-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 17:08                 ` Peter Maydell
@ 2025-03-18 17:18                   ` Daniel P. Berrangé
  2025-03-18 17:48                     ` Peter Maydell
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel P. Berrangé @ 2025-03-18 17:18 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Andreas Schwab, Helge Deller, qemu-devel

On Tue, Mar 18, 2025 at 05:08:52PM +0000, Peter Maydell wrote:
> On Tue, 18 Mar 2025 at 15:04, Peter Maydell <peter.maydell@linaro.org> wrote:
> > More generally, AIUI glibc expects that it has control over what's
> > happening with threads, so it can set up its own data structures
> > for the new thread (e.g. for TLS variables). This email from the
> > glibc mailing list is admittedly now two decades old
> > https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/
> > but it says:
> >
> > # Basically, if you want to call libc functions you should do it from a
> > # thread that was set up by libc or libpthread.  i.e., if you make your own
> > # threads with clone, only call libc functions from the initial thread.
> 
> I spoke to some glibc devs on IRC and they confirmed that this
> remains true for modern glibc: because glibc needs to set up
> things like TLS on new threads, you can't mix your own direct
> calls to clone() with calls to glibc functions.

Using clone() directly is done by a number of projects (systemd, libvirt,
podman/docker/runc, etc) that want to create containers, while freely using
arbitrary glibc calls in the program. You do need to be careful what glibc
functions you run in the child after clone, but before execve though.

For the projects I mention, avoiding the danger areas is probably easier
than for QEMU, since QEMU has to theoretically cope with whatever madness
the guest program chooses to do, while those programs know exactly what
they will run between clone & execve.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 14:17               ` Andreas Schwab
@ 2025-03-18 17:32                 ` Daniel P. Berrangé
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel P. Berrangé @ 2025-03-18 17:32 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Peter Maydell, Helge Deller, qemu-devel

On Tue, Mar 18, 2025 at 03:17:33PM +0100, Andreas Schwab wrote:
> On Mär 18 2025, Daniel P. Berrangé wrote:
> 
> > That would get the synchronization behaviour of Linux vfork,
> > but I'm not sure it'd get the performance benefits (of avoiding
> > page table copying) which is what  Andreas mentioned as the
> > desired thing ?
> 
> For an emulation performance isn't a thing, what we need is accuracy.
> The current issue I have right now is that the MozillaFirefox package
> fails to build because posix_spawn behaves unexpectedly.
> 
> https://build.opensuse.org/package/live_build_log/openSUSE:Factory:RISCV/MozillaFirefox/standard/riscv64
> 
> [  666s]  4:55.15 Traceback (most recent call last):
> [  666s]  4:55.16   File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 80, in <module>
> [  666s]  4:55.16     main()
> [  666s]  4:55.16     ~~~~^^
> [  666s]  4:55.16   File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 10, in main
> [  666s]  4:55.16     cc_is_clang = 'clang' in subprocess.check_output(
> [  666s]  4:55.16                              ~~~~~~~~~~~~~~~~~~~~~~~^
> [  666s]  4:55.16       [cc, '--version'], universal_newlines=True, stderr=sink)
> [  666s]  4:55.16       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> [  666s]  4:55.16   File "/usr/lib64/python3.13/subprocess.py", line 474, in check_output
> [  666s]  4:55.16     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
> [  666s]  4:55.16            ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> [  666s]  4:55.17                **kwargs).stdout
> [  666s]  4:55.17                ^^^^^^^^^
> [  666s]  4:55.17   File "/usr/lib64/python3.13/subprocess.py", line 579, in run
> [  666s]  4:55.17     raise CalledProcessError(retcode, process.args,
> [  666s]  4:55.17                              output=stdout, stderr=stderr)
> [  666s]  4:55.17 subprocess.CalledProcessError: Command '['/usr/bin/ccache /usr/bin/gcc', '--version']' returned non-zero exit status 127.
> 
> A real posix_spawn would have set errno to ENOENT.

I look at how the errno is propagated.

In glibc, they have a struct on the stack of the parent into which the
child will write the errno. This relies on the the vfork() semantics of
sharing of pages, and thus breaks when we use fork() that makes the
pages copy-on-write - the child writes the errno, but the parent will
never see it.

In musl, they create a pipe and the child writes the errno in the pipe
which the parent then reads, so they're seemingly not relying on the
sharing of pages and appears to work under QEMU's impl. 

I don't see an attractive workaround to make glibc's impl compatible
with QEMU, without making QEMU fully use VFORK, with the risk that
entails.  Wonder if its worth enquiring if glibc would be interested
in following musl's approach to make it more emulation friendly for
QEMU ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Generic way to detect qemu linux-user emulation
  2025-03-18 17:18                   ` Daniel P. Berrangé
@ 2025-03-18 17:48                     ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2025-03-18 17:48 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel

On Tue, 18 Mar 2025 at 17:18, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Mar 18, 2025 at 05:08:52PM +0000, Peter Maydell wrote:
> > On Tue, 18 Mar 2025 at 15:04, Peter Maydell <peter.maydell@linaro.org> wrote:
> > > More generally, AIUI glibc expects that it has control over what's
> > > happening with threads, so it can set up its own data structures
> > > for the new thread (e.g. for TLS variables). This email from the
> > > glibc mailing list is admittedly now two decades old
> > > https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/
> > > but it says:
> > >
> > > # Basically, if you want to call libc functions you should do it from a
> > > # thread that was set up by libc or libpthread.  i.e., if you make your own
> > > # threads with clone, only call libc functions from the initial thread.
> >
> > I spoke to some glibc devs on IRC and they confirmed that this
> > remains true for modern glibc: because glibc needs to set up
> > things like TLS on new threads, you can't mix your own direct
> > calls to clone() with calls to glibc functions.
>
> Using clone() directly is done by a number of projects (systemd, libvirt,
> podman/docker/runc, etc) that want to create containers, while freely using
> arbitrary glibc calls in the program. You do need to be careful what glibc
> functions you run in the child after clone, but before execve though.

Yes, if you don't call glibc functions in the child that's fine.
If those other projects are calling some glibc functions post
clone() in the child then I think they're relying on undocumented
behaviour that might break on them in future...

> For the projects I mention, avoiding the danger areas is probably easier
> than for QEMU, since QEMU has to theoretically cope with whatever madness
> the guest program chooses to do, while those programs know exactly what
> they will run between clone & execve.

QEMU's structure also is that we assume we can freely call
glibc functions as a result of TCG operations. So even if the
child in the guest is very carefully doing absolutely no
other library calls between clone and execve, QEMU itself
will be doing them.

> Wonder if its worth enquiring if glibc would be interested
> in following musl's approach to make it more emulation friendly for
> QEMU ?

That would essentially be asking "please can you revert glibc
commit 4b4d4056bb154603f36 ?", so probably not:

https://sourceware.org/git/?p=glibc.git;a=commit;h=4b4d4056bb154603f36

-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-03-18 17:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-18 10:18 Generic way to detect qemu linux-user emulation Andreas Schwab
2025-03-18 10:36 ` Helge Deller
2025-03-18 10:45   ` Helge Deller
2025-03-18 10:53   ` Peter Maydell
2025-03-18 11:58     ` Daniel P. Berrangé
2025-03-18 12:34       ` Andreas Schwab
2025-03-18 12:43         ` Daniel P. Berrangé
2025-03-18 13:06           ` Peter Maydell
2025-03-18 13:54             ` Daniel P. Berrangé
2025-03-18 14:17               ` Andreas Schwab
2025-03-18 17:32                 ` Daniel P. Berrangé
2025-03-18 15:04               ` Peter Maydell
2025-03-18 17:08                 ` Peter Maydell
2025-03-18 17:18                   ` Daniel P. Berrangé
2025-03-18 17:48                     ` Peter Maydell
2025-03-18 11:10   ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).