* Generic way to detect qemu linux-user emulation @ 2025-03-18 10:18 Andreas Schwab 2025-03-18 10:36 ` Helge Deller 2026-03-25 14:51 ` Lawrence Hunter 0 siblings, 2 replies; 18+ messages in thread From: Andreas Schwab @ 2025-03-18 10:18 UTC (permalink / raw) To: qemu-devel Is there a generic way for a program to detect that is it being run inside the linux-user emulation? The purpose for that would be to work around limitations of the emulation, like CLONE_VFORK being unsupported. For example, python >= 3.13 needs to avoid using posix_spawn in that case, because the emulation of CLONE_VFORK as a true fork makes it impossible for it to report errors back to the parent process. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:18 Generic way to detect qemu linux-user emulation Andreas Schwab @ 2025-03-18 10:36 ` Helge Deller 2025-03-18 10:45 ` Helge Deller ` (2 more replies) 2026-03-25 14:51 ` Lawrence Hunter 1 sibling, 3 replies; 18+ messages in thread From: Helge Deller @ 2025-03-18 10:36 UTC (permalink / raw) To: Andreas Schwab, qemu-devel On 3/18/25 11:18, Andreas Schwab wrote: > Is there a generic way for a program to detect that is it being run > inside the linux-user emulation? Yes, having a reliable way to detect it would be good. My current (unreliable) way to detect it is using uname. The kernel string and arch name don't match: (sid_hppa)root@paq:/# uname -a Linux paq 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) parisc GNU/Linux (sid_hppa)root@paq:/# uname -r 6.1.0-31-amd64 (sid_hppa)root@paq:/# uname -m parisc This is a qemu-linux-user parisc(hppa) emulation running on x86-64. > The purpose for that would be to work around limitations of the > emulation, like CLONE_VFORK being unsupported. yes, and robust futexes aren't supported either. > For example, python >= > 3.13 needs to avoid using posix_spawn in that case, because the > emulation of CLONE_VFORK as a true fork makes it impossible for it to > report errors back to the parent process. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:36 ` Helge Deller @ 2025-03-18 10:45 ` Helge Deller 2025-03-18 10:53 ` Peter Maydell 2025-03-18 11:10 ` Andreas Schwab 2 siblings, 0 replies; 18+ messages in thread From: Helge Deller @ 2025-03-18 10:45 UTC (permalink / raw) To: Andreas Schwab, qemu-devel On 3/18/25 11:36, Helge Deller wrote: > On 3/18/25 11:18, Andreas Schwab wrote: >> Is there a generic way for a program to detect that is it being run >> inside the linux-user emulation? > > Yes, having a reliable way to detect it would be good. In qemu-user emulation we could change the return values of "uname --processor" and/or "uname --hardware-platform". Currently both always return "unknown", but in qemu we could return the arch of the host. Another possibility is to extend prctl(), but I think uname is easier to handle in scripts and such... > My current (unreliable) way to detect it is using uname. > The kernel string and arch name don't match: > > (sid_hppa)root@paq:/# uname -a > Linux paq 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) parisc GNU/Linux > > (sid_hppa)root@paq:/# uname -r > 6.1.0-31-amd64 > > (sid_hppa)root@paq:/# uname -m > parisc > > This is a qemu-linux-user parisc(hppa) emulation running on x86-64. > >> The purpose for that would be to work around limitations of the >> emulation, like CLONE_VFORK being unsupported. > > yes, and robust futexes aren't supported either. > >> For example, python >= >> 3.13 needs to avoid using posix_spawn in that case, because the >> emulation of CLONE_VFORK as a true fork makes it impossible for it to >> report errors back to the parent process. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:36 ` Helge Deller 2025-03-18 10:45 ` Helge Deller @ 2025-03-18 10:53 ` Peter Maydell 2025-03-18 11:58 ` Daniel P. Berrangé 2025-03-18 11:10 ` Andreas Schwab 2 siblings, 1 reply; 18+ messages in thread From: Peter Maydell @ 2025-03-18 10:53 UTC (permalink / raw) To: Helge Deller; +Cc: Andreas Schwab, qemu-devel On Tue, 18 Mar 2025 at 10:36, Helge Deller <deller@gmx.de> wrote: > > On 3/18/25 11:18, Andreas Schwab wrote: > > Is there a generic way for a program to detect that is it being run > > inside the linux-user emulation? > > Yes, having a reliable way to detect it would be good. > > My current (unreliable) way to detect it is using uname. Yes, I don't believe there's currently an "intended" mechanism for detecting QEMU, only ways of noticing long-standing deviations from how the real kernel behaves. > > The purpose for that would be to work around limitations of the > > emulation, like CLONE_VFORK being unsupported. > > yes, and robust futexes aren't supported either. You don't need to detect QEMU for that one, though -- you can just try the get_robust_list syscall and if it fails ENOSYS then fall back to a codepath that doesn't use them (same as you would on an ancient kernel that didn't implement the syscall). Robust futexes are in the "technically extremely hard to impossible to support" bucket, per the comment in syscall.c. > In qemu-user emulation we could change the return values of > "uname --processor" and/or "uname --hardware-platform". > Currently both always return "unknown", but in qemu we could > return the arch of the host. As a mechanism that feels a bit risky to me -- at some point somebody may come along and say "my guest program requires that these return the expected values for the target CPU", and then you have a conflict between whether you want them to behave correctly for the target or to give you the "tell me it's QEMU" behaviour... -- PMM ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:53 ` Peter Maydell @ 2025-03-18 11:58 ` Daniel P. Berrangé 2025-03-18 12:34 ` Andreas Schwab 0 siblings, 1 reply; 18+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 11:58 UTC (permalink / raw) To: Peter Maydell; +Cc: Helge Deller, Andreas Schwab, qemu-devel On Tue, Mar 18, 2025 at 10:53:27AM +0000, Peter Maydell wrote: > On Tue, 18 Mar 2025 at 10:36, Helge Deller <deller@gmx.de> wrote: > > > > On 3/18/25 11:18, Andreas Schwab wrote: > > > Is there a generic way for a program to detect that is it being run > > > inside the linux-user emulation? > > > > Yes, having a reliable way to detect it would be good. > > > > My current (unreliable) way to detect it is using uname. > > Yes, I don't believe there's currently an "intended" > mechanism for detecting QEMU, only ways of noticing > long-standing deviations from how the real kernel behaves. > > > > The purpose for that would be to work around limitations of the > > > emulation, like CLONE_VFORK being unsupported. > > > > yes, and robust futexes aren't supported either. > > You don't need to detect QEMU for that one, though -- you can > just try the get_robust_list syscall and if it fails ENOSYS > then fall back to a codepath that doesn't use them (same as > you would on an ancient kernel that didn't implement the > syscall). Robust futexes are in the "technically extremely > hard to impossible to support" bucket, per the comment in > syscall.c. > > > In qemu-user emulation we could change the return values of > > "uname --processor" and/or "uname --hardware-platform". > > Currently both always return "unknown", but in qemu we could > > return the arch of the host. > > As a mechanism that feels a bit risky to me -- at some > point somebody may come along and say "my guest program > requires that these return the expected values for > the target CPU", and then you have a conflict between > whether you want them to behave correctly for the > target or to give you the "tell me it's QEMU" behaviour... It also isn't future proof. People will change their program behaviour based on the limitations of the particular QEMU version they tested against. QEMU later changes/fixes its impl, and apps are not eithuer applying a redundant workaround, or worse, applying a workaround that is now actively harmful. Whereever practical, it is preferrable to check a discrete feature or behaviour in a functional way, rather than matching on "is it QEMU" With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 11:58 ` Daniel P. Berrangé @ 2025-03-18 12:34 ` Andreas Schwab 2025-03-18 12:43 ` Daniel P. Berrangé 0 siblings, 1 reply; 18+ messages in thread From: Andreas Schwab @ 2025-03-18 12:34 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Peter Maydell, Helge Deller, qemu-devel On Mär 18 2025, Daniel P. Berrangé wrote: > Whereever practical, it is preferrable to check a discrete feature > or behaviour in a functional way, rather than matching on "is it QEMU" Do you know a way to detect support for CLONE_VFORK that isn't too expensive? -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 12:34 ` Andreas Schwab @ 2025-03-18 12:43 ` Daniel P. Berrangé 2025-03-18 13:06 ` Peter Maydell 0 siblings, 1 reply; 18+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 12:43 UTC (permalink / raw) To: Andreas Schwab; +Cc: Peter Maydell, Helge Deller, qemu-devel On Tue, Mar 18, 2025 at 01:34:57PM +0100, Andreas Schwab wrote: > On Mär 18 2025, Daniel P. Berrangé wrote: > > > Whereever practical, it is preferrable to check a discrete feature > > or behaviour in a functional way, rather than matching on "is it QEMU" > > Do you know a way to detect support for CLONE_VFORK that isn't too > expensive? No, but I feel like the right thing in this particular case is to look at improving our vfork impl. The current impl is incredibly crude and acknowledged by the original author commit 436d124b7d538b1fd9cf72edf17770664c309856 Author: Andrzej Zaborowski <balrogg@gmail.com> Date: Sun Sep 21 02:39:45 2008 +0000 Band-aid vfork() emulation (Kirill Shutemov). I can see why they did it that way, but I'm feeling like it ought to be possible to do a better special case vfork impl ni QEMU instead of overloading the fork() impl. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 12:43 ` Daniel P. Berrangé @ 2025-03-18 13:06 ` Peter Maydell 2025-03-18 13:54 ` Daniel P. Berrangé 0 siblings, 1 reply; 18+ messages in thread From: Peter Maydell @ 2025-03-18 13:06 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, 18 Mar 2025 at 12:43, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Tue, Mar 18, 2025 at 01:34:57PM +0100, Andreas Schwab wrote: > > On Mär 18 2025, Daniel P. Berrangé wrote: > > > > > Whereever practical, it is preferrable to check a discrete feature > > > or behaviour in a functional way, rather than matching on "is it QEMU" > > > > Do you know a way to detect support for CLONE_VFORK that isn't too > > expensive? > > No, but I feel like the right thing in this particular case is to look > at improving our vfork impl. The current impl is incredibly crude and > acknowledged by the original author > > commit 436d124b7d538b1fd9cf72edf17770664c309856 > Author: Andrzej Zaborowski <balrogg@gmail.com> > Date: Sun Sep 21 02:39:45 2008 +0000 > > Band-aid vfork() emulation (Kirill Shutemov). > > I can see why they did it that way, but I'm feeling like it ought to > be possible to do a better special case vfork impl ni QEMU instead of > overloading the fork() impl. The difficulty with vfork() (and, more generally, with various of the clone() syscall flag combinations) is that because we use the host libc we are restricted to the thread/process creation options that that libc permits: which is only fork() and pthread_create(). vfork() wants "create a new process like fork with its own file descriptors, signal handlers, etc, but share all the memory space with the parent", and the host libc just doesn't provide us with the tools to do that. (We can't call the host vfork() because we wouldn't be abiding by the rules it imposes, like "don't return from the function that called vfork".) If we were implemented as a usermode emulator that sat on the raw kernel syscalls, we could directly call the clone syscall and use that to provide at least a wider range of the possible clone flag options; but our dependency on libc means we have to avoid doing things that would confuse it. For vfork in particular, we could I guess do something like: * use real fork() to create child process * parent process arranges to wait until child process exits (via waitpid or equivalent) or it tells us it's about to exec * we make all the guest memory be mapped read-only in the child process, so we can trap writes and tell the parent about them so it can update its copy of the memory. (Sadly since we can't guaranteedly get control on termination events for the child before it really terminates, we can't do this memory-transfer in bulk at the end; otherwise we'd behave wrongly for the "child process gets SIGKILLed" case.) Historically we've preferred to go for "assume that guests will only want the looser POSIX semantics of vfork(), not the tighter ones of the actual Linux syscall", but unfortunately glibc has gone for the latter. thanks -- PMM ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 13:06 ` Peter Maydell @ 2025-03-18 13:54 ` Daniel P. Berrangé 2025-03-18 14:17 ` Andreas Schwab 2025-03-18 15:04 ` Peter Maydell 0 siblings, 2 replies; 18+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 13:54 UTC (permalink / raw) To: Peter Maydell; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, Mar 18, 2025 at 01:06:17PM +0000, Peter Maydell wrote: > On Tue, 18 Mar 2025 at 12:43, Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Tue, Mar 18, 2025 at 01:34:57PM +0100, Andreas Schwab wrote: > > > On Mär 18 2025, Daniel P. Berrangé wrote: > > > > > > > Whereever practical, it is preferrable to check a discrete feature > > > > or behaviour in a functional way, rather than matching on "is it QEMU" > > > > > > Do you know a way to detect support for CLONE_VFORK that isn't too > > > expensive? > > > > No, but I feel like the right thing in this particular case is to look > > at improving our vfork impl. The current impl is incredibly crude and > > acknowledged by the original author > > > > commit 436d124b7d538b1fd9cf72edf17770664c309856 > > Author: Andrzej Zaborowski <balrogg@gmail.com> > > Date: Sun Sep 21 02:39:45 2008 +0000 > > > > Band-aid vfork() emulation (Kirill Shutemov). > > > > I can see why they did it that way, but I'm feeling like it ought to > > be possible to do a better special case vfork impl ni QEMU instead of > > overloading the fork() impl. > > The difficulty with vfork() (and, more generally, with various of > the clone() syscall flag combinations) is that because we use the > host libc we are restricted to the thread/process creation options > that that libc permits: which is only fork() and pthread_create(). > vfork() wants "create a new process like fork with its own file > descriptors, signal handlers, etc, but share all the memory space with > the parent", and the host libc just doesn't provide us with the tools > to do that. (We can't call the host vfork() because we wouldn't be > abiding by the rules it imposes, like "don't return from the function > that called vfork".) > > If we were implemented as a usermode emulator that sat on the raw > kernel syscalls, we could directly call the clone syscall and > use that to provide at least a wider range of the possible clone > flag options; but our dependency on libc means we have to avoid > doing things that would confuse it. I guess I'm not seeing how libc is blocking us in this respect ? The clone() syscall wrapper is exposed by glibc at least, and it is possible to call it, albeit with some caveats that we might miss any logic glibc has around its fork() wrapper. The spec requires that any child must immediately call execve after vfrok so I'm wondering just what risk of confusion we would have in practice ? > For vfork in particular, we could I guess do something like: > * use real fork() to create child process > * parent process arranges to wait until child process exits > (via waitpid or equivalent) or it tells us it's about to exec > * we make all the guest memory be mapped read-only in the child > process, so we can trap writes and tell the parent about them > so it can update its copy of the memory. > (Sadly since we can't guaranteedly get control on termination > events for the child before it really terminates, we can't > do this memory-transfer in bulk at the end; otherwise we'd > behave wrongly for the "child process gets SIGKILLed" case.) That would get the synchronization behaviour of Linux vfork, but I'm not sure it'd get the performance benefits (of avoiding page table copying) which is what Andreas mentioned as the desired thing ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 13:54 ` Daniel P. Berrangé @ 2025-03-18 14:17 ` Andreas Schwab 2025-03-18 17:32 ` Daniel P. Berrangé 2025-03-18 15:04 ` Peter Maydell 1 sibling, 1 reply; 18+ messages in thread From: Andreas Schwab @ 2025-03-18 14:17 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Peter Maydell, Helge Deller, qemu-devel On Mär 18 2025, Daniel P. Berrangé wrote: > That would get the synchronization behaviour of Linux vfork, > but I'm not sure it'd get the performance benefits (of avoiding > page table copying) which is what Andreas mentioned as the > desired thing ? For an emulation performance isn't a thing, what we need is accuracy. The current issue I have right now is that the MozillaFirefox package fails to build because posix_spawn behaves unexpectedly. https://build.opensuse.org/package/live_build_log/openSUSE:Factory:RISCV/MozillaFirefox/standard/riscv64 [ 666s] 4:55.15 Traceback (most recent call last): [ 666s] 4:55.16 File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 80, in <module> [ 666s] 4:55.16 main() [ 666s] 4:55.16 ~~~~^^ [ 666s] 4:55.16 File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 10, in main [ 666s] 4:55.16 cc_is_clang = 'clang' in subprocess.check_output( [ 666s] 4:55.16 ~~~~~~~~~~~~~~~~~~~~~~~^ [ 666s] 4:55.16 [cc, '--version'], universal_newlines=True, stderr=sink) [ 666s] 4:55.16 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [ 666s] 4:55.16 File "/usr/lib64/python3.13/subprocess.py", line 474, in check_output [ 666s] 4:55.16 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, [ 666s] 4:55.16 ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [ 666s] 4:55.17 **kwargs).stdout [ 666s] 4:55.17 ^^^^^^^^^ [ 666s] 4:55.17 File "/usr/lib64/python3.13/subprocess.py", line 579, in run [ 666s] 4:55.17 raise CalledProcessError(retcode, process.args, [ 666s] 4:55.17 output=stdout, stderr=stderr) [ 666s] 4:55.17 subprocess.CalledProcessError: Command '['/usr/bin/ccache /usr/bin/gcc', '--version']' returned non-zero exit status 127. A real posix_spawn would have set errno to ENOENT. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 14:17 ` Andreas Schwab @ 2025-03-18 17:32 ` Daniel P. Berrangé 0 siblings, 0 replies; 18+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 17:32 UTC (permalink / raw) To: Andreas Schwab; +Cc: Peter Maydell, Helge Deller, qemu-devel On Tue, Mar 18, 2025 at 03:17:33PM +0100, Andreas Schwab wrote: > On Mär 18 2025, Daniel P. Berrangé wrote: > > > That would get the synchronization behaviour of Linux vfork, > > but I'm not sure it'd get the performance benefits (of avoiding > > page table copying) which is what Andreas mentioned as the > > desired thing ? > > For an emulation performance isn't a thing, what we need is accuracy. > The current issue I have right now is that the MozillaFirefox package > fails to build because posix_spawn behaves unexpectedly. > > https://build.opensuse.org/package/live_build_log/openSUSE:Factory:RISCV/MozillaFirefox/standard/riscv64 > > [ 666s] 4:55.15 Traceback (most recent call last): > [ 666s] 4:55.16 File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 80, in <module> > [ 666s] 4:55.16 main() > [ 666s] 4:55.16 ~~~~^^ > [ 666s] 4:55.16 File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 10, in main > [ 666s] 4:55.16 cc_is_clang = 'clang' in subprocess.check_output( > [ 666s] 4:55.16 ~~~~~~~~~~~~~~~~~~~~~~~^ > [ 666s] 4:55.16 [cc, '--version'], universal_newlines=True, stderr=sink) > [ 666s] 4:55.16 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > [ 666s] 4:55.16 File "/usr/lib64/python3.13/subprocess.py", line 474, in check_output > [ 666s] 4:55.16 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, > [ 666s] 4:55.16 ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > [ 666s] 4:55.17 **kwargs).stdout > [ 666s] 4:55.17 ^^^^^^^^^ > [ 666s] 4:55.17 File "/usr/lib64/python3.13/subprocess.py", line 579, in run > [ 666s] 4:55.17 raise CalledProcessError(retcode, process.args, > [ 666s] 4:55.17 output=stdout, stderr=stderr) > [ 666s] 4:55.17 subprocess.CalledProcessError: Command '['/usr/bin/ccache /usr/bin/gcc', '--version']' returned non-zero exit status 127. > > A real posix_spawn would have set errno to ENOENT. I look at how the errno is propagated. In glibc, they have a struct on the stack of the parent into which the child will write the errno. This relies on the the vfork() semantics of sharing of pages, and thus breaks when we use fork() that makes the pages copy-on-write - the child writes the errno, but the parent will never see it. In musl, they create a pipe and the child writes the errno in the pipe which the parent then reads, so they're seemingly not relying on the sharing of pages and appears to work under QEMU's impl. I don't see an attractive workaround to make glibc's impl compatible with QEMU, without making QEMU fully use VFORK, with the risk that entails. Wonder if its worth enquiring if glibc would be interested in following musl's approach to make it more emulation friendly for QEMU ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 13:54 ` Daniel P. Berrangé 2025-03-18 14:17 ` Andreas Schwab @ 2025-03-18 15:04 ` Peter Maydell 2025-03-18 17:08 ` Peter Maydell 2026-03-25 17:08 ` Florian Weimer 1 sibling, 2 replies; 18+ messages in thread From: Peter Maydell @ 2025-03-18 15:04 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, 18 Mar 2025 at 13:55, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Tue, Mar 18, 2025 at 01:06:17PM +0000, Peter Maydell wrote: > > The difficulty with vfork() (and, more generally, with various of > > the clone() syscall flag combinations) is that because we use the > > host libc we are restricted to the thread/process creation options > > that that libc permits: which is only fork() and pthread_create(). > > vfork() wants "create a new process like fork with its own file > > descriptors, signal handlers, etc, but share all the memory space with > > the parent", and the host libc just doesn't provide us with the tools > > to do that. (We can't call the host vfork() because we wouldn't be > > abiding by the rules it imposes, like "don't return from the function > > that called vfork".) > > > > If we were implemented as a usermode emulator that sat on the raw > > kernel syscalls, we could directly call the clone syscall and > > use that to provide at least a wider range of the possible clone > > flag options; but our dependency on libc means we have to avoid > > doing things that would confuse it. > > I guess I'm not seeing how libc is blocking us in this respect ? > The clone() syscall wrapper is exposed by glibc at least, and it > is possible to call it, albeit with some caveats that we might > miss any logic glibc has around its fork() wrapper. The spec > requires that any child must immediately call execve after vfrok > so I'm wondering just what risk of confusion we would have in > practice ? I think my notes about clone are a red herring for vfork specifically. For vfork in the child, the vfork spec requires a very minimal amount of stuff to happen in the child, but QEMU's own TCG data structures and calls and processes mean that we will be doing a lot more than the guest does. For instance, we need to return from the function that called vfork, so we can continue to execute the guest code. And the guest code will likely call into the translator to generate more code, which will (a) mess up the TCG data structures for the parent and (b) probably result in our calling into libc functions that aren't OK to call. More generally, AIUI glibc expects that it has control over what's happening with threads, so it can set up its own data structures for the new thread (e.g. for TLS variables). This email from the glibc mailing list is admittedly now two decades old https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/ but it says: # Basically, if you want to call libc functions you should do it from a # thread that was set up by libc or libpthread. i.e., if you make your own # threads with clone, only call libc functions from the initial thread. > > For vfork in particular, we could I guess do something like: > > * use real fork() to create child process > > * parent process arranges to wait until child process exits > > (via waitpid or equivalent) or it tells us it's about to exec > > * we make all the guest memory be mapped read-only in the child > > process, so we can trap writes and tell the parent about them > > so it can update its copy of the memory. > > (Sadly since we can't guaranteedly get control on termination > > events for the child before it really terminates, we can't > > do this memory-transfer in bulk at the end; otherwise we'd > > behave wrongly for the "child process gets SIGKILLed" case.) > > That would get the synchronization behaviour of Linux vfork, > but I'm not sure it'd get the performance benefits (of avoiding > page table copying) which is what Andreas mentioned as the > desired thing ? The problem is that the guest glibc is using CLONE_VFORK in a particular way for performance reasons on real hardware, which is valid for real kernel CLONE_VFORK but which our lack of accuracy in emulation means we mishandle, causing the guest to fall over. The actual performance under QEMU isn't important. thanks -- PMM ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 15:04 ` Peter Maydell @ 2025-03-18 17:08 ` Peter Maydell 2025-03-18 17:18 ` Daniel P. Berrangé 2026-03-25 17:08 ` Florian Weimer 1 sibling, 1 reply; 18+ messages in thread From: Peter Maydell @ 2025-03-18 17:08 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, 18 Mar 2025 at 15:04, Peter Maydell <peter.maydell@linaro.org> wrote: > More generally, AIUI glibc expects that it has control over what's > happening with threads, so it can set up its own data structures > for the new thread (e.g. for TLS variables). This email from the > glibc mailing list is admittedly now two decades old > https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/ > but it says: > > # Basically, if you want to call libc functions you should do it from a > # thread that was set up by libc or libpthread. i.e., if you make your own > # threads with clone, only call libc functions from the initial thread. I spoke to some glibc devs on IRC and they confirmed that this remains true for modern glibc: because glibc needs to set up things like TLS on new threads, you can't mix your own direct calls to clone() with calls to glibc functions. -- PMM ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 17:08 ` Peter Maydell @ 2025-03-18 17:18 ` Daniel P. Berrangé 2025-03-18 17:48 ` Peter Maydell 0 siblings, 1 reply; 18+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 17:18 UTC (permalink / raw) To: Peter Maydell; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, Mar 18, 2025 at 05:08:52PM +0000, Peter Maydell wrote: > On Tue, 18 Mar 2025 at 15:04, Peter Maydell <peter.maydell@linaro.org> wrote: > > More generally, AIUI glibc expects that it has control over what's > > happening with threads, so it can set up its own data structures > > for the new thread (e.g. for TLS variables). This email from the > > glibc mailing list is admittedly now two decades old > > https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/ > > but it says: > > > > # Basically, if you want to call libc functions you should do it from a > > # thread that was set up by libc or libpthread. i.e., if you make your own > > # threads with clone, only call libc functions from the initial thread. > > I spoke to some glibc devs on IRC and they confirmed that this > remains true for modern glibc: because glibc needs to set up > things like TLS on new threads, you can't mix your own direct > calls to clone() with calls to glibc functions. Using clone() directly is done by a number of projects (systemd, libvirt, podman/docker/runc, etc) that want to create containers, while freely using arbitrary glibc calls in the program. You do need to be careful what glibc functions you run in the child after clone, but before execve though. For the projects I mention, avoiding the danger areas is probably easier than for QEMU, since QEMU has to theoretically cope with whatever madness the guest program chooses to do, while those programs know exactly what they will run between clone & execve. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 17:18 ` Daniel P. Berrangé @ 2025-03-18 17:48 ` Peter Maydell 0 siblings, 0 replies; 18+ messages in thread From: Peter Maydell @ 2025-03-18 17:48 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, 18 Mar 2025 at 17:18, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Tue, Mar 18, 2025 at 05:08:52PM +0000, Peter Maydell wrote: > > On Tue, 18 Mar 2025 at 15:04, Peter Maydell <peter.maydell@linaro.org> wrote: > > > More generally, AIUI glibc expects that it has control over what's > > > happening with threads, so it can set up its own data structures > > > for the new thread (e.g. for TLS variables). This email from the > > > glibc mailing list is admittedly now two decades old > > > https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/ > > > but it says: > > > > > > # Basically, if you want to call libc functions you should do it from a > > > # thread that was set up by libc or libpthread. i.e., if you make your own > > > # threads with clone, only call libc functions from the initial thread. > > > > I spoke to some glibc devs on IRC and they confirmed that this > > remains true for modern glibc: because glibc needs to set up > > things like TLS on new threads, you can't mix your own direct > > calls to clone() with calls to glibc functions. > > Using clone() directly is done by a number of projects (systemd, libvirt, > podman/docker/runc, etc) that want to create containers, while freely using > arbitrary glibc calls in the program. You do need to be careful what glibc > functions you run in the child after clone, but before execve though. Yes, if you don't call glibc functions in the child that's fine. If those other projects are calling some glibc functions post clone() in the child then I think they're relying on undocumented behaviour that might break on them in future... > For the projects I mention, avoiding the danger areas is probably easier > than for QEMU, since QEMU has to theoretically cope with whatever madness > the guest program chooses to do, while those programs know exactly what > they will run between clone & execve. QEMU's structure also is that we assume we can freely call glibc functions as a result of TCG operations. So even if the child in the guest is very carefully doing absolutely no other library calls between clone and execve, QEMU itself will be doing them. > Wonder if its worth enquiring if glibc would be interested > in following musl's approach to make it more emulation friendly for > QEMU ? That would essentially be asking "please can you revert glibc commit 4b4d4056bb154603f36 ?", so probably not: https://sourceware.org/git/?p=glibc.git;a=commit;h=4b4d4056bb154603f36 -- PMM ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 15:04 ` Peter Maydell 2025-03-18 17:08 ` Peter Maydell @ 2026-03-25 17:08 ` Florian Weimer 1 sibling, 0 replies; 18+ messages in thread From: Florian Weimer @ 2026-03-25 17:08 UTC (permalink / raw) To: Peter Maydell Cc: Daniel P. Berrangé, Andreas Schwab, Helge Deller, qemu-devel * Peter Maydell: > On Tue, 18 Mar 2025 at 13:55, Daniel P. Berrangé <berrange@redhat.com> wrote: >> >> On Tue, Mar 18, 2025 at 01:06:17PM +0000, Peter Maydell wrote: >> > The difficulty with vfork() (and, more generally, with various of >> > the clone() syscall flag combinations) is that because we use the >> > host libc we are restricted to the thread/process creation options >> > that that libc permits: which is only fork() and pthread_create(). >> > vfork() wants "create a new process like fork with its own file >> > descriptors, signal handlers, etc, but share all the memory space with >> > the parent", and the host libc just doesn't provide us with the tools >> > to do that. (We can't call the host vfork() because we wouldn't be >> > abiding by the rules it imposes, like "don't return from the function >> > that called vfork".) >> > >> > If we were implemented as a usermode emulator that sat on the raw >> > kernel syscalls, we could directly call the clone syscall and >> > use that to provide at least a wider range of the possible clone >> > flag options; but our dependency on libc means we have to avoid >> > doing things that would confuse it. >> >> I guess I'm not seeing how libc is blocking us in this respect ? >> The clone() syscall wrapper is exposed by glibc at least, and it >> is possible to call it, albeit with some caveats that we might >> miss any logic glibc has around its fork() wrapper. The spec >> requires that any child must immediately call execve after vfrok >> so I'm wondering just what risk of confusion we would have in >> practice ? > > I think my notes about clone are a red herring for vfork > specifically. For vfork in the child, the vfork spec requires > a very minimal amount of stuff to happen in the child, but QEMU's > own TCG data structures and calls and processes mean that we > will be doing a lot more than the guest does. For instance, > we need to return from the function that called vfork, so we > can continue to execute the guest code. And the guest code will > likely call into the translator to generate more code, which will > (a) mess up the TCG data structures for the parent and (b) > probably result in our calling into libc functions that aren't > OK to call. Yes, the problem with vfork is the own state data structures for qemu-user. It may be okay to do this if the process is single-threaded, but it won't really work if it is multi-threaded. I think you would need to use userfaultfd to mimic vfork behavior for emulated code only, and that seems to be quite a big project. Maybe it would work to create the new PID off a new thread (created with pthread_create) via vfork, and proxy emulated system calls through that process, while still running the emulator in the original process. The trampoline could be a simple syscall function wrapper that communicates through shared memory and a process-shared condition variable or barrier. Shouldn't this give the right semantics? The emulated code would see the new process identity because system calls like getpid are executed remotely in that process. It might be easier to implement than userfaultfd support. Thanks, Florian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:36 ` Helge Deller 2025-03-18 10:45 ` Helge Deller 2025-03-18 10:53 ` Peter Maydell @ 2025-03-18 11:10 ` Andreas Schwab 2 siblings, 0 replies; 18+ messages in thread From: Andreas Schwab @ 2025-03-18 11:10 UTC (permalink / raw) To: Helge Deller; +Cc: qemu-devel On Mär 18 2025, Helge Deller wrote: > My current (unreliable) way to detect it is using uname. > The kernel string and arch name don't match: > > (sid_hppa)root@paq:/# uname -a > Linux paq 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) parisc GNU/Linux > > (sid_hppa)root@paq:/# uname -r > 6.1.0-31-amd64 > > (sid_hppa)root@paq:/# uname -m > parisc > > This is a qemu-linux-user parisc(hppa) emulation running on x86-64. That is highly distribution specific, by default the release part does not contain anything arch specific. For riscv the most reliable way is to look for "uarch *: qemu" in /proc/cpuinfo. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:18 Generic way to detect qemu linux-user emulation Andreas Schwab 2025-03-18 10:36 ` Helge Deller @ 2026-03-25 14:51 ` Lawrence Hunter 1 sibling, 0 replies; 18+ messages in thread From: Lawrence Hunter @ 2026-03-25 14:51 UTC (permalink / raw) To: Andreas Schwab; +Cc: qemu-devel Hello, I don't have anything to offer around cython but I am also noticing this posix_spawn issue with systemd >= tag v255 running inside a cross-arch container. systemd specifically is fine under full QEMU or on an ARM64 host but when ran cross and therefore through qemu-user I see systemd failing to fork off any processes i.e. immediately after posix_spawn the child process is dying with SIGCHLD. ``` [ OK ] Reached target rpcbind.target. Wed 2026-03-25 14:08:10 UTC src/core/service.c:1627: rpcbind.service: Will spawn child (service_enter_start): /usr/sbin/rpcbind Wed 2026-03-25 14:08:10 UTC src/core/service.c:1658: rpcbind.service: Passing 3 fds to service Wed 2026-03-25 14:08:11 UTC src/core/execute.c:345: rpcbind.service: About to execute: /usr/sbin/rpcbind "\$RPCBIND_OPTIONS" -w -f Wed 2026-03-25 14:08:11 UTC src/core/execute.c:472: rpcbind.service: Forked /usr/sbin/rpcbind as 37 Wed 2026-03-25 14:08:11 UTC src/shared/fdset.c:71: Closing set fd 40 (socket:[1181504]) Wed 2026-03-25 14:08:11 UTC src/shared/fdset.c:71: Closing set fd 39 (socket:[1181503]) Wed 2026-03-25 14:08:11 UTC src/shared/fdset.c:71: Closing set fd 38 (socket:[1181501]) Wed 2026-03-25 14:08:11 UTC src/shared/fdset.c:71: Closing set fd 41 (socket:[1193529]) Wed 2026-03-25 14:08:11 UTC src/core/service.c:1269: rpcbind.service: Changed dead -> start Starting rpcbind.service... Wed 2026-03-25 14:08:11 UTC src/basic/log.c:1456: Received SIGCHLD from PID 37 (11). Wed 2026-03-25 14:08:11 UTC src/core/manager.c:2804: Child 37 (11) died (code=exited, status=1/FAILURE) Wed 2026-03-25 14:08:11 UTC src/core/manager.c:2769: init.scope: Child 37 belongs to init.scope. Wed 2026-03-25 14:08:11 UTC src/core/manager.c:2769: rpcbind.service: Child 37 belongs to rpcbind.service. Wed 2026-03-25 14:08:11 UTC src/core/unit.c:6066: rpcbind.service: Main process exited, code=exited, status=1/FAILURE Wed 2026-03-25 14:08:11 UTC src/core/unit.c:6024: rpcbind.service: Failed with result 'exit-code'. Wed 2026-03-25 14:08:11 UTC src/core/service.c:1980: rpcbind.service: Service will not restart (restart setting) Wed 2026-03-25 14:08:11 UTC src/core/service.c:1269: rpcbind.service: Changed start -> failed Wed 2026-03-25 14:08:11 UTC src/core/job.c:997: rpcbind.service: Job 91 rpcbind.service/start finished, result=failed [FAILED] Failed to start rpcbind.service. See 'systemctl status rpcbind.service' for details. ``` I have narrowed it down to https://github.com/systemd/systemd/commit/bb5232b6a3b8af075ee06cc87416e5f49a6170d3 as being the change which breaks qemu-user mode with a bisect, and the key bit seems to be around https://github.com/systemd/systemd/blob/7e37e01768e2f223750ead2c9e08b4490243b8d1/src/basic/process-util.c#L2110 other fixes I found online around credentials or protect/restrictions do not seem to make a difference. This can be reproduced by: ``` docker run --privileged --platform linux/arm64 -it fedora:40 /bin/bash $ dnf install -y systemd $ exec /sbin/init ``` Any help around this issue would be greatly helpful too. Best, Lawrence On 2025-03-18 10:18, Andreas Schwab wrote: > Is there a generic way for a program to detect that is it being run > inside the linux-user emulation? > > The purpose for that would be to work around limitations of the > emulation, like CLONE_VFORK being unsupported. For example, python >= > 3.13 needs to avoid using posix_spawn in that case, because the > emulation of CLONE_VFORK as a true fork makes it impossible for it to > report errors back to the parent process. > > -- > Andreas Schwab, SUSE Labs, schwab@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA > B9D7 > "And now for something completely different." ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-03-25 17:08 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-18 10:18 Generic way to detect qemu linux-user emulation Andreas Schwab 2025-03-18 10:36 ` Helge Deller 2025-03-18 10:45 ` Helge Deller 2025-03-18 10:53 ` Peter Maydell 2025-03-18 11:58 ` Daniel P. Berrangé 2025-03-18 12:34 ` Andreas Schwab 2025-03-18 12:43 ` Daniel P. Berrangé 2025-03-18 13:06 ` Peter Maydell 2025-03-18 13:54 ` Daniel P. Berrangé 2025-03-18 14:17 ` Andreas Schwab 2025-03-18 17:32 ` Daniel P. Berrangé 2025-03-18 15:04 ` Peter Maydell 2025-03-18 17:08 ` Peter Maydell 2025-03-18 17:18 ` Daniel P. Berrangé 2025-03-18 17:48 ` Peter Maydell 2026-03-25 17:08 ` Florian Weimer 2025-03-18 11:10 ` Andreas Schwab 2026-03-25 14:51 ` Lawrence Hunter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.