* Generic way to detect qemu linux-user emulation @ 2025-03-18 10:18 Andreas Schwab 2025-03-18 10:36 ` Helge Deller 0 siblings, 1 reply; 16+ messages in thread From: Andreas Schwab @ 2025-03-18 10:18 UTC (permalink / raw) To: qemu-devel Is there a generic way for a program to detect that is it being run inside the linux-user emulation? The purpose for that would be to work around limitations of the emulation, like CLONE_VFORK being unsupported. For example, python >= 3.13 needs to avoid using posix_spawn in that case, because the emulation of CLONE_VFORK as a true fork makes it impossible for it to report errors back to the parent process. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:18 Generic way to detect qemu linux-user emulation Andreas Schwab @ 2025-03-18 10:36 ` Helge Deller 2025-03-18 10:45 ` Helge Deller ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Helge Deller @ 2025-03-18 10:36 UTC (permalink / raw) To: Andreas Schwab, qemu-devel On 3/18/25 11:18, Andreas Schwab wrote: > Is there a generic way for a program to detect that is it being run > inside the linux-user emulation? Yes, having a reliable way to detect it would be good. My current (unreliable) way to detect it is using uname. The kernel string and arch name don't match: (sid_hppa)root@paq:/# uname -a Linux paq 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) parisc GNU/Linux (sid_hppa)root@paq:/# uname -r 6.1.0-31-amd64 (sid_hppa)root@paq:/# uname -m parisc This is a qemu-linux-user parisc(hppa) emulation running on x86-64. > The purpose for that would be to work around limitations of the > emulation, like CLONE_VFORK being unsupported. yes, and robust futexes aren't supported either. > For example, python >= > 3.13 needs to avoid using posix_spawn in that case, because the > emulation of CLONE_VFORK as a true fork makes it impossible for it to > report errors back to the parent process. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:36 ` Helge Deller @ 2025-03-18 10:45 ` Helge Deller 2025-03-18 10:53 ` Peter Maydell 2025-03-18 11:10 ` Andreas Schwab 2 siblings, 0 replies; 16+ messages in thread From: Helge Deller @ 2025-03-18 10:45 UTC (permalink / raw) To: Andreas Schwab, qemu-devel On 3/18/25 11:36, Helge Deller wrote: > On 3/18/25 11:18, Andreas Schwab wrote: >> Is there a generic way for a program to detect that is it being run >> inside the linux-user emulation? > > Yes, having a reliable way to detect it would be good. In qemu-user emulation we could change the return values of "uname --processor" and/or "uname --hardware-platform". Currently both always return "unknown", but in qemu we could return the arch of the host. Another possibility is to extend prctl(), but I think uname is easier to handle in scripts and such... > My current (unreliable) way to detect it is using uname. > The kernel string and arch name don't match: > > (sid_hppa)root@paq:/# uname -a > Linux paq 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) parisc GNU/Linux > > (sid_hppa)root@paq:/# uname -r > 6.1.0-31-amd64 > > (sid_hppa)root@paq:/# uname -m > parisc > > This is a qemu-linux-user parisc(hppa) emulation running on x86-64. > >> The purpose for that would be to work around limitations of the >> emulation, like CLONE_VFORK being unsupported. > > yes, and robust futexes aren't supported either. > >> For example, python >= >> 3.13 needs to avoid using posix_spawn in that case, because the >> emulation of CLONE_VFORK as a true fork makes it impossible for it to >> report errors back to the parent process. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:36 ` Helge Deller 2025-03-18 10:45 ` Helge Deller @ 2025-03-18 10:53 ` Peter Maydell 2025-03-18 11:58 ` Daniel P. Berrangé 2025-03-18 11:10 ` Andreas Schwab 2 siblings, 1 reply; 16+ messages in thread From: Peter Maydell @ 2025-03-18 10:53 UTC (permalink / raw) To: Helge Deller; +Cc: Andreas Schwab, qemu-devel On Tue, 18 Mar 2025 at 10:36, Helge Deller <deller@gmx.de> wrote: > > On 3/18/25 11:18, Andreas Schwab wrote: > > Is there a generic way for a program to detect that is it being run > > inside the linux-user emulation? > > Yes, having a reliable way to detect it would be good. > > My current (unreliable) way to detect it is using uname. Yes, I don't believe there's currently an "intended" mechanism for detecting QEMU, only ways of noticing long-standing deviations from how the real kernel behaves. > > The purpose for that would be to work around limitations of the > > emulation, like CLONE_VFORK being unsupported. > > yes, and robust futexes aren't supported either. You don't need to detect QEMU for that one, though -- you can just try the get_robust_list syscall and if it fails ENOSYS then fall back to a codepath that doesn't use them (same as you would on an ancient kernel that didn't implement the syscall). Robust futexes are in the "technically extremely hard to impossible to support" bucket, per the comment in syscall.c. > In qemu-user emulation we could change the return values of > "uname --processor" and/or "uname --hardware-platform". > Currently both always return "unknown", but in qemu we could > return the arch of the host. As a mechanism that feels a bit risky to me -- at some point somebody may come along and say "my guest program requires that these return the expected values for the target CPU", and then you have a conflict between whether you want them to behave correctly for the target or to give you the "tell me it's QEMU" behaviour... -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:53 ` Peter Maydell @ 2025-03-18 11:58 ` Daniel P. Berrangé 2025-03-18 12:34 ` Andreas Schwab 0 siblings, 1 reply; 16+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 11:58 UTC (permalink / raw) To: Peter Maydell; +Cc: Helge Deller, Andreas Schwab, qemu-devel On Tue, Mar 18, 2025 at 10:53:27AM +0000, Peter Maydell wrote: > On Tue, 18 Mar 2025 at 10:36, Helge Deller <deller@gmx.de> wrote: > > > > On 3/18/25 11:18, Andreas Schwab wrote: > > > Is there a generic way for a program to detect that is it being run > > > inside the linux-user emulation? > > > > Yes, having a reliable way to detect it would be good. > > > > My current (unreliable) way to detect it is using uname. > > Yes, I don't believe there's currently an "intended" > mechanism for detecting QEMU, only ways of noticing > long-standing deviations from how the real kernel behaves. > > > > The purpose for that would be to work around limitations of the > > > emulation, like CLONE_VFORK being unsupported. > > > > yes, and robust futexes aren't supported either. > > You don't need to detect QEMU for that one, though -- you can > just try the get_robust_list syscall and if it fails ENOSYS > then fall back to a codepath that doesn't use them (same as > you would on an ancient kernel that didn't implement the > syscall). Robust futexes are in the "technically extremely > hard to impossible to support" bucket, per the comment in > syscall.c. > > > In qemu-user emulation we could change the return values of > > "uname --processor" and/or "uname --hardware-platform". > > Currently both always return "unknown", but in qemu we could > > return the arch of the host. > > As a mechanism that feels a bit risky to me -- at some > point somebody may come along and say "my guest program > requires that these return the expected values for > the target CPU", and then you have a conflict between > whether you want them to behave correctly for the > target or to give you the "tell me it's QEMU" behaviour... It also isn't future proof. People will change their program behaviour based on the limitations of the particular QEMU version they tested against. QEMU later changes/fixes its impl, and apps are not eithuer applying a redundant workaround, or worse, applying a workaround that is now actively harmful. Whereever practical, it is preferrable to check a discrete feature or behaviour in a functional way, rather than matching on "is it QEMU" With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 11:58 ` Daniel P. Berrangé @ 2025-03-18 12:34 ` Andreas Schwab 2025-03-18 12:43 ` Daniel P. Berrangé 0 siblings, 1 reply; 16+ messages in thread From: Andreas Schwab @ 2025-03-18 12:34 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Peter Maydell, Helge Deller, qemu-devel On Mär 18 2025, Daniel P. Berrangé wrote: > Whereever practical, it is preferrable to check a discrete feature > or behaviour in a functional way, rather than matching on "is it QEMU" Do you know a way to detect support for CLONE_VFORK that isn't too expensive? -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 12:34 ` Andreas Schwab @ 2025-03-18 12:43 ` Daniel P. Berrangé 2025-03-18 13:06 ` Peter Maydell 0 siblings, 1 reply; 16+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 12:43 UTC (permalink / raw) To: Andreas Schwab; +Cc: Peter Maydell, Helge Deller, qemu-devel On Tue, Mar 18, 2025 at 01:34:57PM +0100, Andreas Schwab wrote: > On Mär 18 2025, Daniel P. Berrangé wrote: > > > Whereever practical, it is preferrable to check a discrete feature > > or behaviour in a functional way, rather than matching on "is it QEMU" > > Do you know a way to detect support for CLONE_VFORK that isn't too > expensive? No, but I feel like the right thing in this particular case is to look at improving our vfork impl. The current impl is incredibly crude and acknowledged by the original author commit 436d124b7d538b1fd9cf72edf17770664c309856 Author: Andrzej Zaborowski <balrogg@gmail.com> Date: Sun Sep 21 02:39:45 2008 +0000 Band-aid vfork() emulation (Kirill Shutemov). I can see why they did it that way, but I'm feeling like it ought to be possible to do a better special case vfork impl ni QEMU instead of overloading the fork() impl. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 12:43 ` Daniel P. Berrangé @ 2025-03-18 13:06 ` Peter Maydell 2025-03-18 13:54 ` Daniel P. Berrangé 0 siblings, 1 reply; 16+ messages in thread From: Peter Maydell @ 2025-03-18 13:06 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, 18 Mar 2025 at 12:43, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Tue, Mar 18, 2025 at 01:34:57PM +0100, Andreas Schwab wrote: > > On Mär 18 2025, Daniel P. Berrangé wrote: > > > > > Whereever practical, it is preferrable to check a discrete feature > > > or behaviour in a functional way, rather than matching on "is it QEMU" > > > > Do you know a way to detect support for CLONE_VFORK that isn't too > > expensive? > > No, but I feel like the right thing in this particular case is to look > at improving our vfork impl. The current impl is incredibly crude and > acknowledged by the original author > > commit 436d124b7d538b1fd9cf72edf17770664c309856 > Author: Andrzej Zaborowski <balrogg@gmail.com> > Date: Sun Sep 21 02:39:45 2008 +0000 > > Band-aid vfork() emulation (Kirill Shutemov). > > I can see why they did it that way, but I'm feeling like it ought to > be possible to do a better special case vfork impl ni QEMU instead of > overloading the fork() impl. The difficulty with vfork() (and, more generally, with various of the clone() syscall flag combinations) is that because we use the host libc we are restricted to the thread/process creation options that that libc permits: which is only fork() and pthread_create(). vfork() wants "create a new process like fork with its own file descriptors, signal handlers, etc, but share all the memory space with the parent", and the host libc just doesn't provide us with the tools to do that. (We can't call the host vfork() because we wouldn't be abiding by the rules it imposes, like "don't return from the function that called vfork".) If we were implemented as a usermode emulator that sat on the raw kernel syscalls, we could directly call the clone syscall and use that to provide at least a wider range of the possible clone flag options; but our dependency on libc means we have to avoid doing things that would confuse it. For vfork in particular, we could I guess do something like: * use real fork() to create child process * parent process arranges to wait until child process exits (via waitpid or equivalent) or it tells us it's about to exec * we make all the guest memory be mapped read-only in the child process, so we can trap writes and tell the parent about them so it can update its copy of the memory. (Sadly since we can't guaranteedly get control on termination events for the child before it really terminates, we can't do this memory-transfer in bulk at the end; otherwise we'd behave wrongly for the "child process gets SIGKILLed" case.) Historically we've preferred to go for "assume that guests will only want the looser POSIX semantics of vfork(), not the tighter ones of the actual Linux syscall", but unfortunately glibc has gone for the latter. thanks -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 13:06 ` Peter Maydell @ 2025-03-18 13:54 ` Daniel P. Berrangé 2025-03-18 14:17 ` Andreas Schwab 2025-03-18 15:04 ` Peter Maydell 0 siblings, 2 replies; 16+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 13:54 UTC (permalink / raw) To: Peter Maydell; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, Mar 18, 2025 at 01:06:17PM +0000, Peter Maydell wrote: > On Tue, 18 Mar 2025 at 12:43, Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Tue, Mar 18, 2025 at 01:34:57PM +0100, Andreas Schwab wrote: > > > On Mär 18 2025, Daniel P. Berrangé wrote: > > > > > > > Whereever practical, it is preferrable to check a discrete feature > > > > or behaviour in a functional way, rather than matching on "is it QEMU" > > > > > > Do you know a way to detect support for CLONE_VFORK that isn't too > > > expensive? > > > > No, but I feel like the right thing in this particular case is to look > > at improving our vfork impl. The current impl is incredibly crude and > > acknowledged by the original author > > > > commit 436d124b7d538b1fd9cf72edf17770664c309856 > > Author: Andrzej Zaborowski <balrogg@gmail.com> > > Date: Sun Sep 21 02:39:45 2008 +0000 > > > > Band-aid vfork() emulation (Kirill Shutemov). > > > > I can see why they did it that way, but I'm feeling like it ought to > > be possible to do a better special case vfork impl ni QEMU instead of > > overloading the fork() impl. > > The difficulty with vfork() (and, more generally, with various of > the clone() syscall flag combinations) is that because we use the > host libc we are restricted to the thread/process creation options > that that libc permits: which is only fork() and pthread_create(). > vfork() wants "create a new process like fork with its own file > descriptors, signal handlers, etc, but share all the memory space with > the parent", and the host libc just doesn't provide us with the tools > to do that. (We can't call the host vfork() because we wouldn't be > abiding by the rules it imposes, like "don't return from the function > that called vfork".) > > If we were implemented as a usermode emulator that sat on the raw > kernel syscalls, we could directly call the clone syscall and > use that to provide at least a wider range of the possible clone > flag options; but our dependency on libc means we have to avoid > doing things that would confuse it. I guess I'm not seeing how libc is blocking us in this respect ? The clone() syscall wrapper is exposed by glibc at least, and it is possible to call it, albeit with some caveats that we might miss any logic glibc has around its fork() wrapper. The spec requires that any child must immediately call execve after vfrok so I'm wondering just what risk of confusion we would have in practice ? > For vfork in particular, we could I guess do something like: > * use real fork() to create child process > * parent process arranges to wait until child process exits > (via waitpid or equivalent) or it tells us it's about to exec > * we make all the guest memory be mapped read-only in the child > process, so we can trap writes and tell the parent about them > so it can update its copy of the memory. > (Sadly since we can't guaranteedly get control on termination > events for the child before it really terminates, we can't > do this memory-transfer in bulk at the end; otherwise we'd > behave wrongly for the "child process gets SIGKILLed" case.) That would get the synchronization behaviour of Linux vfork, but I'm not sure it'd get the performance benefits (of avoiding page table copying) which is what Andreas mentioned as the desired thing ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 13:54 ` Daniel P. Berrangé @ 2025-03-18 14:17 ` Andreas Schwab 2025-03-18 17:32 ` Daniel P. Berrangé 2025-03-18 15:04 ` Peter Maydell 1 sibling, 1 reply; 16+ messages in thread From: Andreas Schwab @ 2025-03-18 14:17 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Peter Maydell, Helge Deller, qemu-devel On Mär 18 2025, Daniel P. Berrangé wrote: > That would get the synchronization behaviour of Linux vfork, > but I'm not sure it'd get the performance benefits (of avoiding > page table copying) which is what Andreas mentioned as the > desired thing ? For an emulation performance isn't a thing, what we need is accuracy. The current issue I have right now is that the MozillaFirefox package fails to build because posix_spawn behaves unexpectedly. https://build.opensuse.org/package/live_build_log/openSUSE:Factory:RISCV/MozillaFirefox/standard/riscv64 [ 666s] 4:55.15 Traceback (most recent call last): [ 666s] 4:55.16 File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 80, in <module> [ 666s] 4:55.16 main() [ 666s] 4:55.16 ~~~~^^ [ 666s] 4:55.16 File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 10, in main [ 666s] 4:55.16 cc_is_clang = 'clang' in subprocess.check_output( [ 666s] 4:55.16 ~~~~~~~~~~~~~~~~~~~~~~~^ [ 666s] 4:55.16 [cc, '--version'], universal_newlines=True, stderr=sink) [ 666s] 4:55.16 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [ 666s] 4:55.16 File "/usr/lib64/python3.13/subprocess.py", line 474, in check_output [ 666s] 4:55.16 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, [ 666s] 4:55.16 ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [ 666s] 4:55.17 **kwargs).stdout [ 666s] 4:55.17 ^^^^^^^^^ [ 666s] 4:55.17 File "/usr/lib64/python3.13/subprocess.py", line 579, in run [ 666s] 4:55.17 raise CalledProcessError(retcode, process.args, [ 666s] 4:55.17 output=stdout, stderr=stderr) [ 666s] 4:55.17 subprocess.CalledProcessError: Command '['/usr/bin/ccache /usr/bin/gcc', '--version']' returned non-zero exit status 127. A real posix_spawn would have set errno to ENOENT. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 14:17 ` Andreas Schwab @ 2025-03-18 17:32 ` Daniel P. Berrangé 0 siblings, 0 replies; 16+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 17:32 UTC (permalink / raw) To: Andreas Schwab; +Cc: Peter Maydell, Helge Deller, qemu-devel On Tue, Mar 18, 2025 at 03:17:33PM +0100, Andreas Schwab wrote: > On Mär 18 2025, Daniel P. Berrangé wrote: > > > That would get the synchronization behaviour of Linux vfork, > > but I'm not sure it'd get the performance benefits (of avoiding > > page table copying) which is what Andreas mentioned as the > > desired thing ? > > For an emulation performance isn't a thing, what we need is accuracy. > The current issue I have right now is that the MozillaFirefox package > fails to build because posix_spawn behaves unexpectedly. > > https://build.opensuse.org/package/live_build_log/openSUSE:Factory:RISCV/MozillaFirefox/standard/riscv64 > > [ 666s] 4:55.15 Traceback (most recent call last): > [ 666s] 4:55.16 File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 80, in <module> > [ 666s] 4:55.16 main() > [ 666s] 4:55.16 ~~~~^^ > [ 666s] 4:55.16 File "/home/abuild/rpmbuild/BUILD/MozillaFirefox-136.0.1-build/firefox-136.0.1/security/nss/./coreconf/werror.py", line 10, in main > [ 666s] 4:55.16 cc_is_clang = 'clang' in subprocess.check_output( > [ 666s] 4:55.16 ~~~~~~~~~~~~~~~~~~~~~~~^ > [ 666s] 4:55.16 [cc, '--version'], universal_newlines=True, stderr=sink) > [ 666s] 4:55.16 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > [ 666s] 4:55.16 File "/usr/lib64/python3.13/subprocess.py", line 474, in check_output > [ 666s] 4:55.16 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, > [ 666s] 4:55.16 ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > [ 666s] 4:55.17 **kwargs).stdout > [ 666s] 4:55.17 ^^^^^^^^^ > [ 666s] 4:55.17 File "/usr/lib64/python3.13/subprocess.py", line 579, in run > [ 666s] 4:55.17 raise CalledProcessError(retcode, process.args, > [ 666s] 4:55.17 output=stdout, stderr=stderr) > [ 666s] 4:55.17 subprocess.CalledProcessError: Command '['/usr/bin/ccache /usr/bin/gcc', '--version']' returned non-zero exit status 127. > > A real posix_spawn would have set errno to ENOENT. I look at how the errno is propagated. In glibc, they have a struct on the stack of the parent into which the child will write the errno. This relies on the the vfork() semantics of sharing of pages, and thus breaks when we use fork() that makes the pages copy-on-write - the child writes the errno, but the parent will never see it. In musl, they create a pipe and the child writes the errno in the pipe which the parent then reads, so they're seemingly not relying on the sharing of pages and appears to work under QEMU's impl. I don't see an attractive workaround to make glibc's impl compatible with QEMU, without making QEMU fully use VFORK, with the risk that entails. Wonder if its worth enquiring if glibc would be interested in following musl's approach to make it more emulation friendly for QEMU ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 13:54 ` Daniel P. Berrangé 2025-03-18 14:17 ` Andreas Schwab @ 2025-03-18 15:04 ` Peter Maydell 2025-03-18 17:08 ` Peter Maydell 1 sibling, 1 reply; 16+ messages in thread From: Peter Maydell @ 2025-03-18 15:04 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, 18 Mar 2025 at 13:55, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Tue, Mar 18, 2025 at 01:06:17PM +0000, Peter Maydell wrote: > > The difficulty with vfork() (and, more generally, with various of > > the clone() syscall flag combinations) is that because we use the > > host libc we are restricted to the thread/process creation options > > that that libc permits: which is only fork() and pthread_create(). > > vfork() wants "create a new process like fork with its own file > > descriptors, signal handlers, etc, but share all the memory space with > > the parent", and the host libc just doesn't provide us with the tools > > to do that. (We can't call the host vfork() because we wouldn't be > > abiding by the rules it imposes, like "don't return from the function > > that called vfork".) > > > > If we were implemented as a usermode emulator that sat on the raw > > kernel syscalls, we could directly call the clone syscall and > > use that to provide at least a wider range of the possible clone > > flag options; but our dependency on libc means we have to avoid > > doing things that would confuse it. > > I guess I'm not seeing how libc is blocking us in this respect ? > The clone() syscall wrapper is exposed by glibc at least, and it > is possible to call it, albeit with some caveats that we might > miss any logic glibc has around its fork() wrapper. The spec > requires that any child must immediately call execve after vfrok > so I'm wondering just what risk of confusion we would have in > practice ? I think my notes about clone are a red herring for vfork specifically. For vfork in the child, the vfork spec requires a very minimal amount of stuff to happen in the child, but QEMU's own TCG data structures and calls and processes mean that we will be doing a lot more than the guest does. For instance, we need to return from the function that called vfork, so we can continue to execute the guest code. And the guest code will likely call into the translator to generate more code, which will (a) mess up the TCG data structures for the parent and (b) probably result in our calling into libc functions that aren't OK to call. More generally, AIUI glibc expects that it has control over what's happening with threads, so it can set up its own data structures for the new thread (e.g. for TLS variables). This email from the glibc mailing list is admittedly now two decades old https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/ but it says: # Basically, if you want to call libc functions you should do it from a # thread that was set up by libc or libpthread. i.e., if you make your own # threads with clone, only call libc functions from the initial thread. > > For vfork in particular, we could I guess do something like: > > * use real fork() to create child process > > * parent process arranges to wait until child process exits > > (via waitpid or equivalent) or it tells us it's about to exec > > * we make all the guest memory be mapped read-only in the child > > process, so we can trap writes and tell the parent about them > > so it can update its copy of the memory. > > (Sadly since we can't guaranteedly get control on termination > > events for the child before it really terminates, we can't > > do this memory-transfer in bulk at the end; otherwise we'd > > behave wrongly for the "child process gets SIGKILLed" case.) > > That would get the synchronization behaviour of Linux vfork, > but I'm not sure it'd get the performance benefits (of avoiding > page table copying) which is what Andreas mentioned as the > desired thing ? The problem is that the guest glibc is using CLONE_VFORK in a particular way for performance reasons on real hardware, which is valid for real kernel CLONE_VFORK but which our lack of accuracy in emulation means we mishandle, causing the guest to fall over. The actual performance under QEMU isn't important. thanks -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 15:04 ` Peter Maydell @ 2025-03-18 17:08 ` Peter Maydell 2025-03-18 17:18 ` Daniel P. Berrangé 0 siblings, 1 reply; 16+ messages in thread From: Peter Maydell @ 2025-03-18 17:08 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, 18 Mar 2025 at 15:04, Peter Maydell <peter.maydell@linaro.org> wrote: > More generally, AIUI glibc expects that it has control over what's > happening with threads, so it can set up its own data structures > for the new thread (e.g. for TLS variables). This email from the > glibc mailing list is admittedly now two decades old > https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/ > but it says: > > # Basically, if you want to call libc functions you should do it from a > # thread that was set up by libc or libpthread. i.e., if you make your own > # threads with clone, only call libc functions from the initial thread. I spoke to some glibc devs on IRC and they confirmed that this remains true for modern glibc: because glibc needs to set up things like TLS on new threads, you can't mix your own direct calls to clone() with calls to glibc functions. -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 17:08 ` Peter Maydell @ 2025-03-18 17:18 ` Daniel P. Berrangé 2025-03-18 17:48 ` Peter Maydell 0 siblings, 1 reply; 16+ messages in thread From: Daniel P. Berrangé @ 2025-03-18 17:18 UTC (permalink / raw) To: Peter Maydell; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, Mar 18, 2025 at 05:08:52PM +0000, Peter Maydell wrote: > On Tue, 18 Mar 2025 at 15:04, Peter Maydell <peter.maydell@linaro.org> wrote: > > More generally, AIUI glibc expects that it has control over what's > > happening with threads, so it can set up its own data structures > > for the new thread (e.g. for TLS variables). This email from the > > glibc mailing list is admittedly now two decades old > > https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/ > > but it says: > > > > # Basically, if you want to call libc functions you should do it from a > > # thread that was set up by libc or libpthread. i.e., if you make your own > > # threads with clone, only call libc functions from the initial thread. > > I spoke to some glibc devs on IRC and they confirmed that this > remains true for modern glibc: because glibc needs to set up > things like TLS on new threads, you can't mix your own direct > calls to clone() with calls to glibc functions. Using clone() directly is done by a number of projects (systemd, libvirt, podman/docker/runc, etc) that want to create containers, while freely using arbitrary glibc calls in the program. You do need to be careful what glibc functions you run in the child after clone, but before execve though. For the projects I mention, avoiding the danger areas is probably easier than for QEMU, since QEMU has to theoretically cope with whatever madness the guest program chooses to do, while those programs know exactly what they will run between clone & execve. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 17:18 ` Daniel P. Berrangé @ 2025-03-18 17:48 ` Peter Maydell 0 siblings, 0 replies; 16+ messages in thread From: Peter Maydell @ 2025-03-18 17:48 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Andreas Schwab, Helge Deller, qemu-devel On Tue, 18 Mar 2025 at 17:18, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Tue, Mar 18, 2025 at 05:08:52PM +0000, Peter Maydell wrote: > > On Tue, 18 Mar 2025 at 15:04, Peter Maydell <peter.maydell@linaro.org> wrote: > > > More generally, AIUI glibc expects that it has control over what's > > > happening with threads, so it can set up its own data structures > > > for the new thread (e.g. for TLS variables). This email from the > > > glibc mailing list is admittedly now two decades old > > > https://public-inbox.org/libc-alpha/200408042007.i74K7ZOr025380@magilla.sf.frob.com/ > > > but it says: > > > > > > # Basically, if you want to call libc functions you should do it from a > > > # thread that was set up by libc or libpthread. i.e., if you make your own > > > # threads with clone, only call libc functions from the initial thread. > > > > I spoke to some glibc devs on IRC and they confirmed that this > > remains true for modern glibc: because glibc needs to set up > > things like TLS on new threads, you can't mix your own direct > > calls to clone() with calls to glibc functions. > > Using clone() directly is done by a number of projects (systemd, libvirt, > podman/docker/runc, etc) that want to create containers, while freely using > arbitrary glibc calls in the program. You do need to be careful what glibc > functions you run in the child after clone, but before execve though. Yes, if you don't call glibc functions in the child that's fine. If those other projects are calling some glibc functions post clone() in the child then I think they're relying on undocumented behaviour that might break on them in future... > For the projects I mention, avoiding the danger areas is probably easier > than for QEMU, since QEMU has to theoretically cope with whatever madness > the guest program chooses to do, while those programs know exactly what > they will run between clone & execve. QEMU's structure also is that we assume we can freely call glibc functions as a result of TCG operations. So even if the child in the guest is very carefully doing absolutely no other library calls between clone and execve, QEMU itself will be doing them. > Wonder if its worth enquiring if glibc would be interested > in following musl's approach to make it more emulation friendly for > QEMU ? That would essentially be asking "please can you revert glibc commit 4b4d4056bb154603f36 ?", so probably not: https://sourceware.org/git/?p=glibc.git;a=commit;h=4b4d4056bb154603f36 -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Generic way to detect qemu linux-user emulation 2025-03-18 10:36 ` Helge Deller 2025-03-18 10:45 ` Helge Deller 2025-03-18 10:53 ` Peter Maydell @ 2025-03-18 11:10 ` Andreas Schwab 2 siblings, 0 replies; 16+ messages in thread From: Andreas Schwab @ 2025-03-18 11:10 UTC (permalink / raw) To: Helge Deller; +Cc: qemu-devel On Mär 18 2025, Helge Deller wrote: > My current (unreliable) way to detect it is using uname. > The kernel string and arch name don't match: > > (sid_hppa)root@paq:/# uname -a > Linux paq 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) parisc GNU/Linux > > (sid_hppa)root@paq:/# uname -r > 6.1.0-31-amd64 > > (sid_hppa)root@paq:/# uname -m > parisc > > This is a qemu-linux-user parisc(hppa) emulation running on x86-64. That is highly distribution specific, by default the release part does not contain anything arch specific. For riscv the most reliable way is to look for "uarch *: qemu" in /proc/cpuinfo. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-03-18 17:50 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-18 10:18 Generic way to detect qemu linux-user emulation Andreas Schwab 2025-03-18 10:36 ` Helge Deller 2025-03-18 10:45 ` Helge Deller 2025-03-18 10:53 ` Peter Maydell 2025-03-18 11:58 ` Daniel P. Berrangé 2025-03-18 12:34 ` Andreas Schwab 2025-03-18 12:43 ` Daniel P. Berrangé 2025-03-18 13:06 ` Peter Maydell 2025-03-18 13:54 ` Daniel P. Berrangé 2025-03-18 14:17 ` Andreas Schwab 2025-03-18 17:32 ` Daniel P. Berrangé 2025-03-18 15:04 ` Peter Maydell 2025-03-18 17:08 ` Peter Maydell 2025-03-18 17:18 ` Daniel P. Berrangé 2025-03-18 17:48 ` Peter Maydell 2025-03-18 11:10 ` Andreas Schwab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).