Re: [PULL 22/24] target-arm: ensure all cross vCPUs TLB flushes complete

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Dmitry Osipenko <digetx@gmail.com>
Cc: peter.maydell@linaro.org, "open list\:ARM" <qemu-arm@nongnu.org>,
	qemu-devel@nongnu.org
Subject: Re: [PULL 22/24] target-arm: ensure all cross vCPUs TLB flushes complete
Date: Mon, 18 Sep 2017 15:00:23 +0100	[thread overview]
Message-ID: <87d16ods3s.fsf@linaro.org> (raw)
In-Reply-To: <70057789-ab76-1150-ab2e-b5a3239a0209@gmail.com>


Dmitry Osipenko <digetx@gmail.com> writes:

> On 18.09.2017 13:10, Alex Bennée wrote:
>>
>> Dmitry Osipenko <digetx@gmail.com> writes:
>>
>>> On 17.09.2017 16:22, Alex Bennée wrote:
>>>>
>>>> Dmitry Osipenko <digetx@gmail.com> writes:
>>>>
>>>>> On 24.02.2017 14:21, Alex Bennée wrote:
>>>>>> Previously flushes on other vCPUs would only get serviced when they
>>>>>> exited their TranslationBlocks. While this isn't overly problematic it
>>>>>> violates the semantics of TLB flush from the point of view of source
>>>>>> vCPU.
>>>>>>
>>>>>> To solve this we call the cputlb *_all_cpus_synced() functions to do
>>>>>> the flushes which ensures all flushes are completed by the time the
>>>>>> vCPU next schedules its own work. As the TLB instructions are modelled
>>>>>> as CP writes the TB ends at this point meaning cpu->exit_request will
>>>>>> be checked before the next instruction is executed.
>>>>>>
>>>>>> Deferring the work until the architectural sync point is a possible
>>>>>> future optimisation.
>>>>>>
>>>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>>>> Reviewed-by: Richard Henderson <rth@twiddle.net>
>>>>>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>>>>> ---
>>>>>>  target/arm/helper.c | 165 ++++++++++++++++++++++------------------------------
>>>>>>  1 file changed, 69 insertions(+), 96 deletions(-)
>>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I have an issue with Linux kernel stopping to boot on a SMP 32bit ARM (haven't
>>>>> checked 64bit) in a single-threaded TCG mode. Kernel reaches point where it
>>>>> should mount rootfs over NFS and vCPUs stop. This issue is reproducible with any
>>>>> 32bit ARM machine type. Kernel boots fine with a MTTCG accel, only
>>>>> single-threaded TCG is affected. Git bisection lead to this patch, any
>>>>> ideas?
>>>>
>>>> It shouldn't cause a problem but can you obtain a backtrace of the
>>>> system when hung?
>>>>
>>>
>>> Actually, it looks like TCG enters infinite loop. Do you mean backtrace of QEMU
>>> by 'backtrace of the system'? If so, here it is:
>>>
>>> Thread 4 (Thread 0x7ffa37f10700 (LWP 20716)):
>>>
>>> #0  0x00007ffa601888bd in poll () at ../sysdeps/unix/syscall-template.S:84
>>>
>>> #1  0x00007ffa5e3aa561 in poll (__timeout=-1, __nfds=2, __fds=0x7ffa30006dc0) at
>>> /usr/include/bits/poll2.h:46
>>> #2  poll_func (ufds=0x7ffa30006dc0, nfds=2, timeout=-1, userdata=0x557bd603eae0)
>>> at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/thread-mainloop.c:69
>>> #3  0x00007ffa5e39bbb1 in pa_mainloop_poll (m=m@entry=0x557bd60401f0) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/mainloop.c:844
>>> #4  0x00007ffa5e39c24e in pa_mainloop_iterate (m=0x557bd60401f0,
>>> block=<optimized out>, retval=0x0) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/mainloop.c:926
>>> #5  0x00007ffa5e39c300 in pa_mainloop_run (m=0x557bd60401f0,
>>> retval=retval@entry=0x0) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/mainloop.c:944
>>>
>>> #6  0x00007ffa5e3aa4a9 in thread (userdata=0x557bd60400f0) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/thread-mainloop.c:100
>>>
>>> #7  0x00007ffa599eea38 in internal_thread_func (userdata=0x557bd603e090) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulsecore/thread-posix.c:81
>>>
>>> #8  0x00007ffa60453657 in start_thread (arg=0x7ffa37f10700) at
>>> pthread_create.c:456
>>>
>>> #9  0x00007ffa60193c5f in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>>>
>>>
>>>
>>>
>>>
>>> Thread 3 (Thread 0x7ffa4adff700 (LWP 20715)):
>>>
>>>
>>> #0  0x00007ffa53e51caf in code_gen_buffer ()
>>>
>>
>> Well it's not locked up in servicing any flush tasks as it's executing
>> code. Maybe the guest code is spinning on something?
>>
>
> Indeed, I should have used 'exec' instead of 'in_asm'.
>
>> In the monitor:
>>
>>   info registers
>>
>> Will show you where things are, see if the ip is moving each time. Also
>> you can do a disassemble dump from there to see what code it is stuck
>> on.
>>
>
> I've attached with GDB to QEMU to see where it got stuck. Turned out it is
> caused by CONFIG_STRICT_KERNEL_RWX=y of the Linux kernel. Upon boot completion
> kernel changes memory permissions and that changing is executed on a dedicated
> CPU, while other CPUs are 'stopped' in a busy loop.
>
> This patch just introduced a noticeable performance regression for a
> single-threaded TCG, which is probably fine since MTTCG is the default now.
> Thank you very much for the suggestions and all your work on MTTCG!

Hmm well it would be nice to know the exact mechanism for that failure.
If we just end up with a very long list of tasks in
cpu->queued_work_first then I guess that explains it but it would be
nice to quantify the problem.

I had trouble seeing where this loop is in the kernel code, got a pointer?

--
Alex Bennée

WARNING: multiple messages have this Message-ID (diff)

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Dmitry Osipenko <digetx@gmail.com>
Cc: peter.maydell@linaro.org, "open list:ARM" <qemu-arm@nongnu.org>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PULL 22/24] target-arm: ensure all cross vCPUs TLB flushes complete
Date: Mon, 18 Sep 2017 15:00:23 +0100	[thread overview]
Message-ID: <87d16ods3s.fsf@linaro.org> (raw)
In-Reply-To: <70057789-ab76-1150-ab2e-b5a3239a0209@gmail.com>


Dmitry Osipenko <digetx@gmail.com> writes:

> On 18.09.2017 13:10, Alex Bennée wrote:
>>
>> Dmitry Osipenko <digetx@gmail.com> writes:
>>
>>> On 17.09.2017 16:22, Alex Bennée wrote:
>>>>
>>>> Dmitry Osipenko <digetx@gmail.com> writes:
>>>>
>>>>> On 24.02.2017 14:21, Alex Bennée wrote:
>>>>>> Previously flushes on other vCPUs would only get serviced when they
>>>>>> exited their TranslationBlocks. While this isn't overly problematic it
>>>>>> violates the semantics of TLB flush from the point of view of source
>>>>>> vCPU.
>>>>>>
>>>>>> To solve this we call the cputlb *_all_cpus_synced() functions to do
>>>>>> the flushes which ensures all flushes are completed by the time the
>>>>>> vCPU next schedules its own work. As the TLB instructions are modelled
>>>>>> as CP writes the TB ends at this point meaning cpu->exit_request will
>>>>>> be checked before the next instruction is executed.
>>>>>>
>>>>>> Deferring the work until the architectural sync point is a possible
>>>>>> future optimisation.
>>>>>>
>>>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>>>> Reviewed-by: Richard Henderson <rth@twiddle.net>
>>>>>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>>>>> ---
>>>>>>  target/arm/helper.c | 165 ++++++++++++++++++++++------------------------------
>>>>>>  1 file changed, 69 insertions(+), 96 deletions(-)
>>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I have an issue with Linux kernel stopping to boot on a SMP 32bit ARM (haven't
>>>>> checked 64bit) in a single-threaded TCG mode. Kernel reaches point where it
>>>>> should mount rootfs over NFS and vCPUs stop. This issue is reproducible with any
>>>>> 32bit ARM machine type. Kernel boots fine with a MTTCG accel, only
>>>>> single-threaded TCG is affected. Git bisection lead to this patch, any
>>>>> ideas?
>>>>
>>>> It shouldn't cause a problem but can you obtain a backtrace of the
>>>> system when hung?
>>>>
>>>
>>> Actually, it looks like TCG enters infinite loop. Do you mean backtrace of QEMU
>>> by 'backtrace of the system'? If so, here it is:
>>>
>>> Thread 4 (Thread 0x7ffa37f10700 (LWP 20716)):
>>>
>>> #0  0x00007ffa601888bd in poll () at ../sysdeps/unix/syscall-template.S:84
>>>
>>> #1  0x00007ffa5e3aa561 in poll (__timeout=-1, __nfds=2, __fds=0x7ffa30006dc0) at
>>> /usr/include/bits/poll2.h:46
>>> #2  poll_func (ufds=0x7ffa30006dc0, nfds=2, timeout=-1, userdata=0x557bd603eae0)
>>> at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/thread-mainloop.c:69
>>> #3  0x00007ffa5e39bbb1 in pa_mainloop_poll (m=m@entry=0x557bd60401f0) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/mainloop.c:844
>>> #4  0x00007ffa5e39c24e in pa_mainloop_iterate (m=0x557bd60401f0,
>>> block=<optimized out>, retval=0x0) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/mainloop.c:926
>>> #5  0x00007ffa5e39c300 in pa_mainloop_run (m=0x557bd60401f0,
>>> retval=retval@entry=0x0) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/mainloop.c:944
>>>
>>> #6  0x00007ffa5e3aa4a9 in thread (userdata=0x557bd60400f0) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulse/thread-mainloop.c:100
>>>
>>> #7  0x00007ffa599eea38 in internal_thread_func (userdata=0x557bd603e090) at
>>> /var/tmp/portage/media-sound/pulseaudio-10.0/work/pulseaudio-10.0/src/pulsecore/thread-posix.c:81
>>>
>>> #8  0x00007ffa60453657 in start_thread (arg=0x7ffa37f10700) at
>>> pthread_create.c:456
>>>
>>> #9  0x00007ffa60193c5f in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>>>
>>>
>>>
>>>
>>>
>>> Thread 3 (Thread 0x7ffa4adff700 (LWP 20715)):
>>>
>>>
>>> #0  0x00007ffa53e51caf in code_gen_buffer ()
>>>
>>
>> Well it's not locked up in servicing any flush tasks as it's executing
>> code. Maybe the guest code is spinning on something?
>>
>
> Indeed, I should have used 'exec' instead of 'in_asm'.
>
>> In the monitor:
>>
>>   info registers
>>
>> Will show you where things are, see if the ip is moving each time. Also
>> you can do a disassemble dump from there to see what code it is stuck
>> on.
>>
>
> I've attached with GDB to QEMU to see where it got stuck. Turned out it is
> caused by CONFIG_STRICT_KERNEL_RWX=y of the Linux kernel. Upon boot completion
> kernel changes memory permissions and that changing is executed on a dedicated
> CPU, while other CPUs are 'stopped' in a busy loop.
>
> This patch just introduced a noticeable performance regression for a
> single-threaded TCG, which is probably fine since MTTCG is the default now.
> Thank you very much for the suggestions and all your work on MTTCG!

Hmm well it would be nice to know the exact mechanism for that failure.
If we just end up with a very long list of tasks in
cpu->queued_work_first then I guess that explains it but it would be
nice to quantify the problem.

I had trouble seeing where this loop is in the kernel code, got a pointer?

--
Alex Bennée

next prev parent reply	other threads:[~2017-09-18 14:00 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-24 11:20 [Qemu-devel] [PULL 00/24] MTTCG Base enabling patches with ARM enablement Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 01/24] docs: new design document multi-thread-tcg.txt Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 02/24] mttcg: translate-all: Enable locking debug in a debug build Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 03/24] mttcg: Add missing tb_lock/unlock() in cpu_exec_step() Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 04/24] tcg: move TCG_MO/BAR types into own file Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 05/24] tcg: add options for enabling MTTCG Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 06/24] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 07/24] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
2017-02-24 11:20 ` [PULL 08/24] tcg: drop global lock during TCG code execution Alex Bennée
2017-02-24 11:20   ` [Qemu-devel] " Alex Bennée
2017-02-27 12:48   ` Laurent Desnogues
2017-02-27 14:39     ` Alex Bennée
2017-02-27 14:39       ` Alex Bennée
2017-03-03 20:59       ` Aaron Lindsay
2017-03-03 21:08         ` Alex Bennée
2017-03-03 21:08           ` Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 09/24] tcg: remove global exit_request Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 10/24] tcg: enable tb_lock() for SoftMMU Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 11/24] tcg: enable thread-per-vCPU Alex Bennée
2017-02-27 12:48   ` Laurent Vivier
2017-02-27 14:38     ` Alex Bennée
2017-03-13 14:03       ` Laurent Vivier
2017-03-13 16:58         ` Alex Bennée
2017-03-13 18:21           ` Laurent Vivier
2017-03-16 17:31         ` Alex Bennée
2017-03-16 18:36           ` Laurent Vivier
2017-03-17 20:43         ` Alex Bennée
2017-03-18 11:19           ` Laurent Vivier
2017-03-20 11:19           ` Paolo Bonzini
2017-03-20 11:47             ` Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 12/24] tcg: handle EXCP_ATOMIC exception for system emulation Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 13/24] cputlb: add assert_cpu_is_self checks Alex Bennée
2017-02-24 11:20 ` [Qemu-devel] [PULL 14/24] cputlb: tweak qemu_ram_addr_from_host_nofail reporting Alex Bennée
2017-02-24 11:21 ` [Qemu-devel] [PULL 15/24] cputlb: introduce tlb_flush_* async work Alex Bennée
2017-02-24 11:21 ` [PULL 16/24] cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap Alex Bennée
2017-02-24 11:21   ` [Qemu-devel] " Alex Bennée
2017-02-24 11:21 ` [Qemu-devel] [PULL 17/24] cputlb: add tlb_flush_by_mmuidx async routines Alex Bennée
2017-02-24 11:21 ` [Qemu-devel] [PULL 18/24] cputlb: atomically update tlb fields used by tlb_reset_dirty Alex Bennée
2017-02-24 11:21 ` [Qemu-devel] [PULL 19/24] cputlb: introduce tlb_flush_*_all_cpus[_synced] Alex Bennée
2017-02-24 11:21 ` [PULL 20/24] target-arm/powerctl: defer cpu reset work to CPU context Alex Bennée
2017-02-24 11:21   ` [Qemu-devel] " Alex Bennée
2017-02-24 11:21 ` [PULL 21/24] target-arm: don't generate WFE/YIELD calls for MTTCG Alex Bennée
2017-02-24 11:21   ` [Qemu-devel] " Alex Bennée
2017-02-24 11:21 ` [PULL 22/24] target-arm: ensure all cross vCPUs TLB flushes complete Alex Bennée
2017-02-24 11:21   ` [Qemu-devel] " Alex Bennée
2017-09-17 13:07   ` Dmitry Osipenko
2017-09-17 13:07     ` [Qemu-devel] " Dmitry Osipenko
2017-09-17 13:22     ` Alex Bennée
2017-09-17 13:22       ` [Qemu-devel] " Alex Bennée
2017-09-17 13:46       ` Dmitry Osipenko
2017-09-17 13:46         ` [Qemu-devel] " Dmitry Osipenko
2017-09-18 10:10         ` Alex Bennée
2017-09-18 10:10           ` [Qemu-devel] " Alex Bennée
2017-09-18 12:23           ` Dmitry Osipenko
2017-09-18 12:23             ` [Qemu-devel] " Dmitry Osipenko
2017-09-18 14:00             ` Alex Bennée [this message]
2017-09-18 14:00               ` Alex Bennée
2017-09-18 15:32               ` Dmitry Osipenko
2017-09-18 15:32                 ` [Qemu-devel] " Dmitry Osipenko
2017-02-24 11:21 ` [PULL 23/24] hw/misc/imx6_src: defer clearing of SRC_SCR reset bits Alex Bennée
2017-02-24 11:21   ` [Qemu-devel] " Alex Bennée
2017-02-24 11:21 ` [PULL 24/24] tcg: enable MTTCG by default for ARM on x86 hosts Alex Bennée
2017-02-24 11:21   ` [Qemu-devel] " Alex Bennée
2017-02-25 21:14 ` [Qemu-devel] [PULL 00/24] MTTCG Base enabling patches with ARM enablement Peter Maydell
2017-02-27  8:48   ` Christian Borntraeger
2017-02-27  9:11     ` Alex Bennée
2017-02-27  9:25       ` Christian Borntraeger
2017-02-27  9:35       ` Christian Borntraeger
2017-02-27 12:39 ` Paolo Bonzini
2017-02-27 15:48   ` Alex Bennée
2017-02-27 16:17     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d16ods3s.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=digetx@gmail.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.