qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Sven Schnelle <svens@linux.ibm.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: David Hildenbrand <david@redhat.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Liam Howlett <liam.howlett@oracle.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Guenter Roeck <linux@roeck-us.net>,
	"maple-tree@lists.infradead.org" <maple-tree@lists.infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Yu Zhao <yuzhao@google.com>, Juergen Gross <jgross@suse.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Andreas Krebbel <krebbel@linux.ibm.com>,
	Ilya Leoshkevich <iii@linux.ibm.com>,
	Thomas Huth <thuth@redhat.com>,
	richard.henderson@linaro.org, qemu-devel@nongnu.org,
	qemu-s390x@nongnu.org
Subject: Re: qemu-system-s390x hang in tcg
Date: Wed, 29 Jun 2022 12:46:01 +0200	[thread overview]
Message-ID: <yt9dbkubhhna.fsf@linux.ibm.com> (raw)
In-Reply-To: <87pmirj3aq.fsf@linaro.org> ("Alex Bennée"'s message of "Wed, 29 Jun 2022 09:10:57 +0100")

Alex Bennée <alex.bennee@linaro.org> writes:

> Sven Schnelle <svens@linux.ibm.com> writes:
>
>> Hi,
>>
>> David Hildenbrand <david@redhat.com> writes:
>>
>>> On 04.05.22 09:37, Janosch Frank wrote:
>>>> I had a short look yesterday and the boot usually hangs in the raid6 
>>>> code. Disabling vector instructions didn't make a difference but a few 
>>>> interruptions via GDB solve the problem for some reason.
>>>> 
>>>> CCing David and Thomas for TCG
>>>> 
>>>
>>> I somehow recall that KASAN was always disabled under TCG, I might be
>>> wrong (I thought we'd get a message early during boot that the HW
>>> doesn't support KASAN).
>>>
>>> I recall that raid code is a heavy user of vector instructions.
>>>
>>> How can I reproduce? Compile upstream (or -next?) with kasan support and
>>> run it under TCG?
>>
>> I spent some time looking into this. It's usually hanging in
>> s390vx8_gen_syndrome(). My first thought was that it is a problem with
>> the VX instructions, but turned out that it hangs even if i remove all
>> the code from s390vx8_gen_syndrome().
>>
>> Tracing the execution of TB's, i see that the generated code is always
>> jumping between a few TB's, but never exiting the TB's to check for
>> interrupts (i.e. return to cpu_tb_exec(). I only see calls to
>> helper_lookup_tb_ptr to lookup the tb pointer for the next TB.
>>
>> The raid6 code is waiting for some time to expire by reading jiffies,
>> but interrupts are never processed and therefore jiffies doesn't change.
>> So the raid6 code hangs forever.
>>
>> As a test, i made a quick change to test:
>>
>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> index c997c2e8e0..35819fd5a7 100644
>> --- a/accel/tcg/cpu-exec.c
>> +++ b/accel/tcg/cpu-exec.c
>> @@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
>>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>>
>>      cflags = curr_cflags(cpu);
>> -    if (check_for_breakpoints(cpu, pc, &cflags)) {
>> +    if (check_for_breakpoints(cpu, pc, &cflags) ||
>> +        unlikely(qatomic_read(&cpu->interrupt_request))) {
>>          cpu_loop_exit(cpu);
>>      }
>>
>> And that makes the problem go away. But i'm not familiar with the TCG
>> internals, so i can't say whether the generated code is incorrect or
>> something else is wrong. I have tcg log files of a failing + working run
>> if someone wants to take a look. They are rather large so i would have to
>> upload them somewhere.
>
> Whatever is setting cpu->interrupt_request should be calling
> cpu_exit(cpu) which sets the exit flag which is checked at the start of
> every TB execution (see gen_tb_start).

Thanks, that was very helpful. I added debugging and it turned out
that the TB is left because of a pending irq. The code then calls
s390_cpu_exec_interrupt:

bool s390_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
{
    if (interrupt_request & CPU_INTERRUPT_HARD) {
        S390CPU *cpu = S390_CPU(cs);
        CPUS390XState *env = &cpu->env;

        if (env->ex_value) {
            /* Execution of the target insn is indivisible from
               the parent EXECUTE insn.  */
            return false;
        }
        if (s390_cpu_has_int(cpu)) {
            s390_cpu_do_interrupt(cs);
            return true;
        }
        if (env->psw.mask & PSW_MASK_WAIT) {
            /* Woken up because of a floating interrupt but it has already
             * been delivered. Go back to sleep. */
            cpu_interrupt(CPU(cpu), CPU_INTERRUPT_HALT);
        }
    }
    return false;
}

Note the 'if (env->ex_value) { }' check. It looks like this function
just returns false in case tcg is executing an EX instruction. After
that the information that the TB should be exited because of an
interrupt is gone. So the TB's are never exited again, although the
interrupt wasn't handled. At least that's my assumption now, if i'm
wrong please tell me.

So the raid6 code is spinning waiting that the jiffies value reaches a
timeout, but as the timer interrupt was lost it will never change.

So i wonder now how this could be fixed.


  reply	other threads:[~2022-06-29 14:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220426150616.3937571-24-Liam.Howlett@oracle.com>
     [not found] ` <20220428201947.GA1912192@roeck-us.net>
     [not found]   ` <20220429003841.cx7uenepca22qbdl@revolver>
     [not found]     ` <20220428181621.636487e753422ad0faf09bd6@linux-foundation.org>
     [not found]       ` <20220502001358.s2azy37zcc27vgdb@revolver>
     [not found]         ` <20220501172412.50268e7b217d0963293e7314@linux-foundation.org>
     [not found]           ` <Ym+v4lfU5IyxkGc4@osiris>
     [not found]             ` <20220502133050.kuy2kjkzv6msokeb@revolver>
     [not found]               ` <YnAn3FI9aVCi/xKd@osiris>
     [not found]                 ` <YnGHJ7oroqF+v1u+@osiris>
     [not found]                   ` <20220503215520.qpaukvjq55o7qwu3@revolver>
     [not found]                     ` <60a3bc3f-5cd6-79ac-a7a8-4ecc3d7fd3db@linux.ibm.com>
     [not found]                       ` <15f5f8d6-dc92-d491-d455-dd6b22b34bc3@redhat.com>
2022-06-29  7:04                         ` qemu-system-s390x hang in tcg (was: Re: [PATCH v8 23/70] mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()) Sven Schnelle
2022-06-29  8:10                           ` Alex Bennée
2022-06-29 10:46                             ` Sven Schnelle [this message]
2022-06-29 12:18                               ` qemu-system-s390x hang in tcg Sven Schnelle
2022-06-29 14:52                                 ` Alex Bennée
2022-06-30  3:03                               ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yt9dbkubhhna.fsf@linux.ibm.com \
    --to=svens@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.bennee@linaro.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jgross@suse.com \
    --cc=krebbel@linux.ibm.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@roeck-us.net \
    --cc=maple-tree@lists.infradead.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=thuth@redhat.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).