* qemu-system-s390x hang in tcg (was: Re: [PATCH v8 23/70] mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()) [not found] ` <15f5f8d6-dc92-d491-d455-dd6b22b34bc3@redhat.com> @ 2022-06-29 7:04 ` Sven Schnelle 2022-06-29 8:10 ` Alex Bennée 0 siblings, 1 reply; 6+ messages in thread From: Sven Schnelle @ 2022-06-29 7:04 UTC (permalink / raw) To: David Hildenbrand Cc: Janosch Frank, Liam Howlett, Heiko Carstens, Claudio Imbrenda, Andrew Morton, Guenter Roeck, maple-tree@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao, Juergen Gross, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger, Andreas Krebbel, Ilya Leoshkevich, Thomas Huth, richard.henderson, qemu-devel, qemu-s390x Hi, David Hildenbrand <david@redhat.com> writes: > On 04.05.22 09:37, Janosch Frank wrote: >> I had a short look yesterday and the boot usually hangs in the raid6 >> code. Disabling vector instructions didn't make a difference but a few >> interruptions via GDB solve the problem for some reason. >> >> CCing David and Thomas for TCG >> > > I somehow recall that KASAN was always disabled under TCG, I might be > wrong (I thought we'd get a message early during boot that the HW > doesn't support KASAN). > > I recall that raid code is a heavy user of vector instructions. > > How can I reproduce? Compile upstream (or -next?) with kasan support and > run it under TCG? I spent some time looking into this. It's usually hanging in s390vx8_gen_syndrome(). My first thought was that it is a problem with the VX instructions, but turned out that it hangs even if i remove all the code from s390vx8_gen_syndrome(). Tracing the execution of TB's, i see that the generated code is always jumping between a few TB's, but never exiting the TB's to check for interrupts (i.e. return to cpu_tb_exec(). I only see calls to helper_lookup_tb_ptr to lookup the tb pointer for the next TB. The raid6 code is waiting for some time to expire by reading jiffies, but interrupts are never processed and therefore jiffies doesn't change. So the raid6 code hangs forever. As a test, i made a quick change to test: diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index c997c2e8e0..35819fd5a7 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); cflags = curr_cflags(cpu); - if (check_for_breakpoints(cpu, pc, &cflags)) { + if (check_for_breakpoints(cpu, pc, &cflags) || + unlikely(qatomic_read(&cpu->interrupt_request))) { cpu_loop_exit(cpu); } And that makes the problem go away. But i'm not familiar with the TCG internals, so i can't say whether the generated code is incorrect or something else is wrong. I have tcg log files of a failing + working run if someone wants to take a look. They are rather large so i would have to upload them somewhere. ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: qemu-system-s390x hang in tcg (was: Re: [PATCH v8 23/70] mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()) 2022-06-29 7:04 ` qemu-system-s390x hang in tcg (was: Re: [PATCH v8 23/70] mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()) Sven Schnelle @ 2022-06-29 8:10 ` Alex Bennée 2022-06-29 10:46 ` qemu-system-s390x hang in tcg Sven Schnelle 0 siblings, 1 reply; 6+ messages in thread From: Alex Bennée @ 2022-06-29 8:10 UTC (permalink / raw) To: Sven Schnelle Cc: David Hildenbrand, Janosch Frank, Liam Howlett, Heiko Carstens, Claudio Imbrenda, Andrew Morton, Guenter Roeck, maple-tree@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao, Juergen Gross, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger, Andreas Krebbel, Ilya Leoshkevich, Thomas Huth, richard.henderson, qemu-devel, qemu-s390x Sven Schnelle <svens@linux.ibm.com> writes: > Hi, > > David Hildenbrand <david@redhat.com> writes: > >> On 04.05.22 09:37, Janosch Frank wrote: >>> I had a short look yesterday and the boot usually hangs in the raid6 >>> code. Disabling vector instructions didn't make a difference but a few >>> interruptions via GDB solve the problem for some reason. >>> >>> CCing David and Thomas for TCG >>> >> >> I somehow recall that KASAN was always disabled under TCG, I might be >> wrong (I thought we'd get a message early during boot that the HW >> doesn't support KASAN). >> >> I recall that raid code is a heavy user of vector instructions. >> >> How can I reproduce? Compile upstream (or -next?) with kasan support and >> run it under TCG? > > I spent some time looking into this. It's usually hanging in > s390vx8_gen_syndrome(). My first thought was that it is a problem with > the VX instructions, but turned out that it hangs even if i remove all > the code from s390vx8_gen_syndrome(). > > Tracing the execution of TB's, i see that the generated code is always > jumping between a few TB's, but never exiting the TB's to check for > interrupts (i.e. return to cpu_tb_exec(). I only see calls to > helper_lookup_tb_ptr to lookup the tb pointer for the next TB. > > The raid6 code is waiting for some time to expire by reading jiffies, > but interrupts are never processed and therefore jiffies doesn't change. > So the raid6 code hangs forever. > > As a test, i made a quick change to test: > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index c997c2e8e0..35819fd5a7 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) > cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); > > cflags = curr_cflags(cpu); > - if (check_for_breakpoints(cpu, pc, &cflags)) { > + if (check_for_breakpoints(cpu, pc, &cflags) || > + unlikely(qatomic_read(&cpu->interrupt_request))) { > cpu_loop_exit(cpu); > } > > And that makes the problem go away. But i'm not familiar with the TCG > internals, so i can't say whether the generated code is incorrect or > something else is wrong. I have tcg log files of a failing + working run > if someone wants to take a look. They are rather large so i would have to > upload them somewhere. Whatever is setting cpu->interrupt_request should be calling cpu_exit(cpu) which sets the exit flag which is checked at the start of every TB execution (see gen_tb_start). -- Alex Bennée ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qemu-system-s390x hang in tcg 2022-06-29 8:10 ` Alex Bennée @ 2022-06-29 10:46 ` Sven Schnelle 2022-06-29 12:18 ` Sven Schnelle 2022-06-30 3:03 ` Richard Henderson 0 siblings, 2 replies; 6+ messages in thread From: Sven Schnelle @ 2022-06-29 10:46 UTC (permalink / raw) To: Alex Bennée Cc: David Hildenbrand, Janosch Frank, Liam Howlett, Heiko Carstens, Claudio Imbrenda, Andrew Morton, Guenter Roeck, maple-tree@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao, Juergen Gross, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger, Andreas Krebbel, Ilya Leoshkevich, Thomas Huth, richard.henderson, qemu-devel, qemu-s390x Alex Bennée <alex.bennee@linaro.org> writes: > Sven Schnelle <svens@linux.ibm.com> writes: > >> Hi, >> >> David Hildenbrand <david@redhat.com> writes: >> >>> On 04.05.22 09:37, Janosch Frank wrote: >>>> I had a short look yesterday and the boot usually hangs in the raid6 >>>> code. Disabling vector instructions didn't make a difference but a few >>>> interruptions via GDB solve the problem for some reason. >>>> >>>> CCing David and Thomas for TCG >>>> >>> >>> I somehow recall that KASAN was always disabled under TCG, I might be >>> wrong (I thought we'd get a message early during boot that the HW >>> doesn't support KASAN). >>> >>> I recall that raid code is a heavy user of vector instructions. >>> >>> How can I reproduce? Compile upstream (or -next?) with kasan support and >>> run it under TCG? >> >> I spent some time looking into this. It's usually hanging in >> s390vx8_gen_syndrome(). My first thought was that it is a problem with >> the VX instructions, but turned out that it hangs even if i remove all >> the code from s390vx8_gen_syndrome(). >> >> Tracing the execution of TB's, i see that the generated code is always >> jumping between a few TB's, but never exiting the TB's to check for >> interrupts (i.e. return to cpu_tb_exec(). I only see calls to >> helper_lookup_tb_ptr to lookup the tb pointer for the next TB. >> >> The raid6 code is waiting for some time to expire by reading jiffies, >> but interrupts are never processed and therefore jiffies doesn't change. >> So the raid6 code hangs forever. >> >> As a test, i made a quick change to test: >> >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c >> index c997c2e8e0..35819fd5a7 100644 >> --- a/accel/tcg/cpu-exec.c >> +++ b/accel/tcg/cpu-exec.c >> @@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) >> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); >> >> cflags = curr_cflags(cpu); >> - if (check_for_breakpoints(cpu, pc, &cflags)) { >> + if (check_for_breakpoints(cpu, pc, &cflags) || >> + unlikely(qatomic_read(&cpu->interrupt_request))) { >> cpu_loop_exit(cpu); >> } >> >> And that makes the problem go away. But i'm not familiar with the TCG >> internals, so i can't say whether the generated code is incorrect or >> something else is wrong. I have tcg log files of a failing + working run >> if someone wants to take a look. They are rather large so i would have to >> upload them somewhere. > > Whatever is setting cpu->interrupt_request should be calling > cpu_exit(cpu) which sets the exit flag which is checked at the start of > every TB execution (see gen_tb_start). Thanks, that was very helpful. I added debugging and it turned out that the TB is left because of a pending irq. The code then calls s390_cpu_exec_interrupt: bool s390_cpu_exec_interrupt(CPUState *cs, int interrupt_request) { if (interrupt_request & CPU_INTERRUPT_HARD) { S390CPU *cpu = S390_CPU(cs); CPUS390XState *env = &cpu->env; if (env->ex_value) { /* Execution of the target insn is indivisible from the parent EXECUTE insn. */ return false; } if (s390_cpu_has_int(cpu)) { s390_cpu_do_interrupt(cs); return true; } if (env->psw.mask & PSW_MASK_WAIT) { /* Woken up because of a floating interrupt but it has already * been delivered. Go back to sleep. */ cpu_interrupt(CPU(cpu), CPU_INTERRUPT_HALT); } } return false; } Note the 'if (env->ex_value) { }' check. It looks like this function just returns false in case tcg is executing an EX instruction. After that the information that the TB should be exited because of an interrupt is gone. So the TB's are never exited again, although the interrupt wasn't handled. At least that's my assumption now, if i'm wrong please tell me. So the raid6 code is spinning waiting that the jiffies value reaches a timeout, but as the timer interrupt was lost it will never change. So i wonder now how this could be fixed. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qemu-system-s390x hang in tcg 2022-06-29 10:46 ` qemu-system-s390x hang in tcg Sven Schnelle @ 2022-06-29 12:18 ` Sven Schnelle 2022-06-29 14:52 ` Alex Bennée 2022-06-30 3:03 ` Richard Henderson 1 sibling, 1 reply; 6+ messages in thread From: Sven Schnelle @ 2022-06-29 12:18 UTC (permalink / raw) To: Alex Bennée Cc: David Hildenbrand, Janosch Frank, Liam Howlett, Heiko Carstens, Claudio Imbrenda, Andrew Morton, Guenter Roeck, maple-tree@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao, Juergen Gross, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger, Andreas Krebbel, Ilya Leoshkevich, Thomas Huth, richard.henderson, qemu-devel, qemu-s390x Sven Schnelle <svens@linux.ibm.com> writes: > Alex Bennée <alex.bennee@linaro.org> writes: > >> Sven Schnelle <svens@linux.ibm.com> writes: >> >>> Hi, >>> >>> David Hildenbrand <david@redhat.com> writes: >>> >>>> On 04.05.22 09:37, Janosch Frank wrote: >>>>> I had a short look yesterday and the boot usually hangs in the raid6 >>>>> code. Disabling vector instructions didn't make a difference but a few >>>>> interruptions via GDB solve the problem for some reason. >>>>> >>>>> CCing David and Thomas for TCG >>>>> >>>> >>>> I somehow recall that KASAN was always disabled under TCG, I might be >>>> wrong (I thought we'd get a message early during boot that the HW >>>> doesn't support KASAN). >>>> >>>> I recall that raid code is a heavy user of vector instructions. >>>> >>>> How can I reproduce? Compile upstream (or -next?) with kasan support and >>>> run it under TCG? >>> >>> I spent some time looking into this. It's usually hanging in >>> s390vx8_gen_syndrome(). My first thought was that it is a problem with >>> the VX instructions, but turned out that it hangs even if i remove all >>> the code from s390vx8_gen_syndrome(). >>> >>> Tracing the execution of TB's, i see that the generated code is always >>> jumping between a few TB's, but never exiting the TB's to check for >>> interrupts (i.e. return to cpu_tb_exec(). I only see calls to >>> helper_lookup_tb_ptr to lookup the tb pointer for the next TB. >>> >>> The raid6 code is waiting for some time to expire by reading jiffies, >>> but interrupts are never processed and therefore jiffies doesn't change. >>> So the raid6 code hangs forever. >>> >>> As a test, i made a quick change to test: >>> >>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c >>> index c997c2e8e0..35819fd5a7 100644 >>> --- a/accel/tcg/cpu-exec.c >>> +++ b/accel/tcg/cpu-exec.c >>> @@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) >>> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); >>> >>> cflags = curr_cflags(cpu); >>> - if (check_for_breakpoints(cpu, pc, &cflags)) { >>> + if (check_for_breakpoints(cpu, pc, &cflags) || >>> + unlikely(qatomic_read(&cpu->interrupt_request))) { >>> cpu_loop_exit(cpu); >>> } >>> >>> And that makes the problem go away. But i'm not familiar with the TCG >>> internals, so i can't say whether the generated code is incorrect or >>> something else is wrong. I have tcg log files of a failing + working run >>> if someone wants to take a look. They are rather large so i would have to >>> upload them somewhere. >> >> Whatever is setting cpu->interrupt_request should be calling >> cpu_exit(cpu) which sets the exit flag which is checked at the start of >> every TB execution (see gen_tb_start). > > Thanks, that was very helpful. I added debugging and it turned out > that the TB is left because of a pending irq. The code then calls > s390_cpu_exec_interrupt: > > bool s390_cpu_exec_interrupt(CPUState *cs, int interrupt_request) > { > if (interrupt_request & CPU_INTERRUPT_HARD) { > S390CPU *cpu = S390_CPU(cs); > CPUS390XState *env = &cpu->env; > > if (env->ex_value) { > /* Execution of the target insn is indivisible from > the parent EXECUTE insn. */ > return false; > } > if (s390_cpu_has_int(cpu)) { > s390_cpu_do_interrupt(cs); > return true; > } > if (env->psw.mask & PSW_MASK_WAIT) { > /* Woken up because of a floating interrupt but it has already > * been delivered. Go back to sleep. */ > cpu_interrupt(CPU(cpu), CPU_INTERRUPT_HALT); > } > } > return false; > } > > Note the 'if (env->ex_value) { }' check. It looks like this function > just returns false in case tcg is executing an EX instruction. After > that the information that the TB should be exited because of an > interrupt is gone. So the TB's are never exited again, although the > interrupt wasn't handled. At least that's my assumption now, if i'm > wrong please tell me. Looking at the code i see CF_NOIRQ to prevent TB's from getting interrupted. But i only see that used in the core tcg code. Would that be a possibility, or is there something else/better? Sorry for the dumb questions, i'm not often working on qemu ;-) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qemu-system-s390x hang in tcg 2022-06-29 12:18 ` Sven Schnelle @ 2022-06-29 14:52 ` Alex Bennée 0 siblings, 0 replies; 6+ messages in thread From: Alex Bennée @ 2022-06-29 14:52 UTC (permalink / raw) To: Sven Schnelle Cc: David Hildenbrand, Janosch Frank, Liam Howlett, Heiko Carstens, Claudio Imbrenda, Andrew Morton, Guenter Roeck, maple-tree@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao, Juergen Gross, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger, Andreas Krebbel, Ilya Leoshkevich, Thomas Huth, richard.henderson, qemu-devel, qemu-s390x Sven Schnelle <svens@linux.ibm.com> writes: > Sven Schnelle <svens@linux.ibm.com> writes: > >> Alex Bennée <alex.bennee@linaro.org> writes: >> >>> Sven Schnelle <svens@linux.ibm.com> writes: >>> >>>> Hi, >>>> >>>> David Hildenbrand <david@redhat.com> writes: >>>> >>>>> On 04.05.22 09:37, Janosch Frank wrote: >>>>>> I had a short look yesterday and the boot usually hangs in the raid6 >>>>>> code. Disabling vector instructions didn't make a difference but a few >>>>>> interruptions via GDB solve the problem for some reason. >>>>>> >>>>>> CCing David and Thomas for TCG >>>>>> >>>>> >>>>> I somehow recall that KASAN was always disabled under TCG, I might be >>>>> wrong (I thought we'd get a message early during boot that the HW >>>>> doesn't support KASAN). >>>>> >>>>> I recall that raid code is a heavy user of vector instructions. >>>>> >>>>> How can I reproduce? Compile upstream (or -next?) with kasan support and >>>>> run it under TCG? >>>> >>>> I spent some time looking into this. It's usually hanging in >>>> s390vx8_gen_syndrome(). My first thought was that it is a problem with >>>> the VX instructions, but turned out that it hangs even if i remove all >>>> the code from s390vx8_gen_syndrome(). >>>> >>>> Tracing the execution of TB's, i see that the generated code is always >>>> jumping between a few TB's, but never exiting the TB's to check for >>>> interrupts (i.e. return to cpu_tb_exec(). I only see calls to >>>> helper_lookup_tb_ptr to lookup the tb pointer for the next TB. >>>> >>>> The raid6 code is waiting for some time to expire by reading jiffies, >>>> but interrupts are never processed and therefore jiffies doesn't change. >>>> So the raid6 code hangs forever. >>>> >>>> As a test, i made a quick change to test: >>>> >>>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c >>>> index c997c2e8e0..35819fd5a7 100644 >>>> --- a/accel/tcg/cpu-exec.c >>>> +++ b/accel/tcg/cpu-exec.c >>>> @@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) >>>> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); >>>> >>>> cflags = curr_cflags(cpu); >>>> - if (check_for_breakpoints(cpu, pc, &cflags)) { >>>> + if (check_for_breakpoints(cpu, pc, &cflags) || >>>> + unlikely(qatomic_read(&cpu->interrupt_request))) { >>>> cpu_loop_exit(cpu); >>>> } >>>> >>>> And that makes the problem go away. But i'm not familiar with the TCG >>>> internals, so i can't say whether the generated code is incorrect or >>>> something else is wrong. I have tcg log files of a failing + working run >>>> if someone wants to take a look. They are rather large so i would have to >>>> upload them somewhere. >>> >>> Whatever is setting cpu->interrupt_request should be calling >>> cpu_exit(cpu) which sets the exit flag which is checked at the start of >>> every TB execution (see gen_tb_start). >> >> Thanks, that was very helpful. I added debugging and it turned out >> that the TB is left because of a pending irq. The code then calls >> s390_cpu_exec_interrupt: >> >> bool s390_cpu_exec_interrupt(CPUState *cs, int interrupt_request) >> { >> if (interrupt_request & CPU_INTERRUPT_HARD) { >> S390CPU *cpu = S390_CPU(cs); >> CPUS390XState *env = &cpu->env; >> >> if (env->ex_value) { >> /* Execution of the target insn is indivisible from >> the parent EXECUTE insn. */ >> return false; >> } >> if (s390_cpu_has_int(cpu)) { >> s390_cpu_do_interrupt(cs); >> return true; >> } >> if (env->psw.mask & PSW_MASK_WAIT) { >> /* Woken up because of a floating interrupt but it has already >> * been delivered. Go back to sleep. */ >> cpu_interrupt(CPU(cpu), CPU_INTERRUPT_HALT); >> } >> } >> return false; >> } >> >> Note the 'if (env->ex_value) { }' check. It looks like this function >> just returns false in case tcg is executing an EX instruction. After >> that the information that the TB should be exited because of an >> interrupt is gone. So the TB's are never exited again, although the >> interrupt wasn't handled. At least that's my assumption now, if i'm >> wrong please tell me. > > Looking at the code i see CF_NOIRQ to prevent TB's from getting > interrupted. But i only see that used in the core tcg code. Would > that be a possibility, or is there something else/better? Yes CF_NOIRQ is exactly the compiler flag you would use to prevent a block from exiting early when you absolutely want to execute the next block. We currently only use it from core code to deal with icount related things but I can see it's use here. I would probably still wrap it in a common function in cpu-exec-common.c I'm unsure of the exact semantics for s390 so I will defer to Richard and others but something like (untested): /* * Ensure the next N instructions are not interrupted by IRQ checks. */ void cpu_loop_exit_unint(CPUState *cpu, uintptr_t pc, int len) { if (pc) { cpu_restore_state(cpu, pc, true); } cpu->cflags_next_tb = len | CF_LAST_IO | CF_NOIRQ | curr_cflags(cpu); cpu_loop_exit(cpu); } And then in HELPER(ex) you can end the helper with: void HELPER(ex)(CPUS390XState *env, uint32_t ilen, uint64_t r1, uint64_t addr) { ... /* * We must execute the next instruction exclusively so exit the loop * and trigger a NOIRQ TB which won't check for an interrupt until * it finishes executing. */ cpu_loop_exit_unint(cpu, 0, 1); } Some notes: * Take care to ensure the CPU state is synchronised Which means the helper cannot use the flags TCG_CALL_NO_(READ_GLOBALS|WRITE_GLOBALS|SIDE_EFFECTS). And you you will to make sure you write the current PC in the tcg gen code in op_ex() * I think the env->ex_value can be removed after this * We will actually exit the execution loop (via a sigjmp) but the IRQ check in cpu_handle_interrupt() will be skipped due to the custom flags. When the next block is looked up (or generated) it will be entered but then immediately exit * I think even a branch to self should work because the second iteration will be interuptable > Sorry for the dumb questions, i'm not often working on qemu ;-) There are no dumb questions, just opportunities for better documentation ;-) -- Alex Bennée ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: qemu-system-s390x hang in tcg 2022-06-29 10:46 ` qemu-system-s390x hang in tcg Sven Schnelle 2022-06-29 12:18 ` Sven Schnelle @ 2022-06-30 3:03 ` Richard Henderson 1 sibling, 0 replies; 6+ messages in thread From: Richard Henderson @ 2022-06-30 3:03 UTC (permalink / raw) To: Sven Schnelle, Alex Bennée Cc: David Hildenbrand, Janosch Frank, Liam Howlett, Heiko Carstens, Claudio Imbrenda, Andrew Morton, Guenter Roeck, maple-tree@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao, Juergen Gross, Vasily Gorbik, Alexander Gordeev, Christian Borntraeger, Andreas Krebbel, Ilya Leoshkevich, Thomas Huth, qemu-devel, qemu-s390x On 6/29/22 16:16, Sven Schnelle wrote: > Thanks, that was very helpful. I added debugging and it turned out > that the TB is left because of a pending irq. The code then calls > s390_cpu_exec_interrupt: > > bool s390_cpu_exec_interrupt(CPUState *cs, int interrupt_request) > { > if (interrupt_request & CPU_INTERRUPT_HARD) { > S390CPU *cpu = S390_CPU(cs); > CPUS390XState *env = &cpu->env; > > if (env->ex_value) { > /* Execution of the target insn is indivisible from > the parent EXECUTE insn. */ > return false; > } > if (s390_cpu_has_int(cpu)) { > s390_cpu_do_interrupt(cs); > return true; > } > if (env->psw.mask & PSW_MASK_WAIT) { > /* Woken up because of a floating interrupt but it has already > * been delivered. Go back to sleep. */ > cpu_interrupt(CPU(cpu), CPU_INTERRUPT_HALT); > } > } > return false; > } > > Note the 'if (env->ex_value) { }' check. It looks like this function > just returns false in case tcg is executing an EX instruction. After > that the information that the TB should be exited because of an > interrupt is gone. So the TB's are never exited again, although the > interrupt wasn't handled. At least that's my assumption now, if i'm > wrong please tell me. Ah, yes, I see. We wanted to treat ex_value != 0 as if interrupts are disabled, because we have no way of stacking that value for re-execution after the interrupt (which itself could use EXECUTE). One solution might be to zap ex_value and arrange to re-execute the EXECUTE instruction after the interrupt. Another solution is to generate an exit from any TB translating ex_value, so that interrupts are re-examined. This is probably cleanest. I'll prepare a patch. r~ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-06-30 3:04 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20220426150616.3937571-24-Liam.Howlett@oracle.com> [not found] ` <20220428201947.GA1912192@roeck-us.net> [not found] ` <20220429003841.cx7uenepca22qbdl@revolver> [not found] ` <20220428181621.636487e753422ad0faf09bd6@linux-foundation.org> [not found] ` <20220502001358.s2azy37zcc27vgdb@revolver> [not found] ` <20220501172412.50268e7b217d0963293e7314@linux-foundation.org> [not found] ` <Ym+v4lfU5IyxkGc4@osiris> [not found] ` <20220502133050.kuy2kjkzv6msokeb@revolver> [not found] ` <YnAn3FI9aVCi/xKd@osiris> [not found] ` <YnGHJ7oroqF+v1u+@osiris> [not found] ` <20220503215520.qpaukvjq55o7qwu3@revolver> [not found] ` <60a3bc3f-5cd6-79ac-a7a8-4ecc3d7fd3db@linux.ibm.com> [not found] ` <15f5f8d6-dc92-d491-d455-dd6b22b34bc3@redhat.com> 2022-06-29 7:04 ` qemu-system-s390x hang in tcg (was: Re: [PATCH v8 23/70] mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()) Sven Schnelle 2022-06-29 8:10 ` Alex Bennée 2022-06-29 10:46 ` qemu-system-s390x hang in tcg Sven Schnelle 2022-06-29 12:18 ` Sven Schnelle 2022-06-29 14:52 ` Alex Bennée 2022-06-30 3:03 ` Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).