linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>, <linux-kernel@vger.kernel.org>,
	<mingo@kernel.org>, <tglx@linutronix.de>, <x86@kernel.org>,
	<joro@8bytes.org>, <uobergfe@redhat.com>, <prarit@redhat.com>
Subject: Re: [PATCH v3 2/5] x86/apic: Fix restoring boot irq mode in reboot and kexec/kdump
Date: Tue, 13 Feb 2018 11:40:41 -0600	[thread overview]
Message-ID: <87vaf03i06.fsf@xmission.com> (raw)
In-Reply-To: <bb1a2eb2-335a-b789-4274-6b62dd5ef2ef@cn.fujitsu.com> (Dou Liyang's message of "Tue, 13 Feb 2018 10:43:52 +0800")

Dou Liyang <douly.fnst@cn.fujitsu.com> writes:

> Hi Baoquan,
>
> At 02/12/2018 11:08 AM, Eric W. Biederman wrote:
>> Baoquan He <bhe@redhat.com> writes:
>>
>>> This is a regression fix.
>>>
>>> Before, to fix erratum AVR31, commit 522e66464467 ("x86/apic: Disable
>>> I/O APIC before shutdown of the local APIC") moved lapic_shutdown()
>>> calling after disable_IO_APIC(). This introdued a regression. The
>>> root cause is that disable_IO_APIC() not only clears IO_APIC, also
>>> restore boot irq mode by setting LAPIC/APIC/IMCR, lapic_shutdown()
>>> after disable_IO_APIC() will disable LAPIC and ruin the possible
>>> virtual wire mode setting which the code has been trying to do all
>>> along.
>>>
>>> The consequence is, in KVM guest kernel always prints warning as below
>>> during kexec/kdump kernel boots up. That happened in setup_local_APIC()
>>> since 'do { xxx } while (queued && max_loops > 0)' loop does not function
>
> I am not sure another thing here
>
> AFAIK, according to the order of SMP machine shutdown, other CPUs will
> be stopped firstly, then the last CPU disable its local apic.
>
>  --machine_shutdown
>    |----......
>    |----stop_other_cpus()
>    |----local_shutdown()
>
> So, the pending interrupts exist only in BSP and only be ACKed by
> BSP. is it right?(I validated this by print the value of APIC_IRR/ISR of
> all CPUs, found only BSP had the non-zero value).

We don't know.  In the kexec on panic case we try and shutdown the other
cpus but we have a timeout because we might fail.

Further you have to be careful with the concept of boot cpu.  In the
normal kexec case shutdown on the boot cpu and leave it running.  In the
kexec on panic case we shutdown on an arbitrary cpu.

> If it is right, We will do not need check the pending interrupt for each
> cpus.

It is also cheap if there are no pending interrupts as there is nothing
to do.

> BTW, the pending interrupt check code is mixed with the local
> APIC setup code, that looks messy. How about extracting the code which
> related to the crash interrupt check, put it into a new function and
> only invoked it when the CPU is BSP?

Moving it into it's own function makes sense.  Let's not taint the
concept with ``crash''.  We don't know that the only way this will
ever happen is from kexec on panic.   We only know it was easy to
trigger the condition from kexec on panic.

There are a lot of cases I can think of that interrupts might fire while
interrupts are disabled because the kernel is booting.  A normal kexec
is also possible.

Throw in MSI interrupts and transitioning from one state to another in
non-legacy apic mode and we might be more likely to get some irqs in
a pending state.

Your patch makes me nervous as it is not just code motion but a much
more substantial change.


As much as I agree that we need to fix the regression in the apic
shutdown code that is causing problems.  What we really need to do
is to completely remove the apic shutdown code from the kexec on panic
code path.  Over the long term that will provide us with a much more
robust path to getting crash dumps and the like.


Eric


> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 3fc259b4dd2d..0350528b320d 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1408,6 +1408,57 @@ static void lapic_setup_esr(void)
>                         oldvalue, value);
>  }
>
> +
> +static void clear_crash_pending_intr(void)
> +{
> +       long long max_loops = cpu_khz ? cpu_khz : 1000000;
> +       unsigned long long tsc = 0, ntsc;
> +       unsigned int value, queued;
> +       int i, j, acked = 0;
> +
> +       if (boot_cpu_has(X86_FEATURE_TSC))
> +               tsc = rdtsc();
> +       /*
> +        * After a crash, we no longer service the interrupts and a pending
> +        * interrupt from previous kernel might still have ISR bit set.
> +        *
> +        * Most probably by now CPU has serviced that pending interrupt and
> +        * it might not have done the ack_APIC_irq() because it thought,
> +        * interrupt came from i8259 as ExtInt. LAPIC did not get EOI so it
> +        * does not clear the ISR bit and cpu thinks it has already serivced
> +        * the interrupt. Hence a vector might get locked. It was noticed
> +        * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
> +        */
> +       do {
> +               queued = 0;
> +               for (i = APIC_ISR_NR - 1; i >= 0; i--)
> +                       queued |= apic_read(APIC_IRR + i*0x10);
> +
> +               for (i = APIC_ISR_NR - 1; i >= 0; i--) {
> +                       value = apic_read(APIC_ISR + i*0x10);
> +                       for (j = 31; j >= 0; j--) {
> +                               if (value & (1<<j)) {
> +                                       ack_APIC_irq();
> +                                       acked++;
> +                               }
> +                       }
> +               }
> +               if (acked > 256) {
> +                       printk(KERN_ERR "LAPIC pending interrupts after %d
> EOI\n",
> +                              acked);
> +                       break;
> +               }
> +               if (queued) {
> +                       if (boot_cpu_has(X86_FEATURE_TSC) && cpu_khz) {
> +                               ntsc = rdtsc();
> +                               max_loops = (cpu_khz << 10) - (ntsc - tsc);
> +                       } else
> +                               max_loops--;
> +               }
> +       } while (queued && max_loops > 0);
> +       WARN_ON(max_loops <= 0);
> +}
> +
>  /**
>   * setup_local_APIC - setup the local APIC
>   *
> @@ -1417,13 +1468,7 @@ static void lapic_setup_esr(void)
>  void setup_local_APIC(void)
>  {
>         int cpu = smp_processor_id();
> -       unsigned int value, queued;
> -       int i, j, acked = 0;
> -       unsigned long long tsc = 0, ntsc;
> -       long long max_loops = cpu_khz ? cpu_khz : 1000000;
> -
> -       if (boot_cpu_has(X86_FEATURE_TSC))
> -               tsc = rdtsc();
> -               tsc = rdtsc();
> +       unsigned int value;
>
>         if (disable_apic) {
>                 disable_ioapic_support();
> @@ -1475,45 +1520,8 @@ void setup_local_APIC(void)
>         value &= ~APIC_TPRI_MASK;
>         apic_write(APIC_TASKPRI, value);
>
> -       /*
> -        * After a crash, we no longer service the interrupts and a pending
> -        * interrupt from previous kernel might still have ISR bit set.
> -        *
> -        * Most probably by now CPU has serviced that pending interrupt and
> -        * it might not have done the ack_APIC_irq() because it thought,
> -        * interrupt came from i8259 as ExtInt. LAPIC did not get EOI so it
> -        * does not clear the ISR bit and cpu thinks it has already serivced
> -        * the interrupt. Hence a vector might get locked. It was noticed
> -        * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
> -        */
> -       do {
> -               queued = 0;
> -               for (i = APIC_ISR_NR - 1; i >= 0; i--)
> -                       queued |= apic_read(APIC_IRR + i*0x10);
> -
> -               for (i = APIC_ISR_NR - 1; i >= 0; i--) {
> -                       value = apic_read(APIC_ISR + i*0x10);
> -                       for (j = 31; j >= 0; j--) {
> -                               if (value & (1<<j)) {
> -                                       ack_APIC_irq();
> -                                       acked++;
> -                               }
> -                       }
> -               }
> -               if (acked > 256) {
> -                       printk(KERN_ERR "LAPIC pending interrupts after %d
> EOI\n",
> -                              acked);
> -                       break;
> -               }
> -               if (queued) {
> -                       if (boot_cpu_has(X86_FEATURE_TSC) && cpu_khz) {
> -                               ntsc = rdtsc();
> -                               max_loops = (cpu_khz << 10) - (ntsc - tsc);
> -                       } else
> -                               max_loops--;
> -               }
> -       } while (queued && max_loops > 0);
> -       WARN_ON(max_loops <= 0);
> +       if (!cpu)
> +               clear_crash_pending_intr();
>
>         /*
>          * Now that we are all set up, enable the APIC
>
>
> Thanks,
> 	dou.

  parent reply	other threads:[~2018-02-13 17:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-09 12:10 [PATCH v3 0/5] x86/apic: Fix restoring boot irq mode in reboot and kexec/kdump Baoquan He
2018-02-09 12:10 ` [PATCH v3 1/5] x86/apic: Split out restore_boot_irq_mode from disable_IO_APIC Baoquan He
2018-02-09 12:10 ` [PATCH v3 2/5] x86/apic: Fix restoring boot irq mode in reboot and kexec/kdump Baoquan He
2018-02-12  3:08   ` Eric W. Biederman
2018-02-12 10:03     ` Baoquan He
2018-02-13  2:43     ` Dou Liyang
2018-02-13  3:24       ` Baoquan He
2018-02-13 17:40       ` Eric W. Biederman [this message]
2018-02-14  3:22         ` Dou Liyang
2018-02-13  7:43     ` Baoquan He
2018-02-13 17:44       ` Eric W. Biederman
2018-02-14  2:44         ` Baoquan He
2018-02-09 12:10 ` [PATCH v3 3/5] x86/apic: Remove useless disable_IO_APIC Baoquan He
2018-02-09 12:10 ` [PATCH v3 4/5] x86/apic: Rename variable/function related to x86_io_apic_ops Baoquan He
2018-02-09 12:10 ` [PATCH v3 5/5] x86/apic: Set up through-local-APIC on boot CPU if 'noapic' specified Baoquan He
2018-02-12  4:06 ` [PATCH v3 0/5] x86/apic: Fix restoring boot irq mode in reboot and kexec/kdump Dou Liyang
2018-02-12  5:11   ` Eric W. Biederman
2018-02-12  8:58     ` Dou Liyang
2018-02-12  9:59     ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vaf03i06.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=bhe@redhat.com \
    --cc=douly.fnst@cn.fujitsu.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=prarit@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=uobergfe@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).