public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
From: "Zhang, Rui" <rui.zhang@intel.com>
To: "jmattson@google.com" <jmattson@google.com>
Cc: "ajorgens@google.com" <ajorgens@google.com>,
	"myrade@google.com" <myrade@google.com>,
	"bp@alien8.de" <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"Tang, Feng" <feng.tang@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"jay.chen@amd.com" <jay.chen@amd.com>,
	"vladteodor@google.com" <vladteodor@google.com>,
	"jon.grimm@amd.com" <jon.grimm@amd.com>
Subject: Re: [RFC PATCH] x86/acpi: Ignore invalid x2APIC entries
Date: Tue, 15 Oct 2024 03:23:27 +0000	[thread overview]
Message-ID: <ac85fe73b3d7f46ce96d5033f8cf58a1b6b001f3.camel@intel.com> (raw)
In-Reply-To: <CALMp9eQRsQ7hs9vhDGzbyRaqyOuaHDFFgc6VSr9Ui1=J_4s9Nw@mail.gmail.com>

On Mon, 2024-10-14 at 11:00 -0700, Jim Mattson wrote:
> On Mon, Oct 14, 2024 at 6:05 AM Zhang, Rui <rui.zhang@intel.com>
> wrote:
> > 
> > > > > 
> > > > > TBH, I'm not sure that there is actually anything wrong with
> > > > > the
> > > > > new
> > > > > numbering scheme.
> > > > >  The topology is reported correctly (e.g. in
> > > > > /sys/devices/system/cpu/cpu0/topology/thread_siblings_list).
> > > > > Yet,
> > > > > the
> > > > > new enumeration does seem to contradict user expectations.
> > > > > 
> > > > 
> > > > Well, we can say this is a violation of the ACPI spec.
> > > > "OSPM should initialize processors in the order that they
> > > > appear in
> > > > the
> > > > MADT." even for interleaved LAPIC and X2APIC entries.
> > > 
> > > Ah. Thanks. I didn't know that.
> > > 
> > > > Maybe we need two steps for LAPIC/X2APIC parsing.
> > > > 1. check if there is valid LAPIC entry by going through all
> > > > LAPIC
> > > > entries first
> > > > 2. parse LAPIC/X2APIC strictly following the order in MADT.
> > > > (like
> > > > we do
> > > > before)
> > > 
> > > That makes sense to me.
> > > 
> > > Thanks,
> > > 
> > > --jim
> > 
> > Hi, Jim,
> > 
> > Please check if below patch restores the CPU IDs or not.
> > 
> > thanks,
> > rui
> > 
> > From ec786dfe693cad2810b54b0d8afbfc7e4c4b3f8a Mon Sep 17 00:00:00
> > 2001
> > From: Zhang Rui <rui.zhang@intel.com>
> > Date: Mon, 14 Oct 2024 13:26:55 +0800
> > Subject: [PATCH] x86/acpi: Fix LAPIC/x2APIC parsing order
> > 
> > On some systems, the same CPU (with same APIC ID) is assigned with
> > a
> > different logical CPU id after commit ec9aedb2aa1a ("x86/acpi:
> > Ignore
> > invalid x2APIC entries").
> > 
> > This means Linux enumerates the CPUs in a different order and it is
> > a
> > violation of
> > https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#madt-processor-local-apic-sapic-structure-entry-order
> > ,
> > 
> >   "OSPM should initialize processors in the order that they appear
> > in
> >    the MADT"
> > 
> > The offending commit wants to ignore x2APIC entries with APIC ID <
> > 255
> > when valid LAPIC entries exist, so it parses all LAPIC entries
> > before
> > parsing any x2APIC entries. This breaks the CPU enumeration order
> > for
> > systems that have x2APIC entries listed before LAPIC entries in
> > MADT.
> > 
> > Fix the problem by checking the valid LAPIC entries separately,
> > before
> > parsing any LAPIC/x2APIC entries.
> > 
> > Cc: stable@vger.kernel.org
> > Reported-by: Jim Mattson <jmattson@google.com>
> > Closes:
> > https://lore.kernel.org/all/20241010213136.668672-1-jmattson@google.com/
> > Fixes: ec9aedb2aa1a ("x86/acpi: Ignore invalid x2APIC entries")
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > ---
> >  arch/x86/kernel/acpi/boot.c | 50
> > +++++++++++++++++++++++++++++++++----
> >  1 file changed, 45 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/acpi/boot.c
> > b/arch/x86/kernel/acpi/boot.c
> > index 4efecac49863..c70b86f1f295 100644
> > --- a/arch/x86/kernel/acpi/boot.c
> > +++ b/arch/x86/kernel/acpi/boot.c
> > @@ -226,6 +226,28 @@ acpi_parse_x2apic(union acpi_subtable_headers
> > *header, const unsigned long end)
> >         return 0;
> >  }
> > 
> > +static int __init
> > +acpi_check_lapic(union acpi_subtable_headers *header, const
> > unsigned long end)
> > +{
> > +       struct acpi_madt_local_apic *processor = NULL;
> > +
> > +       processor = (struct acpi_madt_local_apic *)header;
> > +
> > +       if (BAD_MADT_ENTRY(processor, end))
> > +               return -EINVAL;
> > +
> > +       /* Ignore invalid ID */
> > +       if (processor->id == 0xff)
> > +               return 0;
> > +
> > +       /* Ignore processors that can not be onlined */
> > +       if (!acpi_is_processor_usable(processor->lapic_flags))
> > +               return 0;
> > +
> > +       has_lapic_cpus = true;
> > +       return 0;
> > +}
> > +
> >  static int __init
> >  acpi_parse_lapic(union acpi_subtable_headers * header, const
> > unsigned long end)
> >  {
> > @@ -257,7 +279,6 @@ acpi_parse_lapic(union acpi_subtable_headers *
> > header, const unsigned long end)
> >                                processor->processor_id, /* ACPI ID
> > */
> >                                processor->lapic_flags &
> > ACPI_MADT_ENABLED);
> > 
> > -       has_lapic_cpus = true;
> >         return 0;
> >  }
> > 
> > @@ -1029,6 +1050,8 @@ static int __init
> > early_acpi_parse_madt_lapic_addr_ovr(void)
> >  static int __init acpi_parse_madt_lapic_entries(void)
> >  {
> >         int count, x2count = 0;
> > +       struct acpi_subtable_proc madt_proc[2];
> > +       int ret;
> > 
> >         if (!boot_cpu_has(X86_FEATURE_APIC))
> >                 return -ENODEV;
> > @@ -1037,10 +1060,27 @@ static int __init
> > acpi_parse_madt_lapic_entries(void)
> >                                       acpi_parse_sapic,
> > MAX_LOCAL_APIC);
> > 
> >         if (!count) {
> > -               count =
> > acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
> > -                                       acpi_parse_lapic,
> > MAX_LOCAL_APIC);
> > -               x2count =
> > acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
> > -                                       acpi_parse_x2apic,
> > MAX_LOCAL_APIC);
> 
> The point is moot now, but I don't think the previous code did the
> right thing when acpi_table_parse_madt() returned a negative value
> (for errors).

Previous and current code checks for the negative value later after
parsing both LAPIC and x2APIC.
so what is the problem you're referring to?
Do you mean we should error out immediately when parsing LAPIC fails?

> 
> > +               /* Check if there are valid LAPIC entries */
> > +               acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
> > acpi_check_lapic, MAX_LOCAL_APIC);
> 
> Two comments:
> 
> 1) Should we check for a return value < 0 here, or just wait for one
> of the later walks to error out?

I'm okay with both.

> 2) It seems unfortunate to walk the entire table when the first entry
> may give you the answer, but perhaps modern systems have only X2APIC
> entries, so we will typically have to walk the entire table anyway.

yeah. There are systems with invalid LAPIC entries first, and
acpi_parse_entries_array() doesn't support graceful early termination,
so we have to check all the entries.

> 
> Reviewed-and-tested-by: Jim Mattson <jmattson@google.com>

Thanks. I will submit the current version to keep your tags.

-rui

  reply	other threads:[~2024-10-15  3:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-02 16:28 [RFC PATCH] x86/acpi: Ignore invalid x2APIC entries Zhang Rui
2023-07-28 12:51 ` Thomas Gleixner
2023-07-28 12:55   ` Thomas Gleixner
2023-07-28 16:47   ` Zhang, Rui
2023-07-29  7:07     ` Thomas Gleixner
2023-07-31 13:04       ` Zhang, Rui
2024-10-10 21:31 ` Jim Mattson
2024-10-11  1:37   ` Zhang, Rui
2024-10-11  3:05     ` Jim Mattson
2024-10-14 13:05       ` Zhang, Rui
2024-10-14 18:00         ` Jim Mattson
2024-10-15  3:23           ` Zhang, Rui [this message]
2024-10-15 13:26             ` Jim Mattson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac85fe73b3d7f46ce96d5033f8cf58a1b6b001f3.camel@intel.com \
    --to=rui.zhang@intel.com \
    --cc=ajorgens@google.com \
    --cc=bp@alien8.de \
    --cc=feng.tang@intel.com \
    --cc=jay.chen@amd.com \
    --cc=jmattson@google.com \
    --cc=jon.grimm@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=myrade@google.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=tglx@linutronix.de \
    --cc=vladteodor@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox