public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RESEND] [PATCH] x86/smpboot: avoid warning messages while hot-removing physical cpu
@ 2018-02-08 14:19 Masayoshi Mizuma
  2018-02-13 11:48 ` [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU Ingo Molnar
  2018-02-13 12:13 ` [tip:x86/urgent] " tip-bot for Masayoshi Mizuma
  0 siblings, 2 replies; 4+ messages in thread
From: Masayoshi Mizuma @ 2018-02-08 14:19 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, yasu.isimatu, ak; +Cc: linux-kernel

From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

When a physical cpu is hot-removed, the following warning message
are shown while the uncore device is removing in uncore_pci_remove().

WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
uncore_pci_remove+0xf1/0x110
...
CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
...
Call Trace:
pci_device_remove+0x36/0xb0
device_release_driver_internal+0x145/0x210
pci_stop_bus_device+0x76/0xa0
pci_stop_root_bus+0x44/0x60
acpi_pci_root_remove+0x1f/0x80
acpi_bus_trim+0x54/0x90
acpi_bus_trim+0x2e/0x90
acpi_device_hotplug+0x2bc/0x4b0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x141/0x340
worker_thread+0x47/0x3e0
kthread+0xf5/0x130

When uncore_pci_remove() runs, it tries to get package id to
clear the value of uncore_extra_pci_dev[].dev[] by using
topology_phys_to_logical_pkg(). The warning messesage are
shown because topology_phys_to_logical_pkg() returns -1.

  arch/x86/events/intel/uncore.c:
  static void uncore_pci_remove(struct pci_dev *pdev)
  {
  ...
          phys_id = uncore_pcibus_to_physid(pdev->bus);
  ...
                  pkg = topology_phys_to_logical_pkg(phys_id); //returns -1
                  for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
                          if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
                                  uncore_extra_pci_dev[pkg].dev[i] = NULL;
                                  break;
                          }
                  }
                  WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); //HERE!!

topology_phys_to_logical_pkg() tries to find
cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.

  arch/x86/kernel/smpboot.c:
  int topology_phys_to_logical_pkg(unsigned int phys_pkg)
  {
          int cpu;
  
          for_each_possible_cpu(cpu) {
                  struct cpuinfo_x86 *c = &cpu_data(cpu);
  
                  if (c->initialized && c->phys_proc_id == phys_pkg)
                          return c->logical_proc_id;
          }
          return -1;
  }

However, the phys_proc_id is already set to 0 by remove_siblinginfo()
when the cpu was offlined.
So, topology_phys_to_logical_pkg() cannot find correct the
logical_proc_id and always returns -1.
As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
messages are shown.

To avoid this, remove the setting from remove_siblinginfo().
There is no influence about the removing because phys_proc_id is not
used after it is hot-removed and it is re-set while hot-adding.

Fixes: 30bb9811856f ("x86/topology: Avoid wasting 128k for package id array")
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
 
---
 arch/x86/kernel/smpboot.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ed556d5..844279c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1431,7 +1431,6 @@ static void remove_siblinginfo(int cpu)
 	cpumask_clear(cpu_llc_shared_mask(cpu));
 	cpumask_clear(topology_sibling_cpumask(cpu));
 	cpumask_clear(topology_core_cpumask(cpu));
-	c->phys_proc_id = 0;
 	c->cpu_core_id = 0;
 	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
 	recompute_smt_state();
-- 
1.8.3.1

- Masayoshi

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
  2018-02-08 14:19 [RESEND] [PATCH] x86/smpboot: avoid warning messages while hot-removing physical cpu Masayoshi Mizuma
@ 2018-02-13 11:48 ` Ingo Molnar
  2018-02-13 16:05   ` Masayoshi Mizuma
  2018-02-13 12:13 ` [tip:x86/urgent] " tip-bot for Masayoshi Mizuma
  1 sibling, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2018-02-13 11:48 UTC (permalink / raw)
  To: Masayoshi Mizuma; +Cc: tglx, mingo, hpa, x86, yasu.isimatu, ak, linux-kernel


* Masayoshi Mizuma <msys.mizuma@gmail.com> wrote:

> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> 
> When a physical cpu is hot-removed, the following warning message
> are shown while the uncore device is removing in uncore_pci_remove().
> 
> WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
> uncore_pci_remove+0xf1/0x110
> ...
> CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> ...
> Call Trace:
> pci_device_remove+0x36/0xb0
> device_release_driver_internal+0x145/0x210
> pci_stop_bus_device+0x76/0xa0
> pci_stop_root_bus+0x44/0x60
> acpi_pci_root_remove+0x1f/0x80
> acpi_bus_trim+0x54/0x90
> acpi_bus_trim+0x2e/0x90
> acpi_device_hotplug+0x2bc/0x4b0
> acpi_hotplug_work_fn+0x1a/0x30
> process_one_work+0x141/0x340
> worker_thread+0x47/0x3e0
> kthread+0xf5/0x130
> 
> When uncore_pci_remove() runs, it tries to get package id to
> clear the value of uncore_extra_pci_dev[].dev[] by using
> topology_phys_to_logical_pkg(). The warning messesage are
> shown because topology_phys_to_logical_pkg() returns -1.
> 
>   arch/x86/events/intel/uncore.c:
>   static void uncore_pci_remove(struct pci_dev *pdev)
>   {
>   ...
>           phys_id = uncore_pcibus_to_physid(pdev->bus);
>   ...
>                   pkg = topology_phys_to_logical_pkg(phys_id); //returns -1
>                   for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
>                           if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
>                                   uncore_extra_pci_dev[pkg].dev[i] = NULL;
>                                   break;
>                           }
>                   }
>                   WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); //HERE!!
> 
> topology_phys_to_logical_pkg() tries to find
> cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
> 
>   arch/x86/kernel/smpboot.c:
>   int topology_phys_to_logical_pkg(unsigned int phys_pkg)
>   {
>           int cpu;
>   
>           for_each_possible_cpu(cpu) {
>                   struct cpuinfo_x86 *c = &cpu_data(cpu);
>   
>                   if (c->initialized && c->phys_proc_id == phys_pkg)
>                           return c->logical_proc_id;
>           }
>           return -1;
>   }
> 
> However, the phys_proc_id is already set to 0 by remove_siblinginfo()
> when the cpu was offlined.
> So, topology_phys_to_logical_pkg() cannot find correct the
> logical_proc_id and always returns -1.
> As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
> messages are shown.
> 
> To avoid this, remove the setting from remove_siblinginfo().
> There is no influence about the removing because phys_proc_id is not
> used after it is hot-removed and it is re-set while hot-adding.

So I think this fix goes beyond fixing a 'warning', if we get -1 for 'pkg':

>                   pkg = topology_phys_to_logical_pkg(phys_id); //returns -1
>                   for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
>                           if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
>                                   uncore_extra_pci_dev[pkg].dev[i] = NULL;

... then that creates two _real_ bugs AFAICS:

 1) we dereference uncore_extra_pci_dev[] with a negative index

 2) we fail to clean up a stale pointer in uncore_extra_pci_dev[][]

So I've rewritten your changelog accordingly - see the attached patch.

I have also added a Cc: stable tag.

Thanks,

	Ingo

===================>
>From 295cc7eb314eb3321fb6d67ca6f7305f5c50d10f Mon Sep 17 00:00:00 2001
From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Date: Thu, 8 Feb 2018 09:19:08 -0500
Subject: [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU

When a physical CPU is hot-removed, the following warning messages
are shown while the uncore device is removed in uncore_pci_remove():

  WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
  uncore_pci_remove+0xf1/0x110
  ...
  CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  ...
  Call Trace:
  pci_device_remove+0x36/0xb0
  device_release_driver_internal+0x145/0x210
  pci_stop_bus_device+0x76/0xa0
  pci_stop_root_bus+0x44/0x60
  acpi_pci_root_remove+0x1f/0x80
  acpi_bus_trim+0x54/0x90
  acpi_bus_trim+0x2e/0x90
  acpi_device_hotplug+0x2bc/0x4b0
  acpi_hotplug_work_fn+0x1a/0x30
  process_one_work+0x141/0x340
  worker_thread+0x47/0x3e0
  kthread+0xf5/0x130

When uncore_pci_remove() runs, it tries to get the package ID to
clear the value of uncore_extra_pci_dev[].dev[] by using
topology_phys_to_logical_pkg(). The warning messesages are
shown because topology_phys_to_logical_pkg() returns -1.

  arch/x86/events/intel/uncore.c:
  static void uncore_pci_remove(struct pci_dev *pdev)
  {
  ...
          phys_id = uncore_pcibus_to_physid(pdev->bus);
  ...
                  pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
                  for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
                          if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
                                  uncore_extra_pci_dev[pkg].dev[i] = NULL;
                                  break;
                          }
                  }
                  WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!

topology_phys_to_logical_pkg() tries to find
cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.

  arch/x86/kernel/smpboot.c:
  int topology_phys_to_logical_pkg(unsigned int phys_pkg)
  {
          int cpu;

          for_each_possible_cpu(cpu) {
                  struct cpuinfo_x86 *c = &cpu_data(cpu);

                  if (c->initialized && c->phys_proc_id == phys_pkg)
                          return c->logical_proc_id;
          }
          return -1;
  }

However, the phys_proc_id was already set to 0 by remove_siblinginfo()
when the CPU was offlined.

So, topology_phys_to_logical_pkg() cannot find the correct
logical_proc_id and always returns -1.

As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
messages are shown.

What is worse is that the bogus 'pkg' index results in two bugs:

 - We dereference uncore_extra_pci_dev[] with a negative index
 - We fail to clean up a stale pointer in uncore_extra_pci_dev[][]

To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().

This should not cause any problems, because ->phys_proc_id is not
used after it is hot-removed and it is re-set while hot-adding.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yasu.isimatu@gmail.com
Cc: <stable@vger.kernel.org>
Fixes: 30bb9811856f ("x86/topology: Avoid wasting 128k for package id array")
Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/smpboot.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6f27facbaa9b..cfc61e1d45e2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1430,7 +1430,6 @@ static void remove_siblinginfo(int cpu)
 	cpumask_clear(cpu_llc_shared_mask(cpu));
 	cpumask_clear(topology_sibling_cpumask(cpu));
 	cpumask_clear(topology_core_cpumask(cpu));
-	c->phys_proc_id = 0;
 	c->cpu_core_id = 0;
 	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
 	recompute_smt_state();

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [tip:x86/urgent] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
  2018-02-08 14:19 [RESEND] [PATCH] x86/smpboot: avoid warning messages while hot-removing physical cpu Masayoshi Mizuma
  2018-02-13 11:48 ` [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU Ingo Molnar
@ 2018-02-13 12:13 ` tip-bot for Masayoshi Mizuma
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Masayoshi Mizuma @ 2018-02-13 12:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tglx, mingo, hpa, stable, peterz, m.mizuma,
	torvalds

Commit-ID:  295cc7eb314eb3321fb6d67ca6f7305f5c50d10f
Gitweb:     https://git.kernel.org/tip/295cc7eb314eb3321fb6d67ca6f7305f5c50d10f
Author:     Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
AuthorDate: Thu, 8 Feb 2018 09:19:08 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 13 Feb 2018 12:47:28 +0100

x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU

When a physical CPU is hot-removed, the following warning messages
are shown while the uncore device is removed in uncore_pci_remove():

  WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
  uncore_pci_remove+0xf1/0x110
  ...
  CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  ...
  Call Trace:
  pci_device_remove+0x36/0xb0
  device_release_driver_internal+0x145/0x210
  pci_stop_bus_device+0x76/0xa0
  pci_stop_root_bus+0x44/0x60
  acpi_pci_root_remove+0x1f/0x80
  acpi_bus_trim+0x54/0x90
  acpi_bus_trim+0x2e/0x90
  acpi_device_hotplug+0x2bc/0x4b0
  acpi_hotplug_work_fn+0x1a/0x30
  process_one_work+0x141/0x340
  worker_thread+0x47/0x3e0
  kthread+0xf5/0x130

When uncore_pci_remove() runs, it tries to get the package ID to
clear the value of uncore_extra_pci_dev[].dev[] by using
topology_phys_to_logical_pkg(). The warning messesages are
shown because topology_phys_to_logical_pkg() returns -1.

  arch/x86/events/intel/uncore.c:
  static void uncore_pci_remove(struct pci_dev *pdev)
  {
  ...
          phys_id = uncore_pcibus_to_physid(pdev->bus);
  ...
                  pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
                  for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
                          if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
                                  uncore_extra_pci_dev[pkg].dev[i] = NULL;
                                  break;
                          }
                  }
                  WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!

topology_phys_to_logical_pkg() tries to find
cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.

  arch/x86/kernel/smpboot.c:
  int topology_phys_to_logical_pkg(unsigned int phys_pkg)
  {
          int cpu;

          for_each_possible_cpu(cpu) {
                  struct cpuinfo_x86 *c = &cpu_data(cpu);

                  if (c->initialized && c->phys_proc_id == phys_pkg)
                          return c->logical_proc_id;
          }
          return -1;
  }

However, the phys_proc_id was already set to 0 by remove_siblinginfo()
when the CPU was offlined.

So, topology_phys_to_logical_pkg() cannot find the correct
logical_proc_id and always returns -1.

As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
messages are shown.

What is worse is that the bogus 'pkg' index results in two bugs:

 - We dereference uncore_extra_pci_dev[] with a negative index
 - We fail to clean up a stale pointer in uncore_extra_pci_dev[][]

To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().

This should not cause any problems, because ->phys_proc_id is not
used after it is hot-removed and it is re-set while hot-adding.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yasu.isimatu@gmail.com
Cc: <stable@vger.kernel.org>
Fixes: 30bb9811856f ("x86/topology: Avoid wasting 128k for package id array")
Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/smpboot.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6f27fac..cfc61e1 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1430,7 +1430,6 @@ static void remove_siblinginfo(int cpu)
 	cpumask_clear(cpu_llc_shared_mask(cpu));
 	cpumask_clear(topology_sibling_cpumask(cpu));
 	cpumask_clear(topology_core_cpumask(cpu));
-	c->phys_proc_id = 0;
 	c->cpu_core_id = 0;
 	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
 	recompute_smt_state();

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
  2018-02-13 11:48 ` [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU Ingo Molnar
@ 2018-02-13 16:05   ` Masayoshi Mizuma
  0 siblings, 0 replies; 4+ messages in thread
From: Masayoshi Mizuma @ 2018-02-13 16:05 UTC (permalink / raw)
  To: mingo; +Cc: tglx, mingo, hpa, x86, yasu.isimatu, ak, linux-kernel

Hello Ingo,

> So I've rewritten your changelog accordingly - see the attached patch.
> 
> I have also added a Cc: stable tag.

Thanks a lot!

- Masayoshi

Tue, 13 Feb 2018 12:48:41 +0100 Ingo Molnar wrote:
> 
> * Masayoshi Mizuma <msys.mizuma@gmail.com> wrote:
> 
>> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
>>
>> When a physical cpu is hot-removed, the following warning message
>> are shown while the uncore device is removing in uncore_pci_remove().
>>
>> WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
>> uncore_pci_remove+0xf1/0x110
>> ...
>> CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
>> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>> ...
>> Call Trace:
>> pci_device_remove+0x36/0xb0
>> device_release_driver_internal+0x145/0x210
>> pci_stop_bus_device+0x76/0xa0
>> pci_stop_root_bus+0x44/0x60
>> acpi_pci_root_remove+0x1f/0x80
>> acpi_bus_trim+0x54/0x90
>> acpi_bus_trim+0x2e/0x90
>> acpi_device_hotplug+0x2bc/0x4b0
>> acpi_hotplug_work_fn+0x1a/0x30
>> process_one_work+0x141/0x340
>> worker_thread+0x47/0x3e0
>> kthread+0xf5/0x130
>>
>> When uncore_pci_remove() runs, it tries to get package id to
>> clear the value of uncore_extra_pci_dev[].dev[] by using
>> topology_phys_to_logical_pkg(). The warning messesage are
>> shown because topology_phys_to_logical_pkg() returns -1.
>>
>>   arch/x86/events/intel/uncore.c:
>>   static void uncore_pci_remove(struct pci_dev *pdev)
>>   {
>>   ...
>>           phys_id = uncore_pcibus_to_physid(pdev->bus);
>>   ...
>>                   pkg = topology_phys_to_logical_pkg(phys_id); //returns -1
>>                   for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
>>                           if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
>>                                   uncore_extra_pci_dev[pkg].dev[i] = NULL;
>>                                   break;
>>                           }
>>                   }
>>                   WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); //HERE!!
>>
>> topology_phys_to_logical_pkg() tries to find
>> cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
>>
>>   arch/x86/kernel/smpboot.c:
>>   int topology_phys_to_logical_pkg(unsigned int phys_pkg)
>>   {
>>           int cpu;
>>   
>>           for_each_possible_cpu(cpu) {
>>                   struct cpuinfo_x86 *c = &cpu_data(cpu);
>>   
>>                   if (c->initialized && c->phys_proc_id == phys_pkg)
>>                           return c->logical_proc_id;
>>           }
>>           return -1;
>>   }
>>
>> However, the phys_proc_id is already set to 0 by remove_siblinginfo()
>> when the cpu was offlined.
>> So, topology_phys_to_logical_pkg() cannot find correct the
>> logical_proc_id and always returns -1.
>> As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
>> messages are shown.
>>
>> To avoid this, remove the setting from remove_siblinginfo().
>> There is no influence about the removing because phys_proc_id is not
>> used after it is hot-removed and it is re-set while hot-adding.
> 
> So I think this fix goes beyond fixing a 'warning', if we get -1 for 'pkg':
> 
>>                   pkg = topology_phys_to_logical_pkg(phys_id); //returns -1
>>                   for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
>>                           if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
>>                                   uncore_extra_pci_dev[pkg].dev[i] = NULL;
> 
> ... then that creates two _real_ bugs AFAICS:
> 
>  1) we dereference uncore_extra_pci_dev[] with a negative index
> 
>  2) we fail to clean up a stale pointer in uncore_extra_pci_dev[][]
> 
> So I've rewritten your changelog accordingly - see the attached patch.
> 
> I have also added a Cc: stable tag
> 
> Thanks,
> 
> 	Ingo
> 
> ===================>
> From 295cc7eb314eb3321fb6d67ca6f7305f5c50d10f Mon Sep 17 00:00:00 2001
> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> Date: Thu, 8 Feb 2018 09:19:08 -0500
> Subject: [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
> 
> When a physical CPU is hot-removed, the following warning messages
> are shown while the uncore device is removed in uncore_pci_remove():
> 
>   WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
>   uncore_pci_remove+0xf1/0x110
>   ...
>   CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
>   Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>   ...
>   Call Trace:
>   pci_device_remove+0x36/0xb0
>   device_release_driver_internal+0x145/0x210
>   pci_stop_bus_device+0x76/0xa0
>   pci_stop_root_bus+0x44/0x60
>   acpi_pci_root_remove+0x1f/0x80
>   acpi_bus_trim+0x54/0x90
>   acpi_bus_trim+0x2e/0x90
>   acpi_device_hotplug+0x2bc/0x4b0
>   acpi_hotplug_work_fn+0x1a/0x30
>   process_one_work+0x141/0x340
>   worker_thread+0x47/0x3e0
>   kthread+0xf5/0x130
> 
> When uncore_pci_remove() runs, it tries to get the package ID to
> clear the value of uncore_extra_pci_dev[].dev[] by using
> topology_phys_to_logical_pkg(). The warning messesages are
> shown because topology_phys_to_logical_pkg() returns -1.
> 
>   arch/x86/events/intel/uncore.c:
>   static void uncore_pci_remove(struct pci_dev *pdev)
>   {
>   ...
>           phys_id = uncore_pcibus_to_physid(pdev->bus);
>   ...
>                   pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
>                   for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
>                           if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
>                                   uncore_extra_pci_dev[pkg].dev[i] = NULL;
>                                   break;
>                           }
>                   }
>                   WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!
> 
> topology_phys_to_logical_pkg() tries to find
> cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
> 
>   arch/x86/kernel/smpboot.c:
>   int topology_phys_to_logical_pkg(unsigned int phys_pkg)
>   {
>           int cpu;
> 
>           for_each_possible_cpu(cpu) {
>                   struct cpuinfo_x86 *c = &cpu_data(cpu);
> 
>                   if (c->initialized && c->phys_proc_id == phys_pkg)
>                           return c->logical_proc_id;
>           }
>           return -1;
>   }
> 
> However, the phys_proc_id was already set to 0 by remove_siblinginfo()
> when the CPU was offlined.
> 
> So, topology_phys_to_logical_pkg() cannot find the correct
> logical_proc_id and always returns -1.
> 
> As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
> messages are shown.
> 
> What is worse is that the bogus 'pkg' index results in two bugs:
> 
>  - We dereference uncore_extra_pci_dev[] with a negative index
>  - We fail to clean up a stale pointer in uncore_extra_pci_dev[][]
> 
> To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().
> 
> This should not cause any problems, because ->phys_proc_id is not
> used after it is hot-removed and it is re-set while hot-adding.
> 
> Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> Acked-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: yasu.isimatu@gmail.com
> Cc: <stable@vger.kernel.org>
> Fixes: 30bb9811856f ("x86/topology: Avoid wasting 128k for package id array")
> Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/smpboot.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 6f27facbaa9b..cfc61e1d45e2 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1430,7 +1430,6 @@ static void remove_siblinginfo(int cpu)
>  	cpumask_clear(cpu_llc_shared_mask(cpu));
>  	cpumask_clear(topology_sibling_cpumask(cpu));
>  	cpumask_clear(topology_core_cpumask(cpu));
> -	c->phys_proc_id = 0;
>  	c->cpu_core_id = 0;
>  	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
>  	recompute_smt_state();
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-02-13 16:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-08 14:19 [RESEND] [PATCH] x86/smpboot: avoid warning messages while hot-removing physical cpu Masayoshi Mizuma
2018-02-13 11:48 ` [PATCH] x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU Ingo Molnar
2018-02-13 16:05   ` Masayoshi Mizuma
2018-02-13 12:13 ` [tip:x86/urgent] " tip-bot for Masayoshi Mizuma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox