* Re: [PATCH] : Revert "ACPI: Remove side effect of partly creating a node in acpi_get_node()" [not found] ` <CAJZ5v0hSJExYtxEZuw-+ZUf1YoZesOtS+x9UbdoBNXtTKPiYxg@mail.gmail.com> @ 2022-05-12 10:15 ` Jonathan Cameron 2022-05-12 13:35 ` Jonathan Lemon 0 siblings, 1 reply; 2+ messages in thread From: Jonathan Cameron @ 2022-05-12 10:15 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jonathan Lemon, Rafael J. Wysocki, Hanjun Guo, Barry Song, Len Brown, Jakub Kicinski, ACPI Devel Maling List, kernel-team, linux-pci, Bjorn Helgaas On Wed, 11 May 2022 19:44:14 +0200 "Rafael J. Wysocki" <rafael@kernel.org> wrote: > On Wed, May 11, 2022 at 7:42 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote: > > > > On 11 May 2022, at 10:33, Rafael J. Wysocki wrote: > > > > > On Wed, May 11, 2022 at 7:24 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote: > > >> > > >> This reverts commit a62d07e0006a3a3ce77041ca07f3c488ec880790. > > >> > > >> The change calls pxm_to_node(), which ends up returning -1 > > >> (NUMA_NO_NODE) on some systems for the pci bus, as opposed > > >> to the prior call to acpi_map_pxm_to_node(), which returns 0. > > >> > > >> The default numa node is then inherited by all pci devices, and is > > >> visible in /sys/bus/pci/devices/*/numa_node > > >> > > >> The prior behavior shows: > > >> # cat /sys/bus/pci/devices/*/numa_node | sort | uniq -c > > >> 122 0 > > >> > > >> While the new behavior has: > > >> # cat /sys/bus/pci/devices/*/numa_node | sort | uniq -c > > >> 1 0 Curious, which device is turning up in node 0? > > >> 121 -1 > > >> > > >> While arguably NUMA_NO_NODE is correct on single-socket systems which > > >> have only one numa domain, this breaks scripts that attempt to read the > > >> NIC numa_node and pass that to numactl in order to pin memory allocation > > >> when running applications (like iperf). E.g.: > > >> > > >> # numactl -p -1 iperf3 > > >> libnuma: Warning: node argument -1 is out of range > > >> <-1> is invalid > > >> > > >> Reverting this change restores the prior behavior. > > > > > > Well, that's not a recent commit and it fixed a real and serious issue. > > > > > > Isn't there a way to fix this other than reverting it? > > > > The userspace behavior changed - is there another way to fix things > > so that a valid numa_node is returned? > > Well, that's my question. As Rafael noted, we don't want to change the internal kernel representation because previous kernel behavior resulting in several paths where you could get NULL pointer de-references, but maybe we could special case it at the userspace boundary. e.g. override dev_to_node() return value here https://elixir.bootlin.com/linux/v5.18-rc6/source/drivers/pci/pci-sysfs.c#L358 What's problematic is we missed this being being an issue until now and hence have shipping kernels with both behaviors. +CC Bjorn and linux-pci Jonathan ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] : Revert "ACPI: Remove side effect of partly creating a node in acpi_get_node()" 2022-05-12 10:15 ` [PATCH] : Revert "ACPI: Remove side effect of partly creating a node in acpi_get_node()" Jonathan Cameron @ 2022-05-12 13:35 ` Jonathan Lemon 0 siblings, 0 replies; 2+ messages in thread From: Jonathan Lemon @ 2022-05-12 13:35 UTC (permalink / raw) To: Jonathan Cameron Cc: Rafael J. Wysocki, Rafael J. Wysocki, Hanjun Guo, Barry Song, Len Brown, Jakub Kicinski, ACPI Devel Maling List, kernel-team, linux-pci, Bjorn Helgaas On 12 May 2022, at 3:15, Jonathan Cameron wrote: > On Wed, 11 May 2022 19:44:14 +0200 > "Rafael J. Wysocki" <rafael@kernel.org> wrote: > >> On Wed, May 11, 2022 at 7:42 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote: >>> >>> On 11 May 2022, at 10:33, Rafael J. Wysocki wrote: >>> >>>> On Wed, May 11, 2022 at 7:24 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote: >>>>> >>>>> This reverts commit a62d07e0006a3a3ce77041ca07f3c488ec880790. >>>>> >>>>> The change calls pxm_to_node(), which ends up returning -1 >>>>> (NUMA_NO_NODE) on some systems for the pci bus, as opposed >>>>> to the prior call to acpi_map_pxm_to_node(), which returns 0. >>>>> >>>>> The default numa node is then inherited by all pci devices, and is >>>>> visible in /sys/bus/pci/devices/*/numa_node >>>>> >>>>> The prior behavior shows: >>>>> # cat /sys/bus/pci/devices/*/numa_node | sort | uniq -c >>>>> 122 0 >>>>> >>>>> While the new behavior has: >>>>> # cat /sys/bus/pci/devices/*/numa_node | sort | uniq -c >>>>> 1 0 > > Curious, which device is turning up in node 0? Oddly enough, the NVME drive: 01:00.0 Non-Volatile memory controller: SK hynix PC401 NVMe Solid State Drive 256GB (prog-if 02 [NVM Express]) Subsystem: SK hynix PC401 NVMe Solid State Drive 256GB NUMA node: 0 These are single-socket Skylake DE platforms. >>>>> >>>>> While arguably NUMA_NO_NODE is correct on single-socket systems which >>>>> have only one numa domain, this breaks scripts that attempt to read the >>>>> NIC numa_node and pass that to numactl in order to pin memory allocation >>>>> when running applications (like iperf). E.g.: >>>>> >>>>> # numactl -p -1 iperf3 >>>>> libnuma: Warning: node argument -1 is out of range >>>>> <-1> is invalid >>>>> >>>>> Reverting this change restores the prior behavior. >>>> >>>> Well, that's not a recent commit and it fixed a real and serious issue. >>>> >>>> Isn't there a way to fix this other than reverting it? >>> >>> The userspace behavior changed - is there another way to fix things >>> so that a valid numa_node is returned? >> >> Well, that's my question. This also could be a BIOS issue that wasn’t noticed until the platforms were updated to a newer kernel. — Jonathan > As Rafael noted, we don't want to change the internal kernel representation because > previous kernel behavior resulting in several paths where you could > get NULL pointer de-references, but maybe we could special case > it at the userspace boundary. > > e.g. override dev_to_node() return value here > https://elixir.bootlin.com/linux/v5.18-rc6/source/drivers/pci/pci-sysfs.c#L358 > > What's problematic is we missed this being being an issue until now and hence > have shipping kernels with both behaviors. > > +CC Bjorn and linux-pci > > Jonathan ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-05-12 13:35 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20220511171754.avfrrqg6eihku55s@bsd-mbp.dhcp.thefacebook.com>
[not found] ` <CAJZ5v0jHDNBqCfmgyLUOs7yUZaEjQ96m5HVZKHP3x7_uamH5zQ@mail.gmail.com>
[not found] ` <7A00774E-13F2-4FB4-9979-D7827C92F5B8@gmail.com>
[not found] ` <CAJZ5v0hSJExYtxEZuw-+ZUf1YoZesOtS+x9UbdoBNXtTKPiYxg@mail.gmail.com>
2022-05-12 10:15 ` [PATCH] : Revert "ACPI: Remove side effect of partly creating a node in acpi_get_node()" Jonathan Cameron
2022-05-12 13:35 ` Jonathan Lemon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox