From: Bjorn Helgaas <bhelgaas@google.com>
To: Prarit Bhargava <prarit@redhat.com>
Cc: linux-pci@vger.kernel.org, Myron Stowe <mstowe@redhat.com>,
Alexander Ducyk <alexander.h.duyck@redhat.com>,
Jiang Liu <jiang.liu@linux.intel.com>
Subject: Re: [PATCH V2] pci, add sysfs numa_node write function
Date: Mon, 10 Nov 2014 13:37:33 -0700 [thread overview]
Message-ID: <20141110203733.GA20527@google.com> (raw)
In-Reply-To: <1414088532-24605-1-git-send-email-prarit@redhat.com>
On Thu, Oct 23, 2014 at 02:22:12PM -0400, Prarit Bhargava wrote:
> Some new drivers, such as the Intel QAT driver, drivers/crypto/qat,
> require that a specific node be assigned to the device in order to
> achieve maximum performance for the device, and will fail to load if the
> device has NUMA_NO_NODE. Users can in some cases, with additional
> information provided by vendor support, determine what the correct numa
> node is supposed to be. In the cases a quick hack of the driver results
> in a function QAT device.
>
> In theory, it should be possible to map a PCI device to a PCI root bridge
> to a specific node, however, in practice it is not possible. Nodes may
> have multiple PCI root bridges, may share multiple PCI root bridges, or
> may not have an active root bridge assigned. Hardware manufacturers may
> specifically have designed systems without numa node to PCI root bridge
> mappings. Without assistance from some hardware reporting mechanism
> (SMBIOS, ACPI, etc.) there is no reliable way to determine the numa node
> for a PCI bridge or device. Typically this numa mapping is done via the
> ACPI _PXM values in the ACPI tables, however, there are many systems out
> there that do not populate the ACPI _PXM entries and therefore do not have
> correct PCI device numa_node values.
>
> Hardware vendors are accepting of reported bugs for the ACPI _PXM entries,
> but production fixes are typically seen in 6 months to a year and in
> some past cases, never.
>
> This patch introduces a mechanism to allow a user that knows the correct
> value of the numa node to set it via sysfs. As suggested by Alexander
> and Bjorn, the setting of the value issues a loud FW_BUG message and
> TAINTS notify the user that the issue really is a firmware bug.
>
> To use this, one can do
>
> echo 3 > /sys/devices/pci0000:ff/0000:03:1f.3/numa_node
>
> to set the numa node for PCI device 0000:03:1f.3.
>
> Cc: Myron Stowe <mstowe@redhat.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Alexander Ducyk <alexander.h.duyck@redhat.com>
> Cc: Jiang Liu <jiang.liu@linux.intel.com>
> Cc: linux-pci@vger.kernel.org
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Applied to pci/numa for v3.19, thanks!
>
> [v2]: add warning about broken BIOS, rework message after attempting
> to determine numa node on a wide number of broken systems, add
> Documentation.
> ---
> Documentation/ABI/testing/sysfs-bus-pci | 13 +++++++++++++
> drivers/pci/pci-sysfs.c | 29 ++++++++++++++++++++++++++++-
> 2 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index ee6c040..76007b3 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -281,3 +281,16 @@ Description:
> opt-out of driver binding using a driver_override name such as
> "none". Only a single driver may be specified in the override,
> there is no support for parsing delimiters.
> +
> +What: /sys/bus/pci/devices/.../numa_node
> +Date: Oct 2014
> +Contact: Prarit Bhargava <prarit@redhat.com>
> +Description:
> + This file contains the value of the NUMA node that the PCI
> + device is attached to, or -1 if the device is attached to
> + multiple nodes. The file can be written to to override the
> + value if the user determines that the system's firmware has
> + provided an incorrect value. If this file is written to
> + the user should report a firmware bug to the system vendor.
> + Writing to this file will result in kernel taint of
> + TAINT_FIRMWARE_WORKAROUND.
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 92b6d9a..e5a4664 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -221,12 +221,39 @@ static ssize_t enabled_show(struct device *dev, struct device_attribute *attr,
> static DEVICE_ATTR_RW(enabled);
>
> #ifdef CONFIG_NUMA
> +static ssize_t numa_node_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> + int node, ret;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + ret = kstrtoint(buf, 0, &node);
> + if (ret)
> + return ret;
> +
> + if (!node_online(node))
> + return -EINVAL;
> +
> + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
> + dev_alert(&pdev->dev,
> + FW_BUG "Overriding NUMA node to %d. Contact your vendor for updates.",
> + node);
> +
> + dev->numa_node = node;
> +
> + return count;
> +}
> +
> static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> return sprintf(buf, "%d\n", dev->numa_node);
> }
> -static DEVICE_ATTR_RO(numa_node);
> +static DEVICE_ATTR_RW(numa_node);
> #endif
>
> static ssize_t dma_mask_bits_show(struct device *dev,
> --
> 1.7.9.3
>
next prev parent reply other threads:[~2014-11-10 20:37 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-23 18:22 [PATCH V2] pci, add sysfs numa_node write function Prarit Bhargava
2014-11-10 20:37 ` Bjorn Helgaas [this message]
-- strict thread matches above, loose matches on Subject: below --
2014-10-23 18:20 Prarit Bhargava
2014-10-23 18:21 ` Prarit Bhargava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141110203733.GA20527@google.com \
--to=bhelgaas@google.com \
--cc=alexander.h.duyck@redhat.com \
--cc=jiang.liu@linux.intel.com \
--cc=linux-pci@vger.kernel.org \
--cc=mstowe@redhat.com \
--cc=prarit@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).