* [PATCH V2] PCI: Extend ACS configurability
@ 2024-05-21 11:09 Vidya Sagar
2024-05-21 15:44 ` kernel test robot
2024-05-23 6:35 ` [PATCH V3] " Vidya Sagar
0 siblings, 2 replies; 28+ messages in thread
From: Vidya Sagar @ 2024-05-21 11:09 UTC (permalink / raw)
To: corbet, bhelgaas, galshalom, leonro, jgg, treding, jonathanh
Cc: mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, vidyas, sagar.tv
For iommu_groups to form correctly, the ACS settings in the PCIe fabric
need to be setup early in the boot process, either via the BIOS or via
the kernel disable_acs_redir parameter.
disable_acs_redir allows clearing the RR|CR|EC ACS flags, but the PCIe
spec Rev3.0 already defines 7 different ACS related flags with many more
useful combinations depending on the fabric design.
For backward compatibility, leave the 'disable_acs_redir' as is and add
a new parameter 'config_acs'so that the user can directly specify the ACS
flags to set on a per-device basis. Use a similar syntax to the existing
'resource_alignment' parameter by using the @ character and have the user
specify the ACS flags using a bit encoding. If both 'disable_acs_redir' and
'config_acs' are specified for a particular device, configuration specified
through 'config_acs' takes precedence over the other.
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
v2:
* Refactored the code as per Jason's suggestion
.../admin-guide/kernel-parameters.txt | 22 +++
drivers/pci/pci.c | 147 +++++++++++-------
2 files changed, 111 insertions(+), 58 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 41644336e..b4a8207eb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4456,6 +4456,28 @@
bridges without forcing it upstream. Note:
this removes isolation between devices and
may put more devices in an IOMMU group.
+ config_acs=
+ Format:
+ =<ACS flags>@<pci_dev>[; ...]
+ Specify one or more PCI devices (in the format
+ specified above) optionally prepended with flags
+ and separated by semicolons. The respective
+ capabilities will be enabled, disabled or unchanged
+ based on what is specified in flags.
+ ACS Flags is defined as follows
+ bit-0 : ACS Source Validation
+ bit-1 : ACS Translation Blocking
+ bit-2 : ACS P2P Request Redirect
+ bit-3 : ACS P2P Completion Redirect
+ bit-4 : ACS Upstream Forwarding
+ bit-5 : ACS P2P Egress Control
+ bit-6 : ACS Direct Translated P2P
+ Each bit can be marked as
+ ‘0‘ – force disabled
+ ‘1’ – force enabled
+ ‘x’ – unchanged.
+ Note: this may remove isolation between devices
+ and may put more devices in an IOMMU group.
force_floating [S390] Force usage of floating interrupts.
nomio [S390] Do not use MIO instructions.
norid [S390] ignore the RID field and force use of
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a607f277c..862df97f1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -887,30 +887,67 @@ void pci_request_acs(void)
}
static const char *disable_acs_redir_param;
+static const char *config_acs_param;
-/**
- * pci_disable_acs_redir - disable ACS redirect capabilities
- * @dev: the PCI device
- *
- * For only devices specified in the disable_acs_redir parameter.
- */
-static void pci_disable_acs_redir(struct pci_dev *dev)
+struct pci_acs {
+ u16 cap;
+ u16 ctrl;
+ u16 fw_ctrl;
+};
+
+static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
+ const char *p, u16 mask, u16 flags)
{
+ char *delimit;
int ret = 0;
- const char *p;
- int pos;
- u16 ctrl;
- if (!disable_acs_redir_param)
+ if (!p)
return;
- p = disable_acs_redir_param;
while (*p) {
+ if (!mask) {
+ /* Check for ACS flags */
+ delimit = strstr(p, "@");
+ if (delimit) {
+ int end;
+ u32 shift = 0;
+
+ end = delimit - p - 1;
+
+ while (end > -1) {
+ if (*(p + end) == '0') {
+ mask |= 1 << shift;
+ shift++;
+ end--;
+ } else if (*(p + end) == '1') {
+ mask |= 1 << shift;
+ flags |= 1 << shift;
+ shift++;
+ end--;
+ } else if ((*(p + end) == 'x') || (*(p + end) == 'X')) {
+ shift++;
+ end--;
+ } else {
+ pci_err(dev, "Invalid ACS flags... Ignoring\n");
+ return;
+ }
+ }
+ p = delimit + 1;
+ } else {
+ pci_err(dev, "ACS Flags missing\n");
+ return;
+ }
+ }
+
+ if (mask & ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
+ PCI_ACS_UF | PCI_ACS_EC | PCI_ACS_DT)) {
+ pci_err(dev, "Invalid ACS flags specified\n");
+ return;
+ }
+
ret = pci_dev_str_match(dev, p, &p);
if (ret < 0) {
- pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
- disable_acs_redir_param);
-
+ pr_info_once("PCI: Can't parse acs command line parameter\n");
break;
} else if (ret == 1) {
/* Found a match */
@@ -930,56 +967,37 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
if (!pci_dev_specific_disable_acs_redir(dev))
return;
- pos = dev->acs_cap;
- if (!pos) {
- pci_warn(dev, "cannot disable ACS redirect for this hardware as it does not have ACS capabilities\n");
- return;
- }
-
- pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+ pci_dbg(dev, "ACS mask = 0x%X\n", mask);
+ pci_dbg(dev, "ACS flags = 0x%X\n", flags);
- /* P2P Request & Completion Redirect */
- ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
+ /* If mask is 0 then we copy the bit from the firmware setting. */
+ caps->ctrl = (caps->ctrl & ~mask) | (caps->fw_ctrl & mask);
+ caps->ctrl |= flags;
- pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
-
- pci_info(dev, "disabled ACS redirect\n");
+ pci_info(dev, "Configured ACS to 0x%x\n", caps->ctrl);
}
/**
* pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
* @dev: the PCI device
*/
-static void pci_std_enable_acs(struct pci_dev *dev)
+static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
{
- int pos;
- u16 cap;
- u16 ctrl;
-
- pos = dev->acs_cap;
- if (!pos)
- return;
-
- pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
- pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
-
/* Source Validation */
- ctrl |= (cap & PCI_ACS_SV);
+ caps->ctrl |= (caps->cap & PCI_ACS_SV);
/* P2P Request Redirect */
- ctrl |= (cap & PCI_ACS_RR);
+ caps->ctrl |= (caps->cap & PCI_ACS_RR);
/* P2P Completion Redirect */
- ctrl |= (cap & PCI_ACS_CR);
+ caps->ctrl |= (caps->cap & PCI_ACS_CR);
/* Upstream Forwarding */
- ctrl |= (cap & PCI_ACS_UF);
+ caps->ctrl |= (caps->cap & PCI_ACS_UF);
/* Enable Translation Blocking for external devices and noats */
if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
- ctrl |= (cap & PCI_ACS_TB);
-
- pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+ caps->ctrl |= (caps->cap & PCI_ACS_TB);
}
/**
@@ -988,23 +1006,33 @@ static void pci_std_enable_acs(struct pci_dev *dev)
*/
static void pci_enable_acs(struct pci_dev *dev)
{
- if (!pci_acs_enable)
- goto disable_acs_redir;
+ struct pci_acs caps;
+ int pos;
+
+ pos = dev->acs_cap;
+ if (!pos)
+ return;
- if (!pci_dev_specific_enable_acs(dev))
- goto disable_acs_redir;
+ pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
+ pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
+ caps.fw_ctrl = caps.ctrl;
- pci_std_enable_acs(dev);
+ /* If an iommu is present we start with kernel default caps */
+ if (pci_acs_enable) {
+ if (pci_dev_specific_enable_acs(dev))
+ pci_std_enable_acs(dev, &caps);
+ }
-disable_acs_redir:
/*
- * Note: pci_disable_acs_redir() must be called even if ACS was not
- * enabled by the kernel because it may have been enabled by
- * platform firmware. So if we are told to disable it, we should
- * always disable it after setting the kernel's default
- * preferences.
+ * Always apply caps from the command line, even if there is no iommu.
+ * Trust that the admin has a reason to change the ACS settings.
*/
- pci_disable_acs_redir(dev);
+ __pci_config_acs(dev, &caps, disable_acs_redir_param,
+ PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC,
+ ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
+ __pci_config_acs(dev, &caps, config_acs_param, 0, 0);
+
+ pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
}
/**
@@ -7023,6 +7051,8 @@ static int __init pci_setup(char *str)
pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
} else if (!strncmp(str, "disable_acs_redir=", 18)) {
disable_acs_redir_param = str + 18;
+ } else if (!strncmp(str, "config_acs=", 11)) {
+ config_acs_param = str + 11;
} else {
pr_err("PCI: Unknown option `%s'\n", str);
}
@@ -7047,6 +7077,7 @@ static int __init pci_realloc_setup_params(void)
resource_alignment_param = kstrdup(resource_alignment_param,
GFP_KERNEL);
disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
+ config_acs_param = kstrdup(config_acs_param, GFP_KERNEL);
return 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH V2] PCI: Extend ACS configurability
2024-05-21 11:09 [PATCH V2] PCI: Extend ACS configurability Vidya Sagar
@ 2024-05-21 15:44 ` kernel test robot
2024-05-23 6:35 ` [PATCH V3] " Vidya Sagar
1 sibling, 0 replies; 28+ messages in thread
From: kernel test robot @ 2024-05-21 15:44 UTC (permalink / raw)
To: Vidya Sagar, corbet, bhelgaas, galshalom, leonro, jgg, treding,
jonathanh
Cc: oe-kbuild-all, mmoshrefjava, shahafs, vsethi, sdonthineni, jan,
tdave, linux-doc, linux-pci, linux-kernel, kthota, mmaddireddy,
vidyas, sagar.tv
Hi Vidya,
kernel test robot noticed the following build warnings:
[auto build test WARNING on pci/next]
[also build test WARNING on pci/for-linus linus/master v6.9 next-20240521]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Vidya-Sagar/PCI-Extend-ACS-configurability/20240521-191317
base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link: https://lore.kernel.org/r/20240521110925.3876786-1-vidyas%40nvidia.com
patch subject: [PATCH V2] PCI: Extend ACS configurability
config: parisc-defconfig (https://download.01.org/0day-ci/archive/20240521/202405212300.S6fsze09-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240521/202405212300.S6fsze09-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405212300.S6fsze09-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/pci/pci.c:1044: warning: Function parameter or struct member 'caps' not described in 'pci_std_enable_acs'
vim +1044 drivers/pci/pci.c
cbe420361f92a31 Rajat Jain 2020-07-07 1038
cbe420361f92a31 Rajat Jain 2020-07-07 1039 /**
cbe420361f92a31 Rajat Jain 2020-07-07 1040 * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
cbe420361f92a31 Rajat Jain 2020-07-07 1041 * @dev: the PCI device
cbe420361f92a31 Rajat Jain 2020-07-07 1042 */
a0bcc944f0e307a Vidya Sagar 2024-05-21 1043 static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
cbe420361f92a31 Rajat Jain 2020-07-07 @1044 {
cbe420361f92a31 Rajat Jain 2020-07-07 1045 /* Source Validation */
a0bcc944f0e307a Vidya Sagar 2024-05-21 1046 caps->ctrl |= (caps->cap & PCI_ACS_SV);
cbe420361f92a31 Rajat Jain 2020-07-07 1047
cbe420361f92a31 Rajat Jain 2020-07-07 1048 /* P2P Request Redirect */
a0bcc944f0e307a Vidya Sagar 2024-05-21 1049 caps->ctrl |= (caps->cap & PCI_ACS_RR);
cbe420361f92a31 Rajat Jain 2020-07-07 1050
cbe420361f92a31 Rajat Jain 2020-07-07 1051 /* P2P Completion Redirect */
a0bcc944f0e307a Vidya Sagar 2024-05-21 1052 caps->ctrl |= (caps->cap & PCI_ACS_CR);
cbe420361f92a31 Rajat Jain 2020-07-07 1053
cbe420361f92a31 Rajat Jain 2020-07-07 1054 /* Upstream Forwarding */
a0bcc944f0e307a Vidya Sagar 2024-05-21 1055 caps->ctrl |= (caps->cap & PCI_ACS_UF);
cbe420361f92a31 Rajat Jain 2020-07-07 1056
7cae7849fccee81 Alex Williamson 2021-06-18 1057 /* Enable Translation Blocking for external devices and noats */
7cae7849fccee81 Alex Williamson 2021-06-18 1058 if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
a0bcc944f0e307a Vidya Sagar 2024-05-21 1059 caps->ctrl |= (caps->cap & PCI_ACS_TB);
cbe420361f92a31 Rajat Jain 2020-07-07 1060 }
cbe420361f92a31 Rajat Jain 2020-07-07 1061
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH V3] PCI: Extend ACS configurability
2024-05-21 11:09 [PATCH V2] PCI: Extend ACS configurability Vidya Sagar
2024-05-21 15:44 ` kernel test robot
@ 2024-05-23 6:35 ` Vidya Sagar
2024-05-23 14:59 ` Bjorn Helgaas
` (2 more replies)
1 sibling, 3 replies; 28+ messages in thread
From: Vidya Sagar @ 2024-05-23 6:35 UTC (permalink / raw)
To: corbet, bhelgaas, galshalom, leonro, jgg, treding, jonathanh
Cc: mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, vidyas, sagar.tv
For iommu_groups to form correctly, the ACS settings in the PCIe fabric
need to be setup early in the boot process, either via the BIOS or via
the kernel disable_acs_redir parameter.
disable_acs_redir allows clearing the RR|CR|EC ACS flags, but the PCIe
spec Rev3.0 already defines 7 different ACS related flags with many more
useful combinations depending on the fabric design.
For backward compatibility, leave the 'disable_acs_redir' as is and add
a new parameter 'config_acs'so that the user can directly specify the ACS
flags to set on a per-device basis. Use a similar syntax to the existing
'resource_alignment' parameter by using the @ character and have the user
specify the ACS flags using a bit encoding. If both 'disable_acs_redir' and
'config_acs' are specified for a particular device, configuration specified
through 'config_acs' takes precedence over the other.
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
v3:
* Fixed a documentation issue reported by kernel test bot
v2:
* Refactored the code as per Jason's suggestion
.../admin-guide/kernel-parameters.txt | 22 +++
drivers/pci/pci.c | 148 +++++++++++-------
2 files changed, 112 insertions(+), 58 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 41644336e..b4a8207eb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4456,6 +4456,28 @@
bridges without forcing it upstream. Note:
this removes isolation between devices and
may put more devices in an IOMMU group.
+ config_acs=
+ Format:
+ =<ACS flags>@<pci_dev>[; ...]
+ Specify one or more PCI devices (in the format
+ specified above) optionally prepended with flags
+ and separated by semicolons. The respective
+ capabilities will be enabled, disabled or unchanged
+ based on what is specified in flags.
+ ACS Flags is defined as follows
+ bit-0 : ACS Source Validation
+ bit-1 : ACS Translation Blocking
+ bit-2 : ACS P2P Request Redirect
+ bit-3 : ACS P2P Completion Redirect
+ bit-4 : ACS Upstream Forwarding
+ bit-5 : ACS P2P Egress Control
+ bit-6 : ACS Direct Translated P2P
+ Each bit can be marked as
+ ‘0‘ – force disabled
+ ‘1’ – force enabled
+ ‘x’ – unchanged.
+ Note: this may remove isolation between devices
+ and may put more devices in an IOMMU group.
force_floating [S390] Force usage of floating interrupts.
nomio [S390] Do not use MIO instructions.
norid [S390] ignore the RID field and force use of
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a607f277c..a46264f83 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -887,30 +887,67 @@ void pci_request_acs(void)
}
static const char *disable_acs_redir_param;
+static const char *config_acs_param;
-/**
- * pci_disable_acs_redir - disable ACS redirect capabilities
- * @dev: the PCI device
- *
- * For only devices specified in the disable_acs_redir parameter.
- */
-static void pci_disable_acs_redir(struct pci_dev *dev)
+struct pci_acs {
+ u16 cap;
+ u16 ctrl;
+ u16 fw_ctrl;
+};
+
+static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
+ const char *p, u16 mask, u16 flags)
{
+ char *delimit;
int ret = 0;
- const char *p;
- int pos;
- u16 ctrl;
- if (!disable_acs_redir_param)
+ if (!p)
return;
- p = disable_acs_redir_param;
while (*p) {
+ if (!mask) {
+ /* Check for ACS flags */
+ delimit = strstr(p, "@");
+ if (delimit) {
+ int end;
+ u32 shift = 0;
+
+ end = delimit - p - 1;
+
+ while (end > -1) {
+ if (*(p + end) == '0') {
+ mask |= 1 << shift;
+ shift++;
+ end--;
+ } else if (*(p + end) == '1') {
+ mask |= 1 << shift;
+ flags |= 1 << shift;
+ shift++;
+ end--;
+ } else if ((*(p + end) == 'x') || (*(p + end) == 'X')) {
+ shift++;
+ end--;
+ } else {
+ pci_err(dev, "Invalid ACS flags... Ignoring\n");
+ return;
+ }
+ }
+ p = delimit + 1;
+ } else {
+ pci_err(dev, "ACS Flags missing\n");
+ return;
+ }
+ }
+
+ if (mask & ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
+ PCI_ACS_UF | PCI_ACS_EC | PCI_ACS_DT)) {
+ pci_err(dev, "Invalid ACS flags specified\n");
+ return;
+ }
+
ret = pci_dev_str_match(dev, p, &p);
if (ret < 0) {
- pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
- disable_acs_redir_param);
-
+ pr_info_once("PCI: Can't parse acs command line parameter\n");
break;
} else if (ret == 1) {
/* Found a match */
@@ -930,56 +967,38 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
if (!pci_dev_specific_disable_acs_redir(dev))
return;
- pos = dev->acs_cap;
- if (!pos) {
- pci_warn(dev, "cannot disable ACS redirect for this hardware as it does not have ACS capabilities\n");
- return;
- }
-
- pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+ pci_dbg(dev, "ACS mask = 0x%X\n", mask);
+ pci_dbg(dev, "ACS flags = 0x%X\n", flags);
- /* P2P Request & Completion Redirect */
- ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
+ /* If mask is 0 then we copy the bit from the firmware setting. */
+ caps->ctrl = (caps->ctrl & ~mask) | (caps->fw_ctrl & mask);
+ caps->ctrl |= flags;
- pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
-
- pci_info(dev, "disabled ACS redirect\n");
+ pci_info(dev, "Configured ACS to 0x%x\n", caps->ctrl);
}
/**
* pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
* @dev: the PCI device
+ * @caps: default ACS controls
*/
-static void pci_std_enable_acs(struct pci_dev *dev)
+static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
{
- int pos;
- u16 cap;
- u16 ctrl;
-
- pos = dev->acs_cap;
- if (!pos)
- return;
-
- pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
- pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
-
/* Source Validation */
- ctrl |= (cap & PCI_ACS_SV);
+ caps->ctrl |= (caps->cap & PCI_ACS_SV);
/* P2P Request Redirect */
- ctrl |= (cap & PCI_ACS_RR);
+ caps->ctrl |= (caps->cap & PCI_ACS_RR);
/* P2P Completion Redirect */
- ctrl |= (cap & PCI_ACS_CR);
+ caps->ctrl |= (caps->cap & PCI_ACS_CR);
/* Upstream Forwarding */
- ctrl |= (cap & PCI_ACS_UF);
+ caps->ctrl |= (caps->cap & PCI_ACS_UF);
/* Enable Translation Blocking for external devices and noats */
if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
- ctrl |= (cap & PCI_ACS_TB);
-
- pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+ caps->ctrl |= (caps->cap & PCI_ACS_TB);
}
/**
@@ -988,23 +1007,33 @@ static void pci_std_enable_acs(struct pci_dev *dev)
*/
static void pci_enable_acs(struct pci_dev *dev)
{
- if (!pci_acs_enable)
- goto disable_acs_redir;
+ struct pci_acs caps;
+ int pos;
+
+ pos = dev->acs_cap;
+ if (!pos)
+ return;
- if (!pci_dev_specific_enable_acs(dev))
- goto disable_acs_redir;
+ pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
+ pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
+ caps.fw_ctrl = caps.ctrl;
- pci_std_enable_acs(dev);
+ /* If an iommu is present we start with kernel default caps */
+ if (pci_acs_enable) {
+ if (pci_dev_specific_enable_acs(dev))
+ pci_std_enable_acs(dev, &caps);
+ }
-disable_acs_redir:
/*
- * Note: pci_disable_acs_redir() must be called even if ACS was not
- * enabled by the kernel because it may have been enabled by
- * platform firmware. So if we are told to disable it, we should
- * always disable it after setting the kernel's default
- * preferences.
+ * Always apply caps from the command line, even if there is no iommu.
+ * Trust that the admin has a reason to change the ACS settings.
*/
- pci_disable_acs_redir(dev);
+ __pci_config_acs(dev, &caps, disable_acs_redir_param,
+ PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC,
+ ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
+ __pci_config_acs(dev, &caps, config_acs_param, 0, 0);
+
+ pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
}
/**
@@ -7023,6 +7052,8 @@ static int __init pci_setup(char *str)
pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
} else if (!strncmp(str, "disable_acs_redir=", 18)) {
disable_acs_redir_param = str + 18;
+ } else if (!strncmp(str, "config_acs=", 11)) {
+ config_acs_param = str + 11;
} else {
pr_err("PCI: Unknown option `%s'\n", str);
}
@@ -7047,6 +7078,7 @@ static int __init pci_realloc_setup_params(void)
resource_alignment_param = kstrdup(resource_alignment_param,
GFP_KERNEL);
disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
+ config_acs_param = kstrdup(config_acs_param, GFP_KERNEL);
return 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-05-23 6:35 ` [PATCH V3] " Vidya Sagar
@ 2024-05-23 14:59 ` Bjorn Helgaas
2024-05-23 15:16 ` Jason Gunthorpe
2024-06-12 12:19 ` Jason Gunthorpe
2024-06-25 15:31 ` [PATCH V4] " Vidya Sagar
2 siblings, 1 reply; 28+ messages in thread
From: Bjorn Helgaas @ 2024-05-23 14:59 UTC (permalink / raw)
To: Vidya Sagar
Cc: corbet, bhelgaas, galshalom, leonro, jgg, treding, jonathanh,
mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv,
Joerg Roedel, Will Deacon, Robin Murphy, iommu
[+cc iommu folks]
On Thu, May 23, 2024 at 12:05:28PM +0530, Vidya Sagar wrote:
> For iommu_groups to form correctly, the ACS settings in the PCIe fabric
> need to be setup early in the boot process, either via the BIOS or via
> the kernel disable_acs_redir parameter.
Can you point to the iommu code that is involved here? It sounds like
the iommu_groups are built at boot time and are immutable after that?
If we need per-device ACS config that depends on the workload, it
seems kind of problematic to only be able to specify this at boot
time. I guess we would need to reboot if we want to run a workload
that needs a different config?
Is this the iommu usage model we want in the long term?
> disable_acs_redir allows clearing the RR|CR|EC ACS flags, but the PCIe
> spec Rev3.0 already defines 7 different ACS related flags with many more
> useful combinations depending on the fabric design.
If we need a spec citation, I'd rather use r6.x since r3.0 is from
2010.
> For backward compatibility, leave the 'disable_acs_redir' as is and add
> a new parameter 'config_acs'so that the user can directly specify the ACS
> flags to set on a per-device basis. Use a similar syntax to the existing
> 'resource_alignment' parameter by using the @ character and have the user
> specify the ACS flags using a bit encoding. If both 'disable_acs_redir' and
> 'config_acs' are specified for a particular device, configuration specified
> through 'config_acs' takes precedence over the other.
>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
> v3:
> * Fixed a documentation issue reported by kernel test bot
>
> v2:
> * Refactored the code as per Jason's suggestion
>
> .../admin-guide/kernel-parameters.txt | 22 +++
> drivers/pci/pci.c | 148 +++++++++++-------
> 2 files changed, 112 insertions(+), 58 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 41644336e..b4a8207eb 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4456,6 +4456,28 @@
> bridges without forcing it upstream. Note:
> this removes isolation between devices and
> may put more devices in an IOMMU group.
> + config_acs=
> + Format:
> + =<ACS flags>@<pci_dev>[; ...]
> + Specify one or more PCI devices (in the format
> + specified above) optionally prepended with flags
> + and separated by semicolons. The respective
> + capabilities will be enabled, disabled or unchanged
> + based on what is specified in flags.
> + ACS Flags is defined as follows
> + bit-0 : ACS Source Validation
> + bit-1 : ACS Translation Blocking
> + bit-2 : ACS P2P Request Redirect
> + bit-3 : ACS P2P Completion Redirect
> + bit-4 : ACS Upstream Forwarding
> + bit-5 : ACS P2P Egress Control
> + bit-6 : ACS Direct Translated P2P
> + Each bit can be marked as
> + ‘0‘ – force disabled
> + ‘1’ – force enabled
> + ‘x’ – unchanged.
> + Note: this may remove isolation between devices
> + and may put more devices in an IOMMU group.
> force_floating [S390] Force usage of floating interrupts.
> nomio [S390] Do not use MIO instructions.
> norid [S390] ignore the RID field and force use of
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index a607f277c..a46264f83 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -887,30 +887,67 @@ void pci_request_acs(void)
> }
>
> static const char *disable_acs_redir_param;
> +static const char *config_acs_param;
>
> -/**
> - * pci_disable_acs_redir - disable ACS redirect capabilities
> - * @dev: the PCI device
> - *
> - * For only devices specified in the disable_acs_redir parameter.
> - */
> -static void pci_disable_acs_redir(struct pci_dev *dev)
> +struct pci_acs {
> + u16 cap;
> + u16 ctrl;
> + u16 fw_ctrl;
> +};
> +
> +static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
> + const char *p, u16 mask, u16 flags)
> {
> + char *delimit;
> int ret = 0;
> - const char *p;
> - int pos;
> - u16 ctrl;
>
> - if (!disable_acs_redir_param)
> + if (!p)
> return;
>
> - p = disable_acs_redir_param;
> while (*p) {
> + if (!mask) {
> + /* Check for ACS flags */
> + delimit = strstr(p, "@");
> + if (delimit) {
> + int end;
> + u32 shift = 0;
> +
> + end = delimit - p - 1;
> +
> + while (end > -1) {
> + if (*(p + end) == '0') {
> + mask |= 1 << shift;
> + shift++;
> + end--;
> + } else if (*(p + end) == '1') {
> + mask |= 1 << shift;
> + flags |= 1 << shift;
> + shift++;
> + end--;
> + } else if ((*(p + end) == 'x') || (*(p + end) == 'X')) {
> + shift++;
> + end--;
> + } else {
> + pci_err(dev, "Invalid ACS flags... Ignoring\n");
> + return;
> + }
> + }
> + p = delimit + 1;
> + } else {
> + pci_err(dev, "ACS Flags missing\n");
> + return;
> + }
> + }
> +
> + if (mask & ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> + PCI_ACS_UF | PCI_ACS_EC | PCI_ACS_DT)) {
> + pci_err(dev, "Invalid ACS flags specified\n");
> + return;
> + }
> +
> ret = pci_dev_str_match(dev, p, &p);
> if (ret < 0) {
> - pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
> - disable_acs_redir_param);
> -
> + pr_info_once("PCI: Can't parse acs command line parameter\n");
> break;
> } else if (ret == 1) {
> /* Found a match */
> @@ -930,56 +967,38 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
> if (!pci_dev_specific_disable_acs_redir(dev))
> return;
>
> - pos = dev->acs_cap;
> - if (!pos) {
> - pci_warn(dev, "cannot disable ACS redirect for this hardware as it does not have ACS capabilities\n");
> - return;
> - }
> -
> - pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> + pci_dbg(dev, "ACS mask = 0x%X\n", mask);
> + pci_dbg(dev, "ACS flags = 0x%X\n", flags);
>
> - /* P2P Request & Completion Redirect */
> - ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
> + /* If mask is 0 then we copy the bit from the firmware setting. */
> + caps->ctrl = (caps->ctrl & ~mask) | (caps->fw_ctrl & mask);
> + caps->ctrl |= flags;
>
> - pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> -
> - pci_info(dev, "disabled ACS redirect\n");
> + pci_info(dev, "Configured ACS to 0x%x\n", caps->ctrl);
> }
>
> /**
> * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
> * @dev: the PCI device
> + * @caps: default ACS controls
> */
> -static void pci_std_enable_acs(struct pci_dev *dev)
> +static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
> {
> - int pos;
> - u16 cap;
> - u16 ctrl;
> -
> - pos = dev->acs_cap;
> - if (!pos)
> - return;
> -
> - pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
> - pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> -
> /* Source Validation */
> - ctrl |= (cap & PCI_ACS_SV);
> + caps->ctrl |= (caps->cap & PCI_ACS_SV);
>
> /* P2P Request Redirect */
> - ctrl |= (cap & PCI_ACS_RR);
> + caps->ctrl |= (caps->cap & PCI_ACS_RR);
>
> /* P2P Completion Redirect */
> - ctrl |= (cap & PCI_ACS_CR);
> + caps->ctrl |= (caps->cap & PCI_ACS_CR);
>
> /* Upstream Forwarding */
> - ctrl |= (cap & PCI_ACS_UF);
> + caps->ctrl |= (caps->cap & PCI_ACS_UF);
>
> /* Enable Translation Blocking for external devices and noats */
> if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
> - ctrl |= (cap & PCI_ACS_TB);
> -
> - pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> + caps->ctrl |= (caps->cap & PCI_ACS_TB);
> }
>
> /**
> @@ -988,23 +1007,33 @@ static void pci_std_enable_acs(struct pci_dev *dev)
> */
> static void pci_enable_acs(struct pci_dev *dev)
> {
> - if (!pci_acs_enable)
> - goto disable_acs_redir;
> + struct pci_acs caps;
> + int pos;
> +
> + pos = dev->acs_cap;
> + if (!pos)
> + return;
>
> - if (!pci_dev_specific_enable_acs(dev))
> - goto disable_acs_redir;
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
> + caps.fw_ctrl = caps.ctrl;
>
> - pci_std_enable_acs(dev);
> + /* If an iommu is present we start with kernel default caps */
> + if (pci_acs_enable) {
> + if (pci_dev_specific_enable_acs(dev))
> + pci_std_enable_acs(dev, &caps);
> + }
>
> -disable_acs_redir:
> /*
> - * Note: pci_disable_acs_redir() must be called even if ACS was not
> - * enabled by the kernel because it may have been enabled by
> - * platform firmware. So if we are told to disable it, we should
> - * always disable it after setting the kernel's default
> - * preferences.
> + * Always apply caps from the command line, even if there is no iommu.
> + * Trust that the admin has a reason to change the ACS settings.
> */
> - pci_disable_acs_redir(dev);
> + __pci_config_acs(dev, &caps, disable_acs_redir_param,
> + PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC,
> + ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
> + __pci_config_acs(dev, &caps, config_acs_param, 0, 0);
> +
> + pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
> }
>
> /**
> @@ -7023,6 +7052,8 @@ static int __init pci_setup(char *str)
> pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
> } else if (!strncmp(str, "disable_acs_redir=", 18)) {
> disable_acs_redir_param = str + 18;
> + } else if (!strncmp(str, "config_acs=", 11)) {
> + config_acs_param = str + 11;
> } else {
> pr_err("PCI: Unknown option `%s'\n", str);
> }
> @@ -7047,6 +7078,7 @@ static int __init pci_realloc_setup_params(void)
> resource_alignment_param = kstrdup(resource_alignment_param,
> GFP_KERNEL);
> disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
> + config_acs_param = kstrdup(config_acs_param, GFP_KERNEL);
>
> return 0;
> }
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-05-23 14:59 ` Bjorn Helgaas
@ 2024-05-23 15:16 ` Jason Gunthorpe
2024-06-03 7:50 ` Vidya Sagar
0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2024-05-23 15:16 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Vidya Sagar, corbet, bhelgaas, galshalom, leonro, treding,
jonathanh, mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave,
linux-doc, linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv,
Joerg Roedel, Will Deacon, Robin Murphy, iommu
On Thu, May 23, 2024 at 09:59:36AM -0500, Bjorn Helgaas wrote:
> [+cc iommu folks]
>
> On Thu, May 23, 2024 at 12:05:28PM +0530, Vidya Sagar wrote:
> > For iommu_groups to form correctly, the ACS settings in the PCIe fabric
> > need to be setup early in the boot process, either via the BIOS or via
> > the kernel disable_acs_redir parameter.
>
> Can you point to the iommu code that is involved here? It sounds like
> the iommu_groups are built at boot time and are immutable after that?
They are created when the struct device is plugged
in. pci_device_group() does the logic.
Notably groups can't/don't change if details like ACS change after the
groups are setup.
There are alot of instructions out there telling people to boot their
servers and then manually change the ACS flags with set_pci or
something, and these are not good instructions since it defeats the
VFIO group based security mechanisms.
> If we need per-device ACS config that depends on the workload, it
> seems kind of problematic to only be able to specify this at boot
> time. I guess we would need to reboot if we want to run a workload
> that needs a different config?
Basically. The main difference I'd see is if the server is a VM host
or running bare metal apps. You can get more efficicenty if you change
things for the bare metal case, and often bare metal will want to turn
the iommu off while a VM host often wants more of it turned on.
> Is this the iommu usage model we want in the long term?
There is some path to more dynamic behavior here, but it would require
separating groups into two components - devices that are together
because they are physically sharing translation (aliases and things)
from devices that are together because they share a security boundary
(ACS).
It is more believable we could dynamically change security group
assigments for VFIO than translation group assignment. I don't know
anyone interested in this right now - Alex and I have only talked
about it as a possibility a while back.
FWIW I don't view patch as excluding more dynamisism in the future,
but it is the best way to work with the current state of affairs, and
definitely better than set_pci instructions.
Thanks,
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH V3] PCI: Extend ACS configurability
2024-05-23 15:16 ` Jason Gunthorpe
@ 2024-06-03 7:50 ` Vidya Sagar
2024-06-07 19:30 ` Bjorn Helgaas
0 siblings, 1 reply; 28+ messages in thread
From: Vidya Sagar @ 2024-06-03 7:50 UTC (permalink / raw)
To: Jason Gunthorpe, Bjorn Helgaas
Cc: corbet@lwn.net, bhelgaas@google.com, Gal Shalom, Leon Romanovsky,
Thierry Reding, Jon Hunter, Masoud Moshref Javadi, Shahaf Shuler,
Vikram Sethi, Shanker Donthineni, Jiandi An, Tushar Dave,
linux-doc@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Krishna Thota, Manikanta Maddireddy,
sagar.tv@gmail.com, Joerg Roedel, Will Deacon, Robin Murphy,
iommu@lists.linux.dev
Hi Bjorn,
Could you let me know if Jason's reply answers your question?
Please let me know if you are looking for any more information.
Thanks,
Vidya Sagar
> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, May 23, 2024 8:46 PM
> To: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Vidya Sagar <vidyas@nvidia.com>; corbet@lwn.net; bhelgaas@google.com; Gal
> Shalom <galshalom@nvidia.com>; Leon Romanovsky <leonro@nvidia.com>; Thierry
> Reding <treding@nvidia.com>; Jon Hunter <jonathanh@nvidia.com>; Masoud
> Moshref Javadi <mmoshrefjava@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>;
> Vikram Sethi <vsethi@nvidia.com>; Shanker Donthineni <sdonthineni@nvidia.com>;
> Jiandi An <jan@nvidia.com>; Tushar Dave <tdave@nvidia.com>; linux-
> doc@vger.kernel.org; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;
> Krishna Thota <kthota@nvidia.com>; Manikanta Maddireddy
> <mmaddireddy@nvidia.com>; sagar.tv@gmail.com; Joerg Roedel <joro@8bytes.org>;
> Will Deacon <will@kernel.org>; Robin Murphy <robin.murphy@arm.com>;
> iommu@lists.linux.dev
> Subject: Re: [PATCH V3] PCI: Extend ACS configurability
>
> On Thu, May 23, 2024 at 09:59:36AM -0500, Bjorn Helgaas wrote:
> > [+cc iommu folks]
> >
> > On Thu, May 23, 2024 at 12:05:28PM +0530, Vidya Sagar wrote:
> > > For iommu_groups to form correctly, the ACS settings in the PCIe
> > > fabric need to be setup early in the boot process, either via the
> > > BIOS or via the kernel disable_acs_redir parameter.
> >
> > Can you point to the iommu code that is involved here? It sounds like
> > the iommu_groups are built at boot time and are immutable after that?
>
> They are created when the struct device is plugged in. pci_device_group() does the
> logic.
>
> Notably groups can't/don't change if details like ACS change after the groups are
> setup.
>
> There are alot of instructions out there telling people to boot their servers and then
> manually change the ACS flags with set_pci or something, and these are not good
> instructions since it defeats the VFIO group based security mechanisms.
>
> > If we need per-device ACS config that depends on the workload, it
> > seems kind of problematic to only be able to specify this at boot
> > time. I guess we would need to reboot if we want to run a workload
> > that needs a different config?
>
> Basically. The main difference I'd see is if the server is a VM host or running bare
> metal apps. You can get more efficicenty if you change things for the bare metal case,
> and often bare metal will want to turn the iommu off while a VM host often wants
> more of it turned on.
>
> > Is this the iommu usage model we want in the long term?
>
> There is some path to more dynamic behavior here, but it would require separating
> groups into two components - devices that are together because they are physically
> sharing translation (aliases and things) from devices that are together because they
> share a security boundary (ACS).
>
> It is more believable we could dynamically change security group assigments for VFIO
> than translation group assignment. I don't know anyone interested in this right now -
> Alex and I have only talked about it as a possibility a while back.
>
> FWIW I don't view patch as excluding more dynamisism in the future, but it is the best
> way to work with the current state of affairs, and definitely better than set_pci
> instructions.
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-06-03 7:50 ` Vidya Sagar
@ 2024-06-07 19:30 ` Bjorn Helgaas
2024-06-10 11:38 ` Jason Gunthorpe
0 siblings, 1 reply; 28+ messages in thread
From: Bjorn Helgaas @ 2024-06-07 19:30 UTC (permalink / raw)
To: Vidya Sagar
Cc: Jason Gunthorpe, corbet@lwn.net, bhelgaas@google.com, Gal Shalom,
Leon Romanovsky, Thierry Reding, Jon Hunter,
Masoud Moshref Javadi, Shahaf Shuler, Vikram Sethi,
Shanker Donthineni, Jiandi An, Tushar Dave,
linux-doc@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Krishna Thota, Manikanta Maddireddy,
sagar.tv@gmail.com, Joerg Roedel, Will Deacon, Robin Murphy,
iommu@lists.linux.dev
On Mon, Jun 03, 2024 at 07:50:59AM +0000, Vidya Sagar wrote:
> Hi Bjorn,
> Could you let me know if Jason's reply answers your question?
> Please let me know if you are looking for any more information.
I think we should add some of that content to the commit log. It
needs:
- Subject line that advertises some good thing.
- A description of why users want this. I have no idea what the
actual benefit is, but I'm looking for something at the level of
"The default ACS settings put A and B in different IOMMU groups,
preventing P2PDMA between them. If we disable ACS X, A and B will
be put in the same group and P2PDMA will work".
- A primer on how users can affect IOMMU groups by enabling/
disabling ACS settings so they can use this without just blind
trial and error. A note that this is immutable except at boot
time.
- A pointer to the code that determines IOMMU groups based on the
ACS settings. Similar to the above, but more useful for
developers.
If we assert "for iommu_groups to form correctly ...", a hint about
why/where this is so would be helpful.
"Correctly" is not quite the right word here; it's just a fact that
the ACS settings determined at boot time result in certain IOMMU
groups. If the user desires different groups, it's not that something
is "incorrect"; it's just that the user may have to accept less
isolation to get the desired IOMMU groups.
> > -----Original Message-----
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > ...
> >
> > On Thu, May 23, 2024 at 09:59:36AM -0500, Bjorn Helgaas wrote:
> > > [+cc iommu folks]
> > >
> > > On Thu, May 23, 2024 at 12:05:28PM +0530, Vidya Sagar wrote:
> > > > For iommu_groups to form correctly, the ACS settings in the PCIe
> > > > fabric need to be setup early in the boot process, either via the
> > > > BIOS or via the kernel disable_acs_redir parameter.
> > >
> > > Can you point to the iommu code that is involved here? It sounds like
> > > the iommu_groups are built at boot time and are immutable after that?
> >
> > They are created when the struct device is plugged in. pci_device_group() does the
> > logic.
> >
> > Notably groups can't/don't change if details like ACS change after the groups are
> > setup.
> >
> > There are alot of instructions out there telling people to boot their servers and then
> > manually change the ACS flags with set_pci or something, and these are not good
> > instructions since it defeats the VFIO group based security mechanisms.
> >
> > > If we need per-device ACS config that depends on the workload, it
> > > seems kind of problematic to only be able to specify this at boot
> > > time. I guess we would need to reboot if we want to run a workload
> > > that needs a different config?
> >
> > Basically. The main difference I'd see is if the server is a VM host or running bare
> > metal apps. You can get more efficicenty if you change things for the bare metal case,
> > and often bare metal will want to turn the iommu off while a VM host often wants
> > more of it turned on.
> >
> > > Is this the iommu usage model we want in the long term?
> >
> > There is some path to more dynamic behavior here, but it would require separating
> > groups into two components - devices that are together because they are physically
> > sharing translation (aliases and things) from devices that are together because they
> > share a security boundary (ACS).
> >
> > It is more believable we could dynamically change security group assigments for VFIO
> > than translation group assignment. I don't know anyone interested in this right now -
> > Alex and I have only talked about it as a possibility a while back.
> >
> > FWIW I don't view patch as excluding more dynamisism in the future, but it is the best
> > way to work with the current state of affairs, and definitely better than set_pci
> > instructions.
> >
> > Thanks,
> > Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-06-07 19:30 ` Bjorn Helgaas
@ 2024-06-10 11:38 ` Jason Gunthorpe
2024-06-12 21:29 ` Bjorn Helgaas
0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2024-06-10 11:38 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Vidya Sagar, corbet@lwn.net, bhelgaas@google.com, Gal Shalom,
Leon Romanovsky, Thierry Reding, Jon Hunter,
Masoud Moshref Javadi, Shahaf Shuler, Vikram Sethi,
Shanker Donthineni, Jiandi An, Tushar Dave,
linux-doc@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Krishna Thota, Manikanta Maddireddy,
sagar.tv@gmail.com, Joerg Roedel, Will Deacon, Robin Murphy,
iommu@lists.linux.dev
On Fri, Jun 07, 2024 at 02:30:55PM -0500, Bjorn Helgaas wrote:
> "Correctly" is not quite the right word here; it's just a fact that
> the ACS settings determined at boot time result in certain IOMMU
> groups. If the user desires different groups, it's not that something
> is "incorrect"; it's just that the user may have to accept less
> isolation to get the desired IOMMU groups.
That is not quite accurate.. There are HW configurations where ACS
needs to be a certain way for the HW to work with P2P at all. It isn't
just an optimization or the user accepts something, if they want P2P
at all they must get a ACS configuration appropriate for their system.
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-05-23 6:35 ` [PATCH V3] " Vidya Sagar
2024-05-23 14:59 ` Bjorn Helgaas
@ 2024-06-12 12:19 ` Jason Gunthorpe
2024-06-25 15:31 ` [PATCH V4] " Vidya Sagar
2 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2024-06-12 12:19 UTC (permalink / raw)
To: Vidya Sagar
Cc: corbet, bhelgaas, galshalom, leonro, treding, jonathanh,
mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv
On Thu, May 23, 2024 at 12:05:28PM +0530, Vidya Sagar wrote:
> For iommu_groups to form correctly, the ACS settings in the PCIe fabric
> need to be setup early in the boot process, either via the BIOS or via
> the kernel disable_acs_redir parameter.
>
> disable_acs_redir allows clearing the RR|CR|EC ACS flags, but the PCIe
> spec Rev3.0 already defines 7 different ACS related flags with many more
> useful combinations depending on the fabric design.
>
> For backward compatibility, leave the 'disable_acs_redir' as is and add
> a new parameter 'config_acs'so that the user can directly specify the ACS
> flags to set on a per-device basis. Use a similar syntax to the existing
> 'resource_alignment' parameter by using the @ character and have the user
> specify the ACS flags using a bit encoding. If both 'disable_acs_redir' and
> 'config_acs' are specified for a particular device, configuration specified
> through 'config_acs' takes precedence over the other.
>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
> v3:
> * Fixed a documentation issue reported by kernel test bot
>
> v2:
> * Refactored the code as per Jason's suggestion
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-06-10 11:38 ` Jason Gunthorpe
@ 2024-06-12 21:29 ` Bjorn Helgaas
2024-06-12 23:23 ` Jason Gunthorpe
0 siblings, 1 reply; 28+ messages in thread
From: Bjorn Helgaas @ 2024-06-12 21:29 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Vidya Sagar, corbet@lwn.net, bhelgaas@google.com, Gal Shalom,
Leon Romanovsky, Thierry Reding, Jon Hunter,
Masoud Moshref Javadi, Shahaf Shuler, Vikram Sethi,
Shanker Donthineni, Jiandi An, Tushar Dave,
linux-doc@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Krishna Thota, Manikanta Maddireddy,
sagar.tv@gmail.com, Joerg Roedel, Will Deacon, Robin Murphy,
iommu@lists.linux.dev, Alex Williamson
[+cc Alex since VFIO entered the conversation; thread at
https://lore.kernel.org/r/20240523063528.199908-1-vidyas@nvidia.com]
On Mon, Jun 10, 2024 at 08:38:49AM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 07, 2024 at 02:30:55PM -0500, Bjorn Helgaas wrote:
> > "Correctly" is not quite the right word here; it's just a fact that
> > the ACS settings determined at boot time result in certain IOMMU
> > groups. If the user desires different groups, it's not that something
> > is "incorrect"; it's just that the user may have to accept less
> > isolation to get the desired IOMMU groups.
>
> That is not quite accurate.. There are HW configurations where ACS
> needs to be a certain way for the HW to work with P2P at all. It isn't
> just an optimization or the user accepts something, if they want P2P
> at all they must get a ACS configuration appropriate for their system.
The current wording of "For iommu_groups to form correctly, the ACS
settings in the PCIe fabric need to be setup early" suggests that the
way we currently configure ACS is incorrect in general, regardless of
P2PDMA.
But my impression is that there's a trade-off between isolation and
the ability to do P2PDMA, and users have different requirements, and
the preference for less isolation/more P2PDMA is no more "correct"
than a preference for more isolation/less P2PDMA.
The kernel-parameters doc mentions the reduced isolation idea, but I
think we need a little more guidance for users. It's probably too
much detail for kernel-parameters, but the commit log would be a good
place.
Maybe something like this:
PCIe ACS settings determine how devices are put into iommu_groups.
The iommu_groups in turn determine which devices can be passed
through to VMs and whether P2PDMA between them is possible. The
iommu_groups are built at enumeration-time and are currently static.
Add a kernel command-line option to change ACS settings for specific
devices, which allows more devices to be put in the same
iommu_group, at the cost of reduced isolation between them.
ACS applies to PCIe Downstream Ports and multi-function devices.
The default ACS settings are XXX and cause devices below an
ACS-capable port to be put in an iommu_group isolated from P2PDMA
from outside the group.
Disabling ACS XXX at a port allows ... downstream devices to be
included in the same iommu_group as ...
[I don't know exactly how this works, so please make it make sense].
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-06-12 21:29 ` Bjorn Helgaas
@ 2024-06-12 23:23 ` Jason Gunthorpe
2024-06-13 22:05 ` Bjorn Helgaas
2024-06-13 22:38 ` Alex Williamson
0 siblings, 2 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2024-06-12 23:23 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Vidya Sagar, corbet@lwn.net, bhelgaas@google.com, Gal Shalom,
Leon Romanovsky, Thierry Reding, Jon Hunter,
Masoud Moshref Javadi, Shahaf Shuler, Vikram Sethi,
Shanker Donthineni, Jiandi An, Tushar Dave,
linux-doc@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Krishna Thota, Manikanta Maddireddy,
sagar.tv@gmail.com, Joerg Roedel, Will Deacon, Robin Murphy,
iommu@lists.linux.dev, Alex Williamson
On Wed, Jun 12, 2024 at 04:29:03PM -0500, Bjorn Helgaas wrote:
> [+cc Alex since VFIO entered the conversation; thread at
> https://lore.kernel.org/r/20240523063528.199908-1-vidyas@nvidia.com]
>
> On Mon, Jun 10, 2024 at 08:38:49AM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 07, 2024 at 02:30:55PM -0500, Bjorn Helgaas wrote:
> > > "Correctly" is not quite the right word here; it's just a fact that
> > > the ACS settings determined at boot time result in certain IOMMU
> > > groups. If the user desires different groups, it's not that something
> > > is "incorrect"; it's just that the user may have to accept less
> > > isolation to get the desired IOMMU groups.
> >
> > That is not quite accurate.. There are HW configurations where ACS
> > needs to be a certain way for the HW to work with P2P at all. It isn't
> > just an optimization or the user accepts something, if they want P2P
> > at all they must get a ACS configuration appropriate for their system.
>
> The current wording of "For iommu_groups to form correctly, the ACS
> settings in the PCIe fabric need to be setup early" suggests that the
> way we currently configure ACS is incorrect in general, regardless of
> P2PDMA.
Yes, I'd agree with this. We don't have enough information to
configurate it properly in the kernel in an automatic way. We don't
know if pairs of devices even have SW enablement to do P2P in the
kernel and we don't accurately know what issues the root complex
has. All of this information goes into choosing the right ACS bits.
> But my impression is that there's a trade-off between isolation and
> the ability to do P2PDMA, and users have different requirements, and
> the preference for less isolation/more P2PDMA is no more "correct"
> than a preference for more isolation/less P2PDMA.
Sure, that makes sense
> Maybe something like this:
>
> PCIe ACS settings determine how devices are put into iommu_groups.
> The iommu_groups in turn determine which devices can be passed
> through to VMs and whether P2PDMA between them is possible. The
> iommu_groups are built at enumeration-time and are currently static.
Not quite, the iommu_groups don't have alot to do with the P2P. Even
devices in the same kernel group can still have non working P2P.
Maybe:
PCIe ACS settings control the level of isolation and the possible P2P
paths between devices. With greater isolation the kernel will create
smaller iommu_groups and with less isolation there is more HW that
can achieve P2P transfers. From a virtualization perspective all
devices in the same iommu_group must be assigned to the same VM as
they lack security isolation.
There is no way for the kernel to automatically know the correct
ACS settings for any given system and workload. Existing command line
options allow only for large scale change, disabling all
isolation, but this is not sufficient for more complex cases.
Add a kernel command-line option to directly control all the ACS bits
for specific devices, which allows the operator to setup the right
level of isolation to achieve the desired P2P configuration. The
definition is future proof, when new ACS bits are added to the spec
the open syntax can be extended.
ACS needs to be setup early in the kernel boot as the ACS settings
effect how iommu_groups are formed. iommu_group formation is a one
time event during initial device discovery, changing ACS bits after
kernel boot can result in an inaccurate view of the iommu_groups
compared to the current isolation configuration.
ACS applies to PCIe Downstream Ports and multi-function devices.
The default ACS settings are strict and deny any direct traffic
between two functions. This results in the smallest iommu_group the
HW can support. Frequently these values result in slow or
non-working P2PDMA.
ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:
- Full prevention
- Translated requests can be direct, with various options
- Asymetric direct traffic, A can reach B but not the reverse
- All traffic can be direct
Along with some other less common ones for special topologies.
The intention is that this option would be used with expert knowledge
of the HW capability and workload to achieve the desired
configuration.
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-06-12 23:23 ` Jason Gunthorpe
@ 2024-06-13 22:05 ` Bjorn Helgaas
2024-06-13 23:36 ` Jason Gunthorpe
2024-06-13 22:38 ` Alex Williamson
1 sibling, 1 reply; 28+ messages in thread
From: Bjorn Helgaas @ 2024-06-13 22:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Vidya Sagar, corbet@lwn.net, bhelgaas@google.com, Gal Shalom,
Leon Romanovsky, Thierry Reding, Jon Hunter,
Masoud Moshref Javadi, Shahaf Shuler, Vikram Sethi,
Shanker Donthineni, Jiandi An, Tushar Dave,
linux-doc@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Krishna Thota, Manikanta Maddireddy,
sagar.tv@gmail.com, Joerg Roedel, Will Deacon, Robin Murphy,
iommu@lists.linux.dev, Alex Williamson
On Wed, Jun 12, 2024 at 08:23:01PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 12, 2024 at 04:29:03PM -0500, Bjorn Helgaas wrote:
> > [+cc Alex since VFIO entered the conversation; thread at
> > https://lore.kernel.org/r/20240523063528.199908-1-vidyas@nvidia.com]
> >
> > On Mon, Jun 10, 2024 at 08:38:49AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Jun 07, 2024 at 02:30:55PM -0500, Bjorn Helgaas wrote:
> > > > "Correctly" is not quite the right word here; it's just a fact that
> > > > the ACS settings determined at boot time result in certain IOMMU
> > > > groups. If the user desires different groups, it's not that something
> > > > is "incorrect"; it's just that the user may have to accept less
> > > > isolation to get the desired IOMMU groups.
> > >
> > > That is not quite accurate.. There are HW configurations where ACS
> > > needs to be a certain way for the HW to work with P2P at all. It isn't
> > > just an optimization or the user accepts something, if they want P2P
> > > at all they must get a ACS configuration appropriate for their system.
> >
> > The current wording of "For iommu_groups to form correctly, the ACS
> > settings in the PCIe fabric need to be setup early" suggests that the
> > way we currently configure ACS is incorrect in general, regardless of
> > P2PDMA.
>
> Yes, I'd agree with this. We don't have enough information to
> configurate it properly in the kernel in an automatic way. We don't
> know if pairs of devices even have SW enablement to do P2P in the
> kernel and we don't accurately know what issues the root complex
> has. All of this information goes into choosing the right ACS bits.
>
> > But my impression is that there's a trade-off between isolation and
> > the ability to do P2PDMA, and users have different requirements, and
> > the preference for less isolation/more P2PDMA is no more "correct"
> > than a preference for more isolation/less P2PDMA.
>
> Sure, that makes sense
>
> > Maybe something like this:
> >
> > PCIe ACS settings determine how devices are put into iommu_groups.
> > The iommu_groups in turn determine which devices can be passed
> > through to VMs and whether P2PDMA between them is possible. The
> > iommu_groups are built at enumeration-time and are currently static.
>
> Not quite, the iommu_groups don't have alot to do with the P2P. Even
> devices in the same kernel group can still have non working P2P.
>
> Maybe:
>
> PCIe ACS settings control the level of isolation and the possible P2P
> paths between devices. With greater isolation the kernel will create
> smaller iommu_groups and with less isolation there is more HW that
> can achieve P2P transfers. From a virtualization perspective all
> devices in the same iommu_group must be assigned to the same VM as
> they lack security isolation.
>
> There is no way for the kernel to automatically know the correct
> ACS settings for any given system and workload. Existing command line
> options allow only for large scale change, disabling all
> isolation, but this is not sufficient for more complex cases.
>
> Add a kernel command-line option to directly control all the ACS bits
> for specific devices, which allows the operator to setup the right
> level of isolation to achieve the desired P2P configuration. The
> definition is future proof, when new ACS bits are added to the spec
> the open syntax can be extended.
>
> ACS needs to be setup early in the kernel boot as the ACS settings
> effect how iommu_groups are formed. iommu_group formation is a one
> time event during initial device discovery, changing ACS bits after
> kernel boot can result in an inaccurate view of the iommu_groups
> compared to the current isolation configuration.
>
> ACS applies to PCIe Downstream Ports and multi-function devices.
> The default ACS settings are strict and deny any direct traffic
> between two functions. This results in the smallest iommu_group the
> HW can support. Frequently these values result in slow or
> non-working P2PDMA.
>
> ACS offers a range of security choices controlling how traffic is
> allowed to go directly between two devices. Some popular choices:
> - Full prevention
> - Translated requests can be direct, with various options
> - Asymetric direct traffic, A can reach B but not the reverse
> - All traffic can be direct
> Along with some other less common ones for special topologies.
>
> The intention is that this option would be used with expert knowledge
> of the HW capability and workload to achieve the desired
> configuration.
That all sounds good. IIUC the current default is full prevention (I
guess you said that a few paragraphs up).
It's unfortunate that this requires so much expert knowledge to use,
but I guess we don't really have a good alternative. The only way I
can think of to help would be some kind of white paper or examples in
Documentation/PCI/.
Bjorn
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-06-12 23:23 ` Jason Gunthorpe
2024-06-13 22:05 ` Bjorn Helgaas
@ 2024-06-13 22:38 ` Alex Williamson
1 sibling, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2024-06-13 22:38 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Bjorn Helgaas, Vidya Sagar, corbet@lwn.net, bhelgaas@google.com,
Gal Shalom, Leon Romanovsky, Thierry Reding, Jon Hunter,
Masoud Moshref Javadi, Shahaf Shuler, Vikram Sethi,
Shanker Donthineni, Jiandi An, Tushar Dave,
linux-doc@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Krishna Thota, Manikanta Maddireddy,
sagar.tv@gmail.com, Joerg Roedel, Will Deacon, Robin Murphy,
iommu@lists.linux.dev
On Wed, 12 Jun 2024 20:23:01 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Wed, Jun 12, 2024 at 04:29:03PM -0500, Bjorn Helgaas wrote:
> > [+cc Alex since VFIO entered the conversation; thread at
> > https://lore.kernel.org/r/20240523063528.199908-1-vidyas@nvidia.com]
> >
> > On Mon, Jun 10, 2024 at 08:38:49AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Jun 07, 2024 at 02:30:55PM -0500, Bjorn Helgaas wrote:
> > > > "Correctly" is not quite the right word here; it's just a fact that
> > > > the ACS settings determined at boot time result in certain IOMMU
> > > > groups. If the user desires different groups, it's not that something
> > > > is "incorrect"; it's just that the user may have to accept less
> > > > isolation to get the desired IOMMU groups.
> > >
> > > That is not quite accurate.. There are HW configurations where ACS
> > > needs to be a certain way for the HW to work with P2P at all. It isn't
> > > just an optimization or the user accepts something, if they want P2P
> > > at all they must get a ACS configuration appropriate for their system.
> >
> > The current wording of "For iommu_groups to form correctly, the ACS
> > settings in the PCIe fabric need to be setup early" suggests that the
> > way we currently configure ACS is incorrect in general, regardless of
> > P2PDMA.
>
> Yes, I'd agree with this. We don't have enough information to
> configurate it properly in the kernel in an automatic way. We don't
> know if pairs of devices even have SW enablement to do P2P in the
> kernel and we don't accurately know what issues the root complex
> has. All of this information goes into choosing the right ACS bits.
>
> > But my impression is that there's a trade-off between isolation and
> > the ability to do P2PDMA, and users have different requirements, and
> > the preference for less isolation/more P2PDMA is no more "correct"
> > than a preference for more isolation/less P2PDMA.
>
> Sure, that makes sense
>
> > Maybe something like this:
> >
> > PCIe ACS settings determine how devices are put into iommu_groups.
> > The iommu_groups in turn determine which devices can be passed
> > through to VMs and whether P2PDMA between them is possible. The
> > iommu_groups are built at enumeration-time and are currently static.
>
> Not quite, the iommu_groups don't have alot to do with the P2P. Even
> devices in the same kernel group can still have non working P2P.
>
> Maybe:
>
> PCIe ACS settings control the level of isolation and the possible P2P
> paths between devices. With greater isolation the kernel will create
> smaller iommu_groups and with less isolation there is more HW that
> can achieve P2P transfers. From a virtualization perspective all
> devices in the same iommu_group must be assigned to the same VM as
> they lack security isolation.
>
> There is no way for the kernel to automatically know the correct
> ACS settings for any given system and workload. Existing command line
> options allow only for large scale change, disabling all
> isolation, but this is not sufficient for more complex cases.
>
> Add a kernel command-line option to directly control all the ACS bits
> for specific devices, which allows the operator to setup the right
> level of isolation to achieve the desired P2P configuration. The
> definition is future proof, when new ACS bits are added to the spec
> the open syntax can be extended.
>
> ACS needs to be setup early in the kernel boot as the ACS settings
> effect how iommu_groups are formed. iommu_group formation is a one
> time event during initial device discovery, changing ACS bits after
> kernel boot can result in an inaccurate view of the iommu_groups
> compared to the current isolation configuration.
>
> ACS applies to PCIe Downstream Ports and multi-function devices.
> The default ACS settings are strict and deny any direct traffic
> between two functions. This results in the smallest iommu_group the
> HW can support. Frequently these values result in slow or
> non-working P2PDMA.
>
> ACS offers a range of security choices controlling how traffic is
> allowed to go directly between two devices. Some popular choices:
> - Full prevention
> - Translated requests can be direct, with various options
> - Asymetric direct traffic, A can reach B but not the reverse
> - All traffic can be direct
> Along with some other less common ones for special topologies.
>
> The intention is that this option would be used with expert knowledge
> of the HW capability and workload to achieve the desired
> configuration.
FWIW, this sounds good to me too. There certainly needed to be some
clarification that this controls the isolation of devices and IOMMU
groups are determined by aspects of that isolation rather than this
option directly and exclusively being used to configure grouping. I
think this does that. Thanks,
Alex
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V3] PCI: Extend ACS configurability
2024-06-13 22:05 ` Bjorn Helgaas
@ 2024-06-13 23:36 ` Jason Gunthorpe
0 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2024-06-13 23:36 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Vidya Sagar, corbet@lwn.net, bhelgaas@google.com, Gal Shalom,
Leon Romanovsky, Thierry Reding, Jon Hunter,
Masoud Moshref Javadi, Shahaf Shuler, Vikram Sethi,
Shanker Donthineni, Jiandi An, Tushar Dave,
linux-doc@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Krishna Thota, Manikanta Maddireddy,
sagar.tv@gmail.com, Joerg Roedel, Will Deacon, Robin Murphy,
iommu@lists.linux.dev, Alex Williamson
On Thu, Jun 13, 2024 at 05:05:20PM -0500, Bjorn Helgaas wrote:
> It's unfortunate that this requires so much expert knowledge to use,
> but I guess we don't really have a good alternative. The only way I
> can think of to help would be some kind of white paper or examples in
> Documentation/PCI/.
So far I am seeing the system supplier supply the appropriate
instructions.. It is already this way for set_pci, and yes, it is a
huge PITA.
At one point Steven Bates was talking about some ACPI tables to give
the OS more information but I don't think that went anywhere..
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH V4] PCI: Extend ACS configurability
2024-05-23 6:35 ` [PATCH V3] " Vidya Sagar
2024-05-23 14:59 ` Bjorn Helgaas
2024-06-12 12:19 ` Jason Gunthorpe
@ 2024-06-25 15:31 ` Vidya Sagar
2024-06-25 16:26 ` Lukas Wunner
` (4 more replies)
2 siblings, 5 replies; 28+ messages in thread
From: Vidya Sagar @ 2024-06-25 15:31 UTC (permalink / raw)
To: corbet, bhelgaas, galshalom, leonro, jgg, treding, jonathanh
Cc: mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, vidyas, sagar.tv
PCIe ACS settings control the level of isolation and the possible P2P
paths between devices. With greater isolation the kernel will create
smaller iommu_groups and with less isolation there is more HW that
can achieve P2P transfers. From a virtualization perspective all
devices in the same iommu_group must be assigned to the same VM as
they lack security isolation.
There is no way for the kernel to automatically know the correct
ACS settings for any given system and workload. Existing command line
options (ex:- disable_acs_redir) allow only for large scale change,
disabling all isolation, but this is not sufficient for more complex cases.
Add a kernel command-line option 'config_acs' to directly control all the
ACS bits for specific devices, which allows the operator to setup the
right level of isolation to achieve the desired P2P configuration.
The definition is future proof, when new ACS bits are added to the spec
the open syntax can be extended.
ACS needs to be setup early in the kernel boot as the ACS settings
effect how iommu_groups are formed. iommu_group formation is a one
time event during initial device discovery, changing ACS bits after
kernel boot can result in an inaccurate view of the iommu_groups
compared to the current isolation configuration.
ACS applies to PCIe Downstream Ports and multi-function devices.
The default ACS settings are strict and deny any direct traffic
between two functions. This results in the smallest iommu_group the
HW can support. Frequently these values result in slow or
non-working P2PDMA.
ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:
- Full prevention
- Translated requests can be direct, with various options
- Asymmetric direct traffic, A can reach B but not the reverse
- All traffic can be direct
Along with some other less common ones for special topologies.
The intention is that this option would be used with expert knowledge
of the HW capability and workload to achieve the desired
configuration.
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
v4:
* Changed commit message (Courtesy: Jason) to provide more details
v3:
* Fixed a documentation issue reported by kernel test bot
v2:
* Refactored the code as per Jason's suggestion
.../admin-guide/kernel-parameters.txt | 22 +++
drivers/pci/pci.c | 148 +++++++++++-------
2 files changed, 112 insertions(+), 58 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 500cfa776225..42d0f6fd40d0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4619,6 +4619,28 @@
bridges without forcing it upstream. Note:
this removes isolation between devices and
may put more devices in an IOMMU group.
+ config_acs=
+ Format:
+ =<ACS flags>@<pci_dev>[; ...]
+ Specify one or more PCI devices (in the format
+ specified above) optionally prepended with flags
+ and separated by semicolons. The respective
+ capabilities will be enabled, disabled or unchanged
+ based on what is specified in flags.
+ ACS Flags is defined as follows
+ bit-0 : ACS Source Validation
+ bit-1 : ACS Translation Blocking
+ bit-2 : ACS P2P Request Redirect
+ bit-3 : ACS P2P Completion Redirect
+ bit-4 : ACS Upstream Forwarding
+ bit-5 : ACS P2P Egress Control
+ bit-6 : ACS Direct Translated P2P
+ Each bit can be marked as
+ ‘0‘ – force disabled
+ ‘1’ – force enabled
+ ‘x’ – unchanged.
+ Note: this may remove isolation between devices
+ and may put more devices in an IOMMU group.
force_floating [S390] Force usage of floating interrupts.
nomio [S390] Do not use MIO instructions.
norid [S390] ignore the RID field and force use of
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 55078c70a05b..6661932afe59 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -946,30 +946,67 @@ void pci_request_acs(void)
}
static const char *disable_acs_redir_param;
+static const char *config_acs_param;
-/**
- * pci_disable_acs_redir - disable ACS redirect capabilities
- * @dev: the PCI device
- *
- * For only devices specified in the disable_acs_redir parameter.
- */
-static void pci_disable_acs_redir(struct pci_dev *dev)
+struct pci_acs {
+ u16 cap;
+ u16 ctrl;
+ u16 fw_ctrl;
+};
+
+static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
+ const char *p, u16 mask, u16 flags)
{
+ char *delimit;
int ret = 0;
- const char *p;
- int pos;
- u16 ctrl;
- if (!disable_acs_redir_param)
+ if (!p)
return;
- p = disable_acs_redir_param;
while (*p) {
+ if (!mask) {
+ /* Check for ACS flags */
+ delimit = strstr(p, "@");
+ if (delimit) {
+ int end;
+ u32 shift = 0;
+
+ end = delimit - p - 1;
+
+ while (end > -1) {
+ if (*(p + end) == '0') {
+ mask |= 1 << shift;
+ shift++;
+ end--;
+ } else if (*(p + end) == '1') {
+ mask |= 1 << shift;
+ flags |= 1 << shift;
+ shift++;
+ end--;
+ } else if ((*(p + end) == 'x') || (*(p + end) == 'X')) {
+ shift++;
+ end--;
+ } else {
+ pci_err(dev, "Invalid ACS flags... Ignoring\n");
+ return;
+ }
+ }
+ p = delimit + 1;
+ } else {
+ pci_err(dev, "ACS Flags missing\n");
+ return;
+ }
+ }
+
+ if (mask & ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
+ PCI_ACS_UF | PCI_ACS_EC | PCI_ACS_DT)) {
+ pci_err(dev, "Invalid ACS flags specified\n");
+ return;
+ }
+
ret = pci_dev_str_match(dev, p, &p);
if (ret < 0) {
- pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
- disable_acs_redir_param);
-
+ pr_info_once("PCI: Can't parse acs command line parameter\n");
break;
} else if (ret == 1) {
/* Found a match */
@@ -989,56 +1026,38 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
if (!pci_dev_specific_disable_acs_redir(dev))
return;
- pos = dev->acs_cap;
- if (!pos) {
- pci_warn(dev, "cannot disable ACS redirect for this hardware as it does not have ACS capabilities\n");
- return;
- }
-
- pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+ pci_dbg(dev, "ACS mask = 0x%X\n", mask);
+ pci_dbg(dev, "ACS flags = 0x%X\n", flags);
- /* P2P Request & Completion Redirect */
- ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
+ /* If mask is 0 then we copy the bit from the firmware setting. */
+ caps->ctrl = (caps->ctrl & ~mask) | (caps->fw_ctrl & mask);
+ caps->ctrl |= flags;
- pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
-
- pci_info(dev, "disabled ACS redirect\n");
+ pci_info(dev, "Configured ACS to 0x%x\n", caps->ctrl);
}
/**
* pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
* @dev: the PCI device
+ * @caps: default ACS controls
*/
-static void pci_std_enable_acs(struct pci_dev *dev)
+static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
{
- int pos;
- u16 cap;
- u16 ctrl;
-
- pos = dev->acs_cap;
- if (!pos)
- return;
-
- pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
- pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
-
/* Source Validation */
- ctrl |= (cap & PCI_ACS_SV);
+ caps->ctrl |= (caps->cap & PCI_ACS_SV);
/* P2P Request Redirect */
- ctrl |= (cap & PCI_ACS_RR);
+ caps->ctrl |= (caps->cap & PCI_ACS_RR);
/* P2P Completion Redirect */
- ctrl |= (cap & PCI_ACS_CR);
+ caps->ctrl |= (caps->cap & PCI_ACS_CR);
/* Upstream Forwarding */
- ctrl |= (cap & PCI_ACS_UF);
+ caps->ctrl |= (caps->cap & PCI_ACS_UF);
/* Enable Translation Blocking for external devices and noats */
if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
- ctrl |= (cap & PCI_ACS_TB);
-
- pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+ caps->ctrl |= (caps->cap & PCI_ACS_TB);
}
/**
@@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct pci_dev *dev)
*/
static void pci_enable_acs(struct pci_dev *dev)
{
- if (!pci_acs_enable)
- goto disable_acs_redir;
+ struct pci_acs caps;
+ int pos;
+
+ pos = dev->acs_cap;
+ if (!pos)
+ return;
- if (!pci_dev_specific_enable_acs(dev))
- goto disable_acs_redir;
+ pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
+ pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
+ caps.fw_ctrl = caps.ctrl;
- pci_std_enable_acs(dev);
+ /* If an iommu is present we start with kernel default caps */
+ if (pci_acs_enable) {
+ if (pci_dev_specific_enable_acs(dev))
+ pci_std_enable_acs(dev, &caps);
+ }
-disable_acs_redir:
/*
- * Note: pci_disable_acs_redir() must be called even if ACS was not
- * enabled by the kernel because it may have been enabled by
- * platform firmware. So if we are told to disable it, we should
- * always disable it after setting the kernel's default
- * preferences.
+ * Always apply caps from the command line, even if there is no iommu.
+ * Trust that the admin has a reason to change the ACS settings.
*/
- pci_disable_acs_redir(dev);
+ __pci_config_acs(dev, &caps, disable_acs_redir_param,
+ PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC,
+ ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
+ __pci_config_acs(dev, &caps, config_acs_param, 0, 0);
+
+ pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
}
/**
@@ -6740,6 +6769,8 @@ static int __init pci_setup(char *str)
pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
} else if (!strncmp(str, "disable_acs_redir=", 18)) {
disable_acs_redir_param = str + 18;
+ } else if (!strncmp(str, "config_acs=", 11)) {
+ config_acs_param = str + 11;
} else {
pr_err("PCI: Unknown option `%s'\n", str);
}
@@ -6764,6 +6795,7 @@ static int __init pci_realloc_setup_params(void)
resource_alignment_param = kstrdup(resource_alignment_param,
GFP_KERNEL);
disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
+ config_acs_param = kstrdup(config_acs_param, GFP_KERNEL);
return 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-06-25 15:31 ` [PATCH V4] " Vidya Sagar
@ 2024-06-25 16:26 ` Lukas Wunner
2024-06-25 16:39 ` Jason Gunthorpe
2024-06-26 6:02 ` Leon Romanovsky
2024-06-26 7:40 ` Tian, Kevin
` (3 subsequent siblings)
4 siblings, 2 replies; 28+ messages in thread
From: Lukas Wunner @ 2024-06-25 16:26 UTC (permalink / raw)
To: Vidya Sagar
Cc: corbet, bhelgaas, galshalom, leonro, jgg, treding, jonathanh,
mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv
On Tue, Jun 25, 2024 at 09:01:50PM +0530, Vidya Sagar wrote:
> Add a kernel command-line option 'config_acs' to directly control all the
> ACS bits for specific devices, which allows the operator to setup the
> right level of isolation to achieve the desired P2P configuration.
An example wouldn't hurt, here and in kernel-parameters.txt.
> ACS offers a range of security choices controlling how traffic is
> allowed to go directly between two devices. Some popular choices:
> - Full prevention
> - Translated requests can be direct, with various options
> - Asymmetric direct traffic, A can reach B but not the reverse
> - All traffic can be direct
> Along with some other less common ones for special topologies.
I'm wondering whether it would make more sense to let users choose
between those "higher-level" options, instead of giving direct access
to bits (and thus risking users to choose an incorrect setting).
Also, would it be possible to automatically change ACS settings
when enabling or disabling P2PDMA?
The representation chosen here (as a command line option) seems
questionable:
We're going to add more user-controllable options going forward.
E.g., when introducing IDE, we'll have to let user space choose
whether encryption should be enabled for certain PCIe devices.
That's because encryption isn't for free, so can't be enabled
opportunistically. (The number of crypto engines on a CPU is
limited and enabling encryption consumes energy.)
What about exposing such user configurable settings with sysctl?
The networking subsystem has per-interface sysctl settings,
we could have per-PCI-device settings.
So just like this...
net.ipv4.conf.default.arp_accept = 0
net.ipv4.conf.eth0.arp_accept = 0
net.ipv4.conf.eth1.arp_accept = 0
... we could have...
pci.0000:03:00.0.acs = full_prevention
pci.0000:03:00.0.ide = 1
pci.0000:03:01.0.acs = all_traffic
pci.0000:03:01.0.ide = 0
This isn't hard to do, just call register_sysctl() for each device
on enumeration and unregister_sysctl_table() on pci_destroy_dev().
Thanks,
Lukas
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-06-25 16:26 ` Lukas Wunner
@ 2024-06-25 16:39 ` Jason Gunthorpe
2024-06-26 6:02 ` Leon Romanovsky
1 sibling, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2024-06-25 16:39 UTC (permalink / raw)
To: Lukas Wunner
Cc: Vidya Sagar, corbet, bhelgaas, galshalom, leonro, treding,
jonathanh, mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave,
linux-doc, linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv
On Tue, Jun 25, 2024 at 06:26:00PM +0200, Lukas Wunner wrote:
> On Tue, Jun 25, 2024 at 09:01:50PM +0530, Vidya Sagar wrote:
> > Add a kernel command-line option 'config_acs' to directly control all the
> > ACS bits for specific devices, which allows the operator to setup the
> > right level of isolation to achieve the desired P2P configuration.
>
> An example wouldn't hurt, here and in kernel-parameters.txt.
>
>
> > ACS offers a range of security choices controlling how traffic is
> > allowed to go directly between two devices. Some popular choices:
> > - Full prevention
> > - Translated requests can be direct, with various options
> > - Asymmetric direct traffic, A can reach B but not the reverse
> > - All traffic can be direct
> > Along with some other less common ones for special topologies.
>
> I'm wondering whether it would make more sense to let users choose
> between those "higher-level" options, instead of giving direct access
> to bits (and thus risking users to choose an incorrect setting).
It doesn't seem like the kernel has enough information to do that, or
at least describing enough information in the command line would be
more complex than this.
> Also, would it be possible to automatically change ACS settings
> when enabling or disabling P2PDMA?
No, as the commit said the ACS settings are required at early boot
before iommu_groups are formed. They cannot be changed dynamically
with today's kernel.
> The representation chosen here (as a command line option) seems
> questionable:
>
> We're going to add more user-controllable options going forward.
> E.g., when introducing IDE, we'll have to let user space choose
> whether encryption should be enabled for certain PCIe devices.
> That's because encryption isn't for free, so can't be enabled
> opportunistically. (The number of crypto engines on a CPU is
> limited and enabling encryption consumes energy.)
But that isn't part of ACS, so what is wrong with having ACS its own
configurable and other PCI functions can do what is appropriate for
them?
I do encourage people to avoid using the kernel command line. ACS is
forced into that because of the iommu_group issue.
> What about exposing such user configurable settings with sysctl?
I think sysctl is mostly deprecated in favour of sysfs.
An ide file in the sysfs to control the IDE stuff makes alot of sense
to me.
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-06-25 16:26 ` Lukas Wunner
2024-06-25 16:39 ` Jason Gunthorpe
@ 2024-06-26 6:02 ` Leon Romanovsky
1 sibling, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2024-06-26 6:02 UTC (permalink / raw)
To: Lukas Wunner
Cc: Vidya Sagar, corbet, bhelgaas, galshalom, jgg, treding, jonathanh,
mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv
On Tue, Jun 25, 2024 at 06:26:00PM +0200, Lukas Wunner wrote:
> On Tue, Jun 25, 2024 at 09:01:50PM +0530, Vidya Sagar wrote:
> > Add a kernel command-line option 'config_acs' to directly control all the
> > ACS bits for specific devices, which allows the operator to setup the
> > right level of isolation to achieve the desired P2P configuration.
>
> An example wouldn't hurt, here and in kernel-parameters.txt.
>
>
> > ACS offers a range of security choices controlling how traffic is
> > allowed to go directly between two devices. Some popular choices:
> > - Full prevention
> > - Translated requests can be direct, with various options
> > - Asymmetric direct traffic, A can reach B but not the reverse
> > - All traffic can be direct
> > Along with some other less common ones for special topologies.
>
> I'm wondering whether it would make more sense to let users choose
> between those "higher-level" options, instead of giving direct access
> to bits (and thus risking users to choose an incorrect setting).
IMHO, with "higher-level" options will be much more complex to use than
simple ACS bits matrix. In any case, the user who will use this feature
will need to read PCI spec before.
In PCI v6, 13 bits are used for ACS with 8192 possible combinations and
it is unlikely to find small set of "definitions" that will fit all cases.
Thanks
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH V4] PCI: Extend ACS configurability
2024-06-25 15:31 ` [PATCH V4] " Vidya Sagar
2024-06-25 16:26 ` Lukas Wunner
@ 2024-06-26 7:40 ` Tian, Kevin
2024-06-26 11:50 ` Jason Gunthorpe
2024-07-08 14:39 ` Jason Gunthorpe
` (2 subsequent siblings)
4 siblings, 1 reply; 28+ messages in thread
From: Tian, Kevin @ 2024-06-26 7:40 UTC (permalink / raw)
To: Vidya Sagar, corbet@lwn.net, bhelgaas@google.com,
galshalom@nvidia.com, leonro@nvidia.com, jgg@nvidia.com,
treding@nvidia.com, jonathanh@nvidia.com
Cc: mmoshrefjava@nvidia.com, shahafs@nvidia.com, vsethi@nvidia.com,
sdonthineni@nvidia.com, jan@nvidia.com, Dave, Tushar,
linux-doc@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, kthota@nvidia.com,
mmaddireddy@nvidia.com, sagar.tv@gmail.com
> From: Vidya Sagar <vidyas@nvidia.com>
> Sent: Tuesday, June 25, 2024 11:32 PM
>
> PCIe ACS settings control the level of isolation and the possible P2P
> paths between devices. With greater isolation the kernel will create
> smaller iommu_groups and with less isolation there is more HW that
> can achieve P2P transfers. From a virtualization perspective all
> devices in the same iommu_group must be assigned to the same VM as
> they lack security isolation.
>
It'll be helpful to also call out the impact of losing other features (e.g. PASID)
with less isolation:
pci_enable_pasid()
{
...
if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF))
return -EINVAL;
...
}
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-06-26 7:40 ` Tian, Kevin
@ 2024-06-26 11:50 ` Jason Gunthorpe
0 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2024-06-26 11:50 UTC (permalink / raw)
To: Tian, Kevin
Cc: Vidya Sagar, corbet@lwn.net, bhelgaas@google.com,
galshalom@nvidia.com, leonro@nvidia.com, treding@nvidia.com,
jonathanh@nvidia.com, mmoshrefjava@nvidia.com, shahafs@nvidia.com,
vsethi@nvidia.com, sdonthineni@nvidia.com, jan@nvidia.com,
Dave, Tushar, linux-doc@vger.kernel.org,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
kthota@nvidia.com, mmaddireddy@nvidia.com, sagar.tv@gmail.com
On Wed, Jun 26, 2024 at 07:40:31AM +0000, Tian, Kevin wrote:
> > From: Vidya Sagar <vidyas@nvidia.com>
> > Sent: Tuesday, June 25, 2024 11:32 PM
> >
> > PCIe ACS settings control the level of isolation and the possible P2P
> > paths between devices. With greater isolation the kernel will create
> > smaller iommu_groups and with less isolation there is more HW that
> > can achieve P2P transfers. From a virtualization perspective all
> > devices in the same iommu_group must be assigned to the same VM as
> > they lack security isolation.
> >
>
> It'll be helpful to also call out the impact of losing other features (e.g. PASID)
> with less isolation:
>
> pci_enable_pasid()
> {
> ...
> if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF))
> return -EINVAL;
> ...
> }
Yeah, that is one of the considerations that might go into using
asymmetric ACS..
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-06-25 15:31 ` [PATCH V4] " Vidya Sagar
2024-06-25 16:26 ` Lukas Wunner
2024-06-26 7:40 ` Tian, Kevin
@ 2024-07-08 14:39 ` Jason Gunthorpe
2024-07-12 21:57 ` Bjorn Helgaas
2024-09-25 5:06 ` Jiri Slaby
4 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2024-07-08 14:39 UTC (permalink / raw)
To: Vidya Sagar, bhelgaas@google.com
Cc: corbet, bhelgaas, galshalom, leonro, treding, jonathanh,
mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv
On Tue, Jun 25, 2024 at 09:01:50PM +0530, Vidya Sagar wrote:
> PCIe ACS settings control the level of isolation and the possible P2P
> paths between devices. With greater isolation the kernel will create
> smaller iommu_groups and with less isolation there is more HW that
> can achieve P2P transfers. From a virtualization perspective all
> devices in the same iommu_group must be assigned to the same VM as
> they lack security isolation.
>
> There is no way for the kernel to automatically know the correct
> ACS settings for any given system and workload. Existing command line
> options (ex:- disable_acs_redir) allow only for large scale change,
> disabling all isolation, but this is not sufficient for more complex cases.
>
> Add a kernel command-line option 'config_acs' to directly control all the
> ACS bits for specific devices, which allows the operator to setup the
> right level of isolation to achieve the desired P2P configuration.
> The definition is future proof, when new ACS bits are added to the spec
> the open syntax can be extended.
>
> ACS needs to be setup early in the kernel boot as the ACS settings
> effect how iommu_groups are formed. iommu_group formation is a one
> time event during initial device discovery, changing ACS bits after
> kernel boot can result in an inaccurate view of the iommu_groups
> compared to the current isolation configuration.
>
> ACS applies to PCIe Downstream Ports and multi-function devices.
> The default ACS settings are strict and deny any direct traffic
> between two functions. This results in the smallest iommu_group the
> HW can support. Frequently these values result in slow or
> non-working P2PDMA.
>
> ACS offers a range of security choices controlling how traffic is
> allowed to go directly between two devices. Some popular choices:
> - Full prevention
> - Translated requests can be direct, with various options
> - Asymmetric direct traffic, A can reach B but not the reverse
> - All traffic can be direct
> Along with some other less common ones for special topologies.
>
> The intention is that this option would be used with expert knowledge
> of the HW capability and workload to achieve the desired
> configuration.
>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
> v4:
> * Changed commit message (Courtesy: Jason) to provide more details
>
> v3:
> * Fixed a documentation issue reported by kernel test bot
>
> v2:
> * Refactored the code as per Jason's suggestion
>
> .../admin-guide/kernel-parameters.txt | 22 +++
> drivers/pci/pci.c | 148 +++++++++++-------
> 2 files changed, 112 insertions(+), 58 deletions(-)
Bjorn?
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-06-25 15:31 ` [PATCH V4] " Vidya Sagar
` (2 preceding siblings ...)
2024-07-08 14:39 ` Jason Gunthorpe
@ 2024-07-12 21:57 ` Bjorn Helgaas
2024-09-25 5:06 ` Jiri Slaby
4 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2024-07-12 21:57 UTC (permalink / raw)
To: Vidya Sagar
Cc: corbet, bhelgaas, galshalom, leonro, jgg, treding, jonathanh,
mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv
On Tue, Jun 25, 2024 at 09:01:50PM +0530, Vidya Sagar wrote:
> PCIe ACS settings control the level of isolation and the possible P2P
> paths between devices. With greater isolation the kernel will create
> smaller iommu_groups and with less isolation there is more HW that
> can achieve P2P transfers. From a virtualization perspective all
> devices in the same iommu_group must be assigned to the same VM as
> they lack security isolation.
>
> There is no way for the kernel to automatically know the correct
> ACS settings for any given system and workload. Existing command line
> options (ex:- disable_acs_redir) allow only for large scale change,
> disabling all isolation, but this is not sufficient for more complex cases.
>
> Add a kernel command-line option 'config_acs' to directly control all the
> ACS bits for specific devices, which allows the operator to setup the
> right level of isolation to achieve the desired P2P configuration.
> The definition is future proof, when new ACS bits are added to the spec
> the open syntax can be extended.
>
> ACS needs to be setup early in the kernel boot as the ACS settings
> effect how iommu_groups are formed. iommu_group formation is a one
> time event during initial device discovery, changing ACS bits after
> kernel boot can result in an inaccurate view of the iommu_groups
> compared to the current isolation configuration.
>
> ACS applies to PCIe Downstream Ports and multi-function devices.
> The default ACS settings are strict and deny any direct traffic
> between two functions. This results in the smallest iommu_group the
> HW can support. Frequently these values result in slow or
> non-working P2PDMA.
>
> ACS offers a range of security choices controlling how traffic is
> allowed to go directly between two devices. Some popular choices:
> - Full prevention
> - Translated requests can be direct, with various options
> - Asymmetric direct traffic, A can reach B but not the reverse
> - All traffic can be direct
> Along with some other less common ones for special topologies.
>
> The intention is that this option would be used with expert knowledge
> of the HW capability and workload to achieve the desired
> configuration.
>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Applied with the tweaks below to pci/acs for v6.11, thanks!
I added an example to the doc; please check it to see if I interpreted
the doc correctly.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 42d0f6fd40d0..b2057241ea6c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4621,24 +4621,34 @@
may put more devices in an IOMMU group.
config_acs=
Format:
- =<ACS flags>@<pci_dev>[; ...]
+ <ACS flags>@<pci_dev>[; ...]
Specify one or more PCI devices (in the format
specified above) optionally prepended with flags
and separated by semicolons. The respective
- capabilities will be enabled, disabled or unchanged
- based on what is specified in flags.
- ACS Flags is defined as follows
- bit-0 : ACS Source Validation
- bit-1 : ACS Translation Blocking
- bit-2 : ACS P2P Request Redirect
- bit-3 : ACS P2P Completion Redirect
- bit-4 : ACS Upstream Forwarding
- bit-5 : ACS P2P Egress Control
- bit-6 : ACS Direct Translated P2P
- Each bit can be marked as
- ‘0‘ – force disabled
- ‘1’ – force enabled
- ‘x’ – unchanged.
+ capabilities will be enabled, disabled or
+ unchanged based on what is specified in
+ flags.
+
+ ACS Flags is defined as follows:
+ bit-0 : ACS Source Validation
+ bit-1 : ACS Translation Blocking
+ bit-2 : ACS P2P Request Redirect
+ bit-3 : ACS P2P Completion Redirect
+ bit-4 : ACS Upstream Forwarding
+ bit-5 : ACS P2P Egress Control
+ bit-6 : ACS Direct Translated P2P
+ Each bit can be marked as:
+ '0' – force disabled
+ '1' – force enabled
+ 'x' – unchanged
+ For example,
+ pci=config_acs=10x
+ would configure all devices that support
+ ACS to enable P2P Request Redirect, disable
+ Translation Blocking, and leave Source
+ Validation unchanged from whatever power-up
+ or firmware set it to.
+
Note: this may remove isolation between devices
and may put more devices in an IOMMU group.
force_floating [S390] Force usage of floating interrupts.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1afe650ce338..45d93101a08b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1006,7 +1006,7 @@ static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
ret = pci_dev_str_match(dev, p, &p);
if (ret < 0) {
- pr_info_once("PCI: Can't parse acs command line parameter\n");
+ pr_info_once("PCI: Can't parse ACS command line parameter\n");
break;
} else if (ret == 1) {
/* Found a match */
@@ -1026,14 +1026,14 @@ static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
if (!pci_dev_specific_disable_acs_redir(dev))
return;
- pci_dbg(dev, "ACS mask = 0x%X\n", mask);
- pci_dbg(dev, "ACS flags = 0x%X\n", flags);
+ pci_dbg(dev, "ACS mask = %#06x\n", mask);
+ pci_dbg(dev, "ACS flags = %#06x\n", flags);
/* If mask is 0 then we copy the bit from the firmware setting. */
caps->ctrl = (caps->ctrl & ~mask) | (caps->fw_ctrl & mask);
caps->ctrl |= flags;
- pci_info(dev, "Configured ACS to 0x%x\n", caps->ctrl);
+ pci_info(dev, "Configured ACS to %#06x\n", caps->ctrl);
}
/**
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-06-25 15:31 ` [PATCH V4] " Vidya Sagar
` (3 preceding siblings ...)
2024-07-12 21:57 ` Bjorn Helgaas
@ 2024-09-25 5:06 ` Jiri Slaby
2024-09-25 5:29 ` Jiri Slaby
2024-10-07 20:43 ` Bjorn Helgaas
4 siblings, 2 replies; 28+ messages in thread
From: Jiri Slaby @ 2024-09-25 5:06 UTC (permalink / raw)
To: Vidya Sagar, corbet, bhelgaas, galshalom, leonro, jgg, treding,
jonathanh
Cc: mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv
On 25. 06. 24, 17:31, Vidya Sagar wrote:
> PCIe ACS settings control the level of isolation and the possible P2P
> paths between devices. With greater isolation the kernel will create
> smaller iommu_groups and with less isolation there is more HW that
> can achieve P2P transfers. From a virtualization perspective all
> devices in the same iommu_group must be assigned to the same VM as
> they lack security isolation.
>
> There is no way for the kernel to automatically know the correct
> ACS settings for any given system and workload. Existing command line
> options (ex:- disable_acs_redir) allow only for large scale change,
> disabling all isolation, but this is not sufficient for more complex cases.
>
> Add a kernel command-line option 'config_acs' to directly control all the
> ACS bits for specific devices, which allows the operator to setup the
> right level of isolation to achieve the desired P2P configuration.
> The definition is future proof, when new ACS bits are added to the spec
> the open syntax can be extended.
>
> ACS needs to be setup early in the kernel boot as the ACS settings
> effect how iommu_groups are formed. iommu_group formation is a one
> time event during initial device discovery, changing ACS bits after
> kernel boot can result in an inaccurate view of the iommu_groups
> compared to the current isolation configuration.
>
> ACS applies to PCIe Downstream Ports and multi-function devices.
> The default ACS settings are strict and deny any direct traffic
> between two functions. This results in the smallest iommu_group the
> HW can support. Frequently these values result in slow or
> non-working P2PDMA.
>
> ACS offers a range of security choices controlling how traffic is
> allowed to go directly between two devices. Some popular choices:
> - Full prevention
> - Translated requests can be direct, with various options
> - Asymmetric direct traffic, A can reach B but not the reverse
> - All traffic can be direct
> Along with some other less common ones for special topologies.
>
> The intention is that this option would be used with expert knowledge
> of the HW capability and workload to achieve the desired
> configuration.
Hi,
this breaks ACS on some platforms (in 6.11). See:
https://bugzilla.suse.com/show_bug.cgi?id=1229019
When starting a KVM:
> "Error starting domain: internal error: QEMU unexpectedly closed the monitor (vm=‘win10’): qxl_send_events: spice-server bug: guest stopped, ignoring
> 2024-08-08T01:45:51.824474Z qemu-system-x86_64: -device {“driver”:“vfio-pci”,“host”:“0000:0b:00.0”,“id”:“hostdev0”,“bus”:“pci.2”,“addr”:“0x0”}: vfio 0000:0b:00.0: group 83 is not viable
> Please ensure all devices within the iommu_group are bound to their vfio bus driver."
We backported the commit to 6.4, there they see:
> The good kernel 6.4.0-150600.23.14 log has this:
>
> [ 0.618918] pci 0000:00:1c.0: Intel PCH root port ACS workaround enabled
> [ 0.619153] pci 0000:00:1c.4: Intel PCH root port ACS workaround enabled
>
> the bad one 6.4.0-150600.23.22 does not.
But the same difference is with 6.10 vs 6.11.
Any ideas? It looks like workarounds are not applied anymore.
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
> v4:
> * Changed commit message (Courtesy: Jason) to provide more details
>
> v3:
> * Fixed a documentation issue reported by kernel test bot
>
> v2:
> * Refactored the code as per Jason's suggestion
>
> .../admin-guide/kernel-parameters.txt | 22 +++
> drivers/pci/pci.c | 148 +++++++++++-------
> 2 files changed, 112 insertions(+), 58 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 500cfa776225..42d0f6fd40d0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4619,6 +4619,28 @@
> bridges without forcing it upstream. Note:
> this removes isolation between devices and
> may put more devices in an IOMMU group.
> + config_acs=
> + Format:
> + =<ACS flags>@<pci_dev>[; ...]
> + Specify one or more PCI devices (in the format
> + specified above) optionally prepended with flags
> + and separated by semicolons. The respective
> + capabilities will be enabled, disabled or unchanged
> + based on what is specified in flags.
> + ACS Flags is defined as follows
> + bit-0 : ACS Source Validation
> + bit-1 : ACS Translation Blocking
> + bit-2 : ACS P2P Request Redirect
> + bit-3 : ACS P2P Completion Redirect
> + bit-4 : ACS Upstream Forwarding
> + bit-5 : ACS P2P Egress Control
> + bit-6 : ACS Direct Translated P2P
> + Each bit can be marked as
> + ‘0‘ – force disabled
> + ‘1’ – force enabled
> + ‘x’ – unchanged.
> + Note: this may remove isolation between devices
> + and may put more devices in an IOMMU group.
> force_floating [S390] Force usage of floating interrupts.
> nomio [S390] Do not use MIO instructions.
> norid [S390] ignore the RID field and force use of
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 55078c70a05b..6661932afe59 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -946,30 +946,67 @@ void pci_request_acs(void)
> }
>
> static const char *disable_acs_redir_param;
> +static const char *config_acs_param;
>
> -/**
> - * pci_disable_acs_redir - disable ACS redirect capabilities
> - * @dev: the PCI device
> - *
> - * For only devices specified in the disable_acs_redir parameter.
> - */
> -static void pci_disable_acs_redir(struct pci_dev *dev)
> +struct pci_acs {
> + u16 cap;
> + u16 ctrl;
> + u16 fw_ctrl;
> +};
> +
> +static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
> + const char *p, u16 mask, u16 flags)
> {
> + char *delimit;
> int ret = 0;
> - const char *p;
> - int pos;
> - u16 ctrl;
>
> - if (!disable_acs_redir_param)
> + if (!p)
> return;
>
> - p = disable_acs_redir_param;
> while (*p) {
> + if (!mask) {
> + /* Check for ACS flags */
> + delimit = strstr(p, "@");
> + if (delimit) {
> + int end;
> + u32 shift = 0;
> +
> + end = delimit - p - 1;
> +
> + while (end > -1) {
> + if (*(p + end) == '0') {
> + mask |= 1 << shift;
> + shift++;
> + end--;
> + } else if (*(p + end) == '1') {
> + mask |= 1 << shift;
> + flags |= 1 << shift;
> + shift++;
> + end--;
> + } else if ((*(p + end) == 'x') || (*(p + end) == 'X')) {
> + shift++;
> + end--;
> + } else {
> + pci_err(dev, "Invalid ACS flags... Ignoring\n");
> + return;
> + }
> + }
> + p = delimit + 1;
> + } else {
> + pci_err(dev, "ACS Flags missing\n");
> + return;
> + }
> + }
> +
> + if (mask & ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> + PCI_ACS_UF | PCI_ACS_EC | PCI_ACS_DT)) {
> + pci_err(dev, "Invalid ACS flags specified\n");
> + return;
> + }
> +
> ret = pci_dev_str_match(dev, p, &p);
> if (ret < 0) {
> - pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
> - disable_acs_redir_param);
> -
> + pr_info_once("PCI: Can't parse acs command line parameter\n");
> break;
> } else if (ret == 1) {
> /* Found a match */
> @@ -989,56 +1026,38 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
> if (!pci_dev_specific_disable_acs_redir(dev))
> return;
>
> - pos = dev->acs_cap;
> - if (!pos) {
> - pci_warn(dev, "cannot disable ACS redirect for this hardware as it does not have ACS capabilities\n");
> - return;
> - }
> -
> - pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> + pci_dbg(dev, "ACS mask = 0x%X\n", mask);
> + pci_dbg(dev, "ACS flags = 0x%X\n", flags);
>
> - /* P2P Request & Completion Redirect */
> - ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
> + /* If mask is 0 then we copy the bit from the firmware setting. */
> + caps->ctrl = (caps->ctrl & ~mask) | (caps->fw_ctrl & mask);
> + caps->ctrl |= flags;
>
> - pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> -
> - pci_info(dev, "disabled ACS redirect\n");
> + pci_info(dev, "Configured ACS to 0x%x\n", caps->ctrl);
> }
>
> /**
> * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
> * @dev: the PCI device
> + * @caps: default ACS controls
> */
> -static void pci_std_enable_acs(struct pci_dev *dev)
> +static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
> {
> - int pos;
> - u16 cap;
> - u16 ctrl;
> -
> - pos = dev->acs_cap;
> - if (!pos)
> - return;
> -
> - pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
> - pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> -
> /* Source Validation */
> - ctrl |= (cap & PCI_ACS_SV);
> + caps->ctrl |= (caps->cap & PCI_ACS_SV);
>
> /* P2P Request Redirect */
> - ctrl |= (cap & PCI_ACS_RR);
> + caps->ctrl |= (caps->cap & PCI_ACS_RR);
>
> /* P2P Completion Redirect */
> - ctrl |= (cap & PCI_ACS_CR);
> + caps->ctrl |= (caps->cap & PCI_ACS_CR);
>
> /* Upstream Forwarding */
> - ctrl |= (cap & PCI_ACS_UF);
> + caps->ctrl |= (caps->cap & PCI_ACS_UF);
>
> /* Enable Translation Blocking for external devices and noats */
> if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
> - ctrl |= (cap & PCI_ACS_TB);
> -
> - pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> + caps->ctrl |= (caps->cap & PCI_ACS_TB);
> }
>
> /**
> @@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct pci_dev *dev)
> */
> static void pci_enable_acs(struct pci_dev *dev)
> {
> - if (!pci_acs_enable)
> - goto disable_acs_redir;
> + struct pci_acs caps;
> + int pos;
> +
> + pos = dev->acs_cap;
> + if (!pos)
> + return;
>
> - if (!pci_dev_specific_enable_acs(dev))
> - goto disable_acs_redir;
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
> + caps.fw_ctrl = caps.ctrl;
>
> - pci_std_enable_acs(dev);
> + /* If an iommu is present we start with kernel default caps */
> + if (pci_acs_enable) {
> + if (pci_dev_specific_enable_acs(dev))
> + pci_std_enable_acs(dev, &caps);
> + }
>
> -disable_acs_redir:
> /*
> - * Note: pci_disable_acs_redir() must be called even if ACS was not
> - * enabled by the kernel because it may have been enabled by
> - * platform firmware. So if we are told to disable it, we should
> - * always disable it after setting the kernel's default
> - * preferences.
> + * Always apply caps from the command line, even if there is no iommu.
> + * Trust that the admin has a reason to change the ACS settings.
> */
> - pci_disable_acs_redir(dev);
> + __pci_config_acs(dev, &caps, disable_acs_redir_param,
> + PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC,
> + ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
> + __pci_config_acs(dev, &caps, config_acs_param, 0, 0);
> +
> + pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
> }
>
> /**
> @@ -6740,6 +6769,8 @@ static int __init pci_setup(char *str)
> pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
> } else if (!strncmp(str, "disable_acs_redir=", 18)) {
> disable_acs_redir_param = str + 18;
> + } else if (!strncmp(str, "config_acs=", 11)) {
> + config_acs_param = str + 11;
> } else {
> pr_err("PCI: Unknown option `%s'\n", str);
> }
> @@ -6764,6 +6795,7 @@ static int __init pci_realloc_setup_params(void)
> resource_alignment_param = kstrdup(resource_alignment_param,
> GFP_KERNEL);
> disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
> + config_acs_param = kstrdup(config_acs_param, GFP_KERNEL);
>
> return 0;
> }
--
js
suse labs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-09-25 5:06 ` Jiri Slaby
@ 2024-09-25 5:29 ` Jiri Slaby
2024-09-25 5:49 ` Jiri Slaby
2024-10-07 20:43 ` Bjorn Helgaas
1 sibling, 1 reply; 28+ messages in thread
From: Jiri Slaby @ 2024-09-25 5:29 UTC (permalink / raw)
To: Vidya Sagar, corbet, bhelgaas, galshalom, leonro, jgg, treding,
jonathanh
Cc: mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv,
vliaskovitis
On 25. 09. 24, 7:06, Jiri Slaby wrote:
>> @@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct pci_dev
>> *dev)
>> */
>> static void pci_enable_acs(struct pci_dev *dev)
>> {
>> - if (!pci_acs_enable)
>> - goto disable_acs_redir;
>> + struct pci_acs caps;
>> + int pos;
>> +
>> + pos = dev->acs_cap;
>> + if (!pos)
>> + return;
>> - if (!pci_dev_specific_enable_acs(dev))
>> - goto disable_acs_redir;
>> + pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
>> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
>> + caps.fw_ctrl = caps.ctrl;
>> - pci_std_enable_acs(dev);
>> + /* If an iommu is present we start with kernel default caps */
>> + if (pci_acs_enable) {
AFAIU pci_acs_enable is set from iommus' code via pci_request_acs().
Which is much later than when bridges are initialized here, right?
>> + if (pci_dev_specific_enable_acs(dev))
>> + pci_std_enable_acs(dev, &caps);
So this is never called, IMO.
>> + }
thanks,
--
js
suse labs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-09-25 5:29 ` Jiri Slaby
@ 2024-09-25 5:49 ` Jiri Slaby
2024-10-01 19:33 ` Jason Gunthorpe
0 siblings, 1 reply; 28+ messages in thread
From: Jiri Slaby @ 2024-09-25 5:49 UTC (permalink / raw)
To: Vidya Sagar, corbet, bhelgaas, galshalom, leonro, jgg, treding,
jonathanh
Cc: mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave, linux-doc,
linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv,
vliaskovitis
On 25. 09. 24, 7:29, Jiri Slaby wrote:
> On 25. 09. 24, 7:06, Jiri Slaby wrote:
>>> @@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct pci_dev
>>> *dev)
>>> */
>>> static void pci_enable_acs(struct pci_dev *dev)
>>> {
>>> - if (!pci_acs_enable)
>>> - goto disable_acs_redir;
>>> + struct pci_acs caps;
>>> + int pos;
>>> +
>>> + pos = dev->acs_cap;
>>> + if (!pos)
>>> + return;
Ignore the previous post.
The bridge has no ACS (see lspci below). So it used to be enabled by
pci_quirk_enable_intel_pch_acs() by another registers. But the "if
(!pos)" does not let it run now.
I am not sure how to fix this as we cannot have "caps" from these
quirks, so that whole idea of __pci_config_acs() is nonworking for these
quirks.
00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI
Express Root Port #1 (rev d5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 36
NUMA node: 0
Bus: primary=00, secondary=08, subordinate=08, sec-latency=0
I/O behind bridge: 00001000-00001fff [size=4K]
Memory behind bridge: f3600000-f37fffff [size=2M]
Prefetchable memory behind bridge: 00000000f3800000-00000000f39fffff
[size=2M]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s
<1us, L1 <4us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Via
WAKE# ARIFwd-
AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF
Disabled ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00358 Data: 0000
Capabilities: [90] Subsystem: Dell Device 0618
Capabilities: [a0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: pcieport
>>> - if (!pci_dev_specific_enable_acs(dev))
>>> - goto disable_acs_redir;
>>> + pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
>>> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
>>> + caps.fw_ctrl = caps.ctrl;
>>> - pci_std_enable_acs(dev);
>>> + /* If an iommu is present we start with kernel default caps */
>>> + if (pci_acs_enable) {
>
> AFAIU pci_acs_enable is set from iommus' code via pci_request_acs().
> Which is much later than when bridges are initialized here, right?
--
js
suse labs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-09-25 5:49 ` Jiri Slaby
@ 2024-10-01 19:33 ` Jason Gunthorpe
2024-10-07 16:36 ` Steffen Dirkwinkel
0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 19:33 UTC (permalink / raw)
To: Jiri Slaby
Cc: Vidya Sagar, corbet, bhelgaas, galshalom, leonro, treding,
jonathanh, mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave,
linux-doc, linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv,
vliaskovitis
On Wed, Sep 25, 2024 at 07:49:59AM +0200, Jiri Slaby wrote:
> On 25. 09. 24, 7:29, Jiri Slaby wrote:
> > On 25. 09. 24, 7:06, Jiri Slaby wrote:
> > > > @@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct
> > > > pci_dev *dev)
> > > > */
> > > > static void pci_enable_acs(struct pci_dev *dev)
> > > > {
> > > > - if (!pci_acs_enable)
> > > > - goto disable_acs_redir;
> > > > + struct pci_acs caps;
> > > > + int pos;
> > > > +
> > > > + pos = dev->acs_cap;
> > > > + if (!pos)
> > > > + return;
>
> Ignore the previous post.
>
> The bridge has no ACS (see lspci below). So it used to be enabled by
> pci_quirk_enable_intel_pch_acs() by another registers.
Er, Ok, so it overrides the whole thing with
pci_dev_specific_acs_enabled() too..
> I am not sure how to fix this as we cannot have "caps" from these quirks, so
> that whole idea of __pci_config_acs() is nonworking for these quirks.
We just need to allow the quirk to run before we try to do anything
with the cap, which has probably always been a NOP anyhow.
Maybe like this?
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7d85c04fbba2ae..225a6cd2e9ca3b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1067,8 +1067,15 @@ static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
static void pci_enable_acs(struct pci_dev *dev)
{
struct pci_acs caps;
+ bool enable_acs = false;
int pos;
+ /* If an iommu is present we start with kernel default caps */
+ if (pci_acs_enable) {
+ if (pci_dev_specific_enable_acs(dev))
+ enable_acs = true;
+ }
+
pos = dev->acs_cap;
if (!pos)
return;
@@ -1077,11 +1084,8 @@ static void pci_enable_acs(struct pci_dev *dev)
pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
caps.fw_ctrl = caps.ctrl;
- /* If an iommu is present we start with kernel default caps */
- if (pci_acs_enable) {
- if (pci_dev_specific_enable_acs(dev))
- pci_std_enable_acs(dev, &caps);
- }
+ if (enable_acs)
+ pci_std_enable_acs(dev, &caps);
/*
* Always apply caps from the command line, even if there is no iommu.
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-10-01 19:33 ` Jason Gunthorpe
@ 2024-10-07 16:36 ` Steffen Dirkwinkel
0 siblings, 0 replies; 28+ messages in thread
From: Steffen Dirkwinkel @ 2024-10-07 16:36 UTC (permalink / raw)
To: Jason Gunthorpe, Jiri Slaby
Cc: Vidya Sagar, corbet, bhelgaas, galshalom, leonro, treding,
jonathanh, mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave,
linux-doc, linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv,
vliaskovitis
On Tue, 2024-10-01 at 16:33 -0300, Jason Gunthorpe wrote:
> On Wed, Sep 25, 2024 at 07:49:59AM +0200, Jiri Slaby wrote:
> > On 25. 09. 24, 7:29, Jiri Slaby wrote:
> > > On 25. 09. 24, 7:06, Jiri Slaby wrote:
> > > > > @@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct
> > > > > pci_dev *dev)
> > > > > */
> > > > > static void pci_enable_acs(struct pci_dev *dev)
> > > > > {
> > > > > - if (!pci_acs_enable)
> > > > > - goto disable_acs_redir;
> > > > > + struct pci_acs caps;
> > > > > + int pos;
> > > > > +
> > > > > + pos = dev->acs_cap;
> > > > > + if (!pos)
> > > > > + return;
> >
> > Ignore the previous post.
> >
> > The bridge has no ACS (see lspci below). So it used to be enabled
> > by
> > pci_quirk_enable_intel_pch_acs() by another registers.
>
> Er, Ok, so it overrides the whole thing with
> pci_dev_specific_acs_enabled() too..
>
> > I am not sure how to fix this as we cannot have "caps" from these
> > quirks, so
> > that whole idea of __pci_config_acs() is nonworking for these
> > quirks.
>
> We just need to allow the quirk to run before we try to do anything
> with the cap, which has probably always been a NOP anyhow.
>
> Maybe like this?
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7d85c04fbba2ae..225a6cd2e9ca3b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1067,8 +1067,15 @@ static void pci_std_enable_acs(struct pci_dev
> *dev, struct pci_acs *caps)
> static void pci_enable_acs(struct pci_dev *dev)
> {
> struct pci_acs caps;
> + bool enable_acs = false;
> int pos;
>
> + /* If an iommu is present we start with kernel default caps
> */
> + if (pci_acs_enable) {
> + if (pci_dev_specific_enable_acs(dev))
> + enable_acs = true;
> + }
> +
> pos = dev->acs_cap;
> if (!pos)
> return;
> @@ -1077,11 +1084,8 @@ static void pci_enable_acs(struct pci_dev
> *dev)
> pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
> caps.fw_ctrl = caps.ctrl;
>
> - /* If an iommu is present we start with kernel default caps
> */
> - if (pci_acs_enable) {
> - if (pci_dev_specific_enable_acs(dev))
> - pci_std_enable_acs(dev, &caps);
> - }
> + if (enable_acs)
> + pci_std_enable_acs(dev, &caps);
>
> /*
> * Always apply caps from the command line, even if there is
> no iommu.
Hi,
I just ran into this issue (fewer iommu groups starting with 6.11).
Both reverting the original patch or applying your suggestion worked
for me.
Thanks
Steffen
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH V4] PCI: Extend ACS configurability
2024-09-25 5:06 ` Jiri Slaby
2024-09-25 5:29 ` Jiri Slaby
@ 2024-10-07 20:43 ` Bjorn Helgaas
1 sibling, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2024-10-07 20:43 UTC (permalink / raw)
To: Jiri Slaby
Cc: Vidya Sagar, corbet, bhelgaas, galshalom, leonro, jgg, treding,
jonathanh, mmoshrefjava, shahafs, vsethi, sdonthineni, jan, tdave,
linux-doc, linux-pci, linux-kernel, kthota, mmaddireddy, sagar.tv
On Wed, Sep 25, 2024 at 07:06:41AM +0200, Jiri Slaby wrote:
> On 25. 06. 24, 17:31, Vidya Sagar wrote:
> > PCIe ACS settings control the level of isolation and the possible P2P
> > paths between devices. With greater isolation the kernel will create
> > smaller iommu_groups and with less isolation there is more HW that
> > can achieve P2P transfers. From a virtualization perspective all
> > devices in the same iommu_group must be assigned to the same VM as
> > they lack security isolation.
> >
> > There is no way for the kernel to automatically know the correct
> > ACS settings for any given system and workload. Existing command line
> > options (ex:- disable_acs_redir) allow only for large scale change,
> > disabling all isolation, but this is not sufficient for more complex cases.
> >
> > Add a kernel command-line option 'config_acs' to directly control all the
> > ACS bits for specific devices, which allows the operator to setup the
> > right level of isolation to achieve the desired P2P configuration.
> > The definition is future proof, when new ACS bits are added to the spec
> > the open syntax can be extended.
> >
> > ACS needs to be setup early in the kernel boot as the ACS settings
> > effect how iommu_groups are formed. iommu_group formation is a one
> > time event during initial device discovery, changing ACS bits after
> > kernel boot can result in an inaccurate view of the iommu_groups
> > compared to the current isolation configuration.
> >
> > ACS applies to PCIe Downstream Ports and multi-function devices.
> > The default ACS settings are strict and deny any direct traffic
> > between two functions. This results in the smallest iommu_group the
> > HW can support. Frequently these values result in slow or
> > non-working P2PDMA.
> >
> > ACS offers a range of security choices controlling how traffic is
> > allowed to go directly between two devices. Some popular choices:
> > - Full prevention
> > - Translated requests can be direct, with various options
> > - Asymmetric direct traffic, A can reach B but not the reverse
> > - All traffic can be direct
> > Along with some other less common ones for special topologies.
> >
> > The intention is that this option would be used with expert knowledge
> > of the HW capability and workload to achieve the desired
> > configuration.
>
> Hi,
>
> this breaks ACS on some platforms (in 6.11). See:
> https://bugzilla.suse.com/show_bug.cgi?id=1229019
#regzbot introduced: 47c8846a ("PCI: Extend ACS configurability")
#regzbot link: https://bugzilla.suse.com/show_bug.cgi?id=1229019
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2024-10-07 20:43 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-21 11:09 [PATCH V2] PCI: Extend ACS configurability Vidya Sagar
2024-05-21 15:44 ` kernel test robot
2024-05-23 6:35 ` [PATCH V3] " Vidya Sagar
2024-05-23 14:59 ` Bjorn Helgaas
2024-05-23 15:16 ` Jason Gunthorpe
2024-06-03 7:50 ` Vidya Sagar
2024-06-07 19:30 ` Bjorn Helgaas
2024-06-10 11:38 ` Jason Gunthorpe
2024-06-12 21:29 ` Bjorn Helgaas
2024-06-12 23:23 ` Jason Gunthorpe
2024-06-13 22:05 ` Bjorn Helgaas
2024-06-13 23:36 ` Jason Gunthorpe
2024-06-13 22:38 ` Alex Williamson
2024-06-12 12:19 ` Jason Gunthorpe
2024-06-25 15:31 ` [PATCH V4] " Vidya Sagar
2024-06-25 16:26 ` Lukas Wunner
2024-06-25 16:39 ` Jason Gunthorpe
2024-06-26 6:02 ` Leon Romanovsky
2024-06-26 7:40 ` Tian, Kevin
2024-06-26 11:50 ` Jason Gunthorpe
2024-07-08 14:39 ` Jason Gunthorpe
2024-07-12 21:57 ` Bjorn Helgaas
2024-09-25 5:06 ` Jiri Slaby
2024-09-25 5:29 ` Jiri Slaby
2024-09-25 5:49 ` Jiri Slaby
2024-10-01 19:33 ` Jason Gunthorpe
2024-10-07 16:36 ` Steffen Dirkwinkel
2024-10-07 20:43 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).