* Re: [PATCH v6 00/11] PCI: Drop duplicated tracking of a pci_dev's bound driver
From: Bjorn Helgaas @ 2021-10-12 23:32 UTC (permalink / raw)
To: Uwe Kleine-König
Cc: Giovanni Cabiddu, Mark Rutland, Sathya Prakash,
Alexander Shishkin, linux-pci, Alexander Duyck, x86, qat-linux,
oss-drivers, Oliver O'Halloran, H. Peter Anvin, Jiri Olsa,
Thomas Gleixner, Marco Chiappero, Stefano Stabellini, Herbert Xu,
linux-scsi, Rafał Miłecki, Jesse Brandeburg,
Peter Zijlstra, Ingo Molnar, linux-wireless, Jakub Kicinski,
Yisen Zhuang, Suganath Prabu Subramani, Fiona Trahe,
Andrew Donnellan, Arnd Bergmann, Konrad Rzeszutek Wilk,
Ido Schimmel, Simon Horman, linuxppc-dev,
Arnaldo Carvalho de Melo, Jack Xu, Borislav Petkov,
Michael Buesch, Jiri Pirko, Bjorn Helgaas, Namhyung Kim,
Boris Ostrovsky, Andy Shevchenko, Juergen Gross, Salil Mehta,
Sreekanth Reddy, xen-devel, Vadym Kochan, MPT-FusionLinux.pdl,
Greg Kroah-Hartman, linux-usb, Wojciech Ziemba, linux-kernel,
Mathias Nyman, Zhou Wang, linux-crypto, kernel, netdev,
Frederic Barrat, Paul Mackerras, Tomaszx Kowalik, Taras Chornyi,
David S. Miller, linux-perf-users
In-Reply-To: <20211004125935.2300113-1-u.kleine-koenig@pengutronix.de>
On Mon, Oct 04, 2021 at 02:59:24PM +0200, Uwe Kleine-König wrote:
> Hello,
>
> this is v6 of the quest to drop the "driver" member from struct pci_dev
> which tracks the same data (apart from a constant offset) as dev.driver.
I like this a lot and applied it to pci/driver for v5.16, thanks!
I split some of the bigger patches apart so they only touched one
driver or subsystem at a time. I also updated to_pci_driver() so it
returns NULL when given NULL, which makes some of the validations
quite a bit simpler, especially in the PM code in pci-driver.c.
Full interdiff from this v6 series:
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index deaaef6efe34..36e84d904260 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -80,17 +80,15 @@ static struct resource video_rom_resource = {
*/
static bool match_id(struct pci_dev *pdev, unsigned short vendor, unsigned short device)
{
+ struct pci_driver *drv = to_pci_driver(pdev->dev.driver);
const struct pci_device_id *id;
if (pdev->vendor == vendor && pdev->device == device)
return true;
- if (pdev->dev.driver) {
- struct pci_driver *drv = to_pci_driver(pdev->dev.driver);
- for (id = drv->id_table; id && id->vendor; id++)
- if (id->vendor == vendor && id->device == device)
- break;
- }
+ for (id = drv ? drv->id_table : NULL; id && id->vendor; id++)
+ if (id->vendor == vendor && id->device == device)
+ break;
return id && id->vendor;
}
diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
index d997c9c3ebb5..7eb3706cf42d 100644
--- a/drivers/misc/cxl/guest.c
+++ b/drivers/misc/cxl/guest.c
@@ -20,38 +20,38 @@ static void pci_error_handlers(struct cxl_afu *afu,
pci_channel_state_t state)
{
struct pci_dev *afu_dev;
+ struct pci_driver *afu_drv;
+ struct pci_error_handlers *err_handler;
if (afu->phb == NULL)
return;
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
- struct pci_driver *afu_drv;
-
- if (!afu_dev->dev.driver)
- continue;
-
afu_drv = to_pci_driver(afu_dev->dev.driver);
+ if (!afu_drv)
+ continue;
+ err_handler = afu_drv->err_handler;
switch (bus_error_event) {
case CXL_ERROR_DETECTED_EVENT:
afu_dev->error_state = state;
- if (afu_drv->err_handler &&
- afu_drv->err_handler->error_detected)
- afu_drv->err_handler->error_detected(afu_dev, state);
- break;
+ if (err_handler &&
+ err_handler->error_detected)
+ err_handler->error_detected(afu_dev, state);
+ break;
case CXL_SLOT_RESET_EVENT:
afu_dev->error_state = state;
- if (afu_drv->err_handler &&
- afu_drv->err_handler->slot_reset)
- afu_drv->err_handler->slot_reset(afu_dev);
- break;
+ if (err_handler &&
+ err_handler->slot_reset)
+ err_handler->slot_reset(afu_dev);
+ break;
case CXL_RESUME_EVENT:
- if (afu_drv->err_handler &&
- afu_drv->err_handler->resume)
- afu_drv->err_handler->resume(afu_dev);
- break;
+ if (err_handler &&
+ err_handler->resume)
+ err_handler->resume(afu_dev);
+ break;
}
}
}
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 7e7545d01e27..08bd81854101 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1795,6 +1795,8 @@ static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu,
pci_channel_state_t state)
{
struct pci_dev *afu_dev;
+ struct pci_driver *afu_drv;
+ struct pci_error_handlers *err_handler;
pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET;
pci_ers_result_t afu_result = PCI_ERS_RESULT_NEED_RESET;
@@ -1805,16 +1807,16 @@ static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu,
return result;
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
- struct pci_driver *afu_drv;
- if (!afu_dev->dev.driver)
- continue;
-
afu_drv = to_pci_driver(afu_dev->dev.driver);
+ if (!afu_drv)
+ continue;
afu_dev->error_state = state;
- if (afu_drv->err_handler)
- afu_result = afu_drv->err_handler->error_detected(afu_dev, state);
+ err_handler = afu_drv->err_handler;
+ if (err_handler)
+ afu_result = err_handler->error_detected(afu_dev,
+ state);
/* Disconnect trumps all, NONE trumps NEED_RESET */
if (afu_result == PCI_ERS_RESULT_DISCONNECT)
result = PCI_ERS_RESULT_DISCONNECT;
@@ -1974,6 +1976,8 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev)
struct cxl_afu *afu;
struct cxl_context *ctx;
struct pci_dev *afu_dev;
+ struct pci_driver *afu_drv;
+ struct pci_error_handlers *err_handler;
pci_ers_result_t afu_result = PCI_ERS_RESULT_RECOVERED;
pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
int i;
@@ -2005,8 +2009,6 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev)
continue;
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
- struct pci_driver *afu_drv;
-
/* Reset the device context.
* TODO: make this less disruptive
*/
@@ -2032,14 +2034,13 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev)
* shouldn't start new work until we call
* their resume function.
*/
- if (!afu_dev->dev.driver)
- continue;
-
afu_drv = to_pci_driver(afu_dev->dev.driver);
+ if (!afu_drv)
+ continue;
- if (afu_drv->err_handler &&
- afu_drv->err_handler->slot_reset)
- afu_result = afu_drv->err_handler->slot_reset(afu_dev);
+ err_handler = afu_drv->err_handler;
+ if (err_handler && err_handler->slot_reset)
+ afu_result = err_handler->slot_reset(afu_dev);
if (afu_result == PCI_ERS_RESULT_DISCONNECT)
result = PCI_ERS_RESULT_DISCONNECT;
@@ -2066,6 +2067,8 @@ static void cxl_pci_resume(struct pci_dev *pdev)
struct cxl *adapter = pci_get_drvdata(pdev);
struct cxl_afu *afu;
struct pci_dev *afu_dev;
+ struct pci_driver *afu_drv;
+ struct pci_error_handlers *err_handler;
int i;
/* Everything is back now. Drivers should restart work now.
@@ -2080,11 +2083,13 @@ static void cxl_pci_resume(struct pci_dev *pdev)
continue;
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
- struct pci_driver *afu_drv;
- if (afu_dev->dev.driver &&
- (afu_drv = to_pci_driver(afu_dev->dev.driver))->err_handler &&
- afu_drv->err_handler->resume)
- afu_drv->err_handler->resume(afu_dev);
+ afu_drv = to_pci_driver(afu_dev->dev.driver);
+ if (!afu_drv)
+ continue;
+
+ err_handler = afu_drv->err_handler;
+ if (err_handler && err_handler->resume)
+ err_handler->resume(afu_dev);
}
}
spin_unlock(&adapter->afu_list_lock);
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 0d0a34347868..fa4b52bb1e05 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -168,11 +168,8 @@ static ssize_t sriov_vf_total_msix_show(struct device *dev,
u32 vf_total_msix = 0;
device_lock(dev);
- if (!dev->driver)
- goto unlock;
-
pdrv = to_pci_driver(dev->driver);
- if (!pdrv->sriov_get_vf_total_msix)
+ if (!pdrv || !pdrv->sriov_get_vf_total_msix)
goto unlock;
vf_total_msix = pdrv->sriov_get_vf_total_msix(pdev);
@@ -199,19 +196,14 @@ static ssize_t sriov_vf_msix_count_store(struct device *dev,
return -EINVAL;
device_lock(&pdev->dev);
- if (!pdev->dev.driver) {
- ret = -EOPNOTSUPP;
- goto err_pdev;
- }
-
- pdrv = to_pci_driver(pdev->dev.driver);
- if (!pdrv->sriov_set_msix_vec_count) {
+ pdrv = to_pci_driver(dev->driver);
+ if (!pdrv || !pdrv->sriov_set_msix_vec_count) {
ret = -EOPNOTSUPP;
goto err_pdev;
}
device_lock(&vf_dev->dev);
- if (vf_dev->dev.driver) {
+ if (to_pci_driver(vf_dev->dev.driver)) {
/*
* A driver is already attached to this VF and has configured
* itself based on the current MSI-X vector count. Changing
@@ -405,14 +397,13 @@ static ssize_t sriov_numvfs_store(struct device *dev,
goto exit;
/* is PF driver loaded */
- if (!pdev->dev.driver) {
+ pdrv = to_pci_driver(dev->driver);
+ if (!pdrv) {
pci_info(pdev, "no driver bound to device; cannot configure SR-IOV\n");
ret = -ENOENT;
goto exit;
}
- pdrv = to_pci_driver(pdev->dev.driver);
-
/* is PF driver loaded w/callback */
if (!pdrv->sriov_configure) {
pci_info(pdev, "driver does not support SR-IOV configuration via sysfs\n");
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e94aa338bab4..3884a1542e86 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -454,7 +454,7 @@ static int pci_device_probe(struct device *dev)
static void pci_device_remove(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
- struct pci_driver *drv = to_pci_driver(pci_dev->dev.driver);
+ struct pci_driver *drv = to_pci_driver(dev->driver);
if (drv->remove) {
pm_runtime_get_sync(dev);
@@ -489,15 +489,12 @@ static void pci_device_remove(struct device *dev)
static void pci_device_shutdown(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = to_pci_driver(dev->driver);
pm_runtime_resume(dev);
- if (pci_dev->dev.driver) {
- struct pci_driver *drv = to_pci_driver(pci_dev->dev.driver);
-
- if (drv->shutdown)
- drv->shutdown(pci_dev);
- }
+ if (drv && drv->shutdown)
+ drv->shutdown(pci_dev);
/*
* If this is a kexec reboot, turn off Bus Master bit on the
@@ -588,25 +585,22 @@ static int pci_pm_reenable_device(struct pci_dev *pci_dev)
static int pci_legacy_suspend(struct device *dev, pm_message_t state)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = to_pci_driver(dev->driver);
- if (dev->driver) {
- struct pci_driver *drv = to_pci_driver(dev->driver);
+ if (drv && drv->suspend) {
+ pci_power_t prev = pci_dev->current_state;
+ int error;
- if (drv->suspend) {
- pci_power_t prev = pci_dev->current_state;
- int error;
+ error = drv->suspend(pci_dev, state);
+ suspend_report_result(drv->suspend, error);
+ if (error)
+ return error;
- error = drv->suspend(pci_dev, state);
- suspend_report_result(drv->suspend, error);
- if (error)
- return error;
-
- if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
- && pci_dev->current_state != PCI_UNKNOWN) {
- pci_WARN_ONCE(pci_dev, pci_dev->current_state != prev,
- "PCI PM: Device state not saved by %pS\n",
- drv->suspend);
- }
+ if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+ && pci_dev->current_state != PCI_UNKNOWN) {
+ pci_WARN_ONCE(pci_dev, pci_dev->current_state != prev,
+ "PCI PM: Device state not saved by %pS\n",
+ drv->suspend);
}
}
@@ -632,17 +626,12 @@ static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
static int pci_legacy_resume(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = to_pci_driver(dev->driver);
pci_fixup_device(pci_fixup_resume, pci_dev);
- if (pci_dev->dev.driver) {
- struct pci_driver *drv = to_pci_driver(pci_dev->dev.driver);
-
- if (drv->resume)
- return drv->resume(pci_dev);
- }
-
- return pci_pm_reenable_device(pci_dev);
+ return drv && drv->resume ?
+ drv->resume(pci_dev) : pci_pm_reenable_device(pci_dev);
}
/* Auxiliary functions used by the new power management framework */
@@ -656,14 +645,8 @@ static void pci_pm_default_suspend(struct pci_dev *pci_dev)
static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
{
- struct pci_driver *drv;
- bool ret;
-
- if (!pci_dev->dev.driver)
- return false;
-
- drv = to_pci_driver(pci_dev->dev.driver);
- ret = drv && (drv->suspend || drv->resume);
+ struct pci_driver *drv = to_pci_driver(pci_dev->dev.driver);
+ bool ret = drv && (drv->suspend || drv->resume);
/*
* Legacy PM support is used by default, so warn if the new framework is
@@ -1255,11 +1238,11 @@ static int pci_pm_runtime_suspend(struct device *dev)
int error;
/*
- * If pci_dev->dev.driver is not set (unbound), we leave the device in D0,
- * but it may go to D3cold when the bridge above it runtime suspends.
- * Save its config space in case that happens.
+ * If the device has no driver, we leave it in D0, but it may go to
+ * D3cold when the bridge above it runtime suspends. Save its
+ * config space in case that happens.
*/
- if (!pci_dev->dev.driver) {
+ if (!to_pci_driver(dev->driver)) {
pci_save_state(pci_dev);
return 0;
}
@@ -1316,7 +1299,7 @@ static int pci_pm_runtime_resume(struct device *dev)
*/
pci_restore_standard_config(pci_dev);
- if (!dev->driver)
+ if (!to_pci_driver(dev->driver))
return 0;
pci_fixup_device(pci_fixup_resume_early, pci_dev);
@@ -1335,13 +1318,14 @@ static int pci_pm_runtime_resume(struct device *dev)
static int pci_pm_runtime_idle(struct device *dev)
{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
/*
- * If dev->driver is not set (unbound), the device should
- * always remain in D0 regardless of the runtime PM status
+ * If the device has no driver, it should always remain in D0
+ * regardless of the runtime PM status
*/
- if (!dev->driver)
+ if (!to_pci_driver(dev->driver))
return 0;
if (!pm)
@@ -1448,8 +1432,10 @@ static struct pci_driver pci_compat_driver = {
*/
struct pci_driver *pci_dev_driver(const struct pci_dev *dev)
{
- if (dev->dev.driver)
- return to_pci_driver(dev->dev.driver);
+ struct pci_driver *drv = to_pci_driver(dev->dev.driver);
+
+ if (drv)
+ return drv;
else {
int i;
for (i = 0; i <= PCI_ROM_RESOURCE; i++)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ccecf740de59..5298ce131f8c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5088,13 +5088,14 @@ EXPORT_SYMBOL_GPL(pci_dev_unlock);
static void pci_dev_save_and_disable(struct pci_dev *dev)
{
+ struct pci_driver *drv = to_pci_driver(dev->dev.driver);
const struct pci_error_handlers *err_handler =
- dev->dev.driver ? to_pci_driver(dev->dev.driver)->err_handler : NULL;
+ drv ? drv->err_handler : NULL;
/*
- * dev->driver->err_handler->reset_prepare() is protected against
- * races with ->remove() by the device lock, which must be held by
- * the caller.
+ * drv->err_handler->reset_prepare() is protected against races
+ * with ->remove() by the device lock, which must be held by the
+ * caller.
*/
if (err_handler && err_handler->reset_prepare)
err_handler->reset_prepare(dev);
@@ -5119,15 +5120,15 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
static void pci_dev_restore(struct pci_dev *dev)
{
+ struct pci_driver *drv = to_pci_driver(dev->dev.driver);
const struct pci_error_handlers *err_handler =
- dev->dev.driver ? to_pci_driver(dev->dev.driver)->err_handler : NULL;
+ drv ? drv->err_handler : NULL;
pci_restore_state(dev);
/*
- * dev->driver->err_handler->reset_done() is protected against
- * races with ->remove() by the device lock, which must be held by
- * the caller.
+ * drv->err_handler->reset_done() is protected against races with
+ * ->remove() by the device lock, which must be held by the caller.
*/
if (err_handler && err_handler->reset_done)
err_handler->reset_done(dev);
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index b314b54f7821..42385fe6b7fa 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -54,9 +54,10 @@ static int report_error_detected(struct pci_dev *dev,
const struct pci_error_handlers *err_handler;
device_lock(&dev->dev);
+ pdrv = to_pci_driver(dev->dev.driver);
if (!pci_dev_set_io_state(dev, state) ||
- !dev->dev.driver ||
- !(pdrv = to_pci_driver(dev->dev.driver))->err_handler ||
+ !pdrv ||
+ !pdrv->err_handler ||
!pdrv->err_handler->error_detected) {
/*
* If any device in the subtree does not have an error_detected
@@ -92,13 +93,14 @@ static int report_normal_detected(struct pci_dev *dev, void *data)
static int report_mmio_enabled(struct pci_dev *dev, void *data)
{
- pci_ers_result_t vote, *result = data;
struct pci_driver *pdrv;
+ pci_ers_result_t vote, *result = data;
const struct pci_error_handlers *err_handler;
device_lock(&dev->dev);
- if (!dev->dev.driver ||
- !(pdrv = to_pci_driver(dev->dev.driver))->err_handler ||
+ pdrv = to_pci_driver(dev->dev.driver);
+ if (!pdrv ||
+ !pdrv->err_handler ||
!pdrv->err_handler->mmio_enabled)
goto out;
@@ -112,13 +114,14 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data)
static int report_slot_reset(struct pci_dev *dev, void *data)
{
+ struct pci_driver *pdrv;
pci_ers_result_t vote, *result = data;
const struct pci_error_handlers *err_handler;
- struct pci_driver *pdrv;
device_lock(&dev->dev);
- if (!dev->dev.driver ||
- !(pdrv = to_pci_driver(dev->dev.driver))->err_handler ||
+ pdrv = to_pci_driver(dev->dev.driver);
+ if (!pdrv ||
+ !pdrv->err_handler ||
!pdrv->err_handler->slot_reset)
goto out;
@@ -132,13 +135,14 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
static int report_resume(struct pci_dev *dev, void *data)
{
- const struct pci_error_handlers *err_handler;
struct pci_driver *pdrv;
+ const struct pci_error_handlers *err_handler;
device_lock(&dev->dev);
+ pdrv = dev->driver;
if (!pci_dev_set_io_state(dev, pci_channel_io_normal) ||
- !dev->dev.driver ||
- !(pdrv = to_pci_driver(dev->dev.driver))->err_handler ||
+ !pdrv ||
+ !pdrv->err_handler ||
!pdrv->err_handler->resume)
goto out;
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index 73831fb87a1e..0ec76b4af16f 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -588,7 +588,6 @@ static pci_ers_result_t pcifront_common_process(int cmd,
struct pcifront_device *pdev,
pci_channel_state_t state)
{
- pci_ers_result_t result;
struct pci_driver *pdrv;
int bus = pdev->sh_info->aer_op.bus;
int devfn = pdev->sh_info->aer_op.devfn;
@@ -598,13 +597,12 @@ static pci_ers_result_t pcifront_common_process(int cmd,
dev_dbg(&pdev->xdev->dev,
"pcifront AER process: cmd %x (bus:%x, devfn%x)",
cmd, bus, devfn);
- result = PCI_ERS_RESULT_NONE;
pcidev = pci_get_domain_bus_and_slot(domain, bus, devfn);
if (!pcidev || !pcidev->dev.driver) {
dev_err(&pdev->xdev->dev, "device or AER driver is NULL\n");
pci_dev_put(pcidev);
- return result;
+ return PCI_ERS_RESULT_NONE;
}
pdrv = to_pci_driver(pcidev->dev.driver);
@@ -612,27 +610,18 @@ static pci_ers_result_t pcifront_common_process(int cmd,
pci_dbg(pcidev, "trying to call AER service\n");
switch (cmd) {
case XEN_PCI_OP_aer_detected:
- result = pdrv->err_handler->
- error_detected(pcidev, state);
- break;
+ return pdrv->err_handler->error_detected(pcidev, state);
case XEN_PCI_OP_aer_mmio:
- result = pdrv->err_handler->
- mmio_enabled(pcidev);
- break;
+ return pdrv->err_handler->mmio_enabled(pcidev);
case XEN_PCI_OP_aer_slotreset:
- result = pdrv->err_handler->
- slot_reset(pcidev);
- break;
+ return pdrv->err_handler->slot_reset(pcidev);
case XEN_PCI_OP_aer_resume:
pdrv->err_handler->resume(pcidev);
- break;
+ return PCI_ERS_RESULT_NONE;
default:
dev_err(&pdev->xdev->dev,
- "bad request in aer recovery "
- "operation!\n");
+ "bad request in AER recovery operation!\n");
}
-
- return result;
}
return PCI_ERS_RESULT_NONE;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7c1ceb39035c..03bfdb25a55c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -899,7 +899,10 @@ struct pci_driver {
struct pci_dynids dynids;
};
-#define to_pci_driver(drv) container_of(drv, struct pci_driver, driver)
+static inline struct pci_driver *to_pci_driver(struct device_driver *drv)
+{
+ return drv ? container_of(drv, struct pci_driver, driver) : NULL;
+}
/**
* PCI_DEVICE - macro used to describe a specific PCI device
^ permalink raw reply related
* Re: [PATCH] powerpc: don't select KFENCE on platform PPC_FSL_BOOK3E
From: Michael Ellerman @ 2021-10-13 0:48 UTC (permalink / raw)
To: Christophe Leroy, Liu Shixin, Marco Elver, Benjamin Herrenschmidt,
Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <9c1ee778-b38b-3d41-37f3-5ea22dca063b@csgroup.eu>
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 12/10/2021 à 08:24, Michael Ellerman a écrit :
>> Liu Shixin <liushixin2@huawei.com> writes:
>>> kindly ping.
>>
>> I was under the impression you were trying to debug why it wasn't
>> working with Christophe.
>
> The investigation was a bit dormant to be honest since Liu confirmed
> that neither KFENCE not DEBUG_PAGEALLOC works.
No worries. Sorry it fell to you to do the investigation.
> I now looked at the effort to make it work, and it is not trivial.
> At the time being, all linear space is mapped with pinned TLBs and
> everything is setup for space 0, with space 1 being used temporarily
> when doing heavy changes to space 0.
>
> We can't use standard pages for linear space on space 0 because we need
> memory mapped at all time for exceptions (on booke exception run with
> MMU on in space 0).
>
> In order to use standard pages, we'd need to reorganise the kernel to
> have it run mostly in space 1 (for data at least) where we would map
> almost everything with standard pages, and keep pinned TLB to map linear
> space on space 0 for TLB miss exceptions. Then we'd do more or less like
> book3s/32 and switch back into space 1 into other exceptions prolog.
>
> That could be good to do it as we could maybe have more code in common
> with non booke 32 bits, but it is not a trivial job.
>
> So I suggest that for now, we just make KFENCE and DEBUG_PAGEALLOC
> unselectable for booke/32 (e500 and 44x).
Yep seems reasonable.
cheers
^ permalink raw reply
* [PATCH AUTOSEL 5.14 13/17] powerpc/lib: Add helper to check if offset is within conditional branch range
From: Sasha Levin @ 2021-10-13 0:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, Song Liu, daniel, linuxppc-dev, ast, andrii, netdev,
naveen.n.rao, Naveen N. Rao, jniethe5, bpf
In-Reply-To: <20211013005441.699846-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit 4549c3ea3160fa8b3f37dfe2f957657bb265eda9 ]
Add a helper to check if a given offset is within the branch range for a
powerpc conditional branch instruction, and update some sites to use the
new helper.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/442b69a34ced32ca346a0d9a855f3f6cfdbbbd41.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/lib/code-patching.c | 7 ++++++-
arch/powerpc/net/bpf_jit.h | 7 +------
3 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a95f63788c6b..4ba834599c4d 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -23,6 +23,7 @@
#define BRANCH_ABSOLUTE 0x2
bool is_offset_in_branch_range(long offset);
+bool is_offset_in_cond_branch_range(long offset);
int create_branch(struct ppc_inst *instr, const u32 *addr,
unsigned long target, int flags);
int create_cond_branch(struct ppc_inst *instr, const u32 *addr,
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index f9a3019e37b4..c5ed98823835 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -228,6 +228,11 @@ bool is_offset_in_branch_range(long offset)
return (offset >= -0x2000000 && offset <= 0x1fffffc && !(offset & 0x3));
}
+bool is_offset_in_cond_branch_range(long offset)
+{
+ return offset >= -0x8000 && offset <= 0x7fff && !(offset & 0x3);
+}
+
/*
* Helper to check if a given instruction is a conditional branch
* Derived from the conditional checks in analyse_instr()
@@ -280,7 +285,7 @@ int create_cond_branch(struct ppc_inst *instr, const u32 *addr,
offset = offset - (unsigned long)addr;
/* Check we can represent the target in the instruction format */
- if (offset < -0x8000 || offset > 0x7FFF || offset & 0x3)
+ if (!is_offset_in_cond_branch_range(offset))
return 1;
/* Mask out the flags and target, so they don't step on each other. */
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 99fad093f43e..935ea95b6635 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -78,11 +78,6 @@
#define PPC_FUNC_ADDR(d,i) do { PPC_LI32(d, i); } while(0)
#endif
-static inline bool is_nearbranch(int offset)
-{
- return (offset < 32768) && (offset >= -32768);
-}
-
/*
* The fly in the ointment of code size changing from pass to pass is
* avoided by padding the short branch case with a NOP. If code size differs
@@ -91,7 +86,7 @@ static inline bool is_nearbranch(int offset)
* state.
*/
#define PPC_BCC(cond, dest) do { \
- if (is_nearbranch((dest) - (ctx->idx * 4))) { \
+ if (is_offset_in_cond_branch_range((long)(dest) - (ctx->idx * 4))) { \
PPC_BCC_SHORT(cond, dest); \
EMIT(PPC_RAW_NOP()); \
} else { \
--
2.33.0
^ permalink raw reply related
* [PATCH AUTOSEL 5.14 14/17] powerpc/bpf: Validate branch ranges
From: Sasha Levin @ 2021-10-13 0:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, Song Liu, daniel, Johan Almbladh, ast, andrii,
netdev, naveen.n.rao, Naveen N. Rao, linuxppc-dev, bpf
In-Reply-To: <20211013005441.699846-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit 3832ba4e283d7052b783dab8311df7e3590fed93 ]
Add checks to ensure that we never emit branch instructions with
truncated branch offsets.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/71d33a6b7603ec1013c9734dd8bdd4ff5e929142.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/net/bpf_jit.h | 26 ++++++++++++++++++++------
arch/powerpc/net/bpf_jit_comp.c | 6 +++++-
arch/powerpc/net/bpf_jit_comp32.c | 8 ++++++--
arch/powerpc/net/bpf_jit_comp64.c | 8 ++++++--
4 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 935ea95b6635..7e9b978b768e 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -24,16 +24,30 @@
#define EMIT(instr) PLANT_INSTR(image, ctx->idx, instr)
/* Long jump; (unconditional 'branch') */
-#define PPC_JMP(dest) EMIT(PPC_INST_BRANCH | \
- (((dest) - (ctx->idx * 4)) & 0x03fffffc))
+#define PPC_JMP(dest) \
+ do { \
+ long offset = (long)(dest) - (ctx->idx * 4); \
+ if (!is_offset_in_branch_range(offset)) { \
+ pr_err_ratelimited("Branch offset 0x%lx (@%u) out of range\n", offset, ctx->idx); \
+ return -ERANGE; \
+ } \
+ EMIT(PPC_INST_BRANCH | (offset & 0x03fffffc)); \
+ } while (0)
+
/* blr; (unconditional 'branch' with link) to absolute address */
#define PPC_BL_ABS(dest) EMIT(PPC_INST_BL | \
(((dest) - (unsigned long)(image + ctx->idx)) & 0x03fffffc))
/* "cond" here covers BO:BI fields. */
-#define PPC_BCC_SHORT(cond, dest) EMIT(PPC_INST_BRANCH_COND | \
- (((cond) & 0x3ff) << 16) | \
- (((dest) - (ctx->idx * 4)) & \
- 0xfffc))
+#define PPC_BCC_SHORT(cond, dest) \
+ do { \
+ long offset = (long)(dest) - (ctx->idx * 4); \
+ if (!is_offset_in_cond_branch_range(offset)) { \
+ pr_err_ratelimited("Conditional branch offset 0x%lx (@%u) out of range\n", offset, ctx->idx); \
+ return -ERANGE; \
+ } \
+ EMIT(PPC_INST_BRANCH_COND | (((cond) & 0x3ff) << 16) | (offset & 0xfffc)); \
+ } while (0)
+
/* Sign-extended 32-bit immediate load */
#define PPC_LI32(d, i) do { \
if ((int)(uintptr_t)(i) >= -32768 && \
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 53aefee3fe70..fcbf7a917c56 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -210,7 +210,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
bpf_jit_build_prologue(code_base, &cgctx);
- bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
+ if (bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass)) {
+ bpf_jit_binary_free(bpf_hdr);
+ fp = org_fp;
+ goto out_addrs;
+ }
bpf_jit_build_epilogue(code_base, &cgctx);
if (bpf_jit_enable > 1)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index beb12cbc8c29..a74d52204f8d 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -200,7 +200,7 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 fun
}
}
-static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
+static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
{
/*
* By now, the eBPF program has already setup parameters in r3-r6
@@ -261,7 +261,9 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
bpf_jit_emit_common_epilogue(image, ctx);
EMIT(PPC_RAW_BCTR());
+
/* out: */
+ return 0;
}
/* Assemble the body code between the prologue & epilogue */
@@ -1090,7 +1092,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
*/
case BPF_JMP | BPF_TAIL_CALL:
ctx->seen |= SEEN_TAILCALL;
- bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+ ret = bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+ if (ret < 0)
+ return ret;
break;
default:
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index b87a63dba9c8..f06c62089b14 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -206,7 +206,7 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 fun
EMIT(PPC_RAW_BCTRL());
}
-static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
+static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
{
/*
* By now, the eBPF program has already setup parameters in r3, r4 and r5
@@ -267,7 +267,9 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
bpf_jit_emit_common_epilogue(image, ctx);
EMIT(PPC_RAW_BCTR());
+
/* out: */
+ return 0;
}
/* Assemble the body code between the prologue & epilogue */
@@ -993,7 +995,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
*/
case BPF_JMP | BPF_TAIL_CALL:
ctx->seen |= SEEN_TAILCALL;
- bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+ ret = bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+ if (ret < 0)
+ return ret;
break;
default:
--
2.33.0
^ permalink raw reply related
* [PATCH AUTOSEL 5.14 15/17] powerpc/security: Add a helper to query stf_barrier type
From: Sasha Levin @ 2021-10-13 0:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, aik, lihuafei1, npiggin, aneesh.kumar, Naveen N. Rao,
linuxppc-dev, dja
In-Reply-To: <20211013005441.699846-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit 030905920f32e91a52794937f67434ac0b3ea41a ]
Add a helper to return the stf_barrier type for the current processor.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/security_features.h | 5 +++++
arch/powerpc/kernel/security.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h
index 792eefaf230b..27574f218b37 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,11 @@ static inline bool security_ftr_enabled(u64 feature)
return !!(powerpc_security_features & feature);
}
+#ifdef CONFIG_PPC_BOOK3S_64
+enum stf_barrier_type stf_barrier_type_get(void);
+#else
+static inline enum stf_barrier_type stf_barrier_type_get(void) { return STF_BARRIER_NONE; }
+#endif
// Features indicating support for Spectre/Meltdown mitigations
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index cc51fa52e783..e723ff77cc9b 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -263,6 +263,11 @@ static int __init handle_no_stf_barrier(char *p)
early_param("no_stf_barrier", handle_no_stf_barrier);
+enum stf_barrier_type stf_barrier_type_get(void)
+{
+ return stf_enabled_flush_types;
+}
+
/* This is the generic flag used by other architectures */
static int __init handle_ssbd(char *p)
{
--
2.33.0
^ permalink raw reply related
* [PATCH AUTOSEL 5.14 16/17] powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
From: Sasha Levin @ 2021-10-13 0:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, daniel, ast, andrii, netdev, naveen.n.rao,
Naveen N. Rao, linuxppc-dev, bpf
In-Reply-To: <20211013005441.699846-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit b7540d62509453263604a155bf2d5f0ed450cba2 ]
Emit similar instruction sequences to commit a048a07d7f4535
("powerpc/64s: Add support for a store forwarding barrier at kernel
entry/exit") when encountering BPF_NOSPEC.
Mitigations are enabled depending on what the firmware advertises. In
particular, we do not gate these mitigations based on current settings,
just like in x86. Due to this, we don't need to take any action if
mitigations are enabled or disabled at runtime.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/956570cbc191cd41f8274bed48ee757a86dac62a.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/net/bpf_jit64.h | 8 ++---
arch/powerpc/net/bpf_jit_comp64.c | 55 ++++++++++++++++++++++++++++---
2 files changed, 55 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index 7b713edfa7e2..b63b35e45e55 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -16,18 +16,18 @@
* with our redzone usage.
*
* [ prev sp ] <-------------
- * [ nv gpr save area ] 6*8 |
+ * [ nv gpr save area ] 5*8 |
* [ tail_call_cnt ] 8 |
- * [ local_tmp_var ] 8 |
+ * [ local_tmp_var ] 16 |
* fp (r31) --> [ ebpf stack space ] upto 512 |
* [ frame header ] 32/112 |
* sp (r1) ---> [ stack pointer ] --------------
*/
/* for gpr non volatile registers BPG_REG_6 to 10 */
-#define BPF_PPC_STACK_SAVE (6*8)
+#define BPF_PPC_STACK_SAVE (5*8)
/* for bpf JIT code internal usage */
-#define BPF_PPC_STACK_LOCALS 16
+#define BPF_PPC_STACK_LOCALS 24
/* stack frame excluding BPF stack, ensure this is quadword aligned */
#define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + \
BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index f06c62089b14..1a567c46730a 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -15,6 +15,7 @@
#include <linux/if_vlan.h>
#include <asm/kprobes.h>
#include <linux/bpf.h>
+#include <asm/security_features.h>
#include "bpf_jit64.h"
@@ -35,9 +36,9 @@ static inline bool bpf_has_stack_frame(struct codegen_context *ctx)
* [ prev sp ] <-------------
* [ ... ] |
* sp (r1) ---> [ stack pointer ] --------------
- * [ nv gpr save area ] 6*8
+ * [ nv gpr save area ] 5*8
* [ tail_call_cnt ] 8
- * [ local_tmp_var ] 8
+ * [ local_tmp_var ] 16
* [ unused red zone ] 208 bytes protected
*/
static int bpf_jit_stack_local(struct codegen_context *ctx)
@@ -45,12 +46,12 @@ static int bpf_jit_stack_local(struct codegen_context *ctx)
if (bpf_has_stack_frame(ctx))
return STACK_FRAME_MIN_SIZE + ctx->stack_size;
else
- return -(BPF_PPC_STACK_SAVE + 16);
+ return -(BPF_PPC_STACK_SAVE + 24);
}
static int bpf_jit_stack_tailcallcnt(struct codegen_context *ctx)
{
- return bpf_jit_stack_local(ctx) + 8;
+ return bpf_jit_stack_local(ctx) + 16;
}
static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
@@ -272,10 +273,33 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
return 0;
}
+/*
+ * We spill into the redzone always, even if the bpf program has its own stackframe.
+ * Offsets hardcoded based on BPF_PPC_STACK_SAVE -- see bpf_jit_stack_local()
+ */
+void bpf_stf_barrier(void);
+
+asm (
+" .global bpf_stf_barrier ;"
+" bpf_stf_barrier: ;"
+" std 21,-64(1) ;"
+" std 22,-56(1) ;"
+" sync ;"
+" ld 21,-64(1) ;"
+" ld 22,-56(1) ;"
+" ori 31,31,0 ;"
+" .rept 14 ;"
+" b 1f ;"
+" 1: ;"
+" .endr ;"
+" blr ;"
+);
+
/* Assemble the body code between the prologue & epilogue */
int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
u32 *addrs, bool extra_pass)
{
+ enum stf_barrier_type stf_barrier = stf_barrier_type_get();
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
int i, ret;
@@ -633,6 +657,29 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
* BPF_ST NOSPEC (speculation barrier)
*/
case BPF_ST | BPF_NOSPEC:
+ if (!security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) ||
+ !security_ftr_enabled(SEC_FTR_STF_BARRIER))
+ break;
+
+ switch (stf_barrier) {
+ case STF_BARRIER_EIEIO:
+ EMIT(PPC_RAW_EIEIO() | 0x02000000);
+ break;
+ case STF_BARRIER_SYNC_ORI:
+ EMIT(PPC_RAW_SYNC());
+ EMIT(PPC_RAW_LD(b2p[TMP_REG_1], _R13, 0));
+ EMIT(PPC_RAW_ORI(_R31, _R31, 0));
+ break;
+ case STF_BARRIER_FALLBACK:
+ EMIT(PPC_RAW_MFLR(b2p[TMP_REG_1]));
+ PPC_LI64(12, dereference_kernel_function_descriptor(bpf_stf_barrier));
+ EMIT(PPC_RAW_MTCTR(12));
+ EMIT(PPC_RAW_BCTRL());
+ EMIT(PPC_RAW_MTLR(b2p[TMP_REG_1]));
+ break;
+ case STF_BARRIER_NONE:
+ break;
+ }
break;
/*
--
2.33.0
^ permalink raw reply related
* [PATCH AUTOSEL 5.10 10/11] powerpc/security: Add a helper to query stf_barrier type
From: Sasha Levin @ 2021-10-13 0:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, lihuafei1, aik, npiggin, aneesh.kumar, Naveen N. Rao,
linuxppc-dev, dja
In-Reply-To: <20211013005532.700190-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit 030905920f32e91a52794937f67434ac0b3ea41a ]
Add a helper to return the stf_barrier type for the current processor.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/security_features.h | 5 +++++
arch/powerpc/kernel/security.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h
index b774a4477d5f..e380acc6e413 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,11 @@ static inline bool security_ftr_enabled(u64 feature)
return !!(powerpc_security_features & feature);
}
+#ifdef CONFIG_PPC_BOOK3S_64
+enum stf_barrier_type stf_barrier_type_get(void);
+#else
+static inline enum stf_barrier_type stf_barrier_type_get(void) { return STF_BARRIER_NONE; }
+#endif
// Features indicating support for Spectre/Meltdown mitigations
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index e4e1a94ccf6a..3f510c911b10 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -261,6 +261,11 @@ static int __init handle_no_stf_barrier(char *p)
early_param("no_stf_barrier", handle_no_stf_barrier);
+enum stf_barrier_type stf_barrier_type_get(void)
+{
+ return stf_enabled_flush_types;
+}
+
/* This is the generic flag used by other architectures */
static int __init handle_ssbd(char *p)
{
--
2.33.0
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 6/6] powerpc/security: Add a helper to query stf_barrier type
From: Sasha Levin @ 2021-10-13 0:56 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, lihuafei1, aik, npiggin, aneesh.kumar, Naveen N. Rao,
linuxppc-dev, dja
In-Reply-To: <20211013005603.700363-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit 030905920f32e91a52794937f67434ac0b3ea41a ]
Add a helper to return the stf_barrier type for the current processor.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/security_features.h | 5 +++++
arch/powerpc/kernel/security.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h
index e9e3f85134e5..316a9c8d1929 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,11 @@ static inline bool security_ftr_enabled(u64 feature)
return !!(powerpc_security_features & feature);
}
+#ifdef CONFIG_PPC_BOOK3S_64
+enum stf_barrier_type stf_barrier_type_get(void);
+#else
+static inline enum stf_barrier_type stf_barrier_type_get(void) { return STF_BARRIER_NONE; }
+#endif
// Features indicating support for Spectre/Meltdown mitigations
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 1740a66cea84..ff022e725693 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -256,6 +256,11 @@ static int __init handle_no_stf_barrier(char *p)
early_param("no_stf_barrier", handle_no_stf_barrier);
+enum stf_barrier_type stf_barrier_type_get(void)
+{
+ return stf_enabled_flush_types;
+}
+
/* This is the generic flag used by other architectures */
static int __init handle_ssbd(char *p)
{
--
2.33.0
^ permalink raw reply related
* [PATCH AUTOSEL 4.19 5/5] powerpc/security: Add a helper to query stf_barrier type
From: Sasha Levin @ 2021-10-13 0:56 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, npiggin, lihuafei1, aneesh.kumar, Naveen N. Rao,
linuxppc-dev, dja
In-Reply-To: <20211013005619.700553-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit 030905920f32e91a52794937f67434ac0b3ea41a ]
Add a helper to return the stf_barrier type for the current processor.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/security_features.h | 5 +++++
arch/powerpc/kernel/security.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h
index 3b45a64e491e..a673416da388 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,11 @@ static inline bool security_ftr_enabled(unsigned long feature)
return !!(powerpc_security_features & feature);
}
+#ifdef CONFIG_PPC_BOOK3S_64
+enum stf_barrier_type stf_barrier_type_get(void);
+#else
+static inline enum stf_barrier_type stf_barrier_type_get(void) { return STF_BARRIER_NONE; }
+#endif
// Features indicating support for Spectre/Meltdown mitigations
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 6a3dde9587cc..48985a1fd34d 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -248,6 +248,11 @@ static int __init handle_no_stf_barrier(char *p)
early_param("no_stf_barrier", handle_no_stf_barrier);
+enum stf_barrier_type stf_barrier_type_get(void)
+{
+ return stf_enabled_flush_types;
+}
+
/* This is the generic flag used by other architectures */
static int __init handle_ssbd(char *p)
{
--
2.33.0
^ permalink raw reply related
* [PATCH AUTOSEL 4.14 5/5] powerpc/security: Add a helper to query stf_barrier type
From: Sasha Levin @ 2021-10-13 0:56 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, lihuafei1, npiggin, aneesh.kumar, Naveen N. Rao,
linuxppc-dev, dja
In-Reply-To: <20211013005631.700631-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit 030905920f32e91a52794937f67434ac0b3ea41a ]
Add a helper to return the stf_barrier type for the current processor.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/security_features.h | 5 +++++
arch/powerpc/kernel/security.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h
index 3b45a64e491e..a673416da388 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,11 @@ static inline bool security_ftr_enabled(unsigned long feature)
return !!(powerpc_security_features & feature);
}
+#ifdef CONFIG_PPC_BOOK3S_64
+enum stf_barrier_type stf_barrier_type_get(void);
+#else
+static inline enum stf_barrier_type stf_barrier_type_get(void) { return STF_BARRIER_NONE; }
+#endif
// Features indicating support for Spectre/Meltdown mitigations
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index b3f540c9f410..75a5277ef7b8 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -248,6 +248,11 @@ static int __init handle_no_stf_barrier(char *p)
early_param("no_stf_barrier", handle_no_stf_barrier);
+enum stf_barrier_type stf_barrier_type_get(void)
+{
+ return stf_enabled_flush_types;
+}
+
/* This is the generic flag used by other architectures */
static int __init handle_ssbd(char *p)
{
--
2.33.0
^ permalink raw reply related
* [PATCH AUTOSEL 4.9 4/4] powerpc/security: Add a helper to query stf_barrier type
From: Sasha Levin @ 2021-10-13 0:56 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, lihuafei1, aik, npiggin, aneesh.kumar, Naveen N. Rao,
linuxppc-dev, dja
In-Reply-To: <20211013005644.700705-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit 030905920f32e91a52794937f67434ac0b3ea41a ]
Add a helper to return the stf_barrier type for the current processor.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/security_features.h | 5 +++++
arch/powerpc/kernel/security.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h
index 3b45a64e491e..a673416da388 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,11 @@ static inline bool security_ftr_enabled(unsigned long feature)
return !!(powerpc_security_features & feature);
}
+#ifdef CONFIG_PPC_BOOK3S_64
+enum stf_barrier_type stf_barrier_type_get(void);
+#else
+static inline enum stf_barrier_type stf_barrier_type_get(void) { return STF_BARRIER_NONE; }
+#endif
// Features indicating support for Spectre/Meltdown mitigations
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index ff85fc800183..df42f020e703 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -249,6 +249,11 @@ static int __init handle_no_stf_barrier(char *p)
early_param("no_stf_barrier", handle_no_stf_barrier);
+enum stf_barrier_type stf_barrier_type_get(void)
+{
+ return stf_enabled_flush_types;
+}
+
/* This is the generic flag used by other architectures */
static int __init handle_ssbd(char *p)
{
--
2.33.0
^ permalink raw reply related
* [PATCH AUTOSEL 4.4 2/2] powerpc/security: Add a helper to query stf_barrier type
From: Sasha Levin @ 2021-10-13 0:56 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, lihuafei1, aik, npiggin, aneesh.kumar, Naveen N. Rao,
linuxppc-dev, dja
In-Reply-To: <20211013005654.700769-1-sashal@kernel.org>
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
[ Upstream commit 030905920f32e91a52794937f67434ac0b3ea41a ]
Add a helper to return the stf_barrier type for the current processor.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/security_features.h | 5 +++++
arch/powerpc/kernel/security.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h
index 3b45a64e491e..a673416da388 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -39,6 +39,11 @@ static inline bool security_ftr_enabled(unsigned long feature)
return !!(powerpc_security_features & feature);
}
+#ifdef CONFIG_PPC_BOOK3S_64
+enum stf_barrier_type stf_barrier_type_get(void);
+#else
+static inline enum stf_barrier_type stf_barrier_type_get(void) { return STF_BARRIER_NONE; }
+#endif
// Features indicating support for Spectre/Meltdown mitigations
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 45778c83038f..ac9a2498efe7 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -249,6 +249,11 @@ static int __init handle_no_stf_barrier(char *p)
early_param("no_stf_barrier", handle_no_stf_barrier);
+enum stf_barrier_type stf_barrier_type_get(void)
+{
+ return stf_enabled_flush_types;
+}
+
/* This is the generic flag used by other architectures */
static int __init handle_ssbd(char *p)
{
--
2.33.0
^ permalink raw reply related
* Re: [PATCH 2/2] ftrace: prevent preemption in perf_ftrace_function_call()
From: 王贇 @ 2021-10-13 1:45 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Walmsley, James E.J. Bottomley, Guo Ren, Jisheng Zhang,
H. Peter Anvin, live-patching, linux-riscv, Miroslav Benes,
Paul Mackerras, Joe Lawrence, Helge Deller, x86, linux-csky,
Ingo Molnar, Petr Mladek, Albert Ou, Jiri Kosina, Steven Rostedt,
Borislav Petkov, Nicholas Piggin, Josh Poimboeuf, Thomas Gleixner,
linux-parisc, linux-kernel, Palmer Dabbelt, Masami Hiramatsu,
Colin Ian King, linuxppc-dev
In-Reply-To: <YWVvfBybqjKuifum@hirez.programming.kicks-ass.net>
On 2021/10/12 下午7:20, Peter Zijlstra wrote:
> On Tue, Oct 12, 2021 at 01:40:31PM +0800, 王贇 wrote:
>
>> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
>> index 6aed10e..33c2f76 100644
>> --- a/kernel/trace/trace_event_perf.c
>> +++ b/kernel/trace/trace_event_perf.c
>> @@ -441,12 +441,19 @@ void perf_trace_buf_update(void *record, u16 type)
>> if (!rcu_is_watching())
>> return;
>>
>> + /*
>> + * Prevent CPU changing from now on. rcu must
>> + * be in watching if the task was migrated and
>> + * scheduled.
>> + */
>> + preempt_disable_notrace();
>> +
>> if ((unsigned long)ops->private != smp_processor_id())
>> - return;
>> + goto out;
>>
>> bit = ftrace_test_recursion_trylock(ip, parent_ip);
>> if (bit < 0)
>> - return;
>> + goto out;
>>
>> event = container_of(ops, struct perf_event, ftrace_ops);
>>
>
> This seems rather daft, wouldn't it be easier to just put that check
> under the recursion thing?
In case if the condition matched, extra lock/unlock will be introduced,
but I guess that's acceptable since this seems unlikely to happen :-P
Will move the check in v2.
Regards,
Michael Wang
>
^ permalink raw reply
* Re: [PATCH 1/2] ftrace: disable preemption on the testing of recursion
From: 王贇 @ 2021-10-13 1:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Paul Mackerras, Joe Lawrence,
Helge Deller, x86, linux-csky, Ingo Molnar, Petr Mladek,
Albert Ou, Jiri Kosina, Nicholas Piggin, Borislav Petkov,
Josh Poimboeuf, Thomas Gleixner, linux-parisc, linux-kernel,
Palmer Dabbelt, Masami Hiramatsu, Colin Ian King, linuxppc-dev
In-Reply-To: <20211012081728.5d357d6c@gandalf.local.home>
On 2021/10/12 下午8:17, Steven Rostedt wrote:
> On Tue, 12 Oct 2021 13:40:08 +0800
> 王贇 <yun.wang@linux.alibaba.com> wrote:
>
>> @@ -52,11 +52,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>> bit = ftrace_test_recursion_trylock(ip, parent_ip);
>> if (WARN_ON_ONCE(bit < 0))
>> return;
>> - /*
>> - * A variant of synchronize_rcu() is used to allow patching functions
>> - * where RCU is not watching, see klp_synchronize_transition().
>> - */
>
> I have to take a deeper look at this patch set, but do not remove this
> comment, as it explains the protection here, that is not obvious with the
> changes you made.
Will keep that in v2.
Regards,
Michael Wang
>
> -- Steve
>
>
>> - preempt_disable_notrace();
>>
>> func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>> stack_node);
^ permalink raw reply
* Re: [PATCH 1/2] ftrace: disable preemption on the testing of recursion
From: 王贇 @ 2021-10-13 1:50 UTC (permalink / raw)
To: Miroslav Benes
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Paul Mackerras, Joe Lawrence, Helge Deller, x86,
linux-csky, Ingo Molnar, Petr Mladek, Albert Ou, Jiri Kosina,
Steven Rostedt, Borislav Petkov, Nicholas Piggin, Josh Poimboeuf,
Thomas Gleixner, linux-parisc, linux-kernel, Palmer Dabbelt,
Masami Hiramatsu, Colin Ian King, linuxppc-dev
In-Reply-To: <alpine.LSU.2.21.2110121421260.3394@pobox.suse.cz>
On 2021/10/12 下午8:24, Miroslav Benes wrote:
[snip]
>>
>> func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>> stack_node);
>> @@ -120,7 +115,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>> klp_arch_set_pc(fregs, (unsigned long)func->new_func);
>>
>> unlock:
>> - preempt_enable_notrace();
>> ftrace_test_recursion_unlock(bit);
>> }
>
> I don't like this change much. We have preempt_disable there not because
> of ftrace_test_recursion, but because of RCU. ftrace_test_recursion was
> added later. Yes, it would work with the change, but it would also hide
> things which should not be hidden in my opinion.
Not very sure about the backgroup stories, but just found this in
'Documentation/trace/ftrace-uses.rst':
Note, on success,
ftrace_test_recursion_trylock() will disable preemption, and the
ftrace_test_recursion_unlock() will enable it again (if it was previously
enabled).
Seems like this lock pair was supposed to take care the preemtion itself?
Regards,
Michael Wang
>
> Miroslav
>
^ permalink raw reply
* Re: [PATCH 1/2] ftrace: disable preemption on the testing of recursion
From: 王贇 @ 2021-10-13 1:52 UTC (permalink / raw)
To: Steven Rostedt, Miroslav Benes
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Paul Mackerras, Joe Lawrence, Helge Deller, x86,
linux-csky, Ingo Molnar, Petr Mladek, Albert Ou, Jiri Kosina,
Nicholas Piggin, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
linux-parisc, linux-kernel, Palmer Dabbelt, Masami Hiramatsu,
Colin Ian King, linuxppc-dev
In-Reply-To: <20211012082920.1f8d6557@gandalf.local.home>
On 2021/10/12 下午8:29, Steven Rostedt wrote:
> On Tue, 12 Oct 2021 14:24:43 +0200 (CEST)
> Miroslav Benes <mbenes@suse.cz> wrote:
>
>>> +++ b/kernel/livepatch/patch.c
>>> @@ -52,11 +52,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>>> bit = ftrace_test_recursion_trylock(ip, parent_ip);
>>> if (WARN_ON_ONCE(bit < 0))
>>> return;
>>> - /*
>>> - * A variant of synchronize_rcu() is used to allow patching functions
>>> - * where RCU is not watching, see klp_synchronize_transition().
>>> - */
>>> - preempt_disable_notrace();
>>>
>>> func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>>> stack_node);
>>> @@ -120,7 +115,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>>> klp_arch_set_pc(fregs, (unsigned long)func->new_func);
>>>
>>> unlock:
>>> - preempt_enable_notrace();
>>> ftrace_test_recursion_unlock(bit);
>>> }
>>
>> I don't like this change much. We have preempt_disable there not because
>> of ftrace_test_recursion, but because of RCU. ftrace_test_recursion was
>> added later. Yes, it would work with the change, but it would also hide
>> things which should not be hidden in my opinion.
>
> Agreed, but I believe the change is fine, but requires a nice comment to
> explain what you said above.
>
> Thus, before the "ftrace_test_recursion_trylock()" we need:
>
> /*
> * The ftrace_test_recursion_trylock() will disable preemption,
> * which is required for the variant of synchronize_rcu() that is
> * used to allow patching functions where RCU is not watching.
> * See klp_synchronize_transition() for more details.
> */
Will be in v2 too :-)
Regards,
Michael Wang
>
> -- Steve
>
^ permalink raw reply
* Re: [PATCH 1/2] ftrace: disable preemption on the testing of recursion
From: 王贇 @ 2021-10-13 2:04 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Paul Mackerras, Joe Lawrence,
Helge Deller, x86, linux-csky, Ingo Molnar, Petr Mladek,
Albert Ou, Jiri Kosina, Nicholas Piggin, Borislav Petkov,
Josh Poimboeuf, Thomas Gleixner, linux-parisc, linux-kernel,
Palmer Dabbelt, Masami Hiramatsu, Colin Ian King, linuxppc-dev
In-Reply-To: <20211012084331.06b8dd23@gandalf.local.home>
On 2021/10/12 下午8:43, Steven Rostedt wrote:
> On Tue, 12 Oct 2021 13:40:08 +0800
> 王贇 <yun.wang@linux.alibaba.com> wrote:
>
>> --- a/include/linux/trace_recursion.h
>> +++ b/include/linux/trace_recursion.h
>> @@ -214,7 +214,14 @@ static __always_inline void trace_clear_recursion(int bit)
>> static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>> unsigned long parent_ip)
>> {
>> - return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>> + int bit;
>> +
>> + preempt_disable_notrace();
>
> The recursion test does not require preemption disabled, it uses the task
> struct, not per_cpu variables, so you should not disable it before the test.
>
> bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> if (bit >= 0)
> preempt_disable_notrace();
>
> And if the bit is zero, it means a recursion check was already done by
> another caller (ftrace handler does the check, followed by calling perf),
> and you really don't even need to disable preemption in that case.
>
> if (bit > 0)
> preempt_disable_notrace();
>
> And on the unlock, have:
>
> static __always_inline void ftrace_test_recursion_unlock(int bit)
> {
> if (bit)
> preempt_enable_notrace();
> trace_clear_recursion(bit);
> }
>
> But maybe that's over optimizing ;-)
I see, while the user can still check smp_processor_id() after trylock
return bit 0...
I guess Peter's point at very beginning is to prevent such cases, since
kernel for production will not have preemption debug on, and such issue
won't get report but could cause trouble which really hard to trace down
, way to eliminate such issue once for all sounds attractive, isn't it?
Regards,
Michael Wang
>
> -- Steve
>
>
>> + bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>> + if (bit < 0)
>> + preempt_enable_notrace();
>> +
>> + return bit;
>> }
^ permalink raw reply
* Re: [PATCH 1/2] ftrace: disable preemption on the testing of recursion
From: Steven Rostedt @ 2021-10-13 2:27 UTC (permalink / raw)
To: 王贇
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Paul Mackerras, Joe Lawrence,
Helge Deller, x86, linux-csky, Ingo Molnar, Petr Mladek,
Albert Ou, Jiri Kosina, Nicholas Piggin, Borislav Petkov,
Josh Poimboeuf, Thomas Gleixner, linux-parisc, linux-kernel,
Palmer Dabbelt, Masami Hiramatsu, Colin Ian King, linuxppc-dev
In-Reply-To: <74090798-7d93-0713-982c-6f0247118d20@linux.alibaba.com>
On Wed, 13 Oct 2021 09:50:17 +0800
王贇 <yun.wang@linux.alibaba.com> wrote:
> >> - preempt_enable_notrace();
> >> ftrace_test_recursion_unlock(bit);
> >> }
> >
> > I don't like this change much. We have preempt_disable there not because
> > of ftrace_test_recursion, but because of RCU. ftrace_test_recursion was
> > added later. Yes, it would work with the change, but it would also hide
> > things which should not be hidden in my opinion.
>
> Not very sure about the backgroup stories, but just found this in
> 'Documentation/trace/ftrace-uses.rst':
>
> Note, on success,
> ftrace_test_recursion_trylock() will disable preemption, and the
> ftrace_test_recursion_unlock() will enable it again (if it was previously
> enabled).
Right that part is to be fixed by what you are adding here.
The point that Miroslav is complaining about is that the preemption
disabling is special in this case, and not just from the recursion
point of view, which is why the comment is still required.
-- Steve
>
> Seems like this lock pair was supposed to take care the preemtion itself?
^ permalink raw reply
* Re: [PATCH 1/2] ftrace: disable preemption on the testing of recursion
From: Steven Rostedt @ 2021-10-13 2:30 UTC (permalink / raw)
To: 王贇
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Paul Mackerras, Joe Lawrence,
Helge Deller, x86, linux-csky, Ingo Molnar, Petr Mladek,
Albert Ou, Jiri Kosina, Nicholas Piggin, Borislav Petkov,
Josh Poimboeuf, Thomas Gleixner, linux-parisc, linux-kernel,
Palmer Dabbelt, Masami Hiramatsu, Colin Ian King, linuxppc-dev
In-Reply-To: <1eab20c1-d69b-f94b-92ff-4329d0aff6a2@linux.alibaba.com>
On Wed, 13 Oct 2021 10:04:52 +0800
王贇 <yun.wang@linux.alibaba.com> wrote:
> I see, while the user can still check smp_processor_id() after trylock
> return bit 0...
But preemption would have already been disabled. That's because a bit 0
means that a recursion check has already been made by a previous
caller and this one is nested, thus preemption is already disabled.
If bit is 0, then preemption had better be disabled as well.
-- Steve
^ permalink raw reply
* Re: [PATCH 1/2] ftrace: disable preemption on the testing of recursion
From: 王贇 @ 2021-10-13 2:36 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Paul Mackerras, Joe Lawrence,
Helge Deller, x86, linux-csky, Ingo Molnar, Petr Mladek,
Albert Ou, Jiri Kosina, Nicholas Piggin, Borislav Petkov,
Josh Poimboeuf, Thomas Gleixner, linux-parisc, linux-kernel,
Palmer Dabbelt, Masami Hiramatsu, Colin Ian King, linuxppc-dev
In-Reply-To: <20211012222758.1a029157@oasis.local.home>
On 2021/10/13 上午10:27, Steven Rostedt wrote:
> On Wed, 13 Oct 2021 09:50:17 +0800
> 王贇 <yun.wang@linux.alibaba.com> wrote:
>
>>>> - preempt_enable_notrace();
>>>> ftrace_test_recursion_unlock(bit);
>>>> }
>>>
>>> I don't like this change much. We have preempt_disable there not because
>>> of ftrace_test_recursion, but because of RCU. ftrace_test_recursion was
>>> added later. Yes, it would work with the change, but it would also hide
>>> things which should not be hidden in my opinion.
>>
>> Not very sure about the backgroup stories, but just found this in
>> 'Documentation/trace/ftrace-uses.rst':
>>
>> Note, on success,
>> ftrace_test_recursion_trylock() will disable preemption, and the
>> ftrace_test_recursion_unlock() will enable it again (if it was previously
>> enabled).
>
> Right that part is to be fixed by what you are adding here.
>
> The point that Miroslav is complaining about is that the preemption
> disabling is special in this case, and not just from the recursion
> point of view, which is why the comment is still required.
My bad... the title do confusing people, will rewrite it.
Regards,
Michael Wang
>
> -- Steve
>
>
>>
>> Seems like this lock pair was supposed to take care the preemtion itself?
^ permalink raw reply
* Re: [PATCH 1/2] ftrace: disable preemption on the testing of recursion
From: 王贇 @ 2021-10-13 2:38 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Paul Mackerras, Joe Lawrence,
Helge Deller, x86, linux-csky, Ingo Molnar, Petr Mladek,
Albert Ou, Jiri Kosina, Nicholas Piggin, Borislav Petkov,
Josh Poimboeuf, Thomas Gleixner, linux-parisc, linux-kernel,
Palmer Dabbelt, Masami Hiramatsu, Colin Ian King, linuxppc-dev
In-Reply-To: <20211012223039.78099c24@oasis.local.home>
On 2021/10/13 上午10:30, Steven Rostedt wrote:
> On Wed, 13 Oct 2021 10:04:52 +0800
> 王贇 <yun.wang@linux.alibaba.com> wrote:
>
>> I see, while the user can still check smp_processor_id() after trylock
>> return bit 0...
>
> But preemption would have already been disabled. That's because a bit 0
> means that a recursion check has already been made by a previous
> caller and this one is nested, thus preemption is already disabled.
> If bit is 0, then preemption had better be disabled as well.
Thanks for the explain, now I get your point :-)
Let's make bit 0 an exemption then.
Regards,
Michael Wang
>
> -- Steve
>
^ permalink raw reply
* [PATCH v2 0/2] fix & prevent the missing preemption disabling
From: 王贇 @ 2021-10-13 3:16 UTC (permalink / raw)
To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
Joe Lawrence, Colin Ian King, Masami Hiramatsu,
Peter Zijlstra (Intel), Nicholas Piggin, Jisheng Zhang,
linux-csky, linux-kernel, linux-parisc, linuxppc-dev, linux-riscv,
live-patching
In-Reply-To: <8c7de46d-9869-aa5e-2bb9-5dbc2eda395e@linux.alibaba.com>
The testing show that perf_ftrace_function_call() are using smp_processor_id()
with preemption enabled, all the checking on CPU could be wrong after preemption.
As Peter point out, the section between ftrace_test_recursion_trylock/unlock()
pair require the preemption to be disabled as 'Documentation/trace/ftrace-uses.rst'
explained, but currently the work is done outside of the helpers.
Patch 1/2 will make sure preemption disabled after trylock() succeed,
patch 2/2 will do smp_processor_id() checking after trylock to address the
issue.
Michael Wang (2):
ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
ftrace: do CPU checking after preemption disabled
arch/csky/kernel/probes/ftrace.c | 2 --
arch/parisc/kernel/ftrace.c | 2 --
arch/powerpc/kernel/kprobes-ftrace.c | 2 --
arch/riscv/kernel/probes/ftrace.c | 2 --
arch/x86/kernel/kprobes/ftrace.c | 2 --
include/linux/trace_recursion.h | 22 +++++++++++++++++++++-
kernel/livepatch/patch.c | 6 ------
kernel/trace/trace_event_perf.c | 6 +++---
kernel/trace/trace_functions.c | 5 -----
9 files changed, 24 insertions(+), 25 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH v2 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
From: 王贇 @ 2021-10-13 3:17 UTC (permalink / raw)
To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
Joe Lawrence, Colin Ian King, Masami Hiramatsu,
Peter Zijlstra (Intel), Nicholas Piggin, Jisheng Zhang,
linux-csky, linux-kernel, linux-parisc, linuxppc-dev, linux-riscv,
live-patching
In-Reply-To: <1a8e8d73-b508-f90b-0d82-eb2da45a720e@linux.alibaba.com>
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.
This path will make sure the preemption was disabled when trylock()
succeed, and the unlock() will enable the preemption if previously
enabled.
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
---
arch/csky/kernel/probes/ftrace.c | 2 --
arch/parisc/kernel/ftrace.c | 2 --
arch/powerpc/kernel/kprobes-ftrace.c | 2 --
arch/riscv/kernel/probes/ftrace.c | 2 --
arch/x86/kernel/kprobes/ftrace.c | 2 --
include/linux/trace_recursion.h | 22 +++++++++++++++++++++-
kernel/livepatch/patch.c | 6 ------
kernel/trace/trace_functions.c | 5 -----
8 files changed, 21 insertions(+), 22 deletions(-)
diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index b388228..834cffc 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -17,7 +17,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (!p) {
p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
@@ -57,7 +56,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 0a1e75a..3543496 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -216,7 +216,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -245,7 +244,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
}
__this_cpu_write(current_kprobe, NULL);
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 7154d58..072ebe7 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -26,7 +26,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)nip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -61,7 +60,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
index aab85a8..7142ec4 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -15,7 +15,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -52,7 +51,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 596de2f..dd2ec14 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -25,7 +25,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -59,7 +58,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index a9f9c57..101e1fb 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -208,13 +208,29 @@ static __always_inline void trace_clear_recursion(int bit)
* Use this for ftrace callbacks. This will detect if the function
* tracing recursed in the same context (normal vs interrupt),
*
+ * The ftrace_test_recursion_trylock() will disable preemption,
+ * which is required for the variant of synchronize_rcu() that is
+ * used to allow patching functions where RCU is not watching.
+ * See klp_synchronize_transition() for more details.
+ *
* Returns: -1 if a recursion happened.
* >= 0 if no recursion
*/
static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
unsigned long parent_ip)
{
- return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+ int bit;
+
+ bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+ /*
+ * The zero bit indicate we are nested
+ * in another trylock(), which means the
+ * preemption already disabled.
+ */
+ if (bit > 0)
+ preempt_disable_notrace();
+
+ return bit;
}
/**
@@ -222,9 +238,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
* @bit: The return of a successful ftrace_test_recursion_trylock()
*
* This is used at the end of a ftrace callback.
+ *
+ * Preemption will be enabled (if it was previously enabled).
*/
static __always_inline void ftrace_test_recursion_unlock(int bit)
{
+ if (bit)
+ preempt_enable_notrace();
trace_clear_recursion(bit);
}
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index e8029ae..6e66ccd 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -52,11 +52,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (WARN_ON_ONCE(bit < 0))
return;
- /*
- * A variant of synchronize_rcu() is used to allow patching functions
- * where RCU is not watching, see klp_synchronize_transition().
- */
- preempt_disable_notrace();
func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
stack_node);
@@ -120,7 +115,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
klp_arch_set_pc(fregs, (unsigned long)func->new_func);
unlock:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 1f0e63f..9f1bfbe 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -186,7 +186,6 @@ static void function_trace_start(struct trace_array *tr)
return;
trace_ctx = tracing_gen_ctx();
- preempt_disable_notrace();
cpu = smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
@@ -194,7 +193,6 @@ static void function_trace_start(struct trace_array *tr)
trace_function(tr, ip, parent_ip, trace_ctx);
ftrace_test_recursion_unlock(bit);
- preempt_enable_notrace();
}
#ifdef CONFIG_UNWINDER_ORC
@@ -298,8 +296,6 @@ static inline void process_repeats(struct trace_array *tr,
if (bit < 0)
return;
- preempt_disable_notrace();
-
cpu = smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
if (atomic_read(&data->disabled))
@@ -324,7 +320,6 @@ static inline void process_repeats(struct trace_array *tr,
out:
ftrace_test_recursion_unlock(bit);
- preempt_enable_notrace();
}
static void
--
1.8.3.1
^ permalink raw reply related
* [PATCH v2 2/2] ftrace: do CPU checking after preemption disabled
From: 王贇 @ 2021-10-13 3:18 UTC (permalink / raw)
To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
Joe Lawrence, Colin Ian King, Masami Hiramatsu,
Peter Zijlstra (Intel), Nicholas Piggin, Jisheng Zhang,
linux-csky, linux-kernel, linux-parisc, linuxppc-dev, linux-riscv,
live-patching
In-Reply-To: <1a8e8d73-b508-f90b-0d82-eb2da45a720e@linux.alibaba.com>
With CONFIG_DEBUG_PREEMPT we observed reports like:
BUG: using smp_processor_id() in preemptible
caller is perf_ftrace_function_call+0x6f/0x2e0
CPU: 1 PID: 680 Comm: a.out Not tainted
Call Trace:
<TASK>
dump_stack_lvl+0x8d/0xcf
check_preemption_disabled+0x104/0x110
? optimize_nops.isra.7+0x230/0x230
? text_poke_bp_batch+0x9f/0x310
perf_ftrace_function_call+0x6f/0x2e0
...
__text_poke+0x5/0x620
text_poke_bp_batch+0x9f/0x310
This telling us the CPU could be changed after task is preempted, and
the checking on CPU before preemption will be invalid.
Since now ftrace_test_recursion_trylock() will help to disable the
preemption, this patch just do the checking after trylock() to address
the issue.
CC: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
---
kernel/trace/trace_event_perf.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 6aed10e..fba8cb7 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -441,13 +441,13 @@ void perf_trace_buf_update(void *record, u16 type)
if (!rcu_is_watching())
return;
- if ((unsigned long)ops->private != smp_processor_id())
- return;
-
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
+ if ((unsigned long)ops->private != smp_processor_id())
+ goto out;
+
event = container_of(ops, struct perf_event, ftrace_ops);
/*
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v2 0/2] fix & prevent the missing preemption disabling
From: Steven Rostedt @ 2021-10-13 3:26 UTC (permalink / raw)
To: 王贇
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Paul Mackerras, Joe Lawrence,
Helge Deller, x86, linux-csky, Ingo Molnar, Petr Mladek,
Albert Ou, Jiri Kosina, Nicholas Piggin, Borislav Petkov,
Josh Poimboeuf, Thomas Gleixner, linux-parisc, linux-kernel,
Palmer Dabbelt, Masami Hiramatsu, Colin Ian King, linuxppc-dev
In-Reply-To: <1a8e8d73-b508-f90b-0d82-eb2da45a720e@linux.alibaba.com>
Please start a new thread when sending new versions. v2 should not be a
reply to v1. If you want to reference v1, just add it to the cover
letter with a link tag:
Link: https://lore.kernel.org/all/8c7de46d-9869-aa5e-2bb9-5dbc2eda395e@linux.alibaba.com/
-- Steve
On Wed, 13 Oct 2021 11:16:56 +0800
王贇 <yun.wang@linux.alibaba.com> wrote:
> The testing show that perf_ftrace_function_call() are using smp_processor_id()
> with preemption enabled, all the checking on CPU could be wrong after preemption.
>
> As Peter point out, the section between ftrace_test_recursion_trylock/unlock()
> pair require the preemption to be disabled as 'Documentation/trace/ftrace-uses.rst'
> explained, but currently the work is done outside of the helpers.
>
> Patch 1/2 will make sure preemption disabled after trylock() succeed,
> patch 2/2 will do smp_processor_id() checking after trylock to address the
> issue.
>
> Michael Wang (2):
> ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
> ftrace: do CPU checking after preemption disabled
>
> arch/csky/kernel/probes/ftrace.c | 2 --
> arch/parisc/kernel/ftrace.c | 2 --
> arch/powerpc/kernel/kprobes-ftrace.c | 2 --
> arch/riscv/kernel/probes/ftrace.c | 2 --
> arch/x86/kernel/kprobes/ftrace.c | 2 --
> include/linux/trace_recursion.h | 22 +++++++++++++++++++++-
> kernel/livepatch/patch.c | 6 ------
> kernel/trace/trace_event_perf.c | 6 +++---
> kernel/trace/trace_functions.c | 5 -----
> 9 files changed, 24 insertions(+), 25 deletions(-)
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox