* [PATCH RFC 1/2] driver core: allow EPROBE_DEFER after boot
@ 2017-02-06 19:15 Tadeusz Struk
2017-02-06 19:15 ` [PATCH RFC 2/2] IB/hfi1: Fix port ordering issue in a multiport device Tadeusz Struk
0 siblings, 1 reply; 5+ messages in thread
From: Tadeusz Struk @ 2017-02-06 19:15 UTC (permalink / raw)
To: gregkh
Cc: jgunthorpe, tadeusz.struk, linux-rdma, dennis.dalessandro,
linux-kernel, grant.likely, dledford, ira.weiny
Currently EPROBE_DEFER error code is only honored by the core during
boot time, where the deferred probes are triggered by the late_initcall.
After boot, if a driver returns EPROBE_DEFER for whatever reason, during
manual insmod, it is not handled, as there is nothing to trigger the
driver_deferred_probe_trigger() function.
This change allow drivers to be re-probed after boot.
The deferred_trigger_count counter is not needed as the
driver_deferred_probe_trigger() is safe to call multiple times.
There is also a warning added, which warns if drivers would try to abuse
EPROBE_DEFER causing the core to busy loop.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
---
drivers/base/dd.c | 26 +++++++++-----------------
drivers/base/driver.c | 2 +-
include/linux/device.h | 2 ++
3 files changed, 12 insertions(+), 18 deletions(-)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index a1fbf55..dfee412 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -51,7 +51,6 @@
static DEFINE_MUTEX(deferred_probe_mutex);
static LIST_HEAD(deferred_probe_pending_list);
static LIST_HEAD(deferred_probe_active_list);
-static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
/*
* In some cases, like suspend to RAM or hibernation, It might be reasonable
@@ -142,17 +141,6 @@ static bool driver_deferred_probe_enable = false;
* This functions moves all devices from the pending list to the active
* list and schedules the deferred probe workqueue to process them. It
* should be called anytime a driver is successfully bound to a device.
- *
- * Note, there is a race condition in multi-threaded probe. In the case where
- * more than one device is probing at the same time, it is possible for one
- * probe to complete successfully while another is about to defer. If the second
- * depends on the first, then it will get put on the pending list after the
- * trigger event has already occurred and will be stuck there.
- *
- * The atomic 'deferred_trigger_count' is used to determine if a successful
- * trigger has occurred in the midst of probing a driver. If the trigger count
- * changes in the midst of a probe, then deferred processing should be triggered
- * again.
*/
static void driver_deferred_probe_trigger(void)
{
@@ -165,7 +153,6 @@ static void driver_deferred_probe_trigger(void)
* into the active list so they can be retried by the workqueue
*/
mutex_lock(&deferred_probe_mutex);
- atomic_inc(&deferred_trigger_count);
list_splice_tail_init(&deferred_probe_pending_list,
&deferred_probe_active_list);
mutex_unlock(&deferred_probe_mutex);
@@ -324,7 +311,6 @@ static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
static int really_probe(struct device *dev, struct device_driver *drv)
{
int ret = -EPROBE_DEFER;
- int local_trigger_count = atomic_read(&deferred_trigger_count);
bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
!drv->suppress_bind_attrs;
@@ -384,6 +370,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
ret = drv->probe(dev);
if (ret)
goto probe_failed;
+ else
+ atomic_set(&drv->deferred_probe_ctr, 0);
}
if (test_remove) {
@@ -435,9 +423,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
/* Driver requested deferred probing */
dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
driver_deferred_probe_add(dev);
- /* Did a trigger occur while probing? Need to re-trigger if yes */
- if (local_trigger_count != atomic_read(&deferred_trigger_count))
- driver_deferred_probe_trigger();
+ /* Need to re-trigger, but prevent busy looping */
+ if (atomic_add_return(1, &drv->deferred_probe_ctr) % 10 == 0) {
+ dev_warn(dev, "Driver %s loops on probe deferred\n",
+ drv->name);
+ msleep(1000);
+ }
+ driver_deferred_probe_trigger();
break;
case -ENODEV:
case -ENXIO:
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 4eabfe2..afbd84de 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -174,7 +174,7 @@ int driver_register(struct device_driver *drv)
return ret;
}
kobject_uevent(&drv->p->kobj, KOBJ_ADD);
-
+ atomic_set(&drv->deferred_probe_ctr, 0);
return ret;
}
EXPORT_SYMBOL_GPL(driver_register);
diff --git a/include/linux/device.h b/include/linux/device.h
index 491b4c0c..2992fde 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -284,6 +284,8 @@ struct device_driver {
const struct dev_pm_ops *pm;
struct driver_private *p;
+
+ atomic_t deferred_probe_ctr;
};
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH RFC 2/2] IB/hfi1: Fix port ordering issue in a multiport device
2017-02-06 19:15 [PATCH RFC 1/2] driver core: allow EPROBE_DEFER after boot Tadeusz Struk
@ 2017-02-06 19:15 ` Tadeusz Struk
2017-02-06 19:26 ` Jason Gunthorpe
0 siblings, 1 reply; 5+ messages in thread
From: Tadeusz Struk @ 2017-02-06 19:15 UTC (permalink / raw)
To: gregkh
Cc: jgunthorpe, tadeusz.struk, linux-rdma, dennis.dalessandro,
linux-kernel, grant.likely, dledford, ira.weiny
Some hardware has multiple HFIs within the same ASIC, each one on a
sepatate bus number. In some devices the numbers labeled on the
faceplate of the device don't match the PCI bus order, and the result
is that the devices (ports) are probed in the opposite order of their
port numbers. The result is IB device unit numbers are in reverse order
from the faceplate numbering. This leads to confusion, and errors.
Use EPROBE_DEFER error code to enforce correct port order.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
---
drivers/infiniband/hw/hfi1/chip.c | 95 ++++++++++++++++++++++++++++++++++++-
drivers/infiniband/hw/hfi1/chip.h | 2 -
drivers/infiniband/hw/hfi1/init.c | 1
3 files changed, 94 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index ef72bc2..45beeae 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -14365,6 +14365,79 @@ static int check_int_registers(struct hfi1_devdata *dd)
return -EINVAL;
}
+struct deferred_dev {
+ struct list_head list;
+ u64 guid;
+ u32 bdf;
+ bool deferred;
+};
+
+static LIST_HEAD(device_list);
+static DEFINE_MUTEX(device_list_lock); /* protects device_list */
+#define HFI_ASIC_GUID(guid) ((guid) & ~(1ULL << GUID_HFI_INDEX_SHIFT))
+
+static int hfi1_need_defer(struct hfi1_devdata *dd)
+{
+ struct deferred_dev *def_dev = NULL;
+ struct pci_dev *pdev = dd->pcidev;
+ static bool deferred;
+ static u64 deferred_guid;
+ bool found_deferred = false;
+
+ mutex_lock(&device_list_lock);
+ /* Try to find an re-probed device first based on the BDF address */
+ list_for_each_entry(def_dev, &device_list, list) {
+ if (def_dev->bdf == (pdev->bus->number << 8 | pdev->devfn)) {
+ found_deferred = true;
+ break;
+ }
+ }
+
+ /* If not found then allocate one */
+ if (!found_deferred) {
+ def_dev = kzalloc(sizeof(*def_dev), GFP_KERNEL);
+
+ if (!def_dev) {
+ mutex_unlock(&device_list_lock);
+ return -ENOMEM;
+ }
+
+ def_dev->guid = dd->base_guid;
+ def_dev->bdf = pdev->bus->number << 8 | pdev->devfn;
+ def_dev->deferred = false;
+ list_add_tail(&def_dev->list, &device_list);
+ }
+
+ /*
+ * If device has been already deferred we need to wait for the other
+ * port and defer other devices until we get to the other port or
+ * back the same device
+ */
+ if (deferred && (HFI_ASIC_GUID(deferred_guid) != dd->base_guid) &&
+ !def_dev->deferred) {
+ def_dev->deferred = true;
+ mutex_unlock(&device_list_lock);
+ return -EPROBE_DEFER;
+ } else if (!deferred && ((dd->base_guid >> GUID_HFI_INDEX_SHIFT) & 1)) {
+ /*
+ * If no device is currently deferred check by guid if we need
+ * to defer this one and if so store the deferred guid to find
+ * the sister port by the guid
+ */
+ deferred = true;
+ def_dev->deferred = true;
+ deferred_guid = dd->base_guid;
+ mutex_unlock(&device_list_lock);
+ return -EPROBE_DEFER;
+ }
+
+ deferred = false;
+ def_dev->deferred = false;
+ mutex_unlock(&device_list_lock);
+
+ return 0;
+}
+
/**
* Allocate and initialize the device structure for the hfi.
* @dev: the pci_dev for hfi1_ib device
@@ -14456,6 +14529,13 @@ struct hfi1_devdata *hfi1_init_dd(struct pci_dev *pdev,
if (ret < 0)
goto bail_free;
+ /* needs to be done before we look for the peer device */
+ read_guid(dd);
+
+ ret = hfi1_need_defer(dd);
+ if (ret)
+ goto bail_cleanup;
+
/* verify that reads actually work, save revision for reset check */
dd->revision = read_csr(dd, CCE_REVISION);
if (dd->revision == ~(u64)0) {
@@ -14539,9 +14619,6 @@ struct hfi1_devdata *hfi1_init_dd(struct pci_dev *pdev,
else if (dd->rcv_intr_timeout_csr == 0 && rcv_intr_timeout)
dd->rcv_intr_timeout_csr = 1;
- /* needs to be done before we look for the peer device */
- read_guid(dd);
-
/* set up shared ASIC data with peer device */
ret = init_asic_data(dd);
if (ret)
@@ -14702,6 +14779,18 @@ struct hfi1_devdata *hfi1_init_dd(struct pci_dev *pdev,
return dd;
}
+void hfi1_device_list_cleanup(void)
+{
+ struct deferred_dev *dev, *tmp;
+
+ mutex_lock(&device_list_lock);
+ list_for_each_entry_safe(dev, tmp, &device_list, list) {
+ list_del(&dev->list);
+ kfree(dev);
+ }
+ mutex_unlock(&device_list_lock);
+}
+
static u16 delay_cycles(struct hfi1_pportdata *ppd, u32 desired_egress_rate,
u32 dw_len)
{
diff --git a/drivers/infiniband/hw/hfi1/chip.h b/drivers/infiniband/hw/hfi1/chip.h
index 043fd21..3971b6d 100644
--- a/drivers/infiniband/hw/hfi1/chip.h
+++ b/drivers/infiniband/hw/hfi1/chip.h
@@ -1358,7 +1358,7 @@ int hfi1_clear_ctxt_jkey(struct hfi1_devdata *dd, unsigned ctxt);
int hfi1_set_ctxt_pkey(struct hfi1_devdata *dd, unsigned ctxt, u16 pkey);
int hfi1_clear_ctxt_pkey(struct hfi1_devdata *dd, unsigned ctxt);
void hfi1_read_link_quality(struct hfi1_devdata *dd, u8 *link_quality);
-
+void hfi1_device_list_cleanup(void);
/*
* Interrupt source table.
*
diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
index e3b5bc9..ae6160d 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -1298,6 +1298,7 @@ static void __exit hfi1_mod_cleanup(void)
idr_destroy(&hfi1_unit_table);
dispose_firmware(); /* asymmetric with obtain_firmware() */
dev_cleanup();
+ hfi1_device_list_cleanup();
}
module_exit(hfi1_mod_cleanup);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH RFC 2/2] IB/hfi1: Fix port ordering issue in a multiport device
2017-02-06 19:15 ` [PATCH RFC 2/2] IB/hfi1: Fix port ordering issue in a multiport device Tadeusz Struk
@ 2017-02-06 19:26 ` Jason Gunthorpe
2017-02-06 21:04 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2017-02-06 19:26 UTC (permalink / raw)
To: Tadeusz Struk
Cc: gregkh, linux-rdma, dennis.dalessandro, linux-kernel,
grant.likely, dledford, ira.weiny
On Mon, Feb 06, 2017 at 11:15:33AM -0800, Tadeusz Struk wrote:
> Some hardware has multiple HFIs within the same ASIC, each one on a
> sepatate bus number. In some devices the numbers labeled on the
> faceplate of the device don't match the PCI bus order, and the result
> is that the devices (ports) are probed in the opposite order of their
> port numbers. The result is IB device unit numbers are in reverse order
> from the faceplate numbering. This leads to confusion, and errors.
> Use EPROBE_DEFER error code to enforce correct port order.
>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
> drivers/infiniband/hw/hfi1/chip.c | 95 ++++++++++++++++++++++++++++++++++++-
> drivers/infiniband/hw/hfi1/chip.h | 2 -
> drivers/infiniband/hw/hfi1/init.c | 1
> 3 files changed, 94 insertions(+), 4 deletions(-)
Still no on this from me, this is a horrible abuse of the device core.
Jason
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC 2/2] IB/hfi1: Fix port ordering issue in a multiport device
2017-02-06 19:26 ` Jason Gunthorpe
@ 2017-02-06 21:04 ` Christoph Hellwig
2017-02-07 9:13 ` Greg KH
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2017-02-06 21:04 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Tadeusz Struk, gregkh, linux-rdma, dennis.dalessandro,
linux-kernel, grant.likely, dledford, ira.weiny
On Mon, Feb 06, 2017 at 12:26:42PM -0700, Jason Gunthorpe wrote:
> Still no on this from me, this is a horrible abuse of the device core.
Seconded. Netdevice (and thus RDMA) devices names generated by the
kernel are not stable, period. Use udev and the MAC / port GUID or
whatever the OFI equivalent is to generate stable device names.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC 2/2] IB/hfi1: Fix port ordering issue in a multiport device
2017-02-06 21:04 ` Christoph Hellwig
@ 2017-02-07 9:13 ` Greg KH
0 siblings, 0 replies; 5+ messages in thread
From: Greg KH @ 2017-02-07 9:13 UTC (permalink / raw)
To: Christoph Hellwig, Jason Gunthorpe, Tadeusz Struk, linux-rdma,
dennis.dalessandro, linux-kernel, grant.likely, dledford,
ira.weiny
On Mon, Feb 06, 2017 at 01:04:25PM -0800, Christoph Hellwig wrote:
> On Mon, Feb 06, 2017 at 12:26:42PM -0700, Jason Gunthorpe wrote:
> > Still no on this from me, this is a horrible abuse of the device core.
>
> Seconded. Netdevice (and thus RDMA) devices names generated by the
> kernel are not stable, period. Use udev and the MAC / port GUID or
> whatever the OFI equivalent is to generate stable device names.
Thirded :)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-02-07 9:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-06 19:15 [PATCH RFC 1/2] driver core: allow EPROBE_DEFER after boot Tadeusz Struk
2017-02-06 19:15 ` [PATCH RFC 2/2] IB/hfi1: Fix port ordering issue in a multiport device Tadeusz Struk
2017-02-06 19:26 ` Jason Gunthorpe
2017-02-06 21:04 ` Christoph Hellwig
2017-02-07 9:13 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox