linux-fpga.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/2] fpga: improve protection against low-level modules unloading
@ 2023-11-24 16:28 Marco Pagani
  2023-11-24 16:28 ` [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops Marco Pagani
  2023-11-24 16:28 ` [RFC PATCH v2 2/2] fpga: set owner of fpga_manager_ops for existing low-level modules Marco Pagani
  0 siblings, 2 replies; 9+ messages in thread
From: Marco Pagani @ 2023-11-24 16:28 UTC (permalink / raw)
  To: Moritz Fischer, Wu Hao, Xu Yilun, Tom Rix, Greg Kroah-Hartman
  Cc: Marco Pagani, linux-kernel, linux-fpga

This RFC proposes a possible solution to keep protecting the fpga
manager against the sudden unloading of low-level control modules while
addressing the limitations of the current implementation. Currently, the
code assumes that the low-level module registers a platform driver for
the parent device, which can later be used by the fpga manager (through
the parent device pointer) to take the low-level module's refcount. This
proposal removes this limitation by adding a module owner field to the
fpga_manager and fpga_manager_ops structs. Low-level control modules can
statically set the owner field to THIS_MODULE. Later, the fpga manager
can use the owner field to take the module's refcount.

For more context, please refer to these threads:
https://lore.kernel.org/linux-fpga/ZS6hhlvjUcqyv8zL@yilunxu-OptiPlex-7050
https://lore.kernel.org/linux-fpga/ZT9qENE9fE3Z0KCW@yilunxu-OptiPlex-7050

v2:
- Fixed protection against races during module removal

Marco Pagani (2):
  fpga: add a module owner field to fpga_manager and fpga_manager_ops
  fpga: set owner of fpga_manager_ops for existing low-level modules

 drivers/fpga/altera-cvp.c             |  1 +
 drivers/fpga/altera-pr-ip-core.c      |  1 +
 drivers/fpga/altera-ps-spi.c          |  1 +
 drivers/fpga/dfl-fme-mgr.c            |  1 +
 drivers/fpga/fpga-mgr.c               | 56 +++++++++++++++++++--------
 drivers/fpga/ice40-spi.c              |  1 +
 drivers/fpga/lattice-sysconfig.c      |  1 +
 drivers/fpga/machxo2-spi.c            |  1 +
 drivers/fpga/microchip-spi.c          |  1 +
 drivers/fpga/socfpga-a10.c            |  1 +
 drivers/fpga/socfpga.c                |  1 +
 drivers/fpga/stratix10-soc.c          |  1 +
 drivers/fpga/tests/fpga-mgr-test.c    |  1 +
 drivers/fpga/tests/fpga-region-test.c |  1 +
 drivers/fpga/ts73xx-fpga.c            |  1 +
 drivers/fpga/versal-fpga.c            |  1 +
 drivers/fpga/xilinx-spi.c             |  1 +
 drivers/fpga/zynq-fpga.c              |  1 +
 drivers/fpga/zynqmp-fpga.c            |  1 +
 include/linux/fpga/fpga-mgr.h         |  4 ++
 20 files changed, 62 insertions(+), 16 deletions(-)


base-commit: 98b1cc82c4affc16f5598d4fa14b1858671b2263
-- 
2.42.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops
  2023-11-24 16:28 [RFC PATCH v2 0/2] fpga: improve protection against low-level modules unloading Marco Pagani
@ 2023-11-24 16:28 ` Marco Pagani
  2023-11-25  9:11   ` Xu Yilun
  2023-11-24 16:28 ` [RFC PATCH v2 2/2] fpga: set owner of fpga_manager_ops for existing low-level modules Marco Pagani
  1 sibling, 1 reply; 9+ messages in thread
From: Marco Pagani @ 2023-11-24 16:28 UTC (permalink / raw)
  To: Moritz Fischer, Wu Hao, Xu Yilun, Tom Rix, Greg Kroah-Hartman
  Cc: Marco Pagani, linux-kernel, linux-fpga

Add a module *owner field to the fpga_manager_ops and fpga_manager
structs to protect the fpga manager against the unloading of the
low-level control module while someone is holding a reference to the
manager device. Low-level control modules should statically set the
owner field of the fpga_manager_ops struct to THIS_MODULE. Then, when
the manager is registered using fpga_mgr_register(), the value is copied
into the owner field of the fpga_manager struct (that contains the
device context). In this way, the manager can later use it in
fpga_mgr_get() to take the low-level module's refcount. To prevent races
while unloading the low-level control module, fpga_mgr_get() and part of
the fpga_mgr_unregister() methods are protected with a mutex.

Other changes: move put_device() from __fpga_mgr_get() to fpga_mgr_get()
and of_fpga_mgr_get() to improve code clarity.

Fixes: 654ba4cc0f3e ("fpga manager: ensure lifetime with of_fpga_mgr_get")
Signed-off-by: Marco Pagani <marpagan@redhat.com>
---
 drivers/fpga/fpga-mgr.c       | 56 +++++++++++++++++++++++++----------
 include/linux/fpga/fpga-mgr.h |  4 +++
 2 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c
index 06651389c592..608605d59860 100644
--- a/drivers/fpga/fpga-mgr.c
+++ b/drivers/fpga/fpga-mgr.c
@@ -21,6 +21,8 @@
 static DEFINE_IDA(fpga_mgr_ida);
 static const struct class fpga_mgr_class;
 
+static DEFINE_MUTEX(mgr_lock);
+
 struct fpga_mgr_devres {
 	struct fpga_manager *mgr;
 };
@@ -667,17 +669,15 @@ ATTRIBUTE_GROUPS(fpga_mgr);
 static struct fpga_manager *__fpga_mgr_get(struct device *dev)
 {
 	struct fpga_manager *mgr;
+	struct module *owner;
 
 	mgr = to_fpga_manager(dev);
+	owner = mgr->owner;
 
-	if (!try_module_get(dev->parent->driver->owner))
-		goto err_dev;
+	if (owner && !try_module_get(owner))
+		mgr = ERR_PTR(-ENODEV);
 
 	return mgr;
-
-err_dev:
-	put_device(dev);
-	return ERR_PTR(-ENODEV);
 }
 
 static int fpga_mgr_dev_match(struct device *dev, const void *data)
@@ -693,12 +693,22 @@ static int fpga_mgr_dev_match(struct device *dev, const void *data)
  */
 struct fpga_manager *fpga_mgr_get(struct device *dev)
 {
-	struct device *mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev,
-						   fpga_mgr_dev_match);
+	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
+	struct device *mgr_dev;
+
+	mutex_lock(&mgr_lock);
+
+	mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev, fpga_mgr_dev_match);
 	if (!mgr_dev)
-		return ERR_PTR(-ENODEV);
+		goto out;
+
+	mgr = __fpga_mgr_get(mgr_dev);
+	if (IS_ERR(mgr))
+		put_device(mgr_dev);
 
-	return __fpga_mgr_get(mgr_dev);
+out:
+	mutex_unlock(&mgr_lock);
+	return mgr;
 }
 EXPORT_SYMBOL_GPL(fpga_mgr_get);
 
@@ -711,13 +721,22 @@ EXPORT_SYMBOL_GPL(fpga_mgr_get);
  */
 struct fpga_manager *of_fpga_mgr_get(struct device_node *node)
 {
-	struct device *dev;
+	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
+	struct device *mgr_dev;
+
+	mutex_lock(&mgr_lock);
+
+	mgr_dev = class_find_device_by_of_node(&fpga_mgr_class, node);
+	if (!mgr_dev)
+		goto out;
 
-	dev = class_find_device_by_of_node(&fpga_mgr_class, node);
-	if (!dev)
-		return ERR_PTR(-ENODEV);
+	mgr = __fpga_mgr_get(mgr_dev);
+	if (IS_ERR(mgr))
+		put_device(mgr_dev);
 
-	return __fpga_mgr_get(dev);
+out:
+	mutex_unlock(&mgr_lock);
+	return mgr;
 }
 EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
 
@@ -727,7 +746,7 @@ EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
  */
 void fpga_mgr_put(struct fpga_manager *mgr)
 {
-	module_put(mgr->dev.parent->driver->owner);
+	module_put(mgr->owner);
 	put_device(&mgr->dev);
 }
 EXPORT_SYMBOL_GPL(fpga_mgr_put);
@@ -806,6 +825,7 @@ fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *in
 
 	mgr->name = info->name;
 	mgr->mops = info->mops;
+	mgr->owner = info->mops->owner;
 	mgr->priv = info->priv;
 	mgr->compat_id = info->compat_id;
 
@@ -888,7 +908,11 @@ void fpga_mgr_unregister(struct fpga_manager *mgr)
 	 */
 	fpga_mgr_fpga_remove(mgr);
 
+	mutex_lock(&mgr_lock);
+
 	device_unregister(&mgr->dev);
+
+	mutex_unlock(&mgr_lock);
 }
 EXPORT_SYMBOL_GPL(fpga_mgr_unregister);
 
diff --git a/include/linux/fpga/fpga-mgr.h b/include/linux/fpga/fpga-mgr.h
index 54f63459efd6..eaf6e072dbc0 100644
--- a/include/linux/fpga/fpga-mgr.h
+++ b/include/linux/fpga/fpga-mgr.h
@@ -162,6 +162,7 @@ struct fpga_manager_info {
  * @write_complete: set FPGA to operating state after writing is done
  * @fpga_remove: optional: Set FPGA into a specific state during driver remove
  * @groups: optional attribute groups.
+ * @owner: owner module.
  *
  * fpga_manager_ops are the low level functions implemented by a specific
  * fpga manager driver.  The optional ones are tested for NULL before being
@@ -184,6 +185,7 @@ struct fpga_manager_ops {
 			      struct fpga_image_info *info);
 	void (*fpga_remove)(struct fpga_manager *mgr);
 	const struct attribute_group **groups;
+	struct module *owner;
 };
 
 /* FPGA manager status: Partial/Full Reconfiguration errors */
@@ -201,6 +203,7 @@ struct fpga_manager_ops {
  * @state: state of fpga manager
  * @compat_id: FPGA manager id for compatibility check.
  * @mops: pointer to struct of fpga manager ops
+ * @owner: owner module.
  * @priv: low level driver private date
  */
 struct fpga_manager {
@@ -210,6 +213,7 @@ struct fpga_manager {
 	enum fpga_mgr_states state;
 	struct fpga_compat_id *compat_id;
 	const struct fpga_manager_ops *mops;
+	struct module *owner;
 	void *priv;
 };
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 2/2] fpga: set owner of fpga_manager_ops for existing low-level modules
  2023-11-24 16:28 [RFC PATCH v2 0/2] fpga: improve protection against low-level modules unloading Marco Pagani
  2023-11-24 16:28 ` [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops Marco Pagani
@ 2023-11-24 16:28 ` Marco Pagani
  1 sibling, 0 replies; 9+ messages in thread
From: Marco Pagani @ 2023-11-24 16:28 UTC (permalink / raw)
  To: Moritz Fischer, Wu Hao, Xu Yilun, Tom Rix, Greg Kroah-Hartman
  Cc: Marco Pagani, linux-kernel, linux-fpga

This patch tentatively set the owner field of fpga_manager_ops to
THIS_MODULE for existing fpga manager low-level control modules.

Signed-off-by: Marco Pagani <marpagan@redhat.com>
---
 drivers/fpga/altera-cvp.c             | 1 +
 drivers/fpga/altera-pr-ip-core.c      | 1 +
 drivers/fpga/altera-ps-spi.c          | 1 +
 drivers/fpga/dfl-fme-mgr.c            | 1 +
 drivers/fpga/ice40-spi.c              | 1 +
 drivers/fpga/lattice-sysconfig.c      | 1 +
 drivers/fpga/machxo2-spi.c            | 1 +
 drivers/fpga/microchip-spi.c          | 1 +
 drivers/fpga/socfpga-a10.c            | 1 +
 drivers/fpga/socfpga.c                | 1 +
 drivers/fpga/stratix10-soc.c          | 1 +
 drivers/fpga/tests/fpga-mgr-test.c    | 1 +
 drivers/fpga/tests/fpga-region-test.c | 1 +
 drivers/fpga/ts73xx-fpga.c            | 1 +
 drivers/fpga/versal-fpga.c            | 1 +
 drivers/fpga/xilinx-spi.c             | 1 +
 drivers/fpga/zynq-fpga.c              | 1 +
 drivers/fpga/zynqmp-fpga.c            | 1 +
 18 files changed, 18 insertions(+)

diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
index 4ffb9da537d8..aeb913547dd8 100644
--- a/drivers/fpga/altera-cvp.c
+++ b/drivers/fpga/altera-cvp.c
@@ -520,6 +520,7 @@ static const struct fpga_manager_ops altera_cvp_ops = {
 	.write_init	= altera_cvp_write_init,
 	.write		= altera_cvp_write,
 	.write_complete	= altera_cvp_write_complete,
+	.owner		= THIS_MODULE,
 };
 
 static const struct cvp_priv cvp_priv_v1 = {
diff --git a/drivers/fpga/altera-pr-ip-core.c b/drivers/fpga/altera-pr-ip-core.c
index df8671af4a92..354221c609e6 100644
--- a/drivers/fpga/altera-pr-ip-core.c
+++ b/drivers/fpga/altera-pr-ip-core.c
@@ -171,6 +171,7 @@ static const struct fpga_manager_ops alt_pr_ops = {
 	.write_init = alt_pr_fpga_write_init,
 	.write = alt_pr_fpga_write,
 	.write_complete = alt_pr_fpga_write_complete,
+	.owner = THIS_MODULE,
 };
 
 int alt_pr_register(struct device *dev, void __iomem *reg_base)
diff --git a/drivers/fpga/altera-ps-spi.c b/drivers/fpga/altera-ps-spi.c
index 740980e7cef8..3be05796a6fc 100644
--- a/drivers/fpga/altera-ps-spi.c
+++ b/drivers/fpga/altera-ps-spi.c
@@ -228,6 +228,7 @@ static const struct fpga_manager_ops altera_ps_ops = {
 	.write_init = altera_ps_write_init,
 	.write = altera_ps_write,
 	.write_complete = altera_ps_write_complete,
+	.owner = THIS_MODULE,
 };
 
 static int altera_ps_probe(struct spi_device *spi)
diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
index ab228d8837a0..740ce82e3ac9 100644
--- a/drivers/fpga/dfl-fme-mgr.c
+++ b/drivers/fpga/dfl-fme-mgr.c
@@ -264,6 +264,7 @@ static const struct fpga_manager_ops fme_mgr_ops = {
 	.write = fme_mgr_write,
 	.write_complete = fme_mgr_write_complete,
 	.status = fme_mgr_status,
+	.owner = THIS_MODULE,
 };
 
 static void fme_mgr_get_compat_id(void __iomem *fme_pr,
diff --git a/drivers/fpga/ice40-spi.c b/drivers/fpga/ice40-spi.c
index 7cbb3558b844..97afa6dc5d76 100644
--- a/drivers/fpga/ice40-spi.c
+++ b/drivers/fpga/ice40-spi.c
@@ -130,6 +130,7 @@ static const struct fpga_manager_ops ice40_fpga_ops = {
 	.write_init = ice40_fpga_ops_write_init,
 	.write = ice40_fpga_ops_write,
 	.write_complete = ice40_fpga_ops_write_complete,
+	.owner = THIS_MODULE,
 };
 
 static int ice40_fpga_probe(struct spi_device *spi)
diff --git a/drivers/fpga/lattice-sysconfig.c b/drivers/fpga/lattice-sysconfig.c
index ba51a60f672f..1393cdd11e49 100644
--- a/drivers/fpga/lattice-sysconfig.c
+++ b/drivers/fpga/lattice-sysconfig.c
@@ -348,6 +348,7 @@ static const struct fpga_manager_ops sysconfig_fpga_mgr_ops = {
 	.write_init = sysconfig_ops_write_init,
 	.write = sysconfig_ops_write,
 	.write_complete = sysconfig_ops_write_complete,
+	.owner = THIS_MODULE,
 };
 
 int sysconfig_probe(struct sysconfig_priv *priv)
diff --git a/drivers/fpga/machxo2-spi.c b/drivers/fpga/machxo2-spi.c
index 905607992a12..46193a47f863 100644
--- a/drivers/fpga/machxo2-spi.c
+++ b/drivers/fpga/machxo2-spi.c
@@ -358,6 +358,7 @@ static const struct fpga_manager_ops machxo2_ops = {
 	.write_init = machxo2_write_init,
 	.write = machxo2_write,
 	.write_complete = machxo2_write_complete,
+	.owner = THIS_MODULE,
 };
 
 static int machxo2_spi_probe(struct spi_device *spi)
diff --git a/drivers/fpga/microchip-spi.c b/drivers/fpga/microchip-spi.c
index 2a82c726d6e5..023ccdf2d5da 100644
--- a/drivers/fpga/microchip-spi.c
+++ b/drivers/fpga/microchip-spi.c
@@ -362,6 +362,7 @@ static const struct fpga_manager_ops mpf_ops = {
 	.write_init = mpf_ops_write_init,
 	.write = mpf_ops_write,
 	.write_complete = mpf_ops_write_complete,
+	.owner = THIS_MODULE,
 };
 
 static int mpf_probe(struct spi_device *spi)
diff --git a/drivers/fpga/socfpga-a10.c b/drivers/fpga/socfpga-a10.c
index cc4861e345c9..a8ab74b30006 100644
--- a/drivers/fpga/socfpga-a10.c
+++ b/drivers/fpga/socfpga-a10.c
@@ -463,6 +463,7 @@ static const struct fpga_manager_ops socfpga_a10_fpga_mgr_ops = {
 	.write_init = socfpga_a10_fpga_write_init,
 	.write = socfpga_a10_fpga_write,
 	.write_complete = socfpga_a10_fpga_write_complete,
+	.owner = THIS_MODULE,
 };
 
 static int socfpga_a10_fpga_probe(struct platform_device *pdev)
diff --git a/drivers/fpga/socfpga.c b/drivers/fpga/socfpga.c
index 723ea0ad3f09..87f3f4a367d0 100644
--- a/drivers/fpga/socfpga.c
+++ b/drivers/fpga/socfpga.c
@@ -538,6 +538,7 @@ static const struct fpga_manager_ops socfpga_fpga_ops = {
 	.write_init = socfpga_fpga_ops_configure_init,
 	.write = socfpga_fpga_ops_configure_write,
 	.write_complete = socfpga_fpga_ops_configure_complete,
+	.owner = THIS_MODULE,
 };
 
 static int socfpga_fpga_probe(struct platform_device *pdev)
diff --git a/drivers/fpga/stratix10-soc.c b/drivers/fpga/stratix10-soc.c
index cacb9cc5757e..63a5a2fe4911 100644
--- a/drivers/fpga/stratix10-soc.c
+++ b/drivers/fpga/stratix10-soc.c
@@ -393,6 +393,7 @@ static const struct fpga_manager_ops s10_ops = {
 	.write_init = s10_ops_write_init,
 	.write = s10_ops_write,
 	.write_complete = s10_ops_write_complete,
+	.owner = THIS_MODULE,
 };
 
 static int s10_probe(struct platform_device *pdev)
diff --git a/drivers/fpga/tests/fpga-mgr-test.c b/drivers/fpga/tests/fpga-mgr-test.c
index 6acec55b60ce..4c2a3e98f8ad 100644
--- a/drivers/fpga/tests/fpga-mgr-test.c
+++ b/drivers/fpga/tests/fpga-mgr-test.c
@@ -187,6 +187,7 @@ static const struct fpga_manager_ops fake_mgr_ops = {
 	.write = op_write,
 	.write_sg = op_write_sg,
 	.write_complete = op_write_complete,
+	.owner = THIS_MODULE,
 };
 
 static void fpga_mgr_test_get(struct kunit *test)
diff --git a/drivers/fpga/tests/fpga-region-test.c b/drivers/fpga/tests/fpga-region-test.c
index baab07e3fc59..2705c1b33d09 100644
--- a/drivers/fpga/tests/fpga-region-test.c
+++ b/drivers/fpga/tests/fpga-region-test.c
@@ -52,6 +52,7 @@ static int op_write(struct fpga_manager *mgr, const char *buf, size_t count)
  */
 static const struct fpga_manager_ops fake_mgr_ops = {
 	.write = op_write,
+	.owner = THIS_MODULE,
 };
 
 static int op_enable_set(struct fpga_bridge *bridge, bool enable)
diff --git a/drivers/fpga/ts73xx-fpga.c b/drivers/fpga/ts73xx-fpga.c
index 4e1d2a4d3df4..20b8db0d150a 100644
--- a/drivers/fpga/ts73xx-fpga.c
+++ b/drivers/fpga/ts73xx-fpga.c
@@ -96,6 +96,7 @@ static const struct fpga_manager_ops ts73xx_fpga_ops = {
 	.write_init	= ts73xx_fpga_write_init,
 	.write		= ts73xx_fpga_write,
 	.write_complete	= ts73xx_fpga_write_complete,
+	.owner		= THIS_MODULE,
 };
 
 static int ts73xx_fpga_probe(struct platform_device *pdev)
diff --git a/drivers/fpga/versal-fpga.c b/drivers/fpga/versal-fpga.c
index 3710e8f01be2..02fd8ed36ff0 100644
--- a/drivers/fpga/versal-fpga.c
+++ b/drivers/fpga/versal-fpga.c
@@ -40,6 +40,7 @@ static int versal_fpga_ops_write(struct fpga_manager *mgr,
 static const struct fpga_manager_ops versal_fpga_ops = {
 	.write_init = versal_fpga_ops_write_init,
 	.write = versal_fpga_ops_write,
+	.owner = THIS_MODULE,
 };
 
 static int versal_fpga_probe(struct platform_device *pdev)
diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
index e1a227e7ff2a..d58cf0ccbd41 100644
--- a/drivers/fpga/xilinx-spi.c
+++ b/drivers/fpga/xilinx-spi.c
@@ -218,6 +218,7 @@ static const struct fpga_manager_ops xilinx_spi_ops = {
 	.write_init = xilinx_spi_write_init,
 	.write = xilinx_spi_write,
 	.write_complete = xilinx_spi_write_complete,
+	.owner = THIS_MODULE,
 };
 
 static int xilinx_spi_probe(struct spi_device *spi)
diff --git a/drivers/fpga/zynq-fpga.c b/drivers/fpga/zynq-fpga.c
index 96611d424a10..241e1fe48a13 100644
--- a/drivers/fpga/zynq-fpga.c
+++ b/drivers/fpga/zynq-fpga.c
@@ -548,6 +548,7 @@ static const struct fpga_manager_ops zynq_fpga_ops = {
 	.write_init = zynq_fpga_ops_write_init,
 	.write_sg = zynq_fpga_ops_write,
 	.write_complete = zynq_fpga_ops_write_complete,
+	.owner = THIS_MODULE,
 };
 
 static int zynq_fpga_probe(struct platform_device *pdev)
diff --git a/drivers/fpga/zynqmp-fpga.c b/drivers/fpga/zynqmp-fpga.c
index f3434e2c487b..2f66400d2330 100644
--- a/drivers/fpga/zynqmp-fpga.c
+++ b/drivers/fpga/zynqmp-fpga.c
@@ -101,6 +101,7 @@ static const struct fpga_manager_ops zynqmp_fpga_ops = {
 	.state = zynqmp_fpga_ops_state,
 	.write_init = zynqmp_fpga_ops_write_init,
 	.write = zynqmp_fpga_ops_write,
+	.owner = THIS_MODULE,
 };
 
 static int zynqmp_fpga_probe(struct platform_device *pdev)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops
  2023-11-24 16:28 ` [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops Marco Pagani
@ 2023-11-25  9:11   ` Xu Yilun
  2023-11-30 10:42     ` Marco Pagani
  0 siblings, 1 reply; 9+ messages in thread
From: Xu Yilun @ 2023-11-25  9:11 UTC (permalink / raw)
  To: Marco Pagani
  Cc: Moritz Fischer, Wu Hao, Xu Yilun, Tom Rix, Greg Kroah-Hartman,
	linux-kernel, linux-fpga

On Fri, Nov 24, 2023 at 05:28:06PM +0100, Marco Pagani wrote:
> Add a module *owner field to the fpga_manager_ops and fpga_manager
> structs to protect the fpga manager against the unloading of the
> low-level control module while someone is holding a reference to the
> manager device. Low-level control modules should statically set the
> owner field of the fpga_manager_ops struct to THIS_MODULE. Then, when
> the manager is registered using fpga_mgr_register(), the value is copied
> into the owner field of the fpga_manager struct (that contains the
> device context). In this way, the manager can later use it in
> fpga_mgr_get() to take the low-level module's refcount. To prevent races
> while unloading the low-level control module, fpga_mgr_get() and part of
> the fpga_mgr_unregister() methods are protected with a mutex.
> 
> Other changes: move put_device() from __fpga_mgr_get() to fpga_mgr_get()
> and of_fpga_mgr_get() to improve code clarity.
> 
> Fixes: 654ba4cc0f3e ("fpga manager: ensure lifetime with of_fpga_mgr_get")
> Signed-off-by: Marco Pagani <marpagan@redhat.com>
> ---
>  drivers/fpga/fpga-mgr.c       | 56 +++++++++++++++++++++++++----------
>  include/linux/fpga/fpga-mgr.h |  4 +++
>  2 files changed, 44 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c
> index 06651389c592..608605d59860 100644
> --- a/drivers/fpga/fpga-mgr.c
> +++ b/drivers/fpga/fpga-mgr.c
> @@ -21,6 +21,8 @@
>  static DEFINE_IDA(fpga_mgr_ida);
>  static const struct class fpga_mgr_class;
>  
> +static DEFINE_MUTEX(mgr_lock);
> +
>  struct fpga_mgr_devres {
>  	struct fpga_manager *mgr;
>  };
> @@ -667,17 +669,15 @@ ATTRIBUTE_GROUPS(fpga_mgr);
>  static struct fpga_manager *__fpga_mgr_get(struct device *dev)
>  {
>  	struct fpga_manager *mgr;
> +	struct module *owner;
>  
>  	mgr = to_fpga_manager(dev);
> +	owner = mgr->owner;
>  
> -	if (!try_module_get(dev->parent->driver->owner))
> -		goto err_dev;
> +	if (owner && !try_module_get(owner))

No need to test owner == NULL, try_module_get() does this.

> +		mgr = ERR_PTR(-ENODEV);
>  
>  	return mgr;
> -
> -err_dev:
> -	put_device(dev);
> -	return ERR_PTR(-ENODEV);
>  }
>  
>  static int fpga_mgr_dev_match(struct device *dev, const void *data)
> @@ -693,12 +693,22 @@ static int fpga_mgr_dev_match(struct device *dev, const void *data)
>   */
>  struct fpga_manager *fpga_mgr_get(struct device *dev)
>  {
> -	struct device *mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev,
> -						   fpga_mgr_dev_match);
> +	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
> +	struct device *mgr_dev;
> +
> +	mutex_lock(&mgr_lock);
> +
> +	mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev, fpga_mgr_dev_match);
>  	if (!mgr_dev)
> -		return ERR_PTR(-ENODEV);
> +		goto out;
> +
> +	mgr = __fpga_mgr_get(mgr_dev);
> +	if (IS_ERR(mgr))
> +		put_device(mgr_dev);
>  
> -	return __fpga_mgr_get(mgr_dev);
> +out:
> +	mutex_unlock(&mgr_lock);
> +	return mgr;
>  }
>  EXPORT_SYMBOL_GPL(fpga_mgr_get);
>  
> @@ -711,13 +721,22 @@ EXPORT_SYMBOL_GPL(fpga_mgr_get);
>   */
>  struct fpga_manager *of_fpga_mgr_get(struct device_node *node)
>  {
> -	struct device *dev;
> +	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
> +	struct device *mgr_dev;
> +
> +	mutex_lock(&mgr_lock);
> +
> +	mgr_dev = class_find_device_by_of_node(&fpga_mgr_class, node);
> +	if (!mgr_dev)
> +		goto out;
>  
> -	dev = class_find_device_by_of_node(&fpga_mgr_class, node);
> -	if (!dev)
> -		return ERR_PTR(-ENODEV);
> +	mgr = __fpga_mgr_get(mgr_dev);
> +	if (IS_ERR(mgr))
> +		put_device(mgr_dev);
>  
> -	return __fpga_mgr_get(dev);
> +out:
> +	mutex_unlock(&mgr_lock);
> +	return mgr;
>  }
>  EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
>  
> @@ -727,7 +746,7 @@ EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
>   */
>  void fpga_mgr_put(struct fpga_manager *mgr)
>  {
> -	module_put(mgr->dev.parent->driver->owner);
> +	module_put(mgr->owner);
>  	put_device(&mgr->dev);
>  }
>  EXPORT_SYMBOL_GPL(fpga_mgr_put);
> @@ -806,6 +825,7 @@ fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *in
>  
>  	mgr->name = info->name;
>  	mgr->mops = info->mops;
> +	mgr->owner = info->mops->owner;
>  	mgr->priv = info->priv;
>  	mgr->compat_id = info->compat_id;
>  
> @@ -888,7 +908,11 @@ void fpga_mgr_unregister(struct fpga_manager *mgr)
>  	 */
>  	fpga_mgr_fpga_remove(mgr);
>  
> +	mutex_lock(&mgr_lock);
> +
>  	device_unregister(&mgr->dev);
> +
> +	mutex_unlock(&mgr_lock);

Why this part should be protected rather than the whole
fpga_mgr_unregister()?

I feel the scope of the protection is unclear to me in this patch. What
data should be protected from concurrent access by this mutex? From the
code seems the racing of mgr dev should be protected but apparently it
doesn't have to.

And with this mutex, the get/put/unregister() for one mgr should be
exclusive with another mgr, but that also seems not necessary.

I think the mgr->owner & mgr->ops should be protected from concurrent
access of delete_module & fpga_mgr_get/put(), so how about:

struct fpga_manager_ops {
	struct module *owner;
	...
};

struct fpga_manager {
	...
	struct mutex mops_lock;
	const struct fpga_manager_ops *mops;
	...
};


static struct fpga_manager *__fpga_mgr_get(struct device *dev)
{
	struct fpga_manager *mgr;

	mgr = to_fpga_manager(dev);

	mutex_lock(&mgr->mops_lock);

	if (!mgr->mops || !try_module_get(mgr->mops->owner))
		mgr = ERR_PTR(-ENODEV);

	mutex_unlock(&mgr->mops_lock);
		
	return mgr;
}

void fpga_mgr_unregister(struct fpga_manager *mgr)
{
	fpga_mgr_fpga_remove(mgr);	

	mutex_lock(&mgr->ops_lock);
	mgr->mops = NULL;
	mutex_unlock(&mgr->ops_lock);

	device_unregister(&mgr->dev);	
}

Not actually tested.

Thanks,
Yilun

>  }
>  EXPORT_SYMBOL_GPL(fpga_mgr_unregister);
>  
> diff --git a/include/linux/fpga/fpga-mgr.h b/include/linux/fpga/fpga-mgr.h
> index 54f63459efd6..eaf6e072dbc0 100644
> --- a/include/linux/fpga/fpga-mgr.h
> +++ b/include/linux/fpga/fpga-mgr.h
> @@ -162,6 +162,7 @@ struct fpga_manager_info {
>   * @write_complete: set FPGA to operating state after writing is done
>   * @fpga_remove: optional: Set FPGA into a specific state during driver remove
>   * @groups: optional attribute groups.
> + * @owner: owner module.
>   *
>   * fpga_manager_ops are the low level functions implemented by a specific
>   * fpga manager driver.  The optional ones are tested for NULL before being
> @@ -184,6 +185,7 @@ struct fpga_manager_ops {
>  			      struct fpga_image_info *info);
>  	void (*fpga_remove)(struct fpga_manager *mgr);
>  	const struct attribute_group **groups;
> +	struct module *owner;
>  };
>  
>  /* FPGA manager status: Partial/Full Reconfiguration errors */
> @@ -201,6 +203,7 @@ struct fpga_manager_ops {
>   * @state: state of fpga manager
>   * @compat_id: FPGA manager id for compatibility check.
>   * @mops: pointer to struct of fpga manager ops
> + * @owner: owner module.
>   * @priv: low level driver private date
>   */
>  struct fpga_manager {
> @@ -210,6 +213,7 @@ struct fpga_manager {
>  	enum fpga_mgr_states state;
>  	struct fpga_compat_id *compat_id;
>  	const struct fpga_manager_ops *mops;
> +	struct module *owner;
>  	void *priv;
>  };
>  
> -- 
> 2.42.0
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops
  2023-11-25  9:11   ` Xu Yilun
@ 2023-11-30 10:42     ` Marco Pagani
  2023-12-02 12:16       ` Xu Yilun
  0 siblings, 1 reply; 9+ messages in thread
From: Marco Pagani @ 2023-11-30 10:42 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Moritz Fischer, Wu Hao, Xu Yilun, Tom Rix, Greg Kroah-Hartman,
	linux-kernel, linux-fpga



On 2023-11-25 10:11, Xu Yilun wrote:
> On Fri, Nov 24, 2023 at 05:28:06PM +0100, Marco Pagani wrote:
>> Add a module *owner field to the fpga_manager_ops and fpga_manager
>> structs to protect the fpga manager against the unloading of the
>> low-level control module while someone is holding a reference to the
>> manager device. Low-level control modules should statically set the
>> owner field of the fpga_manager_ops struct to THIS_MODULE. Then, when
>> the manager is registered using fpga_mgr_register(), the value is copied
>> into the owner field of the fpga_manager struct (that contains the
>> device context). In this way, the manager can later use it in
>> fpga_mgr_get() to take the low-level module's refcount. To prevent races
>> while unloading the low-level control module, fpga_mgr_get() and part of
>> the fpga_mgr_unregister() methods are protected with a mutex.
>>
>> Other changes: move put_device() from __fpga_mgr_get() to fpga_mgr_get()
>> and of_fpga_mgr_get() to improve code clarity.
>>
>> Fixes: 654ba4cc0f3e ("fpga manager: ensure lifetime with of_fpga_mgr_get")
>> Signed-off-by: Marco Pagani <marpagan@redhat.com>
>> ---
>>  drivers/fpga/fpga-mgr.c       | 56 +++++++++++++++++++++++++----------
>>  include/linux/fpga/fpga-mgr.h |  4 +++
>>  2 files changed, 44 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c
>> index 06651389c592..608605d59860 100644
>> --- a/drivers/fpga/fpga-mgr.c
>> +++ b/drivers/fpga/fpga-mgr.c
>> @@ -21,6 +21,8 @@
>>  static DEFINE_IDA(fpga_mgr_ida);
>>  static const struct class fpga_mgr_class;
>>  
>> +static DEFINE_MUTEX(mgr_lock);
>> +
>>  struct fpga_mgr_devres {
>>  	struct fpga_manager *mgr;
>>  };
>> @@ -667,17 +669,15 @@ ATTRIBUTE_GROUPS(fpga_mgr);
>>  static struct fpga_manager *__fpga_mgr_get(struct device *dev)
>>  {
>>  	struct fpga_manager *mgr;
>> +	struct module *owner;
>>  
>>  	mgr = to_fpga_manager(dev);
>> +	owner = mgr->owner;
>>  
>> -	if (!try_module_get(dev->parent->driver->owner))
>> -		goto err_dev;
>> +	if (owner && !try_module_get(owner))
> 
> No need to test owner == NULL, try_module_get() does this.

You are right. I'll remove it in the next version.

> 
>> +		mgr = ERR_PTR(-ENODEV);
>>  
>>  	return mgr;
>> -
>> -err_dev:
>> -	put_device(dev);
>> -	return ERR_PTR(-ENODEV);
>>  }
>>  
>>  static int fpga_mgr_dev_match(struct device *dev, const void *data)
>> @@ -693,12 +693,22 @@ static int fpga_mgr_dev_match(struct device *dev, const void *data)
>>   */
>>  struct fpga_manager *fpga_mgr_get(struct device *dev)
>>  {
>> -	struct device *mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev,
>> -						   fpga_mgr_dev_match);
>> +	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
>> +	struct device *mgr_dev;
>> +
>> +	mutex_lock(&mgr_lock);
>> +
>> +	mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev, fpga_mgr_dev_match);
>>  	if (!mgr_dev)
>> -		return ERR_PTR(-ENODEV);
>> +		goto out;
>> +
>> +	mgr = __fpga_mgr_get(mgr_dev);
>> +	if (IS_ERR(mgr))
>> +		put_device(mgr_dev);
>>  
>> -	return __fpga_mgr_get(mgr_dev);
>> +out:
>> +	mutex_unlock(&mgr_lock);
>> +	return mgr;
>>  }
>>  EXPORT_SYMBOL_GPL(fpga_mgr_get);
>>  
>> @@ -711,13 +721,22 @@ EXPORT_SYMBOL_GPL(fpga_mgr_get);
>>   */
>>  struct fpga_manager *of_fpga_mgr_get(struct device_node *node)
>>  {
>> -	struct device *dev;
>> +	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
>> +	struct device *mgr_dev;
>> +
>> +	mutex_lock(&mgr_lock);
>> +
>> +	mgr_dev = class_find_device_by_of_node(&fpga_mgr_class, node);
>> +	if (!mgr_dev)
>> +		goto out;
>>  
>> -	dev = class_find_device_by_of_node(&fpga_mgr_class, node);
>> -	if (!dev)
>> -		return ERR_PTR(-ENODEV);
>> +	mgr = __fpga_mgr_get(mgr_dev);
>> +	if (IS_ERR(mgr))
>> +		put_device(mgr_dev);
>>  
>> -	return __fpga_mgr_get(dev);
>> +out:
>> +	mutex_unlock(&mgr_lock);
>> +	return mgr;
>>  }
>>  EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
>>  
>> @@ -727,7 +746,7 @@ EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
>>   */
>>  void fpga_mgr_put(struct fpga_manager *mgr)
>>  {
>> -	module_put(mgr->dev.parent->driver->owner);
>> +	module_put(mgr->owner);
>>  	put_device(&mgr->dev);
>>  }
>>  EXPORT_SYMBOL_GPL(fpga_mgr_put);
>> @@ -806,6 +825,7 @@ fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *in
>>  
>>  	mgr->name = info->name;
>>  	mgr->mops = info->mops;
>> +	mgr->owner = info->mops->owner;
>>  	mgr->priv = info->priv;
>>  	mgr->compat_id = info->compat_id;
>>  
>> @@ -888,7 +908,11 @@ void fpga_mgr_unregister(struct fpga_manager *mgr)
>>  	 */
>>  	fpga_mgr_fpga_remove(mgr);
>>  
>> +	mutex_lock(&mgr_lock);
>> +
>>  	device_unregister(&mgr->dev);
>> +
>> +	mutex_unlock(&mgr_lock);
> 
> Why this part should be protected rather than the whole
> fpga_mgr_unregister()?
>

Protecting the fpga_remove() op seems unnecessary to me because it
does not affect the manager device's lifetime. Moreover, it may hold
the mutex for a long time since it was intended to interact with the
hardware to put it in a specific state before removing the driver.

> I feel the scope of the protection is unclear to me in this patch. What
> data should be protected from concurrent access by this mutex? From the
> code seems the racing of mgr dev should be protected but apparently it
> doesn't have to.

The mutex is there to ensure the lifetime of the manager device and
its context (struct fpga_manager) if fpga_mgr_get() happens to run
concurrently with the removal of the low-level module.

> 
> And with this mutex, the get/put/unregister() for one mgr should be
> exclusive with another mgr, but that also seems not necessary.
> 

I decided to use a static mutex because I thought putting the mutex
in the manager's context would be unsafe since its life would be tied
to the manager's life. For instance, consider the following sequence:

- A removes the low-level control module, and delete_module progresses
up to the point when it calls the module's exit function, which in turn
calls fpga_mgr_unregister().

- fpga_mgr_unregister() takes the mutex but gets descheduled before
completing the unregistering of the manager device.

- Meanwhile, B wants to get the manager (it is still there) and calls
fpga_mgr_get(), which tries to take the mutex but gets suspended since
it is held by A.

- A resumes and fpga_mgr_unregister() releases the manager device and
its context, including the mutex on which B is suspended.

We could mitigate this specific case using mutex_trylock(). However,
there will still be other problematic cases, like if fpga_mgr_get()
gets suspended right before taking the mutex and then delete_module
proceeds up to when fpga_mgr_unregister() frees the manager device
and its context.

> I think the mgr->owner & mgr->ops should be protected from concurrent
> access of delete_module & fpga_mgr_get/put(), so how about:
> 
> struct fpga_manager_ops {
> 	struct module *owner;
> 	...
> };
> 
> struct fpga_manager {
> 	...
> 	struct mutex mops_lock;
> 	const struct fpga_manager_ops *mops;
> 	...
> };
> 
> 
> static struct fpga_manager *__fpga_mgr_get(struct device *dev)
> {
> 	struct fpga_manager *mgr;
> 
> 	mgr = to_fpga_manager(dev);
> 
> 	mutex_lock(&mgr->mops_lock);
> 
> 	if (!mgr->mops || !try_module_get(mgr->mops->owner))
> 		mgr = ERR_PTR(-ENODEV);
> 
> 	mutex_unlock(&mgr->mops_lock);
> 		
> 	return mgr;
> }
> 
> void fpga_mgr_unregister(struct fpga_manager *mgr)
> {
> 	fpga_mgr_fpga_remove(mgr);	
> 
> 	mutex_lock(&mgr->ops_lock);
> 	mgr->mops = NULL;
> 	mutex_unlock(&mgr->ops_lock);
> 
> 	device_unregister(&mgr->dev);	
> }
> 
> Not actually tested.
> 

I think protecting the only the ops is not enough for the same reasons.
If fpga_mgr_get() gets suspended right after class_find_device(),and
meanwhile the low-level module is removed, it resumes with a reference
to a manager device that no longer exists.

In a certain sense, however, using a mutex seems like a mitigation
that does not solve the problem at its root. I honestly still think
that taking the module's refcount right when registering the manager
is the only way that is both safe and robust against code changes.
However, my proposal was rejected.

So, if you prefer, I can drop the mutex entirely in the next version,
and we leave the responsibility of keeping all kernel pieces to the
user. It will still be an improvement over taking the low-level
module's refcount through the parent device's driver pointer.

Thanks,
Marco


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops
  2023-11-30 10:42     ` Marco Pagani
@ 2023-12-02 12:16       ` Xu Yilun
  2023-12-08 20:08         ` Marco Pagani
  0 siblings, 1 reply; 9+ messages in thread
From: Xu Yilun @ 2023-12-02 12:16 UTC (permalink / raw)
  To: Marco Pagani
  Cc: Moritz Fischer, Wu Hao, Xu Yilun, Tom Rix, Greg Kroah-Hartman,
	linux-kernel, linux-fpga

On Thu, Nov 30, 2023 at 11:42:36AM +0100, Marco Pagani wrote:
> 
> 
> On 2023-11-25 10:11, Xu Yilun wrote:
> > On Fri, Nov 24, 2023 at 05:28:06PM +0100, Marco Pagani wrote:
> >> Add a module *owner field to the fpga_manager_ops and fpga_manager
> >> structs to protect the fpga manager against the unloading of the
> >> low-level control module while someone is holding a reference to the
> >> manager device. Low-level control modules should statically set the
> >> owner field of the fpga_manager_ops struct to THIS_MODULE. Then, when
> >> the manager is registered using fpga_mgr_register(), the value is copied
> >> into the owner field of the fpga_manager struct (that contains the
> >> device context). In this way, the manager can later use it in
> >> fpga_mgr_get() to take the low-level module's refcount. To prevent races
> >> while unloading the low-level control module, fpga_mgr_get() and part of
> >> the fpga_mgr_unregister() methods are protected with a mutex.
> >>
> >> Other changes: move put_device() from __fpga_mgr_get() to fpga_mgr_get()
> >> and of_fpga_mgr_get() to improve code clarity.
> >>
> >> Fixes: 654ba4cc0f3e ("fpga manager: ensure lifetime with of_fpga_mgr_get")
> >> Signed-off-by: Marco Pagani <marpagan@redhat.com>
> >> ---
> >>  drivers/fpga/fpga-mgr.c       | 56 +++++++++++++++++++++++++----------
> >>  include/linux/fpga/fpga-mgr.h |  4 +++
> >>  2 files changed, 44 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c
> >> index 06651389c592..608605d59860 100644
> >> --- a/drivers/fpga/fpga-mgr.c
> >> +++ b/drivers/fpga/fpga-mgr.c
> >> @@ -21,6 +21,8 @@
> >>  static DEFINE_IDA(fpga_mgr_ida);
> >>  static const struct class fpga_mgr_class;
> >>  
> >> +static DEFINE_MUTEX(mgr_lock);
> >> +
> >>  struct fpga_mgr_devres {
> >>  	struct fpga_manager *mgr;
> >>  };
> >> @@ -667,17 +669,15 @@ ATTRIBUTE_GROUPS(fpga_mgr);
> >>  static struct fpga_manager *__fpga_mgr_get(struct device *dev)
> >>  {
> >>  	struct fpga_manager *mgr;
> >> +	struct module *owner;
> >>  
> >>  	mgr = to_fpga_manager(dev);
> >> +	owner = mgr->owner;
> >>  
> >> -	if (!try_module_get(dev->parent->driver->owner))
> >> -		goto err_dev;
> >> +	if (owner && !try_module_get(owner))
> > 
> > No need to test owner == NULL, try_module_get() does this.
> 
> You are right. I'll remove it in the next version.
> 
> > 
> >> +		mgr = ERR_PTR(-ENODEV);
> >>  
> >>  	return mgr;
> >> -
> >> -err_dev:
> >> -	put_device(dev);
> >> -	return ERR_PTR(-ENODEV);
> >>  }
> >>  
> >>  static int fpga_mgr_dev_match(struct device *dev, const void *data)
> >> @@ -693,12 +693,22 @@ static int fpga_mgr_dev_match(struct device *dev, const void *data)
> >>   */
> >>  struct fpga_manager *fpga_mgr_get(struct device *dev)
> >>  {
> >> -	struct device *mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev,
> >> -						   fpga_mgr_dev_match);
> >> +	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
> >> +	struct device *mgr_dev;
> >> +
> >> +	mutex_lock(&mgr_lock);
> >> +
> >> +	mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev, fpga_mgr_dev_match);
> >>  	if (!mgr_dev)
> >> -		return ERR_PTR(-ENODEV);
> >> +		goto out;
> >> +
> >> +	mgr = __fpga_mgr_get(mgr_dev);
> >> +	if (IS_ERR(mgr))
> >> +		put_device(mgr_dev);
> >>  
> >> -	return __fpga_mgr_get(mgr_dev);
> >> +out:
> >> +	mutex_unlock(&mgr_lock);
> >> +	return mgr;
> >>  }
> >>  EXPORT_SYMBOL_GPL(fpga_mgr_get);
> >>  
> >> @@ -711,13 +721,22 @@ EXPORT_SYMBOL_GPL(fpga_mgr_get);
> >>   */
> >>  struct fpga_manager *of_fpga_mgr_get(struct device_node *node)
> >>  {
> >> -	struct device *dev;
> >> +	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
> >> +	struct device *mgr_dev;
> >> +
> >> +	mutex_lock(&mgr_lock);
> >> +
> >> +	mgr_dev = class_find_device_by_of_node(&fpga_mgr_class, node);
> >> +	if (!mgr_dev)
> >> +		goto out;
> >>  
> >> -	dev = class_find_device_by_of_node(&fpga_mgr_class, node);
> >> -	if (!dev)
> >> -		return ERR_PTR(-ENODEV);
> >> +	mgr = __fpga_mgr_get(mgr_dev);
> >> +	if (IS_ERR(mgr))
> >> +		put_device(mgr_dev);
> >>  
> >> -	return __fpga_mgr_get(dev);
> >> +out:
> >> +	mutex_unlock(&mgr_lock);
> >> +	return mgr;
> >>  }
> >>  EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
> >>  
> >> @@ -727,7 +746,7 @@ EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
> >>   */
> >>  void fpga_mgr_put(struct fpga_manager *mgr)
> >>  {
> >> -	module_put(mgr->dev.parent->driver->owner);
> >> +	module_put(mgr->owner);
> >>  	put_device(&mgr->dev);
> >>  }
> >>  EXPORT_SYMBOL_GPL(fpga_mgr_put);
> >> @@ -806,6 +825,7 @@ fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *in
> >>  
> >>  	mgr->name = info->name;
> >>  	mgr->mops = info->mops;
> >> +	mgr->owner = info->mops->owner;
> >>  	mgr->priv = info->priv;
> >>  	mgr->compat_id = info->compat_id;
> >>  
> >> @@ -888,7 +908,11 @@ void fpga_mgr_unregister(struct fpga_manager *mgr)
> >>  	 */
> >>  	fpga_mgr_fpga_remove(mgr);
> >>  
> >> +	mutex_lock(&mgr_lock);
> >> +
> >>  	device_unregister(&mgr->dev);
> >> +
> >> +	mutex_unlock(&mgr_lock);
> > 
> > Why this part should be protected rather than the whole
> > fpga_mgr_unregister()?
> >
> 
> Protecting the fpga_remove() op seems unnecessary to me because it
> does not affect the manager device's lifetime. Moreover, it may hold
> the mutex for a long time since it was intended to interact with the
> hardware to put it in a specific state before removing the driver.
> 
> > I feel the scope of the protection is unclear to me in this patch. What
> > data should be protected from concurrent access by this mutex? From the
> > code seems the racing of mgr dev should be protected but apparently it
> > doesn't have to.
> 
> The mutex is there to ensure the lifetime of the manager device and
> its context (struct fpga_manager) if fpga_mgr_get() happens to run
> concurrently with the removal of the low-level module.
> 
> > 
> > And with this mutex, the get/put/unregister() for one mgr should be
> > exclusive with another mgr, but that also seems not necessary.
> > 
> 
> I decided to use a static mutex because I thought putting the mutex
> in the manager's context would be unsafe since its life would be tied
> to the manager's life. For instance, consider the following sequence:
> 
> - A removes the low-level control module, and delete_module progresses
> up to the point when it calls the module's exit function, which in turn
> calls fpga_mgr_unregister().
> 
> - fpga_mgr_unregister() takes the mutex but gets descheduled before
> completing the unregistering of the manager device.
> 
> - Meanwhile, B wants to get the manager (it is still there) and calls
> fpga_mgr_get(), which tries to take the mutex but gets suspended since
> it is held by A.
> 
> - A resumes and fpga_mgr_unregister() releases the manager device and

The lifecycle of the manager device is not entirely decided by
fpga_mgr_unregister(), this func just puts/decreases the device
refcount.

Remember fpga_mgr_get() gets the device via
class_find_device()->get_device(). I assume if the valid device pointer
could be returned by class_find_device(), it would never be released by
other nice players. So I have no worry about the per manager mutex.

> its context, including the mutex on which B is suspended.
> 
> We could mitigate this specific case using mutex_trylock(). However,
> there will still be other problematic cases, like if fpga_mgr_get()
> gets suspended right before taking the mutex and then delete_module
> proceeds up to when fpga_mgr_unregister() frees the manager device
> and its context.
> 
> > I think the mgr->owner & mgr->ops should be protected from concurrent
> > access of delete_module & fpga_mgr_get/put(), so how about:
> > 
> > struct fpga_manager_ops {
> > 	struct module *owner;
> > 	...
> > };
> > 
> > struct fpga_manager {
> > 	...
> > 	struct mutex mops_lock;
> > 	const struct fpga_manager_ops *mops;
> > 	...
> > };
> > 
> > 
> > static struct fpga_manager *__fpga_mgr_get(struct device *dev)
> > {
> > 	struct fpga_manager *mgr;
> > 
> > 	mgr = to_fpga_manager(dev);
> > 
> > 	mutex_lock(&mgr->mops_lock);
> > 
> > 	if (!mgr->mops || !try_module_get(mgr->mops->owner))
> > 		mgr = ERR_PTR(-ENODEV);
> > 
> > 	mutex_unlock(&mgr->mops_lock);
> > 		
> > 	return mgr;
> > }
> > 
> > void fpga_mgr_unregister(struct fpga_manager *mgr)
> > {
> > 	fpga_mgr_fpga_remove(mgr);	
> > 
> > 	mutex_lock(&mgr->ops_lock);
> > 	mgr->mops = NULL;
> > 	mutex_unlock(&mgr->ops_lock);
> > 
> > 	device_unregister(&mgr->dev);	
> > }
> > 
> > Not actually tested.
> > 
> 
> I think protecting the only the ops is not enough for the same reasons.
> If fpga_mgr_get() gets suspended right after class_find_device(),and
> meanwhile the low-level module is removed, it resumes with a reference
> to a manager device that no longer exists.
> 
> In a certain sense, however, using a mutex seems like a mitigation
> that does not solve the problem at its root. I honestly still think
> that taking the module's refcount right when registering the manager
> is the only way that is both safe and robust against code changes.

I would nak either. As mentioned above, that makes rmmod vendor module
impossible. Introducing another user interface to release the module's
refcount is also a bad idea. Who decides to take the ref, who releases
it. A user has no knowledge of what is happening inside and should not
enforce.

> However, my proposal was rejected.
> 
> So, if you prefer, I can drop the mutex entirely in the next version,
> and we leave the responsibility of keeping all kernel pieces to the

No, please try to fix it. Could you please reconsider my proposal?

Thanks,
Yilun

> user. It will still be an improvement over taking the low-level
> module's refcount through the parent device's driver pointer.
> 
> Thanks,
> Marco
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops
  2023-12-02 12:16       ` Xu Yilun
@ 2023-12-08 20:08         ` Marco Pagani
  2023-12-09 15:27           ` Xu Yilun
  0 siblings, 1 reply; 9+ messages in thread
From: Marco Pagani @ 2023-12-08 20:08 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Moritz Fischer, Wu Hao, Xu Yilun, Tom Rix, Greg Kroah-Hartman,
	linux-kernel, linux-fpga



On 2023-12-02 13:16, Xu Yilun wrote:
> On Thu, Nov 30, 2023 at 11:42:36AM +0100, Marco Pagani wrote:
>>
>>
>> On 2023-11-25 10:11, Xu Yilun wrote:
>>> On Fri, Nov 24, 2023 at 05:28:06PM +0100, Marco Pagani wrote:
>>>> Add a module *owner field to the fpga_manager_ops and fpga_manager
>>>> structs to protect the fpga manager against the unloading of the
>>>> low-level control module while someone is holding a reference to the
>>>> manager device. Low-level control modules should statically set the
>>>> owner field of the fpga_manager_ops struct to THIS_MODULE. Then, when
>>>> the manager is registered using fpga_mgr_register(), the value is copied
>>>> into the owner field of the fpga_manager struct (that contains the
>>>> device context). In this way, the manager can later use it in
>>>> fpga_mgr_get() to take the low-level module's refcount. To prevent races
>>>> while unloading the low-level control module, fpga_mgr_get() and part of
>>>> the fpga_mgr_unregister() methods are protected with a mutex.
>>>>
>>>> Other changes: move put_device() from __fpga_mgr_get() to fpga_mgr_get()
>>>> and of_fpga_mgr_get() to improve code clarity.
>>>>
>>>> Fixes: 654ba4cc0f3e ("fpga manager: ensure lifetime with of_fpga_mgr_get")
>>>> Signed-off-by: Marco Pagani <marpagan@redhat.com>
>>>> ---
>>>>  drivers/fpga/fpga-mgr.c       | 56 +++++++++++++++++++++++++----------
>>>>  include/linux/fpga/fpga-mgr.h |  4 +++
>>>>  2 files changed, 44 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c
>>>> index 06651389c592..608605d59860 100644
>>>> --- a/drivers/fpga/fpga-mgr.c
>>>> +++ b/drivers/fpga/fpga-mgr.c
>>>> @@ -21,6 +21,8 @@
>>>>  static DEFINE_IDA(fpga_mgr_ida);
>>>>  static const struct class fpga_mgr_class;
>>>>  
>>>> +static DEFINE_MUTEX(mgr_lock);
>>>> +
>>>>  struct fpga_mgr_devres {
>>>>  	struct fpga_manager *mgr;
>>>>  };
>>>> @@ -667,17 +669,15 @@ ATTRIBUTE_GROUPS(fpga_mgr);
>>>>  static struct fpga_manager *__fpga_mgr_get(struct device *dev)
>>>>  {
>>>>  	struct fpga_manager *mgr;
>>>> +	struct module *owner;
>>>>  
>>>>  	mgr = to_fpga_manager(dev);
>>>> +	owner = mgr->owner;
>>>>  
>>>> -	if (!try_module_get(dev->parent->driver->owner))
>>>> -		goto err_dev;
>>>> +	if (owner && !try_module_get(owner))
>>>
>>> No need to test owner == NULL, try_module_get() does this.
>>
>> You are right. I'll remove it in the next version.
>>
>>>
>>>> +		mgr = ERR_PTR(-ENODEV);
>>>>  
>>>>  	return mgr;
>>>> -
>>>> -err_dev:
>>>> -	put_device(dev);
>>>> -	return ERR_PTR(-ENODEV);
>>>>  }
>>>>  
>>>>  static int fpga_mgr_dev_match(struct device *dev, const void *data)
>>>> @@ -693,12 +693,22 @@ static int fpga_mgr_dev_match(struct device *dev, const void *data)
>>>>   */
>>>>  struct fpga_manager *fpga_mgr_get(struct device *dev)
>>>>  {
>>>> -	struct device *mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev,
>>>> -						   fpga_mgr_dev_match);
>>>> +	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
>>>> +	struct device *mgr_dev;
>>>> +
>>>> +	mutex_lock(&mgr_lock);
>>>> +
>>>> +	mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev, fpga_mgr_dev_match);
>>>>  	if (!mgr_dev)
>>>> -		return ERR_PTR(-ENODEV);
>>>> +		goto out;
>>>> +
>>>> +	mgr = __fpga_mgr_get(mgr_dev);
>>>> +	if (IS_ERR(mgr))
>>>> +		put_device(mgr_dev);
>>>>  
>>>> -	return __fpga_mgr_get(mgr_dev);
>>>> +out:
>>>> +	mutex_unlock(&mgr_lock);
>>>> +	return mgr;
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(fpga_mgr_get);
>>>>  
>>>> @@ -711,13 +721,22 @@ EXPORT_SYMBOL_GPL(fpga_mgr_get);
>>>>   */
>>>>  struct fpga_manager *of_fpga_mgr_get(struct device_node *node)
>>>>  {
>>>> -	struct device *dev;
>>>> +	struct fpga_manager *mgr = ERR_PTR(-ENODEV);
>>>> +	struct device *mgr_dev;
>>>> +
>>>> +	mutex_lock(&mgr_lock);
>>>> +
>>>> +	mgr_dev = class_find_device_by_of_node(&fpga_mgr_class, node);
>>>> +	if (!mgr_dev)
>>>> +		goto out;
>>>>  
>>>> -	dev = class_find_device_by_of_node(&fpga_mgr_class, node);
>>>> -	if (!dev)
>>>> -		return ERR_PTR(-ENODEV);
>>>> +	mgr = __fpga_mgr_get(mgr_dev);
>>>> +	if (IS_ERR(mgr))
>>>> +		put_device(mgr_dev);
>>>>  
>>>> -	return __fpga_mgr_get(dev);
>>>> +out:
>>>> +	mutex_unlock(&mgr_lock);
>>>> +	return mgr;
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
>>>>  
>>>> @@ -727,7 +746,7 @@ EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
>>>>   */
>>>>  void fpga_mgr_put(struct fpga_manager *mgr)
>>>>  {
>>>> -	module_put(mgr->dev.parent->driver->owner);
>>>> +	module_put(mgr->owner);
>>>>  	put_device(&mgr->dev);
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(fpga_mgr_put);
>>>> @@ -806,6 +825,7 @@ fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *in
>>>>  
>>>>  	mgr->name = info->name;
>>>>  	mgr->mops = info->mops;
>>>> +	mgr->owner = info->mops->owner;
>>>>  	mgr->priv = info->priv;
>>>>  	mgr->compat_id = info->compat_id;
>>>>  
>>>> @@ -888,7 +908,11 @@ void fpga_mgr_unregister(struct fpga_manager *mgr)
>>>>  	 */
>>>>  	fpga_mgr_fpga_remove(mgr);
>>>>  
>>>> +	mutex_lock(&mgr_lock);
>>>> +
>>>>  	device_unregister(&mgr->dev);
>>>> +
>>>> +	mutex_unlock(&mgr_lock);
>>>
>>> Why this part should be protected rather than the whole
>>> fpga_mgr_unregister()?
>>>
>>
>> Protecting the fpga_remove() op seems unnecessary to me because it
>> does not affect the manager device's lifetime. Moreover, it may hold
>> the mutex for a long time since it was intended to interact with the
>> hardware to put it in a specific state before removing the driver.
>>
>>> I feel the scope of the protection is unclear to me in this patch. What
>>> data should be protected from concurrent access by this mutex? From the
>>> code seems the racing of mgr dev should be protected but apparently it
>>> doesn't have to.
>>
>> The mutex is there to ensure the lifetime of the manager device and
>> its context (struct fpga_manager) if fpga_mgr_get() happens to run
>> concurrently with the removal of the low-level module.
>>
>>>
>>> And with this mutex, the get/put/unregister() for one mgr should be
>>> exclusive with another mgr, but that also seems not necessary.
>>>
>>
>> I decided to use a static mutex because I thought putting the mutex
>> in the manager's context would be unsafe since its life would be tied
>> to the manager's life. For instance, consider the following sequence:
>>
>> - A removes the low-level control module, and delete_module progresses
>> up to the point when it calls the module's exit function, which in turn
>> calls fpga_mgr_unregister().
>>
>> - fpga_mgr_unregister() takes the mutex but gets descheduled before
>> completing the unregistering of the manager device.
>>
>> - Meanwhile, B wants to get the manager (it is still there) and calls
>> fpga_mgr_get(), which tries to take the mutex but gets suspended since
>> it is held by A.
>>
>> - A resumes and fpga_mgr_unregister() releases the manager device and
> 
> The lifecycle of the manager device is not entirely decided by
> fpga_mgr_unregister(), this func just puts/decreases the device
> refcount.

Right, but here I'm considering the case where no one has previously
taken the manager device. In that specific case, the refcount will be
decremented to zero, and the device (with its context) will be released.

However, you got me thinking about how the mutex is causing the problem
in this case.

> 
> Remember fpga_mgr_get() gets the device via
> class_find_device()->get_device(). I assume if the valid device pointer
> could be returned by class_find_device(), it would never be released by
> other nice players. So I have no worry about the per manager mutex.
> 
>> its context, including the mutex on which B is suspended.
>>
>> We could mitigate this specific case using mutex_trylock(). However,
>> there will still be other problematic cases, like if fpga_mgr_get()
>> gets suspended right before taking the mutex and then delete_module
>> proceeds up to when fpga_mgr_unregister() frees the manager device
>> and its context.
>>
>>> I think the mgr->owner & mgr->ops should be protected from concurrent
>>> access of delete_module & fpga_mgr_get/put(), so how about:
>>>
>>> struct fpga_manager_ops {
>>> 	struct module *owner;
>>> 	...
>>> };
>>>
>>> struct fpga_manager {
>>> 	...
>>> 	struct mutex mops_lock;
>>> 	const struct fpga_manager_ops *mops;
>>> 	...
>>> };
>>>
>>>
>>> static struct fpga_manager *__fpga_mgr_get(struct device *dev)
>>> {
>>> 	struct fpga_manager *mgr;
>>>
>>> 	mgr = to_fpga_manager(dev);
>>>
>>> 	mutex_lock(&mgr->mops_lock);
>>>
>>> 	if (!mgr->mops || !try_module_get(mgr->mops->owner))
>>> 		mgr = ERR_PTR(-ENODEV);
>>>
>>> 	mutex_unlock(&mgr->mops_lock);
>>> 		
>>> 	return mgr;
>>> }
>>>
>>> void fpga_mgr_unregister(struct fpga_manager *mgr)
>>> {
>>> 	fpga_mgr_fpga_remove(mgr);	
>>>
>>> 	mutex_lock(&mgr->ops_lock);
>>> 	mgr->mops = NULL;
>>> 	mutex_unlock(&mgr->ops_lock);
>>>
>>> 	device_unregister(&mgr->dev);	
>>> }
>>>
>>> Not actually tested.
>>>
>>
>> I think protecting the only the ops is not enough for the same reasons.
>> If fpga_mgr_get() gets suspended right after class_find_device(),and
>> meanwhile the low-level module is removed, it resumes with a reference
>> to a manager device that no longer exists.
>>
>> In a certain sense, however, using a mutex seems like a mitigation
>> that does not solve the problem at its root. I honestly still think
>> that taking the module's refcount right when registering the manager
>> is the only way that is both safe and robust against code changes.
> 
> I would nak either. As mentioned above, that makes rmmod vendor module
> impossible. Introducing another user interface to release the module's
> refcount is also a bad idea. Who decides to take the ref, who releases
> it. A user has no knowledge of what is happening inside and should not
> enforce.
> 
>> However, my proposal was rejected.
>>
>> So, if you prefer, I can drop the mutex entirely in the next version,
>> and we leave the responsibility of keeping all kernel pieces to the
> 
> No, please try to fix it. Could you please reconsider my proposal?
> 

I have considered it and thought about it. However, I don't think we need a
mutex to protect mgr->mops. This is because the low-level module's refcount has
already been decremented when fpga_mgr_unregister() is called by the module's
exit function. So, when we get to the point of executing fpga_mgr_unregister(),
any concurrent call to try_module_get() will fail (if no one has taken the
module before) without the need to check mops first.

If we assume (as you pointed out) that class_find_device() can be safely
executed concurrently with device_unregister() (returning either a valid dev
pointer or null) and, consequently, the manager context is guaranteed to exist
after that point. Then, we should be good without the mutex if we call
try_module_get() on a copy of the owner pointer stored in the manager context.

>> user. It will still be an improvement over taking the low-level
>> module's refcount through the parent device's driver pointer.
>>

Thanks,
Marco


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops
  2023-12-08 20:08         ` Marco Pagani
@ 2023-12-09 15:27           ` Xu Yilun
  2023-12-16  9:35             ` Marco Pagani
  0 siblings, 1 reply; 9+ messages in thread
From: Xu Yilun @ 2023-12-09 15:27 UTC (permalink / raw)
  To: Marco Pagani
  Cc: Moritz Fischer, Wu Hao, Xu Yilun, Tom Rix, Greg Kroah-Hartman,
	linux-kernel, linux-fpga

> >>> I feel the scope of the protection is unclear to me in this patch. What
> >>> data should be protected from concurrent access by this mutex? From the
> >>> code seems the racing of mgr dev should be protected but apparently it
> >>> doesn't have to.
> >>
> >> The mutex is there to ensure the lifetime of the manager device and
> >> its context (struct fpga_manager) if fpga_mgr_get() happens to run
> >> concurrently with the removal of the low-level module.
> >>
> >>>
> >>> And with this mutex, the get/put/unregister() for one mgr should be
> >>> exclusive with another mgr, but that also seems not necessary.
> >>>
> >>
> >> I decided to use a static mutex because I thought putting the mutex
> >> in the manager's context would be unsafe since its life would be tied
> >> to the manager's life. For instance, consider the following sequence:
> >>
> >> - A removes the low-level control module, and delete_module progresses
> >> up to the point when it calls the module's exit function, which in turn
> >> calls fpga_mgr_unregister().
> >>
> >> - fpga_mgr_unregister() takes the mutex but gets descheduled before
> >> completing the unregistering of the manager device.
> >>
> >> - Meanwhile, B wants to get the manager (it is still there) and calls
> >> fpga_mgr_get(), which tries to take the mutex but gets suspended since
> >> it is held by A.
> >>
> >> - A resumes and fpga_mgr_unregister() releases the manager device and
> > 
> > The lifecycle of the manager device is not entirely decided by
> > fpga_mgr_unregister(), this func just puts/decreases the device
> > refcount.
> 
> Right, but here I'm considering the case where no one has previously
> taken the manager device. In that specific case, the refcount will be

I don't think this is valid, anyone should firstly get the manager
device via get_device() then try to access its context.

> decremented to zero, and the device (with its context) will be released.

If no one has taken the manager device, the device & its context are
safe to be released.

> 
> However, you got me thinking about how the mutex is causing the problem
> in this case.
> 
> > 
> > Remember fpga_mgr_get() gets the device via
> > class_find_device()->get_device(). I assume if the valid device pointer
> > could be returned by class_find_device(), it would never be released by
> > other nice players. So I have no worry about the per manager mutex.
> > 
> >> its context, including the mutex on which B is suspended.
> >>
> >> We could mitigate this specific case using mutex_trylock(). However,
> >> there will still be other problematic cases, like if fpga_mgr_get()
> >> gets suspended right before taking the mutex and then delete_module
> >> proceeds up to when fpga_mgr_unregister() frees the manager device
> >> and its context.
> >>
> >>> I think the mgr->owner & mgr->ops should be protected from concurrent
> >>> access of delete_module & fpga_mgr_get/put(), so how about:
> >>>
> >>> struct fpga_manager_ops {
> >>> 	struct module *owner;
> >>> 	...
> >>> };
> >>>
> >>> struct fpga_manager {
> >>> 	...
> >>> 	struct mutex mops_lock;
> >>> 	const struct fpga_manager_ops *mops;
> >>> 	...
> >>> };
> >>>
> >>>
> >>> static struct fpga_manager *__fpga_mgr_get(struct device *dev)
> >>> {
> >>> 	struct fpga_manager *mgr;
> >>>
> >>> 	mgr = to_fpga_manager(dev);
> >>>
> >>> 	mutex_lock(&mgr->mops_lock);
> >>>
> >>> 	if (!mgr->mops || !try_module_get(mgr->mops->owner))
> >>> 		mgr = ERR_PTR(-ENODEV);
> >>>
> >>> 	mutex_unlock(&mgr->mops_lock);
> >>> 		
> >>> 	return mgr;
> >>> }
> >>>
> >>> void fpga_mgr_unregister(struct fpga_manager *mgr)
> >>> {
> >>> 	fpga_mgr_fpga_remove(mgr);	
> >>>
> >>> 	mutex_lock(&mgr->ops_lock);
> >>> 	mgr->mops = NULL;
> >>> 	mutex_unlock(&mgr->ops_lock);
> >>>
> >>> 	device_unregister(&mgr->dev);	
> >>> }
> >>>
> >>> Not actually tested.
> >>>
> >>
> >> I think protecting the only the ops is not enough for the same reasons.
> >> If fpga_mgr_get() gets suspended right after class_find_device(),and
> >> meanwhile the low-level module is removed, it resumes with a reference
> >> to a manager device that no longer exists.
> >>
> >> In a certain sense, however, using a mutex seems like a mitigation
> >> that does not solve the problem at its root. I honestly still think
> >> that taking the module's refcount right when registering the manager
> >> is the only way that is both safe and robust against code changes.
> > 
> > I would nak either. As mentioned above, that makes rmmod vendor module
> > impossible. Introducing another user interface to release the module's
> > refcount is also a bad idea. Who decides to take the ref, who releases
> > it. A user has no knowledge of what is happening inside and should not
> > enforce.
> > 
> >> However, my proposal was rejected.
> >>
> >> So, if you prefer, I can drop the mutex entirely in the next version,
> >> and we leave the responsibility of keeping all kernel pieces to the
> > 
> > No, please try to fix it. Could you please reconsider my proposal?
> > 
> 
> I have considered it and thought about it. However, I don't think we need a
> mutex to protect mgr->mops. This is because the low-level module's refcount has
> already been decremented when fpga_mgr_unregister() is called by the module's
> exit function. So, when we get to the point of executing fpga_mgr_unregister(),
> any concurrent call to try_module_get() will fail (if no one has taken the

Are you still taking care of your previous finding [1]? It says:

  To be clear, you should only use try_module_get() *iff* you are 100% sure
  the module already does exist ...

IIUC, if you do nothing on fpga_mgr_unregister(), the low-level module may
just disappear and any copy of the owner pointer became invalid. Then
try_module_get() would not fail but panic.

[1] 557aafac1153 ("kernel/module: add documentation for try_module_get()")

Thanks,
Yilun

> module before) without the need to check mops first.
> 
> If we assume (as you pointed out) that class_find_device() can be safely
> executed concurrently with device_unregister() (returning either a valid dev
> pointer or null) and, consequently, the manager context is guaranteed to exist
> after that point. Then, we should be good without the mutex if we call
> try_module_get() on a copy of the owner pointer stored in the manager context.
> 
> >> user. It will still be an improvement over taking the low-level
> >> module's refcount through the parent device's driver pointer.
> >>
> 
> Thanks,
> Marco
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops
  2023-12-09 15:27           ` Xu Yilun
@ 2023-12-16  9:35             ` Marco Pagani
  0 siblings, 0 replies; 9+ messages in thread
From: Marco Pagani @ 2023-12-16  9:35 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Moritz Fischer, Wu Hao, Xu Yilun, Tom Rix, Greg Kroah-Hartman,
	linux-kernel, linux-fpga



On 2023-12-09 16:27, Xu Yilun wrote:
>>>>> I feel the scope of the protection is unclear to me in this patch. What
>>>>> data should be protected from concurrent access by this mutex? From the
>>>>> code seems the racing of mgr dev should be protected but apparently it
>>>>> doesn't have to.
>>>>
>>>> The mutex is there to ensure the lifetime of the manager device and
>>>> its context (struct fpga_manager) if fpga_mgr_get() happens to run
>>>> concurrently with the removal of the low-level module.
>>>>
>>>>>
>>>>> And with this mutex, the get/put/unregister() for one mgr should be
>>>>> exclusive with another mgr, but that also seems not necessary.
>>>>>
>>>>
>>>> I decided to use a static mutex because I thought putting the mutex
>>>> in the manager's context would be unsafe since its life would be tied
>>>> to the manager's life. For instance, consider the following sequence:
>>>>
>>>> - A removes the low-level control module, and delete_module progresses
>>>> up to the point when it calls the module's exit function, which in turn
>>>> calls fpga_mgr_unregister().
>>>>
>>>> - fpga_mgr_unregister() takes the mutex but gets descheduled before
>>>> completing the unregistering of the manager device.
>>>>
>>>> - Meanwhile, B wants to get the manager (it is still there) and calls
>>>> fpga_mgr_get(), which tries to take the mutex but gets suspended since
>>>> it is held by A.
>>>>
>>>> - A resumes and fpga_mgr_unregister() releases the manager device and
>>>
>>> The lifecycle of the manager device is not entirely decided by
>>> fpga_mgr_unregister(), this func just puts/decreases the device
>>> refcount.
>>
>> Right, but here I'm considering the case where no one has previously
>> taken the manager device. In that specific case, the refcount will be
> 
> I don't think this is valid, anyone should firstly get the manager
> device via get_device() then try to access its context.
> 
>> decremented to zero, and the device (with its context) will be released.
> 
> If no one has taken the manager device, the device & its context are
> safe to be released.
> 
>>
>> However, you got me thinking about how the mutex is causing the problem
>> in this case.
>>
>>>
>>> Remember fpga_mgr_get() gets the device via
>>> class_find_device()->get_device(). I assume if the valid device pointer
>>> could be returned by class_find_device(), it would never be released by
>>> other nice players. So I have no worry about the per manager mutex.
>>>
>>>> its context, including the mutex on which B is suspended.
>>>>
>>>> We could mitigate this specific case using mutex_trylock(). However,
>>>> there will still be other problematic cases, like if fpga_mgr_get()
>>>> gets suspended right before taking the mutex and then delete_module
>>>> proceeds up to when fpga_mgr_unregister() frees the manager device
>>>> and its context.
>>>>
>>>>> I think the mgr->owner & mgr->ops should be protected from concurrent
>>>>> access of delete_module & fpga_mgr_get/put(), so how about:
>>>>>
>>>>> struct fpga_manager_ops {
>>>>> 	struct module *owner;
>>>>> 	...
>>>>> };
>>>>>
>>>>> struct fpga_manager {
>>>>> 	...
>>>>> 	struct mutex mops_lock;
>>>>> 	const struct fpga_manager_ops *mops;
>>>>> 	...
>>>>> };
>>>>>
>>>>>
>>>>> static struct fpga_manager *__fpga_mgr_get(struct device *dev)
>>>>> {
>>>>> 	struct fpga_manager *mgr;
>>>>>
>>>>> 	mgr = to_fpga_manager(dev);
>>>>>
>>>>> 	mutex_lock(&mgr->mops_lock);
>>>>>
>>>>> 	if (!mgr->mops || !try_module_get(mgr->mops->owner))
>>>>> 		mgr = ERR_PTR(-ENODEV);
>>>>>
>>>>> 	mutex_unlock(&mgr->mops_lock);
>>>>> 		
>>>>> 	return mgr;
>>>>> }
>>>>>
>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr)
>>>>> {
>>>>> 	fpga_mgr_fpga_remove(mgr);	
>>>>>
>>>>> 	mutex_lock(&mgr->ops_lock);
>>>>> 	mgr->mops = NULL;
>>>>> 	mutex_unlock(&mgr->ops_lock);
>>>>>
>>>>> 	device_unregister(&mgr->dev);	
>>>>> }
>>>>>
>>>>> Not actually tested.
>>>>>
>>>>
>>>> I think protecting the only the ops is not enough for the same reasons.
>>>> If fpga_mgr_get() gets suspended right after class_find_device(),and
>>>> meanwhile the low-level module is removed, it resumes with a reference
>>>> to a manager device that no longer exists.
>>>>
>>>> In a certain sense, however, using a mutex seems like a mitigation
>>>> that does not solve the problem at its root. I honestly still think
>>>> that taking the module's refcount right when registering the manager
>>>> is the only way that is both safe and robust against code changes.
>>>
>>> I would nak either. As mentioned above, that makes rmmod vendor module
>>> impossible. Introducing another user interface to release the module's
>>> refcount is also a bad idea. Who decides to take the ref, who releases
>>> it. A user has no knowledge of what is happening inside and should not
>>> enforce.
>>>
>>>> However, my proposal was rejected.
>>>>
>>>> So, if you prefer, I can drop the mutex entirely in the next version,
>>>> and we leave the responsibility of keeping all kernel pieces to the
>>>
>>> No, please try to fix it. Could you please reconsider my proposal?
>>>
>>
>> I have considered it and thought about it. However, I don't think we need a
>> mutex to protect mgr->mops. This is because the low-level module's refcount has
>> already been decremented when fpga_mgr_unregister() is called by the module's
>> exit function. So, when we get to the point of executing fpga_mgr_unregister(),
>> any concurrent call to try_module_get() will fail (if no one has taken the
> 
> Are you still taking care of your previous finding [1]? It says:
> 
>   To be clear, you should only use try_module_get() *iff* you are 100% sure
>   the module already does exist ...
> 
> IIUC, if you do nothing on fpga_mgr_unregister(), the low-level module may
> just disappear and any copy of the owner pointer became invalid. Then
> try_module_get() would not fail but panic.
> 
> [1] 557aafac1153 ("kernel/module: add documentation for try_module_get()")
> 

You are right. I'll follow your proposal for locking. I would have preferred a
solution that kept the module locked until it was safe to remove it, but I
cannot come up with anything reasonable at the moment.

> 
>> module before) without the need to check mops first.
>>
>> If we assume (as you pointed out) that class_find_device() can be safely
>> executed concurrently with device_unregister() (returning either a valid dev
>> pointer or null) and, consequently, the manager context is guaranteed to exist
>> after that point. Then, we should be good without the mutex if we call
>> try_module_get() on a copy of the owner pointer stored in the manager context.
>>
>>>> user. It will still be an improvement over taking the low-level
>>>> module's refcount through the parent device's driver pointer.
>>>>


Thanks,
Marco


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-12-16  9:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-24 16:28 [RFC PATCH v2 0/2] fpga: improve protection against low-level modules unloading Marco Pagani
2023-11-24 16:28 ` [RFC PATCH v2 1/2] fpga: add a module owner field to fpga_manager and fpga_manager_ops Marco Pagani
2023-11-25  9:11   ` Xu Yilun
2023-11-30 10:42     ` Marco Pagani
2023-12-02 12:16       ` Xu Yilun
2023-12-08 20:08         ` Marco Pagani
2023-12-09 15:27           ` Xu Yilun
2023-12-16  9:35             ` Marco Pagani
2023-11-24 16:28 ` [RFC PATCH v2 2/2] fpga: set owner of fpga_manager_ops for existing low-level modules Marco Pagani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).