All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: acpica-devel@lists.linux.dev,
	"Alyssa Rosenzweig" <alyssa@rosenzweig.io>,
	"Albert Ou" <aou@eecs.berkeley.edu>,
	asahi@lists.linux.dev,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dexuan Cui" <decui@microsoft.com>,
	devicetree@vger.kernel.org,
	"David Woodhouse" <dwmw2@infradead.org>,
	"Frank Rowand" <frowand.list@gmail.com>,
	"Hanjun Guo" <guohanjun@huawei.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	iommu@lists.linux.dev,
	"Jean-Philippe Brucker" <jean-philippe@linaro.org>,
	"Jonathan Hunter" <jonathanh@nvidia.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Len Brown" <lenb@kernel.org>,
	linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-hyperv@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-riscv@lists.infradead.org,
	linux-snps-arc@lists.infradead.org, linux-tegra@vger.kernel.org,
	"Russell King" <linux@armlinux.org.uk>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Hector Martin" <marcan@marcan.st>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	patches@lists.linux.dev,
	"Paul Walmsley" <paul.walmsley@sifive.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Robert Moore" <robert.moore@intel.com>,
	"Rob Herring" <robh+dt@kernel.org>,
	"Sudeep Holla" <sudeep.holla@arm.com>,
	"Suravee Suthikulpanit" <suravee.suthikulpanit@amd.com>,
	"Sven Peter" <sven@svenpeter.dev>,
	"Thierry Reding" <thierry.reding@gmail.com>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	"Krishna Reddy" <vdumpa@nvidia.com>,
	"Vineet Gupta" <vgupta@kernel.org>,
	virtualization@lists.linux.dev, "Wei Liu" <wei.liu@kernel.org>,
	"Will Deacon" <will@kernel.org>,
	"André Draszik" <andre.draszik@linaro.org>,
	"Lu Baolu" <baolu.lu@linux.intel.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Jerry Snitselaar" <jsnitsel@redhat.com>,
	"Moritz Fischer" <mdf@kernel.org>,
	"Zhenhua Huang" <quic_zhenhuah@quicinc.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	"Rob Herring" <robh@kernel.org>
Subject: Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
Date: Tue, 21 Nov 2023 13:55:57 -0400	[thread overview]
Message-ID: <20231121175557.GH6083@nvidia.com> (raw)
In-Reply-To: <e926d2d8-8209-4c0f-a0cb-dcea4edf839e@arm.com>

On Tue, Nov 21, 2023 at 04:06:15PM +0000, Robin Murphy wrote:

> > Obviously. I rejected that right away because of how incredibly
> > wrongly layered and hacky it is to do something like that.
> 
> What, and dressing up the fundamental layering violation by baking it even
> further into the API flow, while still not actually fixing it or any of its
> *other* symptoms, is somehow better?

It puts the locks and the controlled data in the right place, in the
right layer.

> Ultimately, this series is still basically doing the same thing my patch
> does - extending the scope of the existing iommu_probe_device_lock hack to
> cover fwspec creation. A hack is a hack, so frankly I'd rather it be simple
> and obvious and look like one, and being easy to remove again is an obvious
> bonus too.

Not entirely, as I've said this series removes the use of the dev->iommu
during fwspec creation. This is different than just extending the lock.

> > So, I would move toward having the driver's probe invoke a helper like:
> > 
> >     iommu_of_xlate(dev, fwspec, &per_fwnode_function, &args);
> > 
> > Which generates the same list of (fwnode_handle, of_phandle_args) that
> > was passed to of_xlate today, but is ordered sensibly within the
> > sequence of probe for what many drivers seem to want to do.
> 
> Grep for of_xlate. It is a standard and well-understood callback pattern for
> a subsystem to parse a common DT binding and pass a driver-specific
> specifier to a driver to interpret. Or maybe you just have a peculiar
> definition of what you think "micro-optimisation" means? :/

Don't know about other subsystems, here is making a mess. Look at what
the drivers are doing. The two SMMU drivers are sort of sane, but
everything else has coded half their pobe into of_xlate. It doesn't
make alot of sense.

> > So, it is not so much that that the idea of of_xlate goes away, but
> > the specific op->of_xlate does, it gets shifted into a helper that
> > invokes the same function in a more logical spot.
> 
> I'm curious how you imagine an IOMMU driver's ->probe function could be
> called *before* parsing the firmware to work out what, if any, IOMMU, and
> thus driver, a device is associated with.

You've jumped ahead here, I'm talking about removing ops->of_xlate.

With everything kept the same as today this means we'd scan the FW
description twice. Once to find the ops and once inside the driver to
parse it.

When I say micro-optimization this is what I mean - structuring the
code to try to avoid multiple-scans of the FW table.

Once the drivers are self-parsing I see there are two paths to keep it
at one FW parse:

 1) Have the core code parse and cache common cases in the iommu_fwspec.
    Driver then pulls out the common cases from the iommu_fwspec and
    reparsed in uncommon cases.

 2) Accept we have only a single ops in all real systems and just
    invoke the driver to parse it. That parse will cache enough
    information to decide if EPROBE_DEFER is needed.

In either case the drivers would call the same APIs and have the same
logic. The choice is just infrastructure-side stuff.

It is a different view than trying to do everything up front, but I'm
not seeing that it is so differently efficient as to be a performance
problem.

On the plus side it looks to make the drivers alot simpler and more
logical.

> And then every driver has to have
> identical boilerplate to go off and parse the generic "iommus" binding...
> which is the whole damn reason for *not* going down that route and instead
> using an of_xlate mechanism in the first place.

Let's not guess. I've attached below a sketch conversion for
apple-dart.

Diffstat says it *saves* 1 line. But also we fix several bugs!

 - iommu_of_xlate() will check for NULL fwspec and reject the bus
   probe path

 - iommu_of_xlate() can match the iommu's from the iommu list and
   check that the OF actually points only to iommus with this driver's
   ops (missed sanity check)

 - The of parsing machinery already computes the iommu_driver but throws
   it out. This forces all of the drivers to do their own thing. Pass
   it in and thus apple_dart_of_xlate() doesn't need to mess around
   with of_find_device_by_node() and we fix the bug where it leaks the
   reference on the struct device (woops!)

 - We don't leak the cfg allocation that apple_dart_of_xlate() did on
   various error paths. All error paths logically free in probe. We
   don't have to think about what happens if of_xlate is called twice
   for the same args on error/reprobe paths.

Many drivers follow this pattern of apple-dart and would end up like this.

Drivers that just need the u32 array would be simpler, more like:

   struct iommu_driver *instance;
   unsigned int num_ids;

   instance = iommu_of_get_iommu(..., &num_ids);
   if (IS_ERR(instance))
      return ERR_CAST(instance);
   cfg = kzalloc(struct_size(cfg, sids, num_ids), GFP_KERNEL);
   if (!cfg)
      return -ENOMEM;
   iommu_of_read_u32_ids(..., &cfg->sids);
   [..]
   return instance;

I haven't sketched *every* driver yet, but I've sketched enough to be
confident.

Robin, I know it is not what you have in your head, but you should
stop with the insults and be more open to other perspectives.

> > The per-device data can be allocated at the top of probe and passed
> > through args to fix the lifetime bugs.
> > 
> > It is pretty simple to do.
> 
> I believe the kids these days would say "Say you don't understand the code
> without saying you don't understand the code."

I think you are too fixated on what you have in your mind. It will
take me a bit but I will come with a series to move all the FW parsing
into probe along this model. Once done it is trivial to make bus probe
work like it should.

Regarding this series, I don't really see a problem. There is no
"concrete" or anything like that.

> > Not quite. This decouples two unrelated things into seperate
> > concerns. It is not so much about the concurrency but removing the
> > abuse of dev->iommu by code that has no need to touch it at all.
> 
> Sorry, the "abuse" of storing IOMMU-API-specific data in the place we
> intentionally created to consolidate all the IOMMU-API-specific data
> into?

The global state should not be filled until the driver is probed. It
is an abuse to use a global instead of an on-stack variable when
building it. Publishing only fully initialized data to global
visibility is concurrency 101. :(

Jason

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index ee05f4824bfad1..476938722460b8 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -721,19 +721,29 @@ static struct iommu_domain apple_dart_blocked_domain = {
 
 static struct iommu_device *apple_dart_probe_device(struct device *dev)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	struct apple_dart_master_cfg *cfg;
 	struct apple_dart_stream_map *stream_map;
+	int ret;
 	int i;
 
+	cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
 	if (!cfg)
-		return ERR_PTR(-ENODEV);
+		return ERR_PTR(-ENOMEM);
+	ret = iommu_of_xlate(dev_iommu_fwspec_get(dev), &apple_dart_iommu_ops,
+			     1, &apple_dart_of_xlate, cfg);
+	if (ret)
+		goto err_free;
 
 	for_each_stream_map(i, cfg, stream_map)
 		device_link_add(
 			dev, stream_map->dart->dev,
 			DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
 
+	dev_iommu_priv_set(dev, cfg);
 	return &cfg->stream_maps[0].dart->iommu;
+err_free:
+	kfree(cfg);
+	return ret;
 }
 
 static void apple_dart_release_device(struct device *dev)
@@ -777,25 +787,15 @@ static void apple_dart_domain_free(struct iommu_domain *domain)
 	kfree(dart_domain);
 }
 
-static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
+static int apple_dart_of_xlate(struct iommu_device *iommu,
+			       struct of_phandle_args *args, void *priv)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
-	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
-	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);
-	struct apple_dart *cfg_dart;
-	int i, sid;
+	struct apple_dart *dart = container_of(iommu, struct apple_dart, iommu);
+	struct apple_dart_master_cfg *cfg = priv;
+	struct apple_dart *cfg_dart = cfg->stream_maps[0].dart;
+	int sid = args->args[0];
+	int i;
 
-	if (args->args_count != 1)
-		return -EINVAL;
-	sid = args->args[0];
-
-	if (!cfg)
-		cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
-	if (!cfg)
-		return -ENOMEM;
-	dev_iommu_priv_set(dev, cfg);
-
-	cfg_dart = cfg->stream_maps[0].dart;
 	if (cfg_dart) {
 		if (cfg_dart->supports_bypass != dart->supports_bypass)
 			return -EINVAL;
@@ -980,7 +980,6 @@ static const struct iommu_ops apple_dart_iommu_ops = {
 	.probe_device = apple_dart_probe_device,
 	.release_device = apple_dart_release_device,
 	.device_group = apple_dart_device_group,
-	.of_xlate = apple_dart_of_xlate,
 	.def_domain_type = apple_dart_def_domain_type,
 	.get_resv_regions = apple_dart_get_resv_regions,
 	.pgsize_bitmap = -1UL, /* Restricted during dart probe */

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: acpica-devel@lists.linux.dev,
	"Alyssa Rosenzweig" <alyssa@rosenzweig.io>,
	"Albert Ou" <aou@eecs.berkeley.edu>,
	asahi@lists.linux.dev,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dexuan Cui" <decui@microsoft.com>,
	devicetree@vger.kernel.org,
	"David Woodhouse" <dwmw2@infradead.org>,
	"Frank Rowand" <frowand.list@gmail.com>,
	"Hanjun Guo" <guohanjun@huawei.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	iommu@lists.linux.dev,
	"Jean-Philippe Brucker" <jean-philippe@linaro.org>,
	"Jonathan Hunter" <jonathanh@nvidia.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Len Brown" <lenb@kernel.org>,
	linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-hyperv@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-riscv@lists.infradead.org,
	linux-snps-arc@lists.infradead.org, linux-tegra@vger.kernel.org,
	"Russell King" <linux@armlinux.org.uk>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Hector Martin" <marcan@marcan.st>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	patches@lists.linux.dev,
	"Paul Walmsley" <paul.walmsley@sifive.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Robert Moore" <robert.moore@intel.com>,
	"Rob Herring" <robh+dt@kernel.org>,
	"Sudeep Holla" <sudeep.holla@arm.com>,
	"Suravee Suthikulpanit" <suravee.suthikulpanit@amd.com>,
	"Sven Peter" <sven@svenpeter.dev>,
	"Thierry Reding" <thierry.reding@gmail.com>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	"Krishna Reddy" <vdumpa@nvidia.com>,
	"Vineet Gupta" <vgupta@kernel.org>,
	virtualization@lists.linux.dev, "Wei Liu" <wei.liu@kernel.org>,
	"Will Deacon" <will@kernel.org>,
	"André Draszik" <andre.draszik@linaro.org>,
	"Lu Baolu" <baolu.lu@linux.intel.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Jerry Snitselaar" <jsnitsel@redhat.com>,
	"Moritz Fischer" <mdf@kernel.org>,
	"Zhenhua Huang" <quic_zhenhuah@quicinc.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	"Rob Herring" <robh@kernel.org>
Subject: Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
Date: Tue, 21 Nov 2023 13:55:57 -0400	[thread overview]
Message-ID: <20231121175557.GH6083@nvidia.com> (raw)
In-Reply-To: <e926d2d8-8209-4c0f-a0cb-dcea4edf839e@arm.com>

On Tue, Nov 21, 2023 at 04:06:15PM +0000, Robin Murphy wrote:

> > Obviously. I rejected that right away because of how incredibly
> > wrongly layered and hacky it is to do something like that.
> 
> What, and dressing up the fundamental layering violation by baking it even
> further into the API flow, while still not actually fixing it or any of its
> *other* symptoms, is somehow better?

It puts the locks and the controlled data in the right place, in the
right layer.

> Ultimately, this series is still basically doing the same thing my patch
> does - extending the scope of the existing iommu_probe_device_lock hack to
> cover fwspec creation. A hack is a hack, so frankly I'd rather it be simple
> and obvious and look like one, and being easy to remove again is an obvious
> bonus too.

Not entirely, as I've said this series removes the use of the dev->iommu
during fwspec creation. This is different than just extending the lock.

> > So, I would move toward having the driver's probe invoke a helper like:
> > 
> >     iommu_of_xlate(dev, fwspec, &per_fwnode_function, &args);
> > 
> > Which generates the same list of (fwnode_handle, of_phandle_args) that
> > was passed to of_xlate today, but is ordered sensibly within the
> > sequence of probe for what many drivers seem to want to do.
> 
> Grep for of_xlate. It is a standard and well-understood callback pattern for
> a subsystem to parse a common DT binding and pass a driver-specific
> specifier to a driver to interpret. Or maybe you just have a peculiar
> definition of what you think "micro-optimisation" means? :/

Don't know about other subsystems, here is making a mess. Look at what
the drivers are doing. The two SMMU drivers are sort of sane, but
everything else has coded half their pobe into of_xlate. It doesn't
make alot of sense.

> > So, it is not so much that that the idea of of_xlate goes away, but
> > the specific op->of_xlate does, it gets shifted into a helper that
> > invokes the same function in a more logical spot.
> 
> I'm curious how you imagine an IOMMU driver's ->probe function could be
> called *before* parsing the firmware to work out what, if any, IOMMU, and
> thus driver, a device is associated with.

You've jumped ahead here, I'm talking about removing ops->of_xlate.

With everything kept the same as today this means we'd scan the FW
description twice. Once to find the ops and once inside the driver to
parse it.

When I say micro-optimization this is what I mean - structuring the
code to try to avoid multiple-scans of the FW table.

Once the drivers are self-parsing I see there are two paths to keep it
at one FW parse:

 1) Have the core code parse and cache common cases in the iommu_fwspec.
    Driver then pulls out the common cases from the iommu_fwspec and
    reparsed in uncommon cases.

 2) Accept we have only a single ops in all real systems and just
    invoke the driver to parse it. That parse will cache enough
    information to decide if EPROBE_DEFER is needed.

In either case the drivers would call the same APIs and have the same
logic. The choice is just infrastructure-side stuff.

It is a different view than trying to do everything up front, but I'm
not seeing that it is so differently efficient as to be a performance
problem.

On the plus side it looks to make the drivers alot simpler and more
logical.

> And then every driver has to have
> identical boilerplate to go off and parse the generic "iommus" binding...
> which is the whole damn reason for *not* going down that route and instead
> using an of_xlate mechanism in the first place.

Let's not guess. I've attached below a sketch conversion for
apple-dart.

Diffstat says it *saves* 1 line. But also we fix several bugs!

 - iommu_of_xlate() will check for NULL fwspec and reject the bus
   probe path

 - iommu_of_xlate() can match the iommu's from the iommu list and
   check that the OF actually points only to iommus with this driver's
   ops (missed sanity check)

 - The of parsing machinery already computes the iommu_driver but throws
   it out. This forces all of the drivers to do their own thing. Pass
   it in and thus apple_dart_of_xlate() doesn't need to mess around
   with of_find_device_by_node() and we fix the bug where it leaks the
   reference on the struct device (woops!)

 - We don't leak the cfg allocation that apple_dart_of_xlate() did on
   various error paths. All error paths logically free in probe. We
   don't have to think about what happens if of_xlate is called twice
   for the same args on error/reprobe paths.

Many drivers follow this pattern of apple-dart and would end up like this.

Drivers that just need the u32 array would be simpler, more like:

   struct iommu_driver *instance;
   unsigned int num_ids;

   instance = iommu_of_get_iommu(..., &num_ids);
   if (IS_ERR(instance))
      return ERR_CAST(instance);
   cfg = kzalloc(struct_size(cfg, sids, num_ids), GFP_KERNEL);
   if (!cfg)
      return -ENOMEM;
   iommu_of_read_u32_ids(..., &cfg->sids);
   [..]
   return instance;

I haven't sketched *every* driver yet, but I've sketched enough to be
confident.

Robin, I know it is not what you have in your head, but you should
stop with the insults and be more open to other perspectives.

> > The per-device data can be allocated at the top of probe and passed
> > through args to fix the lifetime bugs.
> > 
> > It is pretty simple to do.
> 
> I believe the kids these days would say "Say you don't understand the code
> without saying you don't understand the code."

I think you are too fixated on what you have in your mind. It will
take me a bit but I will come with a series to move all the FW parsing
into probe along this model. Once done it is trivial to make bus probe
work like it should.

Regarding this series, I don't really see a problem. There is no
"concrete" or anything like that.

> > Not quite. This decouples two unrelated things into seperate
> > concerns. It is not so much about the concurrency but removing the
> > abuse of dev->iommu by code that has no need to touch it at all.
> 
> Sorry, the "abuse" of storing IOMMU-API-specific data in the place we
> intentionally created to consolidate all the IOMMU-API-specific data
> into?

The global state should not be filled until the driver is probed. It
is an abuse to use a global instead of an on-stack variable when
building it. Publishing only fully initialized data to global
visibility is concurrency 101. :(

Jason

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index ee05f4824bfad1..476938722460b8 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -721,19 +721,29 @@ static struct iommu_domain apple_dart_blocked_domain = {
 
 static struct iommu_device *apple_dart_probe_device(struct device *dev)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	struct apple_dart_master_cfg *cfg;
 	struct apple_dart_stream_map *stream_map;
+	int ret;
 	int i;
 
+	cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
 	if (!cfg)
-		return ERR_PTR(-ENODEV);
+		return ERR_PTR(-ENOMEM);
+	ret = iommu_of_xlate(dev_iommu_fwspec_get(dev), &apple_dart_iommu_ops,
+			     1, &apple_dart_of_xlate, cfg);
+	if (ret)
+		goto err_free;
 
 	for_each_stream_map(i, cfg, stream_map)
 		device_link_add(
 			dev, stream_map->dart->dev,
 			DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
 
+	dev_iommu_priv_set(dev, cfg);
 	return &cfg->stream_maps[0].dart->iommu;
+err_free:
+	kfree(cfg);
+	return ret;
 }
 
 static void apple_dart_release_device(struct device *dev)
@@ -777,25 +787,15 @@ static void apple_dart_domain_free(struct iommu_domain *domain)
 	kfree(dart_domain);
 }
 
-static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
+static int apple_dart_of_xlate(struct iommu_device *iommu,
+			       struct of_phandle_args *args, void *priv)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
-	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
-	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);
-	struct apple_dart *cfg_dart;
-	int i, sid;
+	struct apple_dart *dart = container_of(iommu, struct apple_dart, iommu);
+	struct apple_dart_master_cfg *cfg = priv;
+	struct apple_dart *cfg_dart = cfg->stream_maps[0].dart;
+	int sid = args->args[0];
+	int i;
 
-	if (args->args_count != 1)
-		return -EINVAL;
-	sid = args->args[0];
-
-	if (!cfg)
-		cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
-	if (!cfg)
-		return -ENOMEM;
-	dev_iommu_priv_set(dev, cfg);
-
-	cfg_dart = cfg->stream_maps[0].dart;
 	if (cfg_dart) {
 		if (cfg_dart->supports_bypass != dart->supports_bypass)
 			return -EINVAL;
@@ -980,7 +980,6 @@ static const struct iommu_ops apple_dart_iommu_ops = {
 	.probe_device = apple_dart_probe_device,
 	.release_device = apple_dart_release_device,
 	.device_group = apple_dart_device_group,
-	.of_xlate = apple_dart_of_xlate,
 	.def_domain_type = apple_dart_def_domain_type,
 	.get_resv_regions = apple_dart_get_resv_regions,
 	.pgsize_bitmap = -1UL, /* Restricted during dart probe */

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: acpica-devel@lists.linux.dev,
	"Alyssa Rosenzweig" <alyssa@rosenzweig.io>,
	"Albert Ou" <aou@eecs.berkeley.edu>,
	asahi@lists.linux.dev,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dexuan Cui" <decui@microsoft.com>,
	devicetree@vger.kernel.org,
	"David Woodhouse" <dwmw2@infradead.org>,
	"Frank Rowand" <frowand.list@gmail.com>,
	"Hanjun Guo" <guohanjun@huawei.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	iommu@lists.linux.dev,
	"Jean-Philippe Brucker" <jean-philippe@linaro.org>,
	"Jonathan Hunter" <jonathanh@nvidia.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Len Brown" <lenb@kernel.org>,
	linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-hyperv@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-riscv@lists.infradead.org,
	linux-snps-arc@lists.infradead.org, linux-tegra@vger.kernel.org,
	"Russell King" <linux@armlinux.org.uk>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Hector Martin" <marcan@marcan.st>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	patches@lists.linux.dev,
	"Paul Walmsley" <paul.walmsley@sifive.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Robert Moore" <robert.moore@intel.com>,
	"Rob Herring" <robh+dt@kernel.org>,
	"Sudeep Holla" <sudeep.holla@arm.com>,
	"Suravee Suthikulpanit" <suravee.suthikulpanit@amd.com>,
	"Sven Peter" <sven@svenpeter.dev>,
	"Thierry Reding" <thierry.reding@gmail.com>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	"Krishna Reddy" <vdumpa@nvidia.com>,
	"Vineet Gupta" <vgupta@kernel.org>,
	virtualization@lists.linux.dev, "Wei Liu" <wei.liu@kernel.org>,
	"Will Deacon" <will@kernel.org>,
	"André Draszik" <andre.draszik@linaro.org>,
	"Lu Baolu" <baolu.lu@linux.intel.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Jerry Snitselaar" <jsnitsel@redhat.com>,
	"Moritz Fischer" <mdf@kernel.org>,
	"Zhenhua Huang" <quic_zhenhuah@quicinc.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	"Rob Herring" <robh@kernel.org>
Subject: Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec
Date: Tue, 21 Nov 2023 13:55:57 -0400	[thread overview]
Message-ID: <20231121175557.GH6083@nvidia.com> (raw)
In-Reply-To: <e926d2d8-8209-4c0f-a0cb-dcea4edf839e@arm.com>

On Tue, Nov 21, 2023 at 04:06:15PM +0000, Robin Murphy wrote:

> > Obviously. I rejected that right away because of how incredibly
> > wrongly layered and hacky it is to do something like that.
> 
> What, and dressing up the fundamental layering violation by baking it even
> further into the API flow, while still not actually fixing it or any of its
> *other* symptoms, is somehow better?

It puts the locks and the controlled data in the right place, in the
right layer.

> Ultimately, this series is still basically doing the same thing my patch
> does - extending the scope of the existing iommu_probe_device_lock hack to
> cover fwspec creation. A hack is a hack, so frankly I'd rather it be simple
> and obvious and look like one, and being easy to remove again is an obvious
> bonus too.

Not entirely, as I've said this series removes the use of the dev->iommu
during fwspec creation. This is different than just extending the lock.

> > So, I would move toward having the driver's probe invoke a helper like:
> > 
> >     iommu_of_xlate(dev, fwspec, &per_fwnode_function, &args);
> > 
> > Which generates the same list of (fwnode_handle, of_phandle_args) that
> > was passed to of_xlate today, but is ordered sensibly within the
> > sequence of probe for what many drivers seem to want to do.
> 
> Grep for of_xlate. It is a standard and well-understood callback pattern for
> a subsystem to parse a common DT binding and pass a driver-specific
> specifier to a driver to interpret. Or maybe you just have a peculiar
> definition of what you think "micro-optimisation" means? :/

Don't know about other subsystems, here is making a mess. Look at what
the drivers are doing. The two SMMU drivers are sort of sane, but
everything else has coded half their pobe into of_xlate. It doesn't
make alot of sense.

> > So, it is not so much that that the idea of of_xlate goes away, but
> > the specific op->of_xlate does, it gets shifted into a helper that
> > invokes the same function in a more logical spot.
> 
> I'm curious how you imagine an IOMMU driver's ->probe function could be
> called *before* parsing the firmware to work out what, if any, IOMMU, and
> thus driver, a device is associated with.

You've jumped ahead here, I'm talking about removing ops->of_xlate.

With everything kept the same as today this means we'd scan the FW
description twice. Once to find the ops and once inside the driver to
parse it.

When I say micro-optimization this is what I mean - structuring the
code to try to avoid multiple-scans of the FW table.

Once the drivers are self-parsing I see there are two paths to keep it
at one FW parse:

 1) Have the core code parse and cache common cases in the iommu_fwspec.
    Driver then pulls out the common cases from the iommu_fwspec and
    reparsed in uncommon cases.

 2) Accept we have only a single ops in all real systems and just
    invoke the driver to parse it. That parse will cache enough
    information to decide if EPROBE_DEFER is needed.

In either case the drivers would call the same APIs and have the same
logic. The choice is just infrastructure-side stuff.

It is a different view than trying to do everything up front, but I'm
not seeing that it is so differently efficient as to be a performance
problem.

On the plus side it looks to make the drivers alot simpler and more
logical.

> And then every driver has to have
> identical boilerplate to go off and parse the generic "iommus" binding...
> which is the whole damn reason for *not* going down that route and instead
> using an of_xlate mechanism in the first place.

Let's not guess. I've attached below a sketch conversion for
apple-dart.

Diffstat says it *saves* 1 line. But also we fix several bugs!

 - iommu_of_xlate() will check for NULL fwspec and reject the bus
   probe path

 - iommu_of_xlate() can match the iommu's from the iommu list and
   check that the OF actually points only to iommus with this driver's
   ops (missed sanity check)

 - The of parsing machinery already computes the iommu_driver but throws
   it out. This forces all of the drivers to do their own thing. Pass
   it in and thus apple_dart_of_xlate() doesn't need to mess around
   with of_find_device_by_node() and we fix the bug where it leaks the
   reference on the struct device (woops!)

 - We don't leak the cfg allocation that apple_dart_of_xlate() did on
   various error paths. All error paths logically free in probe. We
   don't have to think about what happens if of_xlate is called twice
   for the same args on error/reprobe paths.

Many drivers follow this pattern of apple-dart and would end up like this.

Drivers that just need the u32 array would be simpler, more like:

   struct iommu_driver *instance;
   unsigned int num_ids;

   instance = iommu_of_get_iommu(..., &num_ids);
   if (IS_ERR(instance))
      return ERR_CAST(instance);
   cfg = kzalloc(struct_size(cfg, sids, num_ids), GFP_KERNEL);
   if (!cfg)
      return -ENOMEM;
   iommu_of_read_u32_ids(..., &cfg->sids);
   [..]
   return instance;

I haven't sketched *every* driver yet, but I've sketched enough to be
confident.

Robin, I know it is not what you have in your head, but you should
stop with the insults and be more open to other perspectives.

> > The per-device data can be allocated at the top of probe and passed
> > through args to fix the lifetime bugs.
> > 
> > It is pretty simple to do.
> 
> I believe the kids these days would say "Say you don't understand the code
> without saying you don't understand the code."

I think you are too fixated on what you have in your mind. It will
take me a bit but I will come with a series to move all the FW parsing
into probe along this model. Once done it is trivial to make bus probe
work like it should.

Regarding this series, I don't really see a problem. There is no
"concrete" or anything like that.

> > Not quite. This decouples two unrelated things into seperate
> > concerns. It is not so much about the concurrency but removing the
> > abuse of dev->iommu by code that has no need to touch it at all.
> 
> Sorry, the "abuse" of storing IOMMU-API-specific data in the place we
> intentionally created to consolidate all the IOMMU-API-specific data
> into?

The global state should not be filled until the driver is probed. It
is an abuse to use a global instead of an on-stack variable when
building it. Publishing only fully initialized data to global
visibility is concurrency 101. :(

Jason

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index ee05f4824bfad1..476938722460b8 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -721,19 +721,29 @@ static struct iommu_domain apple_dart_blocked_domain = {
 
 static struct iommu_device *apple_dart_probe_device(struct device *dev)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	struct apple_dart_master_cfg *cfg;
 	struct apple_dart_stream_map *stream_map;
+	int ret;
 	int i;
 
+	cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
 	if (!cfg)
-		return ERR_PTR(-ENODEV);
+		return ERR_PTR(-ENOMEM);
+	ret = iommu_of_xlate(dev_iommu_fwspec_get(dev), &apple_dart_iommu_ops,
+			     1, &apple_dart_of_xlate, cfg);
+	if (ret)
+		goto err_free;
 
 	for_each_stream_map(i, cfg, stream_map)
 		device_link_add(
 			dev, stream_map->dart->dev,
 			DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
 
+	dev_iommu_priv_set(dev, cfg);
 	return &cfg->stream_maps[0].dart->iommu;
+err_free:
+	kfree(cfg);
+	return ret;
 }
 
 static void apple_dart_release_device(struct device *dev)
@@ -777,25 +787,15 @@ static void apple_dart_domain_free(struct iommu_domain *domain)
 	kfree(dart_domain);
 }
 
-static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
+static int apple_dart_of_xlate(struct iommu_device *iommu,
+			       struct of_phandle_args *args, void *priv)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
-	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
-	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);
-	struct apple_dart *cfg_dart;
-	int i, sid;
+	struct apple_dart *dart = container_of(iommu, struct apple_dart, iommu);
+	struct apple_dart_master_cfg *cfg = priv;
+	struct apple_dart *cfg_dart = cfg->stream_maps[0].dart;
+	int sid = args->args[0];
+	int i;
 
-	if (args->args_count != 1)
-		return -EINVAL;
-	sid = args->args[0];
-
-	if (!cfg)
-		cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
-	if (!cfg)
-		return -ENOMEM;
-	dev_iommu_priv_set(dev, cfg);
-
-	cfg_dart = cfg->stream_maps[0].dart;
 	if (cfg_dart) {
 		if (cfg_dart->supports_bypass != dart->supports_bypass)
 			return -EINVAL;
@@ -980,7 +980,6 @@ static const struct iommu_ops apple_dart_iommu_ops = {
 	.probe_device = apple_dart_probe_device,
 	.release_device = apple_dart_release_device,
 	.device_group = apple_dart_device_group,
-	.of_xlate = apple_dart_of_xlate,
 	.def_domain_type = apple_dart_def_domain_type,
 	.get_resv_regions = apple_dart_get_resv_regions,
 	.pgsize_bitmap = -1UL, /* Restricted during dart probe */

_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

  reply	other threads:[~2023-11-21 17:56 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-15 14:05 [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jason Gunthorpe
2023-11-15 14:05 ` Jason Gunthorpe
2023-11-15 14:05 ` Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 01/17] iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops() Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 02/17] iommmu/of: Do not return struct iommu_ops from of_iommu_configure() Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 03/17] iommu/of: Use -ENODEV consistently in of_iommu_configure() Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:42   ` Jerry Snitselaar
2023-11-15 14:42     ` Jerry Snitselaar
2023-11-15 14:42     ` Jerry Snitselaar
2023-11-15 14:05 ` [PATCH v2 04/17] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id() Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:45   ` Jerry Snitselaar
2023-11-15 14:45     ` Jerry Snitselaar
2023-11-15 14:45     ` Jerry Snitselaar
2023-11-15 14:05 ` [PATCH v2 05/17] iommu: Make iommu_fwspec->ids a distinct allocation Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 06/17] iommu: Add iommu_fwspec_alloc/dealloc() Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-19  8:10   ` Hector Martin
2023-11-19  8:10     ` Hector Martin
2023-11-19  8:10     ` Hector Martin
2023-11-19  9:19     ` Hector Martin
2023-11-19  9:19       ` Hector Martin
2023-11-19  9:19       ` Hector Martin
2023-11-19  9:19       ` Hector Martin
2023-11-19 14:13       ` Jason Gunthorpe
2023-11-19 14:13         ` Jason Gunthorpe
2023-11-19 14:13         ` Jason Gunthorpe
2023-11-19 14:13         ` Jason Gunthorpe
2023-11-21  6:47         ` Hector Martin
2023-11-21  6:47           ` Hector Martin
2023-11-21  6:47           ` Hector Martin
2023-11-21  6:47           ` Hector Martin
2023-11-21 16:00           ` Jason Gunthorpe
2023-11-21 16:00             ` Jason Gunthorpe
2023-11-21 16:00             ` Jason Gunthorpe
2023-11-21 16:00             ` Jason Gunthorpe
2023-11-23  9:08             ` Hector Martin
2023-11-23  9:08               ` Hector Martin
2023-11-23  9:08               ` Hector Martin
2023-11-23  9:08               ` Hector Martin
2023-11-15 14:05 ` [PATCH v2 07/17] iommu: Add iommu_probe_device_fwspec() Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05 ` [PATCH v2 08/17] iommu/of: Do not use dev->iommu within of_iommu_configure() Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:05   ` Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 09/17] iommu: Add iommu_fwspec_append_ids() Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 10/17] acpi: Do not use dev->iommu within acpi_iommu_configure() Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 11/17] iommu: Hold iommu_probe_device_lock while calling ops->of_xlate Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 12/17] iommu: Make iommu_ops_from_fwnode() static Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 15:09   ` Jerry Snitselaar
2023-11-15 15:09     ` Jerry Snitselaar
2023-11-15 15:09     ` Jerry Snitselaar
2023-11-16 14:36   ` Moritz Fischer
2023-11-16 14:36     ` Moritz Fischer
2023-11-16 14:36     ` Moritz Fischer
2023-11-15 14:06 ` [PATCH v2 13/17] iommu: Remove dev_iommu_fwspec_set() Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 14/17] iommu: Remove pointless iommu_fwspec_free() Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 15/17] iommu: Add ops->of_xlate_fwspec() Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 16/17] iommu: Mark dev_iommu_get() with lockdep Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06 ` [PATCH v2 17/17] iommu: Mark dev_iommu_priv_set() with a lockdep Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:06   ` Jason Gunthorpe
2023-11-15 14:54 ` [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec Jerry Snitselaar
2023-11-15 14:54   ` Jerry Snitselaar
2023-11-15 14:54   ` Jerry Snitselaar
2023-11-15 15:22 ` Robin Murphy
2023-11-15 15:22   ` Robin Murphy
2023-11-15 15:22   ` Robin Murphy
2023-11-15 15:36   ` Jason Gunthorpe
2023-11-15 15:36     ` Jason Gunthorpe
2023-11-15 15:36     ` Jason Gunthorpe
2023-11-15 20:23     ` Robin Murphy
2023-11-15 20:23       ` Robin Murphy
2023-11-15 20:23       ` Robin Murphy
2023-11-16  4:17       ` Jason Gunthorpe
2023-11-16  4:17         ` Jason Gunthorpe
2023-11-16  4:17         ` Jason Gunthorpe
2023-11-21 16:06         ` Robin Murphy
2023-11-21 16:06           ` Robin Murphy
2023-11-21 16:06           ` Robin Murphy
2023-11-21 17:55           ` Jason Gunthorpe [this message]
2023-11-21 17:55             ` Jason Gunthorpe
2023-11-21 17:55             ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231121175557.GH6083@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=acpica-devel@lists.linux.dev \
    --cc=alyssa@rosenzweig.io \
    --cc=andre.draszik@linaro.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=asahi@lists.linux.dev \
    --cc=baolu.lu@linux.intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=decui@microsoft.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw2@infradead.org \
    --cc=frowand.list@gmail.com \
    --cc=guohanjun@huawei.com \
    --cc=haiyangz@microsoft.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=jonathanh@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jsnitsel@redhat.com \
    --cc=kys@microsoft.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-snps-arc@lists.infradead.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=lpieralisi@kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=marcan@marcan.st \
    --cc=mdf@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=patches@lists.linux.dev \
    --cc=paul.walmsley@sifive.com \
    --cc=quic_zhenhuah@quicinc.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=robert.moore@intel.com \
    --cc=robh+dt@kernel.org \
    --cc=robh@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=sudeep.holla@arm.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=sven@svenpeter.dev \
    --cc=thierry.reding@gmail.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=vdumpa@nvidia.com \
    --cc=vgupta@kernel.org \
    --cc=virtualization@lists.linux.dev \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.