Linux CXL
 help / color / mirror / Atom feed
* [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
@ 2024-10-16 18:59 Dan Carpenter
  2024-10-16 21:52 ` Dan Williams
  2024-10-16 22:54 ` Dan Williams
  0 siblings, 2 replies; 7+ messages in thread
From: Dan Carpenter @ 2024-10-16 18:59 UTC (permalink / raw)
  To: Li Ming; +Cc: linux-cxl

Hello Li Ming,

Commit 7f569e917b78 ("cxl/port: Use scoped_guard()/guard() to drop
device_lock() for cxl_port") from Aug 30, 2024 (linux-next), leads to
the following (unpublished) Smatch static checker warning:

	drivers/cxl/core/port.c:1591 add_port_attach_ep()
	warn: re-assigning __cleanup__ ptr 'port'

drivers/cxl/core/port.c
    1542 static int add_port_attach_ep(struct cxl_memdev *cxlmd,
    1543                               struct device *uport_dev,
    1544                               struct device *dport_dev)
    1545 {
    1546         struct device *dparent = grandparent(dport_dev);
    1547         struct cxl_dport *dport, *parent_dport;
    1548         resource_size_t component_reg_phys;
    1549         int rc;
    1550 
    1551         if (!dparent) {
    1552                 /*
    1553                  * The iteration reached the topology root without finding the
    1554                  * CXL-root 'cxl_port' on a previous iteration, fail for now to
    1555                  * be re-probed after platform driver attaches.
    1556                  */
    1557                 dev_dbg(&cxlmd->dev, "%s is a root dport\n",
    1558                         dev_name(dport_dev));
    1559                 return -ENXIO;
    1560         }
    1561 
    1562         struct cxl_port *parent_port __free(put_cxl_port) =
    1563                 find_cxl_port(dparent, &parent_dport);
    1564         if (!parent_port) {
    1565                 /* iterate to create this parent_port */
    1566                 return -EAGAIN;
    1567         }
    1568 
    1569         /*
    1570          * Definition with __free() here to keep the sequence of
    1571          * dereferencing the device of the port before the parent_port releasing.
    1572          */
    1573         struct cxl_port *port __free(put_cxl_port) = NULL;
                                  ^^^^^^^^^^^^^^^^^^^^^^^^
We free port when we exit the function, fine.

    1574         scoped_guard(device, &parent_port->dev) {
    1575                 if (!parent_port->dev.driver) {
    1576                         dev_warn(&cxlmd->dev,
    1577                                  "port %s:%s disabled, failed to enumerate CXL.mem\n",
    1578                                  dev_name(&parent_port->dev), dev_name(uport_dev));
    1579                         return -ENXIO;
    1580                 }
    1581 
    1582                 port = find_cxl_port_at(parent_port, dport_dev, &dport);
    1583                 if (!port) {
    1584                         component_reg_phys = find_component_registers(uport_dev);
    1585                         port = devm_cxl_add_port(&parent_port->dev, uport_dev,
    1586                                                  component_reg_phys, parent_dport);

This port from devm_cxl_add_port() needs to be undone.

    1587                         if (IS_ERR(port))
    1588                                 return PTR_ERR(port);
    1589 
    1590                         /* retry find to pick up the new dport information */
--> 1591                         port = find_cxl_port_at(parent_port, dport_dev, &dport);

But we re-assign port here so it will only free this port and not the one from
devm_cxl_add_port().

    1592                         if (!port)
    1593                                 return -ENXIO;
    1594                 }
    1595         }
    1596 
    1597         dev_dbg(&cxlmd->dev, "add to new port %s:%s\n",
    1598                 dev_name(&port->dev), dev_name(port->uport_dev));
    1599         rc = cxl_add_ep(dport, &cxlmd->dev);
    1600         if (rc == -EBUSY) {
    1601                 /*
    1602                  * "can't" happen, but this error code means
    1603                  * something to the caller, so translate it.
    1604                  */
    1605                 rc = -ENXIO;
    1606         }
    1607 
    1608         return rc;
    1609 }

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
  2024-10-16 18:59 [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port Dan Carpenter
@ 2024-10-16 21:52 ` Dan Williams
  2024-10-16 22:54 ` Dan Williams
  1 sibling, 0 replies; 7+ messages in thread
From: Dan Williams @ 2024-10-16 21:52 UTC (permalink / raw)
  To: Dan Carpenter, Li Ming; +Cc: linux-cxl

Dan Carpenter wrote:
> Hello Li Ming,
> 
> Commit 7f569e917b78 ("cxl/port: Use scoped_guard()/guard() to drop
> device_lock() for cxl_port") from Aug 30, 2024 (linux-next), leads to
> the following (unpublished) Smatch static checker warning:
> 
> 	drivers/cxl/core/port.c:1591 add_port_attach_ep()
> 	warn: re-assigning __cleanup__ ptr 'port'
> 
> drivers/cxl/core/port.c
>     1542 static int add_port_attach_ep(struct cxl_memdev *cxlmd,
>     1543                               struct device *uport_dev,
>     1544                               struct device *dport_dev)
>     1545 {
>     1546         struct device *dparent = grandparent(dport_dev);
>     1547         struct cxl_dport *dport, *parent_dport;
>     1548         resource_size_t component_reg_phys;
>     1549         int rc;
>     1550 
>     1551         if (!dparent) {
>     1552                 /*
>     1553                  * The iteration reached the topology root without finding the
>     1554                  * CXL-root 'cxl_port' on a previous iteration, fail for now to
>     1555                  * be re-probed after platform driver attaches.
>     1556                  */
>     1557                 dev_dbg(&cxlmd->dev, "%s is a root dport\n",
>     1558                         dev_name(dport_dev));
>     1559                 return -ENXIO;
>     1560         }
>     1561 
>     1562         struct cxl_port *parent_port __free(put_cxl_port) =
>     1563                 find_cxl_port(dparent, &parent_dport);
>     1564         if (!parent_port) {
>     1565                 /* iterate to create this parent_port */
>     1566                 return -EAGAIN;
>     1567         }
>     1568 
>     1569         /*
>     1570          * Definition with __free() here to keep the sequence of
>     1571          * dereferencing the device of the port before the parent_port releasing.
>     1572          */
>     1573         struct cxl_port *port __free(put_cxl_port) = NULL;
>                                   ^^^^^^^^^^^^^^^^^^^^^^^^
> We free port when we exit the function, fine.
> 
>     1574         scoped_guard(device, &parent_port->dev) {
>     1575                 if (!parent_port->dev.driver) {
>     1576                         dev_warn(&cxlmd->dev,
>     1577                                  "port %s:%s disabled, failed to enumerate CXL.mem\n",
>     1578                                  dev_name(&parent_port->dev), dev_name(uport_dev));
>     1579                         return -ENXIO;
>     1580                 }
>     1581 
>     1582                 port = find_cxl_port_at(parent_port, dport_dev, &dport);
>     1583                 if (!port) {
>     1584                         component_reg_phys = find_component_registers(uport_dev);
>     1585                         port = devm_cxl_add_port(&parent_port->dev, uport_dev,
>     1586                                                  component_reg_phys, parent_dport);
> 
> This port from devm_cxl_add_port() needs to be undone.
> 
>     1587                         if (IS_ERR(port))
>     1588                                 return PTR_ERR(port);
>     1589 
>     1590                         /* retry find to pick up the new dport information */
> --> 1591                         port = find_cxl_port_at(parent_port, dport_dev, &dport);
> 
> But we re-assign port here so it will only free this port and not the one from
> devm_cxl_add_port().

Yup. I must say I really do not like what scoped_guard() does to code
flow. If conversion to cleanup helpers causes this much code indentation
then the value is low, especially when it hides tricky bugs.

Lets revert the scoped_guard() conversions for now.

I would much prefer factoring out code sections that need a lock to
their own helper that uses guard() rather than convert inline code to
scoped_guard().

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
  2024-10-16 18:59 [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port Dan Carpenter
  2024-10-16 21:52 ` Dan Williams
@ 2024-10-16 22:54 ` Dan Williams
  2024-10-17 15:04   ` Jonathan Cameron
  1 sibling, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-10-16 22:54 UTC (permalink / raw)
  To: Dan Carpenter, Li Ming; +Cc: linux-cxl

Dan Carpenter wrote:
> Hello Li Ming,
> 
> Commit 7f569e917b78 ("cxl/port: Use scoped_guard()/guard() to drop
> device_lock() for cxl_port") from Aug 30, 2024 (linux-next), leads to
> the following (unpublished) Smatch static checker warning:
> 
> 	drivers/cxl/core/port.c:1591 add_port_attach_ep()
> 	warn: re-assigning __cleanup__ ptr 'port'
> 
> drivers/cxl/core/port.c
>     1542 static int add_port_attach_ep(struct cxl_memdev *cxlmd,
>     1543                               struct device *uport_dev,
>     1544                               struct device *dport_dev)
>     1545 {
>     1546         struct device *dparent = grandparent(dport_dev);
>     1547         struct cxl_dport *dport, *parent_dport;
>     1548         resource_size_t component_reg_phys;
>     1549         int rc;
>     1550 
>     1551         if (!dparent) {
>     1552                 /*
>     1553                  * The iteration reached the topology root without finding the
>     1554                  * CXL-root 'cxl_port' on a previous iteration, fail for now to
>     1555                  * be re-probed after platform driver attaches.
>     1556                  */
>     1557                 dev_dbg(&cxlmd->dev, "%s is a root dport\n",
>     1558                         dev_name(dport_dev));
>     1559                 return -ENXIO;
>     1560         }
>     1561 
>     1562         struct cxl_port *parent_port __free(put_cxl_port) =
>     1563                 find_cxl_port(dparent, &parent_dport);
>     1564         if (!parent_port) {
>     1565                 /* iterate to create this parent_port */
>     1566                 return -EAGAIN;
>     1567         }
>     1568 
>     1569         /*
>     1570          * Definition with __free() here to keep the sequence of
>     1571          * dereferencing the device of the port before the parent_port releasing.
>     1572          */
>     1573         struct cxl_port *port __free(put_cxl_port) = NULL;
>                                   ^^^^^^^^^^^^^^^^^^^^^^^^
> We free port when we exit the function, fine.
> 
>     1574         scoped_guard(device, &parent_port->dev) {
>     1575                 if (!parent_port->dev.driver) {
>     1576                         dev_warn(&cxlmd->dev,
>     1577                                  "port %s:%s disabled, failed to enumerate CXL.mem\n",
>     1578                                  dev_name(&parent_port->dev), dev_name(uport_dev));
>     1579                         return -ENXIO;
>     1580                 }
>     1581 
>     1582                 port = find_cxl_port_at(parent_port, dport_dev, &dport);
>     1583                 if (!port) {
>     1584                         component_reg_phys = find_component_registers(uport_dev);
>     1585                         port = devm_cxl_add_port(&parent_port->dev, uport_dev,
>     1586                                                  component_reg_phys, parent_dport);
> 
> This port from devm_cxl_add_port() needs to be undone.

I also think the bug originates in:

dd2617ebd2a6 cxl/port: Use __free() to drop put_device() for cxl_port

...where the wrong port is cleaned up, but I want to revert the
scoped_guard() conversion first to make that cleanup easier.

In general for CXL I want to say that no function should be converted to
use cleanup helpers unless all gotos are removed at once, and if the
conversion needs to reach for scoped_guard() reconsider even attempting
the conversion. I.e. scoped_guard() is a leading indicator for needing
code refactoring.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
  2024-10-16 22:54 ` Dan Williams
@ 2024-10-17 15:04   ` Jonathan Cameron
  2024-10-17 16:32     ` Dan Williams
  2024-10-17 18:56     ` Dan Carpenter
  0 siblings, 2 replies; 7+ messages in thread
From: Jonathan Cameron @ 2024-10-17 15:04 UTC (permalink / raw)
  To: Dan Williams; +Cc: Dan Carpenter, Li Ming, linux-cxl

On Wed, 16 Oct 2024 15:54:00 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Dan Carpenter wrote:
> > Hello Li Ming,
> > 
> > Commit 7f569e917b78 ("cxl/port: Use scoped_guard()/guard() to drop
> > device_lock() for cxl_port") from Aug 30, 2024 (linux-next), leads to
> > the following (unpublished) Smatch static checker warning:
> > 
> > 	drivers/cxl/core/port.c:1591 add_port_attach_ep()
> > 	warn: re-assigning __cleanup__ ptr 'port'
> > 
> > drivers/cxl/core/port.c
> >     1542 static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> >     1543                               struct device *uport_dev,
> >     1544                               struct device *dport_dev)
> >     1545 {
> >     1546         struct device *dparent = grandparent(dport_dev);
> >     1547         struct cxl_dport *dport, *parent_dport;
> >     1548         resource_size_t component_reg_phys;
> >     1549         int rc;
> >     1550 
> >     1551         if (!dparent) {
> >     1552                 /*
> >     1553                  * The iteration reached the topology root without finding the
> >     1554                  * CXL-root 'cxl_port' on a previous iteration, fail for now to
> >     1555                  * be re-probed after platform driver attaches.
> >     1556                  */
> >     1557                 dev_dbg(&cxlmd->dev, "%s is a root dport\n",
> >     1558                         dev_name(dport_dev));
> >     1559                 return -ENXIO;
> >     1560         }
> >     1561 
> >     1562         struct cxl_port *parent_port __free(put_cxl_port) =
> >     1563                 find_cxl_port(dparent, &parent_dport);
> >     1564         if (!parent_port) {
> >     1565                 /* iterate to create this parent_port */
> >     1566                 return -EAGAIN;
> >     1567         }
> >     1568 
> >     1569         /*
> >     1570          * Definition with __free() here to keep the sequence of
> >     1571          * dereferencing the device of the port before the parent_port releasing.
> >     1572          */
> >     1573         struct cxl_port *port __free(put_cxl_port) = NULL;
> >                                   ^^^^^^^^^^^^^^^^^^^^^^^^
> > We free port when we exit the function, fine.
> > 
> >     1574         scoped_guard(device, &parent_port->dev) {
> >     1575                 if (!parent_port->dev.driver) {
> >     1576                         dev_warn(&cxlmd->dev,
> >     1577                                  "port %s:%s disabled, failed to enumerate CXL.mem\n",
> >     1578                                  dev_name(&parent_port->dev), dev_name(uport_dev));
> >     1579                         return -ENXIO;
> >     1580                 }
> >     1581 
> >     1582                 port = find_cxl_port_at(parent_port, dport_dev, &dport);
> >     1583                 if (!port) {
> >     1584                         component_reg_phys = find_component_registers(uport_dev);
> >     1585                         port = devm_cxl_add_port(&parent_port->dev, uport_dev,

> >     1586                                                  component_reg_phys, parent_dport);
> > 
> > This port from devm_cxl_add_port() needs to be undone.  

devm cleanup should sweep that up if we suceed here but fail on one of the remaining calls.

> 
> I also think the bug originates in:
> 
> dd2617ebd2a6 cxl/port: Use __free() to drop put_device() for cxl_port
> 
> ...where the wrong port is cleaned up, but I want to revert the
> scoped_guard() conversion first to make that cleanup easier.
> 
> In general for CXL I want to say that no function should be converted to
> use cleanup helpers unless all gotos are removed at once, and if the
> conversion needs to reach for scoped_guard() reconsider even attempting
> the conversion. I.e. scoped_guard() is a leading indicator for needing
> code refactoring.

I don't think it's a bug and ultimately Dan C didn't say it was.
It's ugly but a simpler path to resolve it logically is to
stop using the variable port for two purposes.


struct cxl_port *port __free(put_cxl_port) = NULL;
	scoped_guard(device, &parent_port->dev) {
		if (!parent_port->dev.driver) {
			dev_warn(&cxlmd->dev,
				 "port %s:%s disabled, failed to enumerate CXL.mem\n",
				 dev_name(&parent_port->dev), dev_name(uport_dev));
			return -ENXIO;
		}

		port = find_cxl_port_at(parent_port, dport_dev, &dport);
		if (!port) {
			struct cxl_dport *yadp;

			component_reg_phys = find_component_registers(uport_dev);
//rename (yet another dport :)

			yadp = devm_cxl_add_port(&parent_port->dev, uport_dev,
						 component_reg_phys, parent_dport);
			if (IS_ERR(yadp))
				return PTR_ERR(yadp);
//port is correctly null. We haven't found one yet, so all the auto cleanup is fine.


			/* retry find to pick up the new dport information */
			port = find_cxl_port_at(parent_port, dport_dev, &dport);
			if (!port)
				return -ENXIO;
		}
	}

Whilst I don't like the code, I'm not sure a revert is the best way out.

Jonathan

> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
  2024-10-17 15:04   ` Jonathan Cameron
@ 2024-10-17 16:32     ` Dan Williams
  2024-10-17 17:12       ` Jonathan Cameron
  2024-10-17 18:56     ` Dan Carpenter
  1 sibling, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-10-17 16:32 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: Dan Carpenter, Li Ming, linux-cxl

Jonathan Cameron wrote:
[..]
> > In general for CXL I want to say that no function should be converted to
> > use cleanup helpers unless all gotos are removed at once, and if the
> > conversion needs to reach for scoped_guard() reconsider even attempting
> > the conversion. I.e. scoped_guard() is a leading indicator for needing
> > code refactoring.
> 
> I don't think it's a bug and ultimately Dan C didn't say it was.
> It's ugly but a simpler path to resolve it logically is to
> stop using the variable port for two purposes.
[..]
> 			/* retry find to pick up the new dport information */
> 			port = find_cxl_port_at(parent_port, dport_dev, &dport);
> 			if (!port)
> 				return -ENXIO;
> 		}
> 	}
> 
> Whilst I don't like the code, I'm not sure a revert is the best way out.

The revert is for the scoped_guard() conversion which was, innocently,
trying to preserve the subtlety of the existing code.

It is true that the subsequent find_cxl_port_at() saves this being an
actual bug by elevating the new port's refcount, but it is subtle beyond
reason. This whole function needs a re-think, not more band-aids. I will
fix up the reverts to drop "Fixes:" since you are right there is no
actual bug, and those can wait for 6.13, but intent is to say "let's not
use scoped_guard() in CXL without an exceedingly good reason".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
  2024-10-17 16:32     ` Dan Williams
@ 2024-10-17 17:12       ` Jonathan Cameron
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Cameron @ 2024-10-17 17:12 UTC (permalink / raw)
  To: Dan Williams; +Cc: Dan Carpenter, Li Ming, linux-cxl

On Thu, 17 Oct 2024 09:32:23 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> [..]
> > > In general for CXL I want to say that no function should be converted to
> > > use cleanup helpers unless all gotos are removed at once, and if the
> > > conversion needs to reach for scoped_guard() reconsider even attempting
> > > the conversion. I.e. scoped_guard() is a leading indicator for needing
> > > code refactoring.  
> > 
> > I don't think it's a bug and ultimately Dan C didn't say it was.
> > It's ugly but a simpler path to resolve it logically is to
> > stop using the variable port for two purposes.  
> [..]
> > 			/* retry find to pick up the new dport information */
> > 			port = find_cxl_port_at(parent_port, dport_dev, &dport);
> > 			if (!port)
> > 				return -ENXIO;
> > 		}
> > 	}
> > 
> > Whilst I don't like the code, I'm not sure a revert is the best way out.  
> 
> The revert is for the scoped_guard() conversion which was, innocently,
> trying to preserve the subtlety of the existing code.
> 
> It is true that the subsequent find_cxl_port_at() saves this being an
> actual bug by elevating the new port's refcount, but it is subtle beyond
> reason. This whole function needs a re-think, not more band-aids. I will
> fix up the reverts to drop "Fixes:" since you are right there is no
> actual bug, and those can wait for 6.13, but intent is to say "let's not
> use scoped_guard() in CXL without an exceedingly good reason".
> 
That's fair.  Let's also rename that variable so there is less subtle
code involved.

Jonathan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
  2024-10-17 15:04   ` Jonathan Cameron
  2024-10-17 16:32     ` Dan Williams
@ 2024-10-17 18:56     ` Dan Carpenter
  1 sibling, 0 replies; 7+ messages in thread
From: Dan Carpenter @ 2024-10-17 18:56 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: Dan Williams, Li Ming, linux-cxl

On Thu, Oct 17, 2024 at 04:04:45PM +0100, Jonathan Cameron wrote:
> > >     1582                 port = find_cxl_port_at(parent_port, dport_dev, &dport);
> > >     1583                 if (!port) {
> > >     1584                         component_reg_phys = find_component_registers(uport_dev);
> > >     1585                         port = devm_cxl_add_port(&parent_port->dev, uport_dev,
> 
> > >     1586                                                  component_reg_phys, parent_dport);
> > > 
> > > This port from devm_cxl_add_port() needs to be undone.  
> 
> devm cleanup should sweep that up if we suceed here but fail on one of the remaining calls.
> 
> > 
> > I also think the bug originates in:
> > 
> > dd2617ebd2a6 cxl/port: Use __free() to drop put_device() for cxl_port
> > 
> > ...where the wrong port is cleaned up, but I want to revert the
> > scoped_guard() conversion first to make that cleanup easier.
> > 
> > In general for CXL I want to say that no function should be converted to
> > use cleanup helpers unless all gotos are removed at once, and if the
> > conversion needs to reach for scoped_guard() reconsider even attempting
> > the conversion. I.e. scoped_guard() is a leading indicator for needing
> > code refactoring.
> 
> I don't think it's a bug and ultimately Dan C didn't say it was.

To be honest, I thought it was a bug because I didn't notice it was devm_.
It's so ugly.

> It's ugly but a simpler path to resolve it logically is to
> stop using the variable port for two purposes.

Yes.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-10-17 18:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-16 18:59 [bug report] cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port Dan Carpenter
2024-10-16 21:52 ` Dan Williams
2024-10-16 22:54 ` Dan Williams
2024-10-17 15:04   ` Jonathan Cameron
2024-10-17 16:32     ` Dan Williams
2024-10-17 17:12       ` Jonathan Cameron
2024-10-17 18:56     ` Dan Carpenter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox