* [PATCH] cxl/acpi: Release device after dev_err
@ 2023-07-07 16:16 Breno Leitao
2023-07-07 16:50 ` Dave Jiang
2023-07-07 22:17 ` Alison Schofield
0 siblings, 2 replies; 8+ messages in thread
From: Breno Leitao @ 2023-07-07 16:16 UTC (permalink / raw)
To: alison.schofield, vishal.l.verma, ira.weiny, bwidawsk,
dan.j.williams
Cc: linux-cxl
Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
fails. Kfence drops this message, after the following:
BUG: KFENCE: use-after-free read in resource_string
This is happening in cxl_parse_cfmws(), and here is a simplified flow
that is coming from Kfence.
Use-after-free:
_dev_err
cxl_parse_cfmws
acpi_table_parse_entries_array
acpi_table_parse_cedt
cxl_acpi_probe
Free:
cxl_decoder_release
device_release
kobject_put
cxl_parse_cfmws
acpi_table_parse_entries_array
acpi_table_parse_cedt
cxl_acpi_probe
Alloc:
cxl_decoder_alloc
cxl_parse_cfmws
acpi_table_parse_entries_array
acpi_table_parse_cedt
cxl_acpi_probe
platform_probe
From my reading of the issue, the device struct being used by
dev_err() was removed in the put_device() before.
Put the device just after the message is printed.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
drivers/cxl/acpi.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 658e6b84a769..5179bf4211d8 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
}
rc = cxl_decoder_add(cxld, target_map);
err_xormap:
- if (rc)
- put_device(&cxld->dev);
- else
- rc = cxl_decoder_autoremove(dev, cxld);
if (rc) {
dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
cxld->hpa_range.start, cxld->hpa_range.end);
+ put_device(&cxld->dev);
return 0;
+ } else {
+ rc = cxl_decoder_autoremove(dev, cxld);
}
dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
dev_name(&cxld->dev),
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] cxl/acpi: Release device after dev_err
2023-07-07 16:16 [PATCH] cxl/acpi: Release device after dev_err Breno Leitao
@ 2023-07-07 16:50 ` Dave Jiang
2023-07-10 10:49 ` Breno Leitao
2023-07-07 22:17 ` Alison Schofield
1 sibling, 1 reply; 8+ messages in thread
From: Dave Jiang @ 2023-07-07 16:50 UTC (permalink / raw)
To: Breno Leitao, alison.schofield, vishal.l.verma, ira.weiny,
bwidawsk, dan.j.williams
Cc: linux-cxl
On 7/7/23 09:16, Breno Leitao wrote:
> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> fails. Kfence drops this message, after the following:
>
> BUG: KFENCE: use-after-free read in resource_string
>
> This is happening in cxl_parse_cfmws(), and here is a simplified flow
> that is coming from Kfence.
>
> Use-after-free:
> _dev_err
> cxl_parse_cfmws
> acpi_table_parse_entries_array
> acpi_table_parse_cedt
> cxl_acpi_probe
>
> Free:
> cxl_decoder_release
> device_release
> kobject_put
> cxl_parse_cfmws
> acpi_table_parse_entries_array
> acpi_table_parse_cedt
> cxl_acpi_probe
>
> Alloc:
> cxl_decoder_alloc
> cxl_parse_cfmws
> acpi_table_parse_entries_array
> acpi_table_parse_cedt
> cxl_acpi_probe
> platform_probe
>
> From my reading of the issue, the device struct being used by
> dev_err() was removed in the put_device() before.
>
> Put the device just after the message is printed.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> drivers/cxl/acpi.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 658e6b84a769..5179bf4211d8 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> }
> rc = cxl_decoder_add(cxld, target_map);
> err_xormap:
> - if (rc)
> - put_device(&cxld->dev);
> - else
> - rc = cxl_decoder_autoremove(dev, cxld);
> if (rc) {
> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> cxld->hpa_range.start, cxld->hpa_range.end);
> + put_device(&cxld->dev);
> return 0;
I think you will want to change this to 'return rc;' in order to reflect
the error.
> + } else {
> + rc = cxl_decoder_autoremove(dev, cxld);
> }
> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> dev_name(&cxld->dev),
You also need to change the 'return 0' that follows to 'return rc', to
catch the error from cxl_decoder_autoremove().
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cxl/acpi: Release device after dev_err
2023-07-07 16:16 [PATCH] cxl/acpi: Release device after dev_err Breno Leitao
2023-07-07 16:50 ` Dave Jiang
@ 2023-07-07 22:17 ` Alison Schofield
2023-07-08 0:33 ` Dave Jiang
2023-07-10 10:41 ` Breno Leitao
1 sibling, 2 replies; 8+ messages in thread
From: Alison Schofield @ 2023-07-07 22:17 UTC (permalink / raw)
To: Breno Leitao
Cc: vishal.l.verma, ira.weiny, bwidawsk, dan.j.williams, linux-cxl
On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> fails. Kfence drops this message, after the following:
>
> BUG: KFENCE: use-after-free read in resource_string
>
> This is happening in cxl_parse_cfmws(), and here is a simplified flow
> that is coming from Kfence.
>
> Use-after-free:
> _dev_err
> cxl_parse_cfmws
> acpi_table_parse_entries_array
> acpi_table_parse_cedt
> cxl_acpi_probe
>
> Free:
> cxl_decoder_release
> device_release
> kobject_put
> cxl_parse_cfmws
> acpi_table_parse_entries_array
> acpi_table_parse_cedt
> cxl_acpi_probe
>
> Alloc:
> cxl_decoder_alloc
> cxl_parse_cfmws
> acpi_table_parse_entries_array
> acpi_table_parse_cedt
> cxl_acpi_probe
> platform_probe
>
> From my reading of the issue, the device struct being used by
> dev_err() was removed in the put_device() before.
Hi Breno,
I'm not familiar w kfence, but I don't follow what it finds
suspect here. Does kfence point to exact offensive lines of
code, or ???
The put_device() removed &cxld->dev and the dev_err() that
this patch moves the put after, was using 'dev', which was
assigned from ctx.dev. It is not the same as &cxld->dev. I
wonder if Kfence thinks we can get to the next dev_dbg()
statement and misuse &cxld->dev.
More below...
>
> Put the device just after the message is printed.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> drivers/cxl/acpi.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 658e6b84a769..5179bf4211d8 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> }
> rc = cxl_decoder_add(cxld, target_map);
> err_xormap:
> - if (rc)
> - put_device(&cxld->dev);
> - else
> - rc = cxl_decoder_autoremove(dev, cxld);
> if (rc) {
> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> cxld->hpa_range.start, cxld->hpa_range.end);
> + put_device(&cxld->dev);
> return 0;
> + } else {
> + rc = cxl_decoder_autoremove(dev, cxld);
> }
> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> dev_name(&cxld->dev),
> --
> 2.34.1
>
(pulled in fresh code snippet to get the dev_dbg() in view.)
> }
> rc = cxl_decoder_add(cxld, target_map);
>err_xormap:
> if (rc)
> put_device(&cxld->dev);
>
This puts &cxld->dev, not dev.
>
> else
> rc = cxl_decoder_autoremove(dev, cxld);
> if (rc) {
> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> cxld->hpa_range.start, cxld->hpa_range.end);
> return 0;
This return avoids getting to the next dev_dbg() statement after
put_device(). We cannot get to the next dev_dbg() statement when
rc is non zero, but it seems kfence thinks we can.
> }
> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> dev_name(&cxld->dev),
If we could get here, after the put_device(), that would be bad.
> phys_to_target_node(cxld->hpa_range.start),
> cxld->hpa_range.start, cxld->hpa_range.end);
>
> return 0;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cxl/acpi: Release device after dev_err
2023-07-07 22:17 ` Alison Schofield
@ 2023-07-08 0:33 ` Dave Jiang
2023-07-08 1:07 ` Alison Schofield
2023-07-10 10:41 ` Breno Leitao
1 sibling, 1 reply; 8+ messages in thread
From: Dave Jiang @ 2023-07-08 0:33 UTC (permalink / raw)
To: Alison Schofield, Breno Leitao
Cc: vishal.l.verma, ira.weiny, bwidawsk, dan.j.williams, linux-cxl
On 7/7/23 15:17, Alison Schofield wrote:
> On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
>> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
>> fails. Kfence drops this message, after the following:
>>
>> BUG: KFENCE: use-after-free read in resource_string
>>
>> This is happening in cxl_parse_cfmws(), and here is a simplified flow
>> that is coming from Kfence.
>>
>> Use-after-free:
>> _dev_err
>> cxl_parse_cfmws
>> acpi_table_parse_entries_array
>> acpi_table_parse_cedt
>> cxl_acpi_probe
>>
>> Free:
>> cxl_decoder_release
>> device_release
>> kobject_put
>> cxl_parse_cfmws
>> acpi_table_parse_entries_array
>> acpi_table_parse_cedt
>> cxl_acpi_probe
>>
>> Alloc:
>> cxl_decoder_alloc
>> cxl_parse_cfmws
>> acpi_table_parse_entries_array
>> acpi_table_parse_cedt
>> cxl_acpi_probe
>> platform_probe
>>
>> From my reading of the issue, the device struct being used by
>> dev_err() was removed in the put_device() before.
>
> Hi Breno,
>
> I'm not familiar w kfence, but I don't follow what it finds
> suspect here. Does kfence point to exact offensive lines of
> code, or ???
>
> The put_device() removed &cxld->dev and the dev_err() that
> this patch moves the put after, was using 'dev', which was
> assigned from ctx.dev. It is not the same as &cxld->dev. I
> wonder if Kfence thinks we can get to the next dev_dbg()
> statement and misuse &cxld->dev.
>
> More below...
>
>
>>
>> Put the device just after the message is printed.
>>
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>> drivers/cxl/acpi.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index 658e6b84a769..5179bf4211d8 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>> }
>> rc = cxl_decoder_add(cxld, target_map);
>> err_xormap:
>> - if (rc)
>> - put_device(&cxld->dev);
>> - else
>> - rc = cxl_decoder_autoremove(dev, cxld);
>> if (rc) {
>> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>> cxld->hpa_range.start, cxld->hpa_range.end);
>> + put_device(&cxld->dev);
>> return 0;
>> + } else {
>> + rc = cxl_decoder_autoremove(dev, cxld);
>> }
>> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>> dev_name(&cxld->dev),
>> --
>> 2.34.1
>>
>
> (pulled in fresh code snippet to get the dev_dbg() in view.)
>
>> }
>> rc = cxl_decoder_add(cxld, target_map);
>> err_xormap:
>> if (rc)
>> put_device(&cxld->dev);
>>
>
> This puts &cxld->dev, not dev.
>
>>
>> else
>> rc = cxl_decoder_autoremove(dev, cxld);
>> if (rc) {
>> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>> cxld->hpa_range.start, cxld->hpa_range.end);
>> return 0;
>
> This return avoids getting to the next dev_dbg() statement after
> put_device(). We cannot get to the next dev_dbg() statement when
> rc is non zero, but it seems kfence thinks we can.
>
>> }
>> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>> dev_name(&cxld->dev),
>
> If we could get here, after the put_device(), that would be bad.
>
>> phys_to_target_node(cxld->hpa_range.start),
>> cxld->hpa_range.start, cxld->hpa_range.end);
>>
>> return 0;
>
Seems like the code can be cleaned up this way. The double check of if
(rc) is kinda weird anyhow.
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 7e1765b09e04..0573b476d29c 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -291,21 +291,21 @@ static int cxl_parse_cfmws(union
acpi_subtable_headers *header, void *arg,
}
rc = cxl_decoder_add(cxld, target_map);
err_xormap:
- if (rc)
- put_device(&cxld->dev);
- else
- rc = cxl_decoder_autoremove(dev, cxld);
if (rc) {
dev_err(dev, "Failed to add decode range [%#llx -
%#llx]\n",
cxld->hpa_range.start, cxld->hpa_range.end);
- return 0;
+ put_device(&cxld->dev);
+ return rc;
}
+
+ rc = cxl_decoder_autoremove(dev, cxld);
+
dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
dev_name(&cxld->dev),
phys_to_target_node(cxld->hpa_range.start),
cxld->hpa_range.start, cxld->hpa_range.end);
- return 0;
+ return rc;
err_insert:
kfree(res->name);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] cxl/acpi: Release device after dev_err
2023-07-08 0:33 ` Dave Jiang
@ 2023-07-08 1:07 ` Alison Schofield
2023-07-10 15:55 ` Dave Jiang
0 siblings, 1 reply; 8+ messages in thread
From: Alison Schofield @ 2023-07-08 1:07 UTC (permalink / raw)
To: Dave Jiang
Cc: Breno Leitao, vishal.l.verma, ira.weiny, bwidawsk, dan.j.williams,
linux-cxl
On Fri, Jul 07, 2023 at 05:33:02PM -0700, Dave Jiang wrote:
>
>
> On 7/7/23 15:17, Alison Schofield wrote:
> > On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
> > > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> > > fails. Kfence drops this message, after the following:
> > >
> > > BUG: KFENCE: use-after-free read in resource_string
> > >
> > > This is happening in cxl_parse_cfmws(), and here is a simplified flow
> > > that is coming from Kfence.
> > >
> > > Use-after-free:
> > > _dev_err
> > > cxl_parse_cfmws
> > > acpi_table_parse_entries_array
> > > acpi_table_parse_cedt
> > > cxl_acpi_probe
> > >
> > > Free:
> > > cxl_decoder_release
> > > device_release
> > > kobject_put
> > > cxl_parse_cfmws
> > > acpi_table_parse_entries_array
> > > acpi_table_parse_cedt
> > > cxl_acpi_probe
> > >
> > > Alloc:
> > > cxl_decoder_alloc
> > > cxl_parse_cfmws
> > > acpi_table_parse_entries_array
> > > acpi_table_parse_cedt
> > > cxl_acpi_probe
> > > platform_probe
> > >
> > > From my reading of the issue, the device struct being used by
> > > dev_err() was removed in the put_device() before.
> >
> > Hi Breno,
> >
> > I'm not familiar w kfence, but I don't follow what it finds
> > suspect here. Does kfence point to exact offensive lines of
> > code, or ???
> >
> > The put_device() removed &cxld->dev and the dev_err() that
> > this patch moves the put after, was using 'dev', which was
> > assigned from ctx.dev. It is not the same as &cxld->dev. I
> > wonder if Kfence thinks we can get to the next dev_dbg()
> > statement and misuse &cxld->dev.
> >
> > More below...
> >
> >
> > >
> > > Put the device just after the message is printed.
> > >
> > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > ---
> > > drivers/cxl/acpi.c | 7 +++----
> > > 1 file changed, 3 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > > index 658e6b84a769..5179bf4211d8 100644
> > > --- a/drivers/cxl/acpi.c
> > > +++ b/drivers/cxl/acpi.c
> > > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> > > }
> > > rc = cxl_decoder_add(cxld, target_map);
> > > err_xormap:
> > > - if (rc)
> > > - put_device(&cxld->dev);
> > > - else
> > > - rc = cxl_decoder_autoremove(dev, cxld);
> > > if (rc) {
> > > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> > > cxld->hpa_range.start, cxld->hpa_range.end);
> > > + put_device(&cxld->dev);
> > > return 0;
> > > + } else {
> > > + rc = cxl_decoder_autoremove(dev, cxld);
> > > }
> > > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> > > dev_name(&cxld->dev),
> > > --
> > > 2.34.1
> > >
> >
> > (pulled in fresh code snippet to get the dev_dbg() in view.)
> >
> > > }
> > > rc = cxl_decoder_add(cxld, target_map);
> > > err_xormap:
> > > if (rc)
> > > put_device(&cxld->dev);
> > >
> >
> > This puts &cxld->dev, not dev.
> >
> > >
> > > else
> > > rc = cxl_decoder_autoremove(dev, cxld);
> > > if (rc) {
> > > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> > > cxld->hpa_range.start, cxld->hpa_range.end);
> > > return 0;
> >
> > This return avoids getting to the next dev_dbg() statement after
> > put_device(). We cannot get to the next dev_dbg() statement when
> > rc is non zero, but it seems kfence thinks we can.
> >
> > > }
> > > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> > > dev_name(&cxld->dev),
> >
> > If we could get here, after the put_device(), that would be bad.
> >
> > > phys_to_target_node(cxld->hpa_range.start),
> > > cxld->hpa_range.start, cxld->hpa_range.end);
> > >
> > > return 0;
> >
>
> Seems like the code can be cleaned up this way. The double check of if (rc)
> is kinda weird anyhow.
>
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 7e1765b09e04..0573b476d29c 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -291,21 +291,21 @@ static int cxl_parse_cfmws(union acpi_subtable_headers
> *header, void *arg,
> }
> rc = cxl_decoder_add(cxld, target_map);
> err_xormap:
> - if (rc)
> - put_device(&cxld->dev);
> - else
> - rc = cxl_decoder_autoremove(dev, cxld);
> if (rc) {
> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> cxld->hpa_range.start, cxld->hpa_range.end);
> - return 0;
> + put_device(&cxld->dev);
Why do you think the put_device() is making deref of cxld bad here?
That cxld is still present.
Maybe the unrelated bug here is that we are leaking the res.
Rather that returning directly, take the kfree(res) path out.
> + return rc;
> }
> +
> + rc = cxl_decoder_autoremove(dev, cxld);
> +
Now this rc is not examined.
> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> dev_name(&cxld->dev),
> phys_to_target_node(cxld->hpa_range.start),
> cxld->hpa_range.start, cxld->hpa_range.end);
>
> - return 0;
> + return rc;
>
> err_insert:
> kfree(res->name);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cxl/acpi: Release device after dev_err
2023-07-07 22:17 ` Alison Schofield
2023-07-08 0:33 ` Dave Jiang
@ 2023-07-10 10:41 ` Breno Leitao
1 sibling, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2023-07-10 10:41 UTC (permalink / raw)
To: Alison Schofield
Cc: vishal.l.verma, ira.weiny, bwidawsk, dan.j.williams, linux-cxl
On Fri, Jul 07, 2023 at 03:17:58PM -0700, Alison Schofield wrote:
> On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
> > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> > fails. Kfence drops this message, after the following:
> >
> > BUG: KFENCE: use-after-free read in resource_string
> >
> > This is happening in cxl_parse_cfmws(), and here is a simplified flow
> > that is coming from Kfence.
> >
> > Use-after-free:
> > _dev_err
> > cxl_parse_cfmws
> > acpi_table_parse_entries_array
> > acpi_table_parse_cedt
> > cxl_acpi_probe
> >
> > Free:
> > cxl_decoder_release
> > device_release
> > kobject_put
> > cxl_parse_cfmws
> > acpi_table_parse_entries_array
> > acpi_table_parse_cedt
> > cxl_acpi_probe
> >
> > Alloc:
> > cxl_decoder_alloc
> > cxl_parse_cfmws
> > acpi_table_parse_entries_array
> > acpi_table_parse_cedt
> > cxl_acpi_probe
> > platform_probe
> >
> > From my reading of the issue, the device struct being used by
> > dev_err() was removed in the put_device() before.
>
> Hi Breno,
>
> I'm not familiar w kfence, but I don't follow what it finds
> suspect here. Does kfence point to exact offensive lines of
> code, or ???
Unfortunately I do not lines that match anything public. Collecting them
might be hard also, since kfence problems during failure mode are not
easy to reproduce.
> The put_device() removed &cxld->dev and the dev_err() that
> this patch moves the put after, was using 'dev', which was
> assigned from ctx.dev. It is not the same as &cxld->dev. I
> wonder if Kfence thinks we can get to the next dev_dbg()
> statement and misuse &cxld->dev.
Any chance that "struct device_type->release"() might be touching
ctx->dev? This is because "struct device_type->release"() is calling
during the put operation, which seems to be the one de-allocating the
resource.
> More below...
>
>
> >
> > Put the device just after the message is printed.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > drivers/cxl/acpi.c | 7 +++----
> > 1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 658e6b84a769..5179bf4211d8 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> > }
> > rc = cxl_decoder_add(cxld, target_map);
> > err_xormap:
> > - if (rc)
> > - put_device(&cxld->dev);
> > - else
> > - rc = cxl_decoder_autoremove(dev, cxld);
> > if (rc) {
> > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> > cxld->hpa_range.start, cxld->hpa_range.end);
> > + put_device(&cxld->dev);
> > return 0;
> > + } else {
> > + rc = cxl_decoder_autoremove(dev, cxld);
> > }
> > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> > dev_name(&cxld->dev),
> > --
> > 2.34.1
> >
>
> (pulled in fresh code snippet to get the dev_dbg() in view.)
>
> > }
> > rc = cxl_decoder_add(cxld, target_map);
> >err_xormap:
> > if (rc)
> > put_device(&cxld->dev);
> >
>
> This puts &cxld->dev, not dev.
>
> >
> > else
> > rc = cxl_decoder_autoremove(dev, cxld);
> > if (rc) {
> > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> > cxld->hpa_range.start, cxld->hpa_range.end);
> > return 0;
>
> This return avoids getting to the next dev_dbg() statement after
> put_device(). We cannot get to the next dev_dbg() statement when
> rc is non zero, but it seems kfence thinks we can.
Oh, the problems is on dev_err() not on dev_dbg(). Basically on the
return path.
Here is what dmesg says:
cxl root0: Failed to populate active decoder targets
cxl_acpi ACPI0017:00: Failed to add decoder for [mem 0x4080000000-0x2baffffffff flags 0x200]
cxl root0: Failed to populate active decoder targets
cxl_acpi ACPI0017:00: Failed to add decoder for [mem 0x2bb00000000-0x5357fffffff flags 0x200]
cxl root0: Failed to populate active decoder targets
==================================================================
BUG: KFENCE: use-after-free read in resource_string+0x80/0x570\x0a
Use-after-free read at 0x(____ptrval____) (in kfence-#111):
resource_string+0x80/0x570
pointer+0x389/0x3c0
vsnprintf+0x214/0x670
pointer+0x1b2/0x3c0
vsnprintf+0x214/0x670
vprintk_store+0x102/0x450
vprintk_emit+0x6f/0x1b0
dev_vprintk_emit+0x117/0x163
dev_printk_emit+0x51/0x6b
_dev_err+0x6e/0x88
cxl_parse_cfmws+0x2a0/0x2d
acpi_table_parse_entries_array+0x1fc/0x330
acpi_table_parse_cedt+0x4f/0x70
cxl_acpi_probe+0xd6/0x150
platform_probe+0x2f/0x60
really_probe+0x1f5/0x340
driver_probe_device+0x1e/0x80
__driver_attach+0xfc/0x190
bus_for_each_dev+0x76/0xb0
bus_add_driver+0x1bb/0x230
driver_register+0x85/0x120
do_one_initcall+0xbe/0x240
kernel_init_freeable+0x1cc/0x2d2
kernel_init+0x16/0x1a0
ret_from_fork+0x1f/0x30
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cxl/acpi: Release device after dev_err
2023-07-07 16:50 ` Dave Jiang
@ 2023-07-10 10:49 ` Breno Leitao
0 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2023-07-10 10:49 UTC (permalink / raw)
To: Dave Jiang
Cc: alison.schofield, vishal.l.verma, ira.weiny, bwidawsk,
dan.j.williams, linux-cxl
On Fri, Jul 07, 2023 at 09:50:09AM -0700, Dave Jiang wrote:
>
>
> On 7/7/23 09:16, Breno Leitao wrote:
> > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> > fails. Kfence drops this message, after the following:
> >
> > BUG: KFENCE: use-after-free read in resource_string
> >
> > This is happening in cxl_parse_cfmws(), and here is a simplified flow
> > that is coming from Kfence.
> >
> > Use-after-free:
> > _dev_err
> > cxl_parse_cfmws
> > acpi_table_parse_entries_array
> > acpi_table_parse_cedt
> > cxl_acpi_probe
> >
> > Free:
> > cxl_decoder_release
> > device_release
> > kobject_put
> > cxl_parse_cfmws
> > acpi_table_parse_entries_array
> > acpi_table_parse_cedt
> > cxl_acpi_probe
> >
> > Alloc:
> > cxl_decoder_alloc
> > cxl_parse_cfmws
> > acpi_table_parse_entries_array
> > acpi_table_parse_cedt
> > cxl_acpi_probe
> > platform_probe
> >
> > From my reading of the issue, the device struct being used by
> > dev_err() was removed in the put_device() before.
> >
> > Put the device just after the message is printed.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > drivers/cxl/acpi.c | 7 +++----
> > 1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 658e6b84a769..5179bf4211d8 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> > }
> > rc = cxl_decoder_add(cxld, target_map);
> > err_xormap:
> > - if (rc)
> > - put_device(&cxld->dev);
> > - else
> > - rc = cxl_decoder_autoremove(dev, cxld);
> > if (rc) {
> > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> > cxld->hpa_range.start, cxld->hpa_range.end);
> > + put_device(&cxld->dev);
> > return 0;
>
> I think you will want to change this to 'return rc;' in order to reflect the
> error.
This is a good point also, and I can change it in v2, if there is an
agreement in this patch.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cxl/acpi: Release device after dev_err
2023-07-08 1:07 ` Alison Schofield
@ 2023-07-10 15:55 ` Dave Jiang
0 siblings, 0 replies; 8+ messages in thread
From: Dave Jiang @ 2023-07-10 15:55 UTC (permalink / raw)
To: Alison Schofield
Cc: Breno Leitao, vishal.l.verma, ira.weiny, bwidawsk, dan.j.williams,
linux-cxl
On 7/7/23 18:07, Alison Schofield wrote:
> On Fri, Jul 07, 2023 at 05:33:02PM -0700, Dave Jiang wrote:
>>
>>
>> On 7/7/23 15:17, Alison Schofield wrote:
>>> On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
>>>> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
>>>> fails. Kfence drops this message, after the following:
>>>>
>>>> BUG: KFENCE: use-after-free read in resource_string
>>>>
>>>> This is happening in cxl_parse_cfmws(), and here is a simplified flow
>>>> that is coming from Kfence.
>>>>
>>>> Use-after-free:
>>>> _dev_err
>>>> cxl_parse_cfmws
>>>> acpi_table_parse_entries_array
>>>> acpi_table_parse_cedt
>>>> cxl_acpi_probe
>>>>
>>>> Free:
>>>> cxl_decoder_release
>>>> device_release
>>>> kobject_put
>>>> cxl_parse_cfmws
>>>> acpi_table_parse_entries_array
>>>> acpi_table_parse_cedt
>>>> cxl_acpi_probe
>>>>
>>>> Alloc:
>>>> cxl_decoder_alloc
>>>> cxl_parse_cfmws
>>>> acpi_table_parse_entries_array
>>>> acpi_table_parse_cedt
>>>> cxl_acpi_probe
>>>> platform_probe
>>>>
>>>> From my reading of the issue, the device struct being used by
>>>> dev_err() was removed in the put_device() before.
>>>
>>> Hi Breno,
>>>
>>> I'm not familiar w kfence, but I don't follow what it finds
>>> suspect here. Does kfence point to exact offensive lines of
>>> code, or ???
>>>
>>> The put_device() removed &cxld->dev and the dev_err() that
>>> this patch moves the put after, was using 'dev', which was
>>> assigned from ctx.dev. It is not the same as &cxld->dev. I
>>> wonder if Kfence thinks we can get to the next dev_dbg()
>>> statement and misuse &cxld->dev.
>>>
>>> More below...
>>>
>>>
>>>>
>>>> Put the device just after the message is printed.
>>>>
>>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>>> ---
>>>> drivers/cxl/acpi.c | 7 +++----
>>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>>>> index 658e6b84a769..5179bf4211d8 100644
>>>> --- a/drivers/cxl/acpi.c
>>>> +++ b/drivers/cxl/acpi.c
>>>> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>>>> }
>>>> rc = cxl_decoder_add(cxld, target_map);
>>>> err_xormap:
>>>> - if (rc)
>>>> - put_device(&cxld->dev);
>>>> - else
>>>> - rc = cxl_decoder_autoremove(dev, cxld);
>>>> if (rc) {
>>>> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>>>> cxld->hpa_range.start, cxld->hpa_range.end);
>>>> + put_device(&cxld->dev);
>>>> return 0;
>>>> + } else {
>>>> + rc = cxl_decoder_autoremove(dev, cxld);
>>>> }
>>>> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>>>> dev_name(&cxld->dev),
>>>> --
>>>> 2.34.1
>>>>
>>>
>>> (pulled in fresh code snippet to get the dev_dbg() in view.)
>>>
>>>> }
>>>> rc = cxl_decoder_add(cxld, target_map);
>>>> err_xormap:
>>>> if (rc)
>>>> put_device(&cxld->dev);
>>>>
>>>
>>> This puts &cxld->dev, not dev.
>>>
>>>>
>>>> else
>>>> rc = cxl_decoder_autoremove(dev, cxld);
>>>> if (rc) {
>>>> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>>>> cxld->hpa_range.start, cxld->hpa_range.end);
>>>> return 0;
>>>
>>> This return avoids getting to the next dev_dbg() statement after
>>> put_device(). We cannot get to the next dev_dbg() statement when
>>> rc is non zero, but it seems kfence thinks we can.
>>>
>>>> }
>>>> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>>>> dev_name(&cxld->dev),
>>>
>>> If we could get here, after the put_device(), that would be bad.
>>>
>>>> phys_to_target_node(cxld->hpa_range.start),
>>>> cxld->hpa_range.start, cxld->hpa_range.end);
>>>>
>>>> return 0;
>>>
>>
>> Seems like the code can be cleaned up this way. The double check of if (rc)
>> is kinda weird anyhow.
>>
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index 7e1765b09e04..0573b476d29c 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -291,21 +291,21 @@ static int cxl_parse_cfmws(union acpi_subtable_headers
>> *header, void *arg,
>> }
>> rc = cxl_decoder_add(cxld, target_map);
>> err_xormap:
>> - if (rc)
>> - put_device(&cxld->dev);
>> - else
>> - rc = cxl_decoder_autoremove(dev, cxld);
>> if (rc) {
>> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>> cxld->hpa_range.start, cxld->hpa_range.end);
>> - return 0;
>> + put_device(&cxld->dev);
>
> Why do you think the put_device() is making deref of cxld bad here?
> That cxld is still present.
The put_device() can trigger the ->release() of the cxld->dev and
therefore cxld may no longer be present.
>
> Maybe the unrelated bug here is that we are leaking the res.
> Rather that returning directly, take the kfree(res) path out.
>
>
>> + return rc;
>> }
>> +
>> + rc = cxl_decoder_autoremove(dev, cxld);
>> +
>
> Now this rc is not examined.
Yeah, should check here and return rc.
>
>
>> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>> dev_name(&cxld->dev),
>> phys_to_target_node(cxld->hpa_range.start),
>> cxld->hpa_range.start, cxld->hpa_range.end);
>>
>> - return 0;
>> + return rc;
>>
>> err_insert:
>> kfree(res->name);
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-07-10 15:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-07 16:16 [PATCH] cxl/acpi: Release device after dev_err Breno Leitao
2023-07-07 16:50 ` Dave Jiang
2023-07-10 10:49 ` Breno Leitao
2023-07-07 22:17 ` Alison Schofield
2023-07-08 0:33 ` Dave Jiang
2023-07-08 1:07 ` Alison Schofield
2023-07-10 15:55 ` Dave Jiang
2023-07-10 10:41 ` Breno Leitao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox