* Re: [PATCH] intel_th: core: fix null pointer dereference in intel_th_irq
2025-08-25 17:45 [PATCH] intel_th: core: fix null pointer dereference in intel_th_irq David Arcari
@ 2025-09-04 17:41 ` David Arcari
2025-09-26 16:19 ` alex
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: David Arcari @ 2025-09-04 17:41 UTC (permalink / raw)
To: linux-kernel; +Cc: Alexander Shishkin, Jerry Hoemann
Hi,
On 8/25/25 1:45 PM, David Arcari wrote:
> In certain cases intel_th_irq can reference a null entry in
> the th->thdev array. This results in the splat shown below.
> The problem is that intel_th_output_enable() can modify the
> thdev[] array at the same time intel_th_irq is referencing
> the same array. This can be fixed by disabling interrupts
> during the call to intel_th_output_enable().
>
> BUG: kernel NULL pointer dereference, address: 0000000000000304
> Oops: Oops: 0000 [#1] SMP NOPTI
> RIP: 0010:intel_th_irq+0x26/0x70 [intel_th]
> Call Trace:
> <IRQ>
> ? show_trace_log_lvl+0x1b0/0x2f0
> ? show_trace_log_lvl+0x1b0/0x2f0
> ? __handle_irq_event_percpu+0x4a/0x180
> ? __die_body.cold+0x8/0x12
> ? page_fault_oops+0x148/0x160
> ? exc_page_fault+0x73/0x160
> ? asm_exc_page_fault+0x26/0x30
> ? intel_th_irq+0x26/0x70 [intel_th]
> __handle_irq_event_percpu+0x4a/0x180
> handle_irq_event+0x38/0x80
> handle_fasteoi_irq+0x78/0x200
> __common_interrupt+0x3e/0x90
> common_interrupt+0x80/0xa0
> </IRQ>
>
> Fixes: a753bfcfdb1f ("intel_th: Make the switch allocate its subdevices")
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
> Signed-off-by: David Arcari <darcari@redhat.com>
> ---
> drivers/hwtracing/intel_th/core.c | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/hwtracing/intel_th/core.c b/drivers/hwtracing/intel_th/core.c
> index 47d9e6c3bac0..c6f6153fcc88 100644
> --- a/drivers/hwtracing/intel_th/core.c
> +++ b/drivers/hwtracing/intel_th/core.c
> @@ -715,7 +715,9 @@ intel_th_subdevice_alloc(struct intel_th *th,
> int intel_th_output_enable(struct intel_th *th, unsigned int otype)
> {
> struct intel_th_device *thdev;
> - int src = 0, dst = 0;
> + int src = 0, dst = 0, ret = 0;
> +
> + disable_irq(th->irq);
>
> for (src = 0, dst = 0; dst <= th->num_thdevs; src++, dst++) {
> for (; src < ARRAY_SIZE(intel_th_subdevices); src++) {
> @@ -730,7 +732,7 @@ int intel_th_output_enable(struct intel_th *th, unsigned int otype)
>
> /* no unallocated matching subdevices */
> if (src == ARRAY_SIZE(intel_th_subdevices))
> - return -ENODEV;
> + goto nodev;
>
> for (; dst < th->num_thdevs; dst++) {
> if (th->thdev[dst]->type != INTEL_TH_OUTPUT)
> @@ -750,16 +752,19 @@ int intel_th_output_enable(struct intel_th *th, unsigned int otype)
> goto found;
> }
>
> +nodev:
> + enable_irq(th->irq);
> return -ENODEV;
>
> found:
> thdev = intel_th_subdevice_alloc(th, &intel_th_subdevices[src]);
> if (IS_ERR(thdev))
> - return PTR_ERR(thdev);
> -
> - th->thdev[th->num_thdevs++] = thdev;
> + ret = PTR_ERR(thdev);
> + else
> + th->thdev[th->num_thdevs++] = thdev;
>
> - return 0;
> + enable_irq(th->irq);
> + return ret;
> }
> EXPORT_SYMBOL_GPL(intel_th_output_enable);
>
I suspect there may be a better approach to this problem, but I did
want to add that after extensive testing this did resolve the issue.
-DA
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] intel_th: core: fix null pointer dereference in intel_th_irq
2025-08-25 17:45 [PATCH] intel_th: core: fix null pointer dereference in intel_th_irq David Arcari
2025-09-04 17:41 ` David Arcari
@ 2025-09-26 16:19 ` alex
2025-09-26 20:12 ` David Arcari
2025-09-27 14:54 ` Markus Elfring
2025-11-18 21:21 ` [PATCH v2] " David Arcari
3 siblings, 1 reply; 12+ messages in thread
From: alex @ 2025-09-26 16:19 UTC (permalink / raw)
To: David Arcari; +Cc: linux-kernel, Jerry Hoemann, alexander.shishkin, alex
On 2025-08-25 20:45, David Arcari wrote:
> In certain cases intel_th_irq can reference a null entry in
> the th->thdev array. This results in the splat shown below.
> The problem is that intel_th_output_enable() can modify the
> thdev[] array at the same time intel_th_irq is referencing
> the same array. This can be fixed by disabling interrupts
> during the call to intel_th_output_enable().
Hi David,
Thank you for the bug report and rootcausing! Can you please also
detail the sequence of actions by which this is reproduced, so
that I can test my fix and not bother you with a back-and-forth
over-email debugging and also add it to our regression testing?
Doesn't have to be a shell script (although I wouldn't say no
to that), plain english would work in a pinch. If you have the
time, I'm also curious about your use case for intel_th.
This has eluded our testing for about 10 years, so I'm very
interested in the reproducer.
> BUG: kernel NULL pointer dereference, address: 0000000000000304
> Oops: Oops: 0000 [#1] SMP NOPTI
> RIP: 0010:intel_th_irq+0x26/0x70 [intel_th]
Yes, this is absolutely a bug.
> @@ -715,7 +715,9 @@ intel_th_subdevice_alloc(struct intel_th *th,
> int intel_th_output_enable(struct intel_th *th, unsigned int otype)
> {
> struct intel_th_device *thdev;
> - int src = 0, dst = 0;
> + int src = 0, dst = 0, ret = 0;
> +
> + disable_irq(th->irq);
>
> for (src = 0, dst = 0; dst <= th->num_thdevs; src++, dst++) {
> for (; src < ARRAY_SIZE(intel_th_subdevices); src++) {
[...]
> @@ -750,16 +752,19 @@ int intel_th_output_enable(struct intel_th *th,
> unsigned int otype)
> goto found;
> }
>
> +nodev:
> + enable_irq(th->irq);
> return -ENODEV;
>
> found:
> thdev = intel_th_subdevice_alloc(th, &intel_th_subdevices[src]);
> if (IS_ERR(thdev))
> - return PTR_ERR(thdev);
> -
> - th->thdev[th->num_thdevs++] = thdev;
> + ret = PTR_ERR(thdev);
> + else
> + th->thdev[th->num_thdevs++] = thdev;
>
> - return 0;
> + enable_irq(th->irq);
> + return ret;
> }
> EXPORT_SYMBOL_GPL(intel_th_output_enable);
This is indeed a possible fix, but I believe a little bit of
serialization can be employed here.
Lastly, my apologies for tardiness.
Thanks!
--
Alex
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] intel_th: core: fix null pointer dereference in intel_th_irq
2025-09-26 16:19 ` alex
@ 2025-09-26 20:12 ` David Arcari
0 siblings, 0 replies; 12+ messages in thread
From: David Arcari @ 2025-09-26 20:12 UTC (permalink / raw)
To: alex; +Cc: linux-kernel, Jerry Hoemann, alexander.shishkin
Hi Alex,
On 9/26/25 12:19 PM, alex@ash.works wrote:
> On 2025-08-25 20:45, David Arcari wrote:
>> In certain cases intel_th_irq can reference a null entry in
>> the th->thdev array. This results in the splat shown below.
>> The problem is that intel_th_output_enable() can modify the
>> thdev[] array at the same time intel_th_irq is referencing
>> the same array. This can be fixed by disabling interrupts
>> during the call to intel_th_output_enable().
>
> Hi David,
>
> Thank you for the bug report and rootcausing! Can you please also
> detail the sequence of actions by which this is reproduced, so
> that I can test my fix and not bother you with a back-and-forth
> over-email debugging and also add it to our regression testing?
> Doesn't have to be a shell script (although I wouldn't say no
> to that), plain english would work in a pinch. If you have the
> time, I'm also curious about your use case for intel_th.
Unfortunately, I don't have a great reproducer. I have a system which
is afflicted in ~ 1 out of every 100 reboots. Adding debug code made
the problem easier to reproduce.
>
> This has eluded our testing for about 10 years, so I'm very
> interested in the reproducer.
>
>> BUG: kernel NULL pointer dereference, address: 0000000000000304
>> Oops: Oops: 0000 [#1] SMP NOPTI
>> RIP: 0010:intel_th_irq+0x26/0x70 [intel_th]
>
> Yes, this is absolutely a bug.
>
>> @@ -715,7 +715,9 @@ intel_th_subdevice_alloc(struct intel_th *th,
>> int intel_th_output_enable(struct intel_th *th, unsigned int otype)
>> {
>> struct intel_th_device *thdev;
>> - int src = 0, dst = 0;
>> + int src = 0, dst = 0, ret = 0;
>> +
>> + disable_irq(th->irq);
>>
>> for (src = 0, dst = 0; dst <= th->num_thdevs; src++, dst++) {
>> for (; src < ARRAY_SIZE(intel_th_subdevices); src++) {
>
> [...]
>
>> @@ -750,16 +752,19 @@ int intel_th_output_enable(struct intel_th *th,
>> unsigned int otype)
>> goto found;
>> }
>>
>> +nodev:
>> + enable_irq(th->irq);
>> return -ENODEV;
>>
>> found:
>> thdev = intel_th_subdevice_alloc(th, &intel_th_subdevices[src]);
>> if (IS_ERR(thdev))
>> - return PTR_ERR(thdev);
>> -
>> - th->thdev[th->num_thdevs++] = thdev;
>> + ret = PTR_ERR(thdev);
>> + else
>> + th->thdev[th->num_thdevs++] = thdev;
>>
>> - return 0;
>> + enable_irq(th->irq);
>> + return ret;
>> }
>> EXPORT_SYMBOL_GPL(intel_th_output_enable);
>
> This is indeed a possible fix, but I believe a little bit of
> serialization can be employed here.
I was thinking there was a better approach. Given the situation if
you'd like me to test a fix, I can do so,
>
> Lastly, my apologies for tardiness.
No worries.
Best,
-DA
>
> Thanks!
> --
> Alex
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] intel_th: core: fix null pointer dereference in intel_th_irq
2025-08-25 17:45 [PATCH] intel_th: core: fix null pointer dereference in intel_th_irq David Arcari
2025-09-04 17:41 ` David Arcari
2025-09-26 16:19 ` alex
@ 2025-09-27 14:54 ` Markus Elfring
2025-09-29 12:05 ` David Arcari
2025-11-18 21:21 ` [PATCH v2] " David Arcari
3 siblings, 1 reply; 12+ messages in thread
From: Markus Elfring @ 2025-09-27 14:54 UTC (permalink / raw)
To: David Arcari, kernel-janitors; +Cc: LKML, Alexander Shishkin, Jerry Hoemann
…
> +++ b/drivers/hwtracing/intel_th/core.c
> @@ -715,7 +715,9 @@ intel_th_subdevice_alloc(struct intel_th *th,
> int intel_th_output_enable(struct intel_th *th, unsigned int otype)
> {
> struct intel_th_device *thdev;
> - int src = 0, dst = 0;
> + int src = 0, dst = 0, ret = 0;
> +
> + disable_irq(th->irq);
…
> - return 0;
> + enable_irq(th->irq);
> + return ret;
> }
…
How do you think about to increase the application of scope-based resource management?
https://elixir.bootlin.com/linux/v6.17-rc7/source/include/linux/interrupt.h#L239-L240
Regards,
Markus
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] intel_th: core: fix null pointer dereference in intel_th_irq
2025-09-27 14:54 ` Markus Elfring
@ 2025-09-29 12:05 ` David Arcari
0 siblings, 0 replies; 12+ messages in thread
From: David Arcari @ 2025-09-29 12:05 UTC (permalink / raw)
To: Markus Elfring, kernel-janitors; +Cc: LKML, Alexander Shishkin, Jerry Hoemann
Hi Markus,
On 9/27/25 10:54 AM, Markus Elfring wrote:
> …
>> +++ b/drivers/hwtracing/intel_th/core.c
>> @@ -715,7 +715,9 @@ intel_th_subdevice_alloc(struct intel_th *th,
>> int intel_th_output_enable(struct intel_th *th, unsigned int otype)
>> {
>> struct intel_th_device *thdev;
>> - int src = 0, dst = 0;
>> + int src = 0, dst = 0, ret = 0;
>> +
>> + disable_irq(th->irq);
> …
>> - return 0;
>> + enable_irq(th->irq);
>> + return ret;
>> }
> …
>
> How do you think about to increase the application of scope-based resource management?
> https://elixir.bootlin.com/linux/v6.17-rc7/source/include/linux/interrupt.h#L239-L240
At this point, I think that Alex is the best person to handle the
resolution of this issue.
Best,
-DA
>
> Regards,
> Markus
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] intel_th: core: fix null pointer dereference in intel_th_irq
2025-08-25 17:45 [PATCH] intel_th: core: fix null pointer dereference in intel_th_irq David Arcari
` (2 preceding siblings ...)
2025-09-27 14:54 ` Markus Elfring
@ 2025-11-18 21:21 ` David Arcari
2025-11-19 12:55 ` Markus Elfring
3 siblings, 1 reply; 12+ messages in thread
From: David Arcari @ 2025-11-18 21:21 UTC (permalink / raw)
To: alexander.shishkin; +Cc: linux-kernel, jerry.hoemann, David Arcari
In certain cases intel_th_irq can reference a null entry in
the th->thdev array. This results in the splat shown below.
The problem is that intel_th_output_enable() can modify the
thdev[] array at the same time intel_th_irq is referencing
the same array. This can be fixed by disabling interrupts
during the call to intel_th_output_enable().
BUG: kernel NULL pointer dereference, address: 0000000000000304
Oops: Oops: 0000 [#1] SMP NOPTI
RIP: 0010:intel_th_irq+0x26/0x70 [intel_th]
Call Trace:
<IRQ>
? show_trace_log_lvl+0x1b0/0x2f0
? show_trace_log_lvl+0x1b0/0x2f0
? __handle_irq_event_percpu+0x4a/0x180
? __die_body.cold+0x8/0x12
? page_fault_oops+0x148/0x160
? exc_page_fault+0x73/0x160
? asm_exc_page_fault+0x26/0x30
? intel_th_irq+0x26/0x70 [intel_th]
__handle_irq_event_percpu+0x4a/0x180
handle_irq_event+0x38/0x80
handle_fasteoi_irq+0x78/0x200
__common_interrupt+0x3e/0x90
common_interrupt+0x80/0xa0
</IRQ>
Fixes: a753bfcfdb1f ("intel_th: Make the switch allocate its subdevices")
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Signed-off-by: David Arcari <darcari@redhat.com>
---
v2: switched to use scoped based lock guard functionality
drivers/hwtracing/intel_th/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/hwtracing/intel_th/core.c b/drivers/hwtracing/intel_th/core.c
index 47d9e6c3bac0..7b7dcc651efe 100644
--- a/drivers/hwtracing/intel_th/core.c
+++ b/drivers/hwtracing/intel_th/core.c
@@ -717,6 +717,8 @@ int intel_th_output_enable(struct intel_th *th, unsigned int otype)
struct intel_th_device *thdev;
int src = 0, dst = 0;
+ guard(disable_irq)(&th->irq);
+
for (src = 0, dst = 0; dst <= th->num_thdevs; src++, dst++) {
for (; src < ARRAY_SIZE(intel_th_subdevices); src++) {
if (intel_th_subdevices[src].type != INTEL_TH_OUTPUT)
--
2.51.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v2] intel_th: core: fix null pointer dereference in intel_th_irq
2025-11-18 21:21 ` [PATCH v2] " David Arcari
@ 2025-11-19 12:55 ` Markus Elfring
2025-11-20 12:32 ` David Arcari
0 siblings, 1 reply; 12+ messages in thread
From: Markus Elfring @ 2025-11-19 12:55 UTC (permalink / raw)
To: David Arcari, kernel-janitors, Alexander Shishkin; +Cc: LKML, Jerry Hoemann
> In certain cases intel_th_irq can reference a null entry in
> the th->thdev array. This results in the splat shown below.
> The problem is that intel_th_output_enable() can modify the
> thdev[] array at the same time intel_th_irq is referencing
> the same array. This can be fixed by disabling interrupts
> during the call to intel_th_output_enable().
1. Would another imperative wording become helpful for an improved change description?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.18-rc6#n94
2. You may occasionally put more than 60 characters into text lines
of such a change description.
3. Would a summary phrase like “Prevent null pointer dereference
in intel_th_output_enable()” be more appropriate?
Regards,
Markus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] intel_th: core: fix null pointer dereference in intel_th_irq
2025-11-19 12:55 ` Markus Elfring
@ 2025-11-20 12:32 ` David Arcari
2025-11-20 13:07 ` [v2] " Markus Elfring
0 siblings, 1 reply; 12+ messages in thread
From: David Arcari @ 2025-11-20 12:32 UTC (permalink / raw)
To: Markus Elfring, kernel-janitors, Alexander Shishkin; +Cc: LKML, Jerry Hoemann
Hi,
On 11/19/25 7:55 AM, Markus Elfring wrote:
>> In certain cases intel_th_irq can reference a null entry in
>> the th->thdev array. This results in the splat shown below.
>> The problem is that intel_th_output_enable() can modify the
>> thdev[] array at the same time intel_th_irq is referencing
>> the same array. This can be fixed by disabling interrupts
>> during the call to intel_th_output_enable().
>
> 1. Would another imperative wording become helpful for an improved change description?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.18-rc6#n94
I feel like the description explains the problem well. However, do you
have alternate wording you would like to suggest?
>
> 2. You may occasionally put more than 60 characters into text lines
> of such a change description.
I can redo the body of the commit and submit a v3 if the maintainer is
interested in applying a patch of this nature.
>
> 3. Would a summary phrase like “Prevent null pointer dereference
> in intel_th_output_enable()” be more appropriate?
The null pointer deference occurs in intel_th_irq. So I could change it
to "Prevent null pointer dererference in intel_th_irq".
Before I do anything else with this patch I'd like to hear back from
Alexander. There's no reason to refactor a patch that won't be committed.
Thanks,
-DA
>
>
> Regards,
> Markus
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [v2] intel_th: core: fix null pointer dereference in intel_th_irq
2025-11-20 12:32 ` David Arcari
@ 2025-11-20 13:07 ` Markus Elfring
2025-11-20 13:22 ` David Arcari
0 siblings, 1 reply; 12+ messages in thread
From: Markus Elfring @ 2025-11-20 13:07 UTC (permalink / raw)
To: David Arcari; +Cc: LKML, kernel-janitors, Alexander Shishkin
…>>> the same array. This can be fixed by disabling interrupts
>>> during the call to intel_th_output_enable().
>>
>> 1. Would another imperative wording become helpful for an improved change description?
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.18-rc6#n94
>
> I feel like the description explains the problem well.
Perhaps?
> However, do you have alternate wording you would like to suggest?
Under which circumstances would you get into the mood to convert the wording approach
“can be fixed by” into an imperative one?
Regards,
Markus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [v2] intel_th: core: fix null pointer dereference in intel_th_irq
2025-11-20 13:07 ` [v2] " Markus Elfring
@ 2025-11-20 13:22 ` David Arcari
2026-01-16 12:49 ` Alexander Shishkin
0 siblings, 1 reply; 12+ messages in thread
From: David Arcari @ 2025-11-20 13:22 UTC (permalink / raw)
To: Markus Elfring; +Cc: LKML, kernel-janitors, Alexander Shishkin
On 11/20/25 8:07 AM, Markus Elfring wrote:
> …>>> the same array. This can be fixed by disabling interrupts
>>>> during the call to intel_th_output_enable().
>>>
>>> 1. Would another imperative wording become helpful for an improved change description?
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.18-rc6#n94
>>
>> I feel like the description explains the problem well.
>
> Perhaps?
>
>
>> However, do you have alternate wording you would like to suggest?
>
> Under which circumstances would you get into the mood to convert the wording approach
> “can be fixed by” into an imperative one?
I see. I will make that change if a v3 is in order. Naturally, I'd like
to hear back from Alexander before putting more effort into this.
Best,
-DA
>
> Regards,
> Markus
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [v2] intel_th: core: fix null pointer dereference in intel_th_irq
2025-11-20 13:22 ` David Arcari
@ 2026-01-16 12:49 ` Alexander Shishkin
0 siblings, 0 replies; 12+ messages in thread
From: Alexander Shishkin @ 2026-01-16 12:49 UTC (permalink / raw)
To: David Arcari; +Cc: alexander.shishkin, LKML, kernel-janitors
David Arcari <darcari@redhat.com> writes:
> On 11/20/25 8:07 AM, Markus Elfring wrote:
>> …>>> the same array. This can be fixed by disabling interrupts
>>>>> during the call to intel_th_output_enable().
>>>>
>>>> 1. Would another imperative wording become helpful for an improved change description?
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.18-rc6#n94
>>>
>>> I feel like the description explains the problem well.
>>
>> Perhaps?
>>
>>
>>> However, do you have alternate wording you would like to suggest?
>>
>> Under which circumstances would you get into the mood to convert the wording approach
>> “can be fixed by” into an imperative one?
>
> I see. I will make that change if a v3 is in order. Naturally, I'd like
> to hear back from Alexander before putting more effort into this.
Sorry for the delay, I found a working intel_th, so I'm looking into it,
I'll get back to you shortly.
Thanks,
--
Alex
^ permalink raw reply [flat|nested] 12+ messages in thread