[PATCH RFC 0/1] ipmi: Fix double list_add when sender returns an error

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/1] ipmi: Fix double list_add when sender returns an error
@ 2026-02-05 14:47 Kenta Akagi
  2026-02-05 14:47 ` [PATCH RFC 1/1] " Kenta Akagi
  2026-02-05 17:50 ` [PATCH RFC 0/1] " Corey Minyard
  0 siblings, 2 replies; 4+ messages in thread
From: Kenta Akagi @ 2026-02-05 14:47 UTC (permalink / raw)
  To: Corey Minyard; +Cc: openipmi-developer, linux-kernel, stable, Kenta Akagi

In kernel 6.18.7, we encountered the following panic.

    [164050.860241] list_add double add: new=ffff8a5833cd0000, prev=ffff8a5833cd0000, next=ffff8a387b2491b0.
    [164050.869744] ------------[ cut here ]------------
    [164050.874698] kernel BUG at lib/list_debug.c:35!
    [164050.879435] Oops: invalid opcode: 0000 [#1] SMP NOPTI
    [164050.884742] CPU: 5 UID: 0 PID: 99228 Comm: kworker/5:2 Kdump: loaded Tainted: G S          E       6.18.7-20260127.el9.x86_64 #1 PREEMPT(voluntary)
    [164050.899481] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
    [164050.905470] Hardware name: Dell Inc. PowerEdge R640/0X45NX, BIOS 2.15.1 06/15/2022
    [164050.913285] Workqueue: events smi_work [ipmi_msghandler]
    [164050.918865] RIP: 0010:__list_add_valid_or_report+0xb6/0xc0
    [164050.924609] Code: c7 e8 b1 c3 89 48 8b 16 48 89 f1 4c 89 e6 e8 e1 16 a9 ff 0f 0b 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 40 b2 c3 89 e8 ca 16 a9 ff <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
    [164050.943787] RSP: 0018:ffffceacac91fdc0 EFLAGS: 00010246
    [164050.949271] RAX: 0000000000000058 RBX: ffff8a5833cd0000 RCX: 0000000000000000
    [164050.956665] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a773f89c1c0
    [164050.964054] RBP: ffff8a5833cd0000 R08: 0000000000000000 R09: ffffceacac91fc78
    [164050.971441] R10: ffffceacac91fc70 R11: ffffffff8a7e10c8 R12: ffff8a387b2491b0
    [164050.978837] R13: 0000000000000000 R14: ffff8a387b249190 R15: ffff8a387b2491b0
    [164050.986229] FS:  0000000000000000(0000) GS:ffff8a77b459d000(0000) knlGS:0000000000000000
    [164050.994581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [164051.000597] CR2: 00007ff95841be6c CR3: 000000063b022001 CR4: 00000000007726f0
    [164051.007997] PKRU: 55555554
    [164051.010970] Call Trace:
    [164051.013690]  <TASK>
    [164051.016055]  ? mutex_lock+0xe/0x30
    [164051.019724]  deliver_response+0x59/0x100 [ipmi_msghandler]
    [164051.025495]  smi_work+0xa0/0x370 [ipmi_msghandler]
    [164051.030563]  process_one_work+0x19d/0x3d0
    [164051.034844]  worker_thread+0x23e/0x360
    [164051.038873]  ? __pfx_worker_thread+0x10/0x10
    [164051.043423]  kthread+0xfb/0x230
    [164051.046850]  ? __pfx_kthread+0x10/0x10
    [164051.050872]  ? __pfx_kthread+0x10/0x10
    [164051.054894]  ret_from_fork+0xe9/0x100
    [164051.058826]  ? __pfx_kthread+0x10/0x10
    [164051.062852]  ret_from_fork_asm+0x1a/0x30
    [164051.067065]  </TASK>

Because kdump was not properly configured, I was unable to inspect the
vmcore, but based on the oops and the current implementation, I infer
that the issue occurred via the following mechanism.

- The BMC becomes unstable
- Some kind of msg is queued in (hp_)xmit_msgs and smi_work runs
- (Because the BMC is unstable) intf->handlers->sender returns an error
- deliver_err_response() queues newmsg into intf->user_msg
- goto restart, but since intf->curr_msg is naturally non-NULL, no
  dequeue is performed from (hp_)xmit_msgs
- The same newmsg as before the restart goes through the same flow and
  deliver_err_response is executed, leading to a double add

I took a quick look at the BMC logs and there was a watchdog BMC reset
around the time of the panic, so I'm pretty sure the BMC was unstable.

I'm not sure if this is the correct approach, but I submit a RFC PATCH
in the spirit of a bug report. I would appreciate your feedback. You
can completely discard mine and fix it as a separate patch if you
prefer.

Thanks.

 
Kenta Akagi (1):
  ipmi: Fix double list_add when sender returns an error

 drivers/char/ipmi/ipmi_msghandler.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH RFC 1/1] ipmi: Fix double list_add when sender returns an error
  2026-02-05 14:47 [PATCH RFC 0/1] ipmi: Fix double list_add when sender returns an error Kenta Akagi
@ 2026-02-05 14:47 ` Kenta Akagi
  2026-02-05 17:50 ` [PATCH RFC 0/1] " Corey Minyard
  1 sibling, 0 replies; 4+ messages in thread
From: Kenta Akagi @ 2026-02-05 14:47 UTC (permalink / raw)
  To: Corey Minyard; +Cc: openipmi-developer, linux-kernel, stable, Kenta Akagi

Since commit 9cf93a8fa951 ("ipmi: Allow an SMI sender to return an
error"), when the BMC does not respond, the sender returns an error, and
smi_work goes to restart.

However, curr_msg is not cleared during restart,
which results in a panic due to a double add to the list after restart.

[164050.860241] list_add double add: new=ffff8a5833cd0000, prev=ffff8a5833cd0000, next=ffff8a387b2491b0.
[164050.869744] ------------[ cut here ]------------
[164050.874698] kernel BUG at lib/list_debug.c:35!
[164050.879435] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[164050.884742] CPU: 5 UID: 0 PID: 99228 Comm: kworker/5:2 Kdump: loaded Tainted: G S          E       6.18.7-20260127.el9.x86_64 #1 PREEMPT(voluntary)
[164050.899481] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
[164050.905470] Hardware name: Dell Inc. PowerEdge R640/0X45NX, BIOS 2.15.1 06/15/2022
[164050.913285] Workqueue: events smi_work [ipmi_msghandler]
[164050.918865] RIP: 0010:__list_add_valid_or_report+0xb6/0xc0
[164050.924609] Code: c7 e8 b1 c3 89 48 8b 16 48 89 f1 4c 89 e6 e8 e1 16 a9 ff 0f 0b 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 40 b2 c3 89 e8 ca 16 a9 ff <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[164050.943787] RSP: 0018:ffffceacac91fdc0 EFLAGS: 00010246
[164050.949271] RAX: 0000000000000058 RBX: ffff8a5833cd0000 RCX: 0000000000000000
[164050.956665] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a773f89c1c0
[164050.964054] RBP: ffff8a5833cd0000 R08: 0000000000000000 R09: ffffceacac91fc78
[164050.971441] R10: ffffceacac91fc70 R11: ffffffff8a7e10c8 R12: ffff8a387b2491b0
[164050.978837] R13: 0000000000000000 R14: ffff8a387b249190 R15: ffff8a387b2491b0
[164050.986229] FS:  0000000000000000(0000) GS:ffff8a77b459d000(0000) knlGS:0000000000000000
[164050.994581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[164051.000597] CR2: 00007ff95841be6c CR3: 000000063b022001 CR4: 00000000007726f0
[164051.007997] PKRU: 55555554
[164051.010970] Call Trace:
[164051.013690]  <TASK>
[164051.016055]  ? mutex_lock+0xe/0x30
[164051.019724]  deliver_response+0x59/0x100 [ipmi_msghandler]
[164051.025495]  smi_work+0xa0/0x370 [ipmi_msghandler]
[164051.030563]  process_one_work+0x19d/0x3d0
[164051.034844]  worker_thread+0x23e/0x360
[164051.038873]  ? __pfx_worker_thread+0x10/0x10
[164051.043423]  kthread+0xfb/0x230
[164051.046850]  ? __pfx_kthread+0x10/0x10
[164051.050872]  ? __pfx_kthread+0x10/0x10
[164051.054894]  ret_from_fork+0xe9/0x100
[164051.058826]  ? __pfx_kthread+0x10/0x10
[164051.062852]  ret_from_fork_asm+0x1a/0x30
[164051.067065]  </TASK>

This commit ensures that the next message is dequeued from the queue
upon restart.

Cc: stable@vger.kernel.org
Fixes: 9cf93a8fa951 ("ipmi: Allow an SMI sender to return an error")
Signed-off-by: Kenta Akagi <k@mgml.me>
---
 drivers/char/ipmi/ipmi_msghandler.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index 3f48fc6ab596..17242b3cf53d 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -4814,7 +4814,7 @@ static void smi_work(struct work_struct *t)
 	unsigned long flags = 0; /* keep us warning-free. */
 	struct ipmi_smi *intf = from_work(intf, t, smi_work);
 	int run_to_completion = READ_ONCE(intf->run_to_completion);
-	struct ipmi_smi_msg *newmsg = NULL;
+	struct ipmi_smi_msg *newmsg;
 	struct ipmi_recv_msg *msg, *msg2;
 	int cc;
 
@@ -4826,6 +4826,7 @@ static void smi_work(struct work_struct *t)
 	 * message delivery.
 	 */
 restart:
+	newmsg = NULL;
 	if (!run_to_completion)
 		spin_lock_irqsave(&intf->xmit_msgs_lock, flags);
 	if (intf->curr_msg == NULL && !intf->in_shutdown) {
@@ -4854,6 +4855,7 @@ static void smi_work(struct work_struct *t)
 						     newmsg->recv_msg, cc);
 			else
 				ipmi_free_smi_msg(newmsg);
+			intf->curr_msg = NULL;
 			goto restart;
 		}
 	}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC 0/1] ipmi: Fix double list_add when sender returns an error
  2026-02-05 14:47 [PATCH RFC 0/1] ipmi: Fix double list_add when sender returns an error Kenta Akagi
  2026-02-05 14:47 ` [PATCH RFC 1/1] " Kenta Akagi
@ 2026-02-05 17:50 ` Corey Minyard
  2026-02-06  8:23   ` Kenta Akagi
  1 sibling, 1 reply; 4+ messages in thread
From: Corey Minyard @ 2026-02-05 17:50 UTC (permalink / raw)
  To: Kenta Akagi; +Cc: openipmi-developer, linux-kernel, stable

On Thu, Feb 05, 2026 at 11:47:38PM +0900, Kenta Akagi wrote:
> In kernel 6.18.7, we encountered the following panic.
> 
>     [164050.860241] list_add double add: new=ffff8a5833cd0000, prev=ffff8a5833cd0000, next=ffff8a387b2491b0.
>     [164050.869744] ------------[ cut here ]------------
>     [164050.874698] kernel BUG at lib/list_debug.c:35!
>     [164050.879435] Oops: invalid opcode: 0000 [#1] SMP NOPTI
>     [164050.884742] CPU: 5 UID: 0 PID: 99228 Comm: kworker/5:2 Kdump: loaded Tainted: G S          E       6.18.7-20260127.el9.x86_64 #1 PREEMPT(voluntary)
>     [164050.899481] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
>     [164050.905470] Hardware name: Dell Inc. PowerEdge R640/0X45NX, BIOS 2.15.1 06/15/2022
>     [164050.913285] Workqueue: events smi_work [ipmi_msghandler]
>     [164050.918865] RIP: 0010:__list_add_valid_or_report+0xb6/0xc0
>     [164050.924609] Code: c7 e8 b1 c3 89 48 8b 16 48 89 f1 4c 89 e6 e8 e1 16 a9 ff 0f 0b 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 40 b2 c3 89 e8 ca 16 a9 ff <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
>     [164050.943787] RSP: 0018:ffffceacac91fdc0 EFLAGS: 00010246
>     [164050.949271] RAX: 0000000000000058 RBX: ffff8a5833cd0000 RCX: 0000000000000000
>     [164050.956665] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a773f89c1c0
>     [164050.964054] RBP: ffff8a5833cd0000 R08: 0000000000000000 R09: ffffceacac91fc78
>     [164050.971441] R10: ffffceacac91fc70 R11: ffffffff8a7e10c8 R12: ffff8a387b2491b0
>     [164050.978837] R13: 0000000000000000 R14: ffff8a387b249190 R15: ffff8a387b2491b0
>     [164050.986229] FS:  0000000000000000(0000) GS:ffff8a77b459d000(0000) knlGS:0000000000000000
>     [164050.994581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     [164051.000597] CR2: 00007ff95841be6c CR3: 000000063b022001 CR4: 00000000007726f0
>     [164051.007997] PKRU: 55555554
>     [164051.010970] Call Trace:
>     [164051.013690]  <TASK>
>     [164051.016055]  ? mutex_lock+0xe/0x30
>     [164051.019724]  deliver_response+0x59/0x100 [ipmi_msghandler]
>     [164051.025495]  smi_work+0xa0/0x370 [ipmi_msghandler]
>     [164051.030563]  process_one_work+0x19d/0x3d0
>     [164051.034844]  worker_thread+0x23e/0x360
>     [164051.038873]  ? __pfx_worker_thread+0x10/0x10
>     [164051.043423]  kthread+0xfb/0x230
>     [164051.046850]  ? __pfx_kthread+0x10/0x10
>     [164051.050872]  ? __pfx_kthread+0x10/0x10
>     [164051.054894]  ret_from_fork+0xe9/0x100
>     [164051.058826]  ? __pfx_kthread+0x10/0x10
>     [164051.062852]  ret_from_fork_asm+0x1a/0x30
>     [164051.067065]  </TASK>
> 
> Because kdump was not properly configured, I was unable to inspect the
> vmcore, but based on the oops and the current implementation, I infer
> that the issue occurred via the following mechanism.

A fix for this is already queued in the next tree.  I should have it
out soon.

-corey

> 
> - The BMC becomes unstable
> - Some kind of msg is queued in (hp_)xmit_msgs and smi_work runs
> - (Because the BMC is unstable) intf->handlers->sender returns an error
> - deliver_err_response() queues newmsg into intf->user_msg
> - goto restart, but since intf->curr_msg is naturally non-NULL, no
>   dequeue is performed from (hp_)xmit_msgs
> - The same newmsg as before the restart goes through the same flow and
>   deliver_err_response is executed, leading to a double add
> 
> I took a quick look at the BMC logs and there was a watchdog BMC reset
> around the time of the panic, so I'm pretty sure the BMC was unstable.
> 
> I'm not sure if this is the correct approach, but I submit a RFC PATCH
> in the spirit of a bug report. I would appreciate your feedback. You
> can completely discard mine and fix it as a separate patch if you
> prefer.
> 
> Thanks.
> 
>  
> Kenta Akagi (1):
>   ipmi: Fix double list_add when sender returns an error
> 
>  drivers/char/ipmi/ipmi_msghandler.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC 0/1] ipmi: Fix double list_add when sender returns an error
  2026-02-05 17:50 ` [PATCH RFC 0/1] " Corey Minyard
@ 2026-02-06  8:23   ` Kenta Akagi
  0 siblings, 0 replies; 4+ messages in thread
From: Kenta Akagi @ 2026-02-06  8:23 UTC (permalink / raw)
  To: corey; +Cc: k, openipmi-developer, linux-kernel, stable



On 2026/02/06 2:50, Corey Minyard wrote:
> On Thu, Feb 05, 2026 at 11:47:38PM +0900, Kenta Akagi wrote:
>> In kernel 6.18.7, we encountered the following panic.
>>
>>     [164050.860241] list_add double add: new=ffff8a5833cd0000, prev=ffff8a5833cd0000, next=ffff8a387b2491b0.
>>     [164050.869744] ------------[ cut here ]------------
>>     [164050.874698] kernel BUG at lib/list_debug.c:35!
>>     [164050.879435] Oops: invalid opcode: 0000 [#1] SMP NOPTI
>>     [164050.884742] CPU: 5 UID: 0 PID: 99228 Comm: kworker/5:2 Kdump: loaded Tainted: G S          E       6.18.7-20260127.el9.x86_64 #1 PREEMPT(voluntary)
>>     [164050.899481] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
>>     [164050.905470] Hardware name: Dell Inc. PowerEdge R640/0X45NX, BIOS 2.15.1 06/15/2022
>>     [164050.913285] Workqueue: events smi_work [ipmi_msghandler]
>>     [164050.918865] RIP: 0010:__list_add_valid_or_report+0xb6/0xc0
>>     [164050.924609] Code: c7 e8 b1 c3 89 48 8b 16 48 89 f1 4c 89 e6 e8 e1 16 a9 ff 0f 0b 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 40 b2 c3 89 e8 ca 16 a9 ff <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
>>     [164050.943787] RSP: 0018:ffffceacac91fdc0 EFLAGS: 00010246
>>     [164050.949271] RAX: 0000000000000058 RBX: ffff8a5833cd0000 RCX: 0000000000000000
>>     [164050.956665] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a773f89c1c0
>>     [164050.964054] RBP: ffff8a5833cd0000 R08: 0000000000000000 R09: ffffceacac91fc78
>>     [164050.971441] R10: ffffceacac91fc70 R11: ffffffff8a7e10c8 R12: ffff8a387b2491b0
>>     [164050.978837] R13: 0000000000000000 R14: ffff8a387b249190 R15: ffff8a387b2491b0
>>     [164050.986229] FS:  0000000000000000(0000) GS:ffff8a77b459d000(0000) knlGS:0000000000000000
>>     [164050.994581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>     [164051.000597] CR2: 00007ff95841be6c CR3: 000000063b022001 CR4: 00000000007726f0
>>     [164051.007997] PKRU: 55555554
>>     [164051.010970] Call Trace:
>>     [164051.013690]  <TASK>
>>     [164051.016055]  ? mutex_lock+0xe/0x30
>>     [164051.019724]  deliver_response+0x59/0x100 [ipmi_msghandler]
>>     [164051.025495]  smi_work+0xa0/0x370 [ipmi_msghandler]
>>     [164051.030563]  process_one_work+0x19d/0x3d0
>>     [164051.034844]  worker_thread+0x23e/0x360
>>     [164051.038873]  ? __pfx_worker_thread+0x10/0x10
>>     [164051.043423]  kthread+0xfb/0x230
>>     [164051.046850]  ? __pfx_kthread+0x10/0x10
>>     [164051.050872]  ? __pfx_kthread+0x10/0x10
>>     [164051.054894]  ret_from_fork+0xe9/0x100
>>     [164051.058826]  ? __pfx_kthread+0x10/0x10
>>     [164051.062852]  ret_from_fork_asm+0x1a/0x30
>>     [164051.067065]  </TASK>
>>
>> Because kdump was not properly configured, I was unable to inspect the
>> vmcore, but based on the oops and the current implementation, I infer
>> that the issue occurred via the following mechanism.
> 
> A fix for this is already queued in the next tree.  I should have it
> out soon.

Ah, sorry for I didn't notice that.
I'll wait for the "ipmi: Fix use-after-free and list corruption on sender error".

Thanks,
Akagi

> 
> -corey
> 
>>
>> - The BMC becomes unstable
>> - Some kind of msg is queued in (hp_)xmit_msgs and smi_work runs
>> - (Because the BMC is unstable) intf->handlers->sender returns an error
>> - deliver_err_response() queues newmsg into intf->user_msg
>> - goto restart, but since intf->curr_msg is naturally non-NULL, no
>>   dequeue is performed from (hp_)xmit_msgs
>> - The same newmsg as before the restart goes through the same flow and
>>   deliver_err_response is executed, leading to a double add
>>
>> I took a quick look at the BMC logs and there was a watchdog BMC reset
>> around the time of the panic, so I'm pretty sure the BMC was unstable.
>>
>> I'm not sure if this is the correct approach, but I submit a RFC PATCH
>> in the spirit of a bug report. I would appreciate your feedback. You
>> can completely discard mine and fix it as a separate patch if you
>> prefer.
>>
>> Thanks.
>>
>>  
>> Kenta Akagi (1):
>>   ipmi: Fix double list_add when sender returns an error
>>
>>  drivers/char/ipmi/ipmi_msghandler.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> -- 
>> 2.50.1
>>
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-06  8:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-05 14:47 [PATCH RFC 0/1] ipmi: Fix double list_add when sender returns an error Kenta Akagi
2026-02-05 14:47 ` [PATCH RFC 1/1] " Kenta Akagi
2026-02-05 17:50 ` [PATCH RFC 0/1] " Corey Minyard
2026-02-06  8:23   ` Kenta Akagi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox