linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [linus:master] [serial]  43066e3222: BUG:kernel_NULL_pointer_dereference,address
@ 2024-04-03 13:43 kernel test robot
  2024-04-03 14:00 ` Andy Shevchenko
  0 siblings, 1 reply; 7+ messages in thread
From: kernel test robot @ 2024-04-03 13:43 UTC (permalink / raw)
  To: Yicong Yang
  Cc: oe-lkp, lkp, linux-kernel, Greg Kroah-Hartman, stable,
	Tony Lindgren, Andy Shevchenko, linux-serial, oliver.sang



Hello,

kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:

commit: 43066e32227ecde674e8ae1fcdd4a1ede67680c2 ("serial: port: Don't suspend if the port is still busy")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 026e680b0a08a62b1d948e5a8ca78700bfac0e6e]
[test failed on linux-next/master c0b832517f627ead3388c6f0c74e8ac10ad5774b]

in testcase: boot

compiler: clang-17
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


in our tests, the issue does not always happen, 15 times out of 200 runs as
below.

e5d6bd25f93d6ae1 43066e32227ecde674e8ae1fcdd
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :200          8%          15:200   dmesg.BUG:kernel_NULL_pointer_dereference,address
           :200          8%          15:200   dmesg.EIP:serial8250_tx_chars
           :200          8%          15:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
           :200          8%          15:200   dmesg.Oops:#[##]



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202404031607.2e92eebe-lkp@intel.com


[   66.918163][   T61] BUG: kernel NULL pointer dereference, address: 00000cf5
[   66.919684][   T61] #PF: supervisor read access in kernel mode
[   66.920890][   T61] #PF: error_code(0x0000) - not-present page
[   66.922103][   T61] *pdpt = 000000002c4f1001 *pde = 0000000000000000
[   66.923402][   T61] Oops: 0000 [#1] PREEMPT SMP PTI
[   66.924417][   T61] CPU: 0 PID: 61 Comm: kworker/0:2 Tainted: G        W       TN 6.8.0-rc6-00003-g43066e32227e #1
[   66.926478][   T61] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   66.928396][   T61] Workqueue: pm pm_runtime_work
[ 66.929411][ T61] EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) 
[ 66.930475][ T61] Code: 10 02 00 00 75 09 89 f1 e8 b7 01 00 00 eb db 8b 9e d0 01 00 00 89 7d f0 90 90 90 90 90 90 90 90 90 90 90 90 8b 8f 0c 02 00 00 <0f> b6 0c 01 8b 7e 30 89 f0 31 d2 e8 8a 49 51 00 f6 86 cc 01 00 00
All code
========
   0:	10 02                	adc    %al,(%rdx)
   2:	00 00                	add    %al,(%rax)
   4:	75 09                	jne    0xf
   6:	89 f1                	mov    %esi,%ecx
   8:	e8 b7 01 00 00       	call   0x1c4
   d:	eb db                	jmp    0xffffffffffffffea
   f:	8b 9e d0 01 00 00    	mov    0x1d0(%rsi),%ebx
  15:	89 7d f0             	mov    %edi,-0x10(%rbp)
  18:	90                   	nop
  19:	90                   	nop
  1a:	90                   	nop
  1b:	90                   	nop
  1c:	90                   	nop
  1d:	90                   	nop
  1e:	90                   	nop
  1f:	90                   	nop
  20:	90                   	nop
  21:	90                   	nop
  22:	90                   	nop
  23:	90                   	nop
  24:	8b 8f 0c 02 00 00    	mov    0x20c(%rdi),%ecx
  2a:*	0f b6 0c 01          	movzbl (%rcx,%rax,1),%ecx		<-- trapping instruction
  2e:	8b 7e 30             	mov    0x30(%rsi),%edi
  31:	89 f0                	mov    %esi,%eax
  33:	31 d2                	xor    %edx,%edx
  35:	e8 8a 49 51 00       	call   0x5149c4
  3a:	f6                   	.byte 0xf6
  3b:	86 cc                	xchg   %cl,%ah
  3d:	01 00                	add    %eax,(%rax)
	...

Code starting with the faulting instruction
===========================================
   0:	0f b6 0c 01          	movzbl (%rcx,%rax,1),%ecx
   4:	8b 7e 30             	mov    0x30(%rsi),%edi
   7:	89 f0                	mov    %esi,%eax
   9:	31 d2                	xor    %edx,%edx
   b:	e8 8a 49 51 00       	call   0x51499a
  10:	f6                   	.byte 0xf6
  11:	86 cc                	xchg   %cl,%ah
  13:	01 00                	add    %eax,(%rax)
	...
[   66.934169][   T61] EAX: 00000cf5 EBX: 00000010 ECX: 00000000 EDX: 00000000
[   66.935518][   T61] ESI: c33128e0 EDI: c35b824c EBP: c37e7e50 ESP: c37e7e40
[   66.936922][   T61] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010087
[   66.938425][   T61] CR0: 80050033 CR2: 00000cf5 CR3: 28938000 CR4: 000406b0
[   66.939773][   T61] Call Trace:
[ 66.940519][ T61] ? __die_body (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420) 
[ 66.941445][ T61] ? __die (arch/x86/kernel/dumpstack.c:434) 
[ 66.942304][ T61] ? page_fault_oops (arch/x86/mm/fault.c:703) 
[ 66.943233][ T61] ? kernelmode_fixup_or_oops (arch/x86/mm/fault.c:761) 
[ 66.944260][ T61] ? __bad_area_nosemaphore (arch/x86/mm/fault.c:808) 
[ 66.945368][ T61] ? bad_area_nosemaphore (arch/x86/mm/fault.c:857) 
[ 66.946377][ T61] ? do_user_addr_fault (arch/x86/mm/fault.c:?) 
[ 66.947406][ T61] ? exc_page_fault (arch/x86/include/asm/irqflags.h:19 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 arch/x86/mm/fault.c:1506 arch/x86/mm/fault.c:1554) 
[ 66.948424][ T61] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1511) 
[ 66.949709][ T61] ? handle_exception (arch/x86/entry/entry_32.S:1058) 
[ 66.950749][ T61] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1511) 
[ 66.951963][ T61] ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) 
[ 66.953001][ T61] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1511) 
[ 66.954188][ T61] ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) 
[ 66.955245][ T61] __start_tx (drivers/tty/serial/8250/8250_port.c:1551) 
[ 66.956129][ T61] serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654) 
[ 66.957146][ T61] serial_port_runtime_suspend (include/linux/spinlock.h:406 include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63) 
[ 66.958275][ T61] __rpm_callback (drivers/base/power/runtime.c:393) 
[ 66.959234][ T61] ? serial_port_remove (drivers/tty/serial/serial_port.c:50) 
[ 66.960282][ T61] ? serial_port_remove (drivers/tty/serial/serial_port.c:50) 
[ 66.961297][ T61] rpm_suspend (drivers/base/power/runtime.c:447) 
[ 66.962165][ T61] ? process_one_work (kernel/workqueue.c:?) 
[ 66.963182][ T61] pm_runtime_work (include/linux/spinlock.h:401 drivers/base/power/runtime.c:983) 
[ 66.964177][ T61] process_one_work (kernel/workqueue.c:2638) 
[ 66.965223][ T61] worker_thread (kernel/workqueue.c:2700 kernel/workqueue.c:2787) 
[ 66.966246][ T61] kthread (kernel/kthread.c:390) 
[ 66.967118][ T61] ? pr_cont_work (kernel/workqueue.c:2733) 
[ 66.968018][ T61] ? kthread_unuse_mm (kernel/kthread.c:341) 
[ 66.968932][ T61] ? kthread_unuse_mm (kernel/kthread.c:341) 
[ 66.969891][ T61] ret_from_fork (arch/x86/kernel/process.c:153) 
[ 66.970813][ T61] ret_from_fork_asm (arch/x86/entry/entry_32.S:741) 
[ 66.971755][ T61] entry_INT80_32 (arch/x86/entry/entry_32.S:948) 
[   66.972650][   T61] Modules linked in: input_leds aesni_intel crypto_simd evdev button drm fuse drm_panel_orientation_quirks
[   66.974751][   T61] CR2: 0000000000000cf5
[   66.975631][   T61] ---[ end trace 0000000000000000 ]---
[ 66.976733][ T61] EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) 
[ 66.977784][ T61] Code: 10 02 00 00 75 09 89 f1 e8 b7 01 00 00 eb db 8b 9e d0 01 00 00 89 7d f0 90 90 90 90 90 90 90 90 90 90 90 90 8b 8f 0c 02 00 00 <0f> b6 0c 01 8b 7e 30 89 f0 31 d2 e8 8a 49 51 00 f6 86 cc 01 00 00
All code
========
   0:	10 02                	adc    %al,(%rdx)
   2:	00 00                	add    %al,(%rax)
   4:	75 09                	jne    0xf
   6:	89 f1                	mov    %esi,%ecx
   8:	e8 b7 01 00 00       	call   0x1c4
   d:	eb db                	jmp    0xffffffffffffffea
   f:	8b 9e d0 01 00 00    	mov    0x1d0(%rsi),%ebx
  15:	89 7d f0             	mov    %edi,-0x10(%rbp)
  18:	90                   	nop
  19:	90                   	nop
  1a:	90                   	nop
  1b:	90                   	nop
  1c:	90                   	nop
  1d:	90                   	nop
  1e:	90                   	nop
  1f:	90                   	nop
  20:	90                   	nop
  21:	90                   	nop
  22:	90                   	nop
  23:	90                   	nop
  24:	8b 8f 0c 02 00 00    	mov    0x20c(%rdi),%ecx
  2a:*	0f b6 0c 01          	movzbl (%rcx,%rax,1),%ecx		<-- trapping instruction
  2e:	8b 7e 30             	mov    0x30(%rsi),%edi
  31:	89 f0                	mov    %esi,%eax
  33:	31 d2                	xor    %edx,%edx
  35:	e8 8a 49 51 00       	call   0x5149c4
  3a:	f6                   	.byte 0xf6
  3b:	86 cc                	xchg   %cl,%ah
  3d:	01 00                	add    %eax,(%rax)
	...

Code starting with the faulting instruction
===========================================
   0:	0f b6 0c 01          	movzbl (%rcx,%rax,1),%ecx
   4:	8b 7e 30             	mov    0x30(%rsi),%edi
   7:	89 f0                	mov    %esi,%eax
   9:	31 d2                	xor    %edx,%edx
   b:	e8 8a 49 51 00       	call   0x51499a
  10:	f6                   	.byte 0xf6
  11:	86 cc                	xchg   %cl,%ah
  13:	01 00                	add    %eax,(%rax)


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240403/202404031607.2e92eebe-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linus:master] [serial]  43066e3222: BUG:kernel_NULL_pointer_dereference,address
  2024-04-03 13:43 [linus:master] [serial] 43066e3222: BUG:kernel_NULL_pointer_dereference,address kernel test robot
@ 2024-04-03 14:00 ` Andy Shevchenko
  2024-04-04  6:54   ` Tony Lindgren
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Shevchenko @ 2024-04-03 14:00 UTC (permalink / raw)
  To: kernel test robot
  Cc: Yicong Yang, oe-lkp, lkp, linux-kernel, Greg Kroah-Hartman,
	stable, Tony Lindgren, linux-serial

On Wed, Apr 03, 2024 at 09:43:28PM +0800, kernel test robot wrote:

> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
> 
> commit: 43066e32227ecde674e8ae1fcdd4a1ede67680c2 ("serial: port: Don't suspend if the port is still busy")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Ja-ja, I am investigating this issue for a while (not that I spend all my time
on it, though) on max3100.

The problem seems that we enable PM runtime on all serial ports
(even if they have no RPM support), but at the same time we enforce
the common serial_port PM callbacks and that change misses a detail
on how to check the port status as the real callback may happen
quite after the port being closed (/dev/ttySx).

Probably we need to check if the port is still open...

P.S. AFAIU the state / xmit is invalid pointer or so.

Culprit line:	serial_out(up, UART_TX, xmit->buf[xmit->tail]);

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linus:master] [serial]  43066e3222: BUG:kernel_NULL_pointer_dereference,address
  2024-04-03 14:00 ` Andy Shevchenko
@ 2024-04-04  6:54   ` Tony Lindgren
  2024-04-04 11:59     ` Andy Shevchenko
  2024-04-04 14:34     ` Andy Shevchenko
  0 siblings, 2 replies; 7+ messages in thread
From: Tony Lindgren @ 2024-04-04  6:54 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: kernel test robot, Yicong Yang, oe-lkp, lkp, linux-kernel,
	Greg Kroah-Hartman, stable, linux-serial

* Andy Shevchenko <andriy.shevchenko@linux.intel.com> [240403 14:00]:
> Probably we need to check if the port is still open...
> 
> P.S. AFAIU the state / xmit is invalid pointer or so.
> 
> Culprit line:	serial_out(up, UART_TX, xmit->buf[xmit->tail]);

Maybe we can set UPF_DEAD a bit earlier as below?

Regards,

Tony

8< ---------------------------
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -3442,8 +3442,6 @@ void serial_core_unregister_port(struct uart_driver *drv, struct uart_port *port
 
 	mutex_lock(&port_mutex);
 
-	port->flags |= UPF_DEAD;
-
 	serial_core_remove_one_port(drv, port);
 
 	/* Note that struct uart_port *port is no longer valid at this point */
diff --git a/drivers/tty/serial/serial_port.c b/drivers/tty/serial/serial_port.c
index 22b9eeb23e68a..1571ef721ef10 100644
--- a/drivers/tty/serial/serial_port.c
+++ b/drivers/tty/serial/serial_port.c
@@ -104,6 +104,7 @@ EXPORT_SYMBOL(uart_add_one_port);
 
 void uart_remove_one_port(struct uart_driver *drv, struct uart_port *port)
 {
+	port->flags |= UPF_DEAD;
 	serial_ctrl_unregister_port(drv, port);
 }
 EXPORT_SYMBOL(uart_remove_one_port);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [linus:master] [serial]  43066e3222: BUG:kernel_NULL_pointer_dereference,address
  2024-04-04  6:54   ` Tony Lindgren
@ 2024-04-04 11:59     ` Andy Shevchenko
  2024-04-04 14:42       ` Andy Shevchenko
  2024-04-04 14:34     ` Andy Shevchenko
  1 sibling, 1 reply; 7+ messages in thread
From: Andy Shevchenko @ 2024-04-04 11:59 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: kernel test robot, Yicong Yang, oe-lkp, lkp, linux-kernel,
	Greg Kroah-Hartman, stable, linux-serial

On Thu, Apr 04, 2024 at 09:54:15AM +0300, Tony Lindgren wrote:
> * Andy Shevchenko <andriy.shevchenko@linux.intel.com> [240403 14:00]:
> > Probably we need to check if the port is still open...
> > 
> > P.S. AFAIU the state / xmit is invalid pointer or so.
> > 
> > Culprit line:	serial_out(up, UART_TX, xmit->buf[xmit->tail]);

xmit.buf seems to be NULL, that's why this fails.

> Maybe we can set UPF_DEAD a bit earlier as below?

Lemme perform some tests later today.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linus:master] [serial]  43066e3222: BUG:kernel_NULL_pointer_dereference,address
  2024-04-04  6:54   ` Tony Lindgren
  2024-04-04 11:59     ` Andy Shevchenko
@ 2024-04-04 14:34     ` Andy Shevchenko
  2024-04-05  5:46       ` Tony Lindgren
  1 sibling, 1 reply; 7+ messages in thread
From: Andy Shevchenko @ 2024-04-04 14:34 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: kernel test robot, Yicong Yang, oe-lkp, lkp, linux-kernel,
	Greg Kroah-Hartman, stable, linux-serial

On Thu, Apr 04, 2024 at 09:54:15AM +0300, Tony Lindgren wrote:
> * Andy Shevchenko <andriy.shevchenko@linux.intel.com> [240403 14:00]:
> > Probably we need to check if the port is still open...
> > 
> > P.S. AFAIU the state / xmit is invalid pointer or so.
> > 
> > Culprit line:	serial_out(up, UART_TX, xmit->buf[xmit->tail]);
> 
> Maybe we can set UPF_DEAD a bit earlier as below?

We should do this under port_mutex. Otherwise we replace one race by another.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linus:master] [serial]  43066e3222: BUG:kernel_NULL_pointer_dereference,address
  2024-04-04 11:59     ` Andy Shevchenko
@ 2024-04-04 14:42       ` Andy Shevchenko
  0 siblings, 0 replies; 7+ messages in thread
From: Andy Shevchenko @ 2024-04-04 14:42 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: kernel test robot, Yicong Yang, oe-lkp, lkp, linux-kernel,
	Greg Kroah-Hartman, stable, linux-serial

On Thu, Apr 04, 2024 at 02:59:19PM +0300, Andy Shevchenko wrote:
> On Thu, Apr 04, 2024 at 09:54:15AM +0300, Tony Lindgren wrote:
> > * Andy Shevchenko <andriy.shevchenko@linux.intel.com> [240403 14:00]:
> > > Probably we need to check if the port is still open...
> > > 
> > > P.S. AFAIU the state / xmit is invalid pointer or so.
> > > 
> > > Culprit line:	serial_out(up, UART_TX, xmit->buf[xmit->tail]);
> 
> xmit.buf seems to be NULL, that's why this fails.
> 
> > Maybe we can set UPF_DEAD a bit earlier as below?
> 
> Lemme perform some tests later today.

Okay, so far it doesn't fix the issue AFAICT. The only reliable one right now
to me is clearing circ buffer before NULLifying it.

I'll send a formal patch.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linus:master] [serial]  43066e3222: BUG:kernel_NULL_pointer_dereference,address
  2024-04-04 14:34     ` Andy Shevchenko
@ 2024-04-05  5:46       ` Tony Lindgren
  0 siblings, 0 replies; 7+ messages in thread
From: Tony Lindgren @ 2024-04-05  5:46 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: kernel test robot, Yicong Yang, oe-lkp, lkp, linux-kernel,
	Greg Kroah-Hartman, stable, linux-serial

* Andy Shevchenko <andriy.shevchenko@linux.intel.com> [240404 14:34]:
> On Thu, Apr 04, 2024 at 09:54:15AM +0300, Tony Lindgren wrote:
> > * Andy Shevchenko <andriy.shevchenko@linux.intel.com> [240403 14:00]:
> > > Probably we need to check if the port is still open...
> > > 
> > > P.S. AFAIU the state / xmit is invalid pointer or so.
> > > 
> > > Culprit line:	serial_out(up, UART_TX, xmit->buf[xmit->tail]);
> > 
> > Maybe we can set UPF_DEAD a bit earlier as below?
> 
> We should do this under port_mutex. Otherwise we replace one race by another.

Oh right.. And adding some better check in addition to UPF_DEAD
would be better.

Regards,

Tony

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-05  5:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-03 13:43 [linus:master] [serial] 43066e3222: BUG:kernel_NULL_pointer_dereference,address kernel test robot
2024-04-03 14:00 ` Andy Shevchenko
2024-04-04  6:54   ` Tony Lindgren
2024-04-04 11:59     ` Andy Shevchenko
2024-04-04 14:42       ` Andy Shevchenko
2024-04-04 14:34     ` Andy Shevchenko
2024-04-05  5:46       ` Tony Lindgren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).