All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simona Vetter <simona.vetter@ffwll.ch>
To: Thomas Zimmermann <tzimmermann@suse.de>
Cc: "Dr. David Alan Gilbert" <linux@treblig.org>,
	Hans de Goede <hdegoede@redhat.com>,
	kraxel@redhat.com, virtualization@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: a bochs-drm (?) oops on head
Date: Wed, 18 Dec 2024 11:41:01 +0100	[thread overview]
Message-ID: <Z2KmvZFRWT10fzhr@phenom.ffwll.local> (raw)
In-Reply-To: <30beb4da-50be-4e28-a19e-5d7f9680c7ea@suse.de>

On Tue, Dec 17, 2024 at 12:43:11PM +0100, Thomas Zimmermann wrote:
> (cc'ing Hans, who implemented deferred console takeover)
> 
> Hi
> 
> Am 16.12.24 um 18:35 schrieb Dr. David Alan Gilbert:
> > * Thomas Zimmermann (tzimmermann@suse.de) wrote:
> > > Hi
> > > 
> > > 
> > > Am 16.12.24 um 14:46 schrieb Dr. David Alan Gilbert:
> > > [...]
> > > > > The attached patch fixes the problem for me. Could you please test and
> > > > > report back the results.
> > > > That gets me a different oops; this was run with:
> > > > qemu-system-x86_64  -M pc -cpu host --enable-kvm -smp 4 -m 2G -kernel /discs/fast/kernel/arch/x86/boot/bzImage -append "console=tty0 console=ttyS0 root=/dev/vdb1 single" -drive if=virtio,file=/discs/more/images/debian12-64scan.qcow2
> > > > 
> > > > It looks to me if it made the mistake of trying to print something in the middle of being removed:
> > >  From the stack trace below, I'd say it is the one at [1]. But I fail to
> > > reproduce the problem.
> > > 
> > > Could you please send me the complete output of dmesg after the system
> > > finished booting?
> > Sure; this is as far as it gets until it hits the vga oops that stops it:
> 
> I was able to reproduce it a single time. My setup is
> 
> - CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
> - CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER=y
> - startup with vgacon
> - 'quiet' on kernel command line (important!)
> 
> and then I got the segfault you report.
> 
> With the provided stack trace and log, I think I got some idea what is
> happened
> 
> > 
> > [    0.000000][    T0] Linux version 6.13.0-rc2+ (dg@dalek) (gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3), GNU ld version 2.43.1-4.fc41) #373 SMP PREEMPT_DYNAMIC Mon Dec 16 13:25:32 GMT 2024
> > [    0.000000][    T0] Command line: console=tty0 console=ttyS0 root=/dev/vdb1 single
> [...]
> > [   74.077481][    T1] SPI driver abt-y030xx067a has no spi_device_id for abt,y030xx067a
> > [   74.088805][    T1] SPI driver panel-ilitek-ili9322 has no spi_device_id for dlink,dir-685-panel
> > [   74.090492][    T1] SPI driver panel-ilitek-ili9322 has no spi_device_id for ilitek,ili9322
> > [   74.094556][    T1] SPI driver panel-innolux-ej030na has no spi_device_id for innolux,ej030na
> > [   74.106367][    T1] SPI driver nt39016 has no spi_device_id for kingdisplay,kd035g6-54nt
> > [   74.116623][    T1] SPI driver s6d27a1-panel has no spi_device_id for samsung,s6d27a1
> > [   74.120701][    T1] SPI driver panel-samsung-s6e63m0 has no spi_device_id for samsung,s6e63m0
> 
> > [   74.177273][    T1] bochs-drm 0000:00:02.0: vgaarb: deactivate vga console
> 
> This comes from the first iteration in really_probe() [1]. This is the bochs
> instance that will be removed. It should install fbcon, but that gets
> deferred.

I don't think that's correct, because ...

> [1] https://elixir.bootlin.com/linux/v6.12.5/source/drivers/base/dd.c#L631
> 
> > [   74.179388][    T1] [drm] Found bochs VGA, ID 0xb0c5.
> > [   74.180931][    T1] [drm] Framebuffer size 16384 kB @ 0xfd000000, mmio @ 0xfebf0000.
> > [   74.199314][    T1] [drm] Initialized bochs-drm 1.0.0 for 0000:00:02.0 on minor 2
> > [   74.265834][    T1] fbcon: bochs-drmdrmfb (fb1) is primary device
> > [   74.265882][    T1] fbcon: Remapping primary device, fb1, to tty 1-63

... the above line I think is only ever printed if we're not deferred
anymore:
- One such printk is in fbcon_remap_all(), but after checking for
  deferred_takeover and bailing out if that's set.
- The other is in do_fb_registered -> fbcon_select_primary (but only if
  CONFIG_FRAMEBUFFER_DETECT_PRIMARY is set), which is before we check for
  deferred_takeover. But if deferred_takeover is set that path should
  print

  		pr_info("fbcon: Deferring console take-over\n");

  which does not seem to be the case.

But then I have no idea why deferred fbcon could have any impact at all.
Really puzzling.
-Sima

> > [   79.736367][    T1] bochs-drm 0000:00:02.0: [drm] fb1: bochs-drmdrmfb frame buffer device
> 
> End of first bochs instance here..
> 
> > [   79.800872][    T1] bochs-drm 0000:00:02.0: vgaarb: deactivate vga console
> 
> The second instance of bochs starts here and tries to deactivate the console
> a second time. Notice that we didn't have any "Console is " or "Taking over
> console" messages.
> 
> > [   79.802400][    T1] BUG: kernel NULL pointer dereference, address: 000000000000020c
> 
> I've not been able to figure out what is a offset 0x20c (524 decimal). None
> of the structs involved appears to have any fields starting at this offset.
> The nearest case is vc_hi_font_mask, [2] which is at +520. Could be related
> to aligned memory access. get_color() would read that field. [3] vc_num at
> +512 is another candidate.
> 
> [2] https://elixir.bootlin.com/linux/v6.13-rc2/source/include/linux/console_struct.h#L124
> [3] https://elixir.bootlin.com/linux/v6.13-rc2/source/drivers/video/fbdev/core/fbcon.c#L302
> 
> > [   79.802448][    T1] #PF: supervisor write access in kernel mode
> > [   79.802498][    T1] #PF: error_code(0x0002) - not-present page
> > [   79.802545][    T1] PGD 0 P4D 0
> > [   79.802622][    T1] Oops: Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
> > [   79.802669][    T1] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W        N 6.13.0-rc2+ #373 5a5c0ce8f09b0b72067981f01985e201a0118bb6
> > [   79.802669][    T1] Tainted: [W]=WARN, [N]=TEST
> > [   79.802669][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
> > [   79.802669][    T1] RIP: 0010:fbcon_cursor+0xa9/0x3c0
> > [   79.802669][    T1] Code: c0 05 00 00 66 89 44 24 06 e8 f3 35 2a fd 0f b7 bb c0 05 00 00 e8 27 b8 e9 fc 49 8d bc 24 0c 02 00 00 49 89 c7 e8 d7 3d 2a fd <45> 89 bc 24 0c 02 00 00 48 8d bd e0 05 00 00 e8 c3 3b 2a fd 44 8b
> > [   79.802669][    T1] RSP: 0018:ffffb70f800136c0 EFLAGS: 00010046
> > [   79.802669][    T1] RAX: 0000000000000000 RBX: ffff9d4fc10a8800 RCX: 0000000000000000
> > [   79.802669][    T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > [   79.802669][    T1] RBP: ffff9d4fc54ba800 R08: 0000000000000000 R09: 0000000000000000
> > [   79.802669][    T1] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > [   79.802669][    T1] R13: 0000000000000000 R14: ffff9d4fc54bade8 R15: 0000000000000032
> > [   79.802669][    T1] FS:  0000000000000000(0000) GS:ffff9d503d200000(0000) knlGS:0000000000000000
> > [   79.802669][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   79.802669][    T1] CR2: 000000000000020c CR3: 0000000056520000 CR4: 0000000000350ef0
> > [   79.802669][    T1] Call Trace:
> > [   79.802669][    T1]  <TASK>
> > [   79.802669][    T1]  ? __die+0x23/0x80
> > [   79.802669][    T1]  ? page_fault_oops+0x21c/0x240
> > [   79.802669][    T1]  ? do_user_addr_fault+0x893/0x1180
> > [   79.802669][    T1]  ? srso_return_thunk+0x5/0x7f
> > [   79.802669][    T1]  ? exc_page_fault+0x3f/0x180
> > [   79.802669][    T1]  ? exc_page_fault+0x87/0x180
> > [   79.802669][    T1]  ? asm_exc_page_fault+0x26/0x40
> 
> > [   79.802669][    T1]  ? fbcon_cursor+0xa9/0x3c0
> > [   79.802669][    T1]  hide_cursor+0x66/0x1c0
> > [   79.802669][    T1]  vt_console_print+0x9b1/0xa40
> 
> I think we get here via steps [4] to [8].
> 
> [4]
> https://elixir.bootlin.com/linux/v6.13-rc2/source/drivers/pci/vgaarb.c#L173
> [5]
> https://elixir.bootlin.com/linux/v6.13-rc2/source/drivers/tty/vt/vt.c#L3287
> [6]
> https://elixir.bootlin.com/linux/v6.13-rc2/source/drivers/tty/vt/vt.c#L861
> [7]
> https://elixir.bootlin.com/linux/v6.13-rc2/source/drivers/tty/vt/vt.c#L846
> [8] https://elixir.bootlin.com/linux/v6.13-rc2/source/drivers/video/fbdev/core/fbcon.c#L1322
> 
> When vt_console_print() invokes the call at [9], it apparently replaces the
> deferred console implementation (maybe ?) and
> then the next line [5] operates on a NULL pointer somewhere.
> 
> [9]
> https://elixir.bootlin.com/linux/v6.13-rc2/source/drivers/tty/vt/vt.c#L3286
> 
> Best regards
> Thomas
> 
> 
> -- 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

      reply	other threads:[~2024-12-18 10:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-15 18:18 a bochs-drm (?) oops on head Dr. David Alan Gilbert
2024-12-16  8:41 ` Thomas Zimmermann
2024-12-16 13:46   ` Dr. David Alan Gilbert
2024-12-16 13:59     ` Thomas Zimmermann
2024-12-16 14:35       ` Dr. David Alan Gilbert
2024-12-16 17:24     ` Thomas Zimmermann
2024-12-16 17:35       ` Dr. David Alan Gilbert
2024-12-16 18:56         ` Dr. David Alan Gilbert
2024-12-16 22:09         ` Dr. David Alan Gilbert
2024-12-17  1:04           ` Dr. David Alan Gilbert
2024-12-17 11:43         ` Thomas Zimmermann
2024-12-18 10:41           ` Simona Vetter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2KmvZFRWT10fzhr@phenom.ffwll.local \
    --to=simona.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hdegoede@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@treblig.org \
    --cc=tzimmermann@suse.de \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.