All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: "Jürgen Groß" <jgross@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: Linux 6.13-rc3 many different panics in Xen PV dom0
Date: Fri, 3 Jan 2025 01:42:45 +0100	[thread overview]
Message-ID: <Z3cyhdKu6M1vdBe_@mail-itl> (raw)
In-Reply-To: <Z3cs1-wG5WJ9FrAR@mail-itl>

[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]

On Fri, Jan 03, 2025 at 01:18:31AM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 02, 2025 at 08:39:16PM +0100, Marek Marczykowski-Górecki wrote:
> > On Thu, Jan 02, 2025 at 08:17:00PM +0100, Jürgen Groß wrote:
> > > On 02.01.25 19:54, Marek Marczykowski-Górecki wrote:
> > > > On Thu, Jan 02, 2025 at 01:24:21PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > On Thu, Jan 02, 2025 at 12:30:10PM +0100, Juergen Gross wrote:
> > > > > > On 02.01.25 11:20, Jürgen Groß wrote:
> > > > > > > On 19.12.24 17:14, Marek Marczykowski-Górecki wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > It crashes on boot like below, most of the times. But sometimes (rarely)
> > > > > > > > it manages to stay alive. Below I'm pasting few of the crashes that look
> > > > > > > > distinctly different, if you follow the links, you can find more of
> > > > > > > > them. IMHO it looks like some memory corruption bug somewhere. I tested
> > > > > > > > also Linux 6.13-rc2 before, and it had very similar issue.
> > > > > > > 
> > > > > > > ...
> > > > > > > 
> > > > > > > > 
> > > > > > > > Full log:
> > > > > > > > https://openqa.qubes-os.org/tests/122879/logfile?filename=serial0.txt
> > > > > > > 
> > > > > > > I can reproduce a crash with 6.13-rc5 PV dom0.
> > > > > > > 
> > > > > > > What is really interesting in the logs: most crashes seem to happen right
> > > > > > > after a module being loaded (in my reproducer it was right after loading
> > > > > > > the first module).
> > > > > > > 
> > > > > > > I need to go through the 6.13 commits, but I think I remember having seen
> > > > > > > a patch optimizing module loading by using large pages for addressing the
> > > > > > > loaded modules. Maybe the case of no large pages being available isn't
> > > > > > > handled properly.
> > > > > > 
> > > > > > Seems I was right.
> > > > > > 
> > > > > > For me the following diff fixes the issue. Marek, can you please confirm
> > > > > > it fixes your crashes, too?
> > > > > 
> > > > > Thanks for looking into it!
> > > > > Will do, I've pushed it to
> > > > > https://github.com/QubesOS/qubes-linux-kernel/pull/662, CI will build it
> > > > > and then I'll post it to openQA.
> > > > 
> > > > It is much better!
> > > > 
> > > > Tests are still running, but I already see that many are green.
> > > 
> > > So are you fine with me adding your "Tested-by:"?
> > 
> > Yes.
> > 
> > > > There is
> > > > one issue (likely unrelated to this change) - sys-usb (HVM domU with USB
> > > > controllers passed through) crashes on a system with Raptor Lake CPU
> > > > (only, others, including ADL and MTL look fine):
> 
> Correction, it does happen on some others too, just got the crash on the ADL
> system, although looks a bit different ("Corrupted page table at ..."):

I've collected some more of them at https://github.com/QubesOS/qubes-issues/issues/9681

Should I start new thread for this? On one hand, it's a different domain
type (HVM), but on the other hand, many of the crashes are around
loading modules too.

> > > > [   75.770849] Bluetooth: Core ver 2.22
> > > > [   75.770866] Oops: general protection fault, probably for non-canonical address 0xc9d2315bc82c3bbd: 0000 [#1] PREEMPT SMP NOPTI
> > > > [   75.770880] CPU: 0 UID: 0 PID: 923 Comm: (udev-worker) Not tainted 6.13.0-0.rc5.2.qubes.1.fc41.x86_64 #1
> > > > [   75.770890] Hardware name: Xen HVM domU, BIOS 4.19.0 01/02/2025
> > > > [   75.770897] RIP: 0010:msft_monitor_device_del+0x93/0x170 [bluetooth]
> > > > [   75.770924] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 65 21 <26> 2b 8b ad 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 
> > > This code is looking suspicious. Large areas of binary 0 in a normal function?
> > > And the code itself is nonsense, as it is using a memory access via ES:, which
> > > doesn't make any sense in 64-bit kernel.
> > 
> > Could it be still something related to modules layout in memory?
> > It seems it's not 100% reliable crash, I see in at least one instance
> > sys-usb remained running (unfortunately I don't have collected full
> > sys-usb console log from successful test...).
> > 
> > I just checked again that this crash didn't happen with any 6.12 or 6.11
> > kernels.
> > 
> > -- 
> > Best Regards,
> > Marek Marczykowski-Górecki
> > Invisible Things Lab
> 
> 
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab



-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-01-03  0:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-19 16:14 Linux 6.13-rc3 many different panics in Xen PV dom0 Marek Marczykowski-Górecki
2024-12-20  1:48 ` Marek Marczykowski-Górecki
2024-12-26 18:48   ` Marek Marczykowski-Górecki
2025-01-02 10:20 ` Jürgen Groß
2025-01-02 11:30   ` Juergen Gross
2025-01-02 12:24     ` Marek Marczykowski-Górecki
2025-01-02 18:54       ` Marek Marczykowski-Górecki
2025-01-02 19:04         ` Andrew Cooper
2025-01-02 19:17         ` Jürgen Groß
2025-01-02 19:39           ` Marek Marczykowski-Górecki
2025-01-03  0:18             ` Marek Marczykowski-Górecki
2025-01-03  0:42               ` Marek Marczykowski-Górecki [this message]
2025-01-03  2:00                 ` Andrew Cooper
2025-01-03 18:09                   ` Linux 6.13-rc5 Xen HVM with PCI passthrough (USB controller) crash Marek Marczykowski-Górecki
2025-01-03 18:32                     ` Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z3cyhdKu6M1vdBe_@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jgross@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.