From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: linux-pci@vger.kernel.org, stable@vger.kernel.org,
regressions@lists.kernel.dev,
xen-devel <xen-devel@lists.xenproject.org>,
Thomas Gleixner <tglx@linutronix.de>,
Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough
Date: Wed, 25 Aug 2021 17:47:31 +0200 [thread overview]
Message-ID: <YSZmFMeVeO4Bupn+@mail-itl> (raw)
In-Reply-To: <3e72345b-d0e1-7856-de51-e74714474724@suse.com>
[-- Attachment #1: Type: text/plain, Size: 8385 bytes --]
On Wed, Aug 25, 2021 at 05:33:54PM +0200, Jan Beulich wrote:
> On 25.08.2021 17:24, Marek Marczykowski-Górecki wrote:
> > On recent kernel I get kernel panic when starting a Xen PV domain with
> > PCI devices assigned. This happens on 5.10.60 (worked on .54) and
> > 5.4.142 (worked on .136):
> >
> > [ 13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0
> > [ 13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1
> > [ 13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2
> > [ 13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3
> > [ 13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6
> > [ 14.036142] e1000e: Intel(R) PRO/1000 Network Driver
> > [ 14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> > [ 14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13
> > [ 14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
> > [ 14.045188] BUG: unable to handle page fault for address: ffffc9004069100c
> > [ 14.045197] #PF: supervisor write access in kernel mode
> > [ 14.045202] #PF: error_code(0x0003) - permissions violation
> > [ 14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 80100000febd4075
>
> I'm curious what lives at physical address FEBD4000.
This is a third BAR of this device, related to MSI-X:
00:04.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: Intel Corporation Device 0000
Physical Slot: 4
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at feb80000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at feba0000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at c080 [size=32]
Region 3: Memory at febd4000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at feb40000 [disabled] [size=256K]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Kernel driver in use: pciback
Kernel modules: e1000e
> The maximum verbosity
> hypervisor log may also have a hint as to why this is a read-only PTE.
I'll try, if that still makes sense.
> > [ 14.045227] Oops: 0003 [#1] SMP NOPTI
> > [ 14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G W 5.14.0-rc7-1.fc32.qubes.x86_64 #15
> > [ 14.045245] Workqueue: events work_for_cpu_fn
> > [ 14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
> > [ 14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
> > [ 14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
> > [ 14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c
> > [ 14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000
> > [ 14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f
> > [ 14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000
> > [ 14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000
> > [ 14.045393] FS: 0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
> > [ 14.045401] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660
> > [ 14.045420] Call Trace:
> > [ 14.045431] e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
> > [ 14.045479] e1000_probe+0x41f/0xdb0 [e1000e]
>
> Otoh, from this it's pretty clear it's not a device Xen may have found
> a need to access for its own purposes. If aforementioned address covers
> (or is adjacent to) the MSI-X table of a device drive by this driver,
> then it would also be helpful to know how many MSI-X entries the device
> reports its table can have.
See above.
Does PCI passthrough for on PV support MSI-X at all?
If so, I guess the issue is the kernel trying to write directly, instead
of via some hypercall, right?
> > [ 14.045506] local_pci_probe+0x42/0x80
> > [ 14.045515] work_for_cpu_fn+0x16/0x20
> > [ 14.045522] process_one_work+0x1ec/0x390
> > [ 14.045529] worker_thread+0x53/0x3e0
> > [ 14.045534] ? process_one_work+0x390/0x390
> > [ 14.045540] kthread+0x127/0x150
> > [ 14.045548] ? set_kthread_struct+0x40/0x40
> > [ 14.045554] ret_from_fork+0x22/0x30
> > [ 14.045565] Modules linked in: e1000e(+) edac_mce_amd rfkill xen_pcifront pcspkr xt_REDIRECT ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack iptable_filter iptable_mangle iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse drm bpf_preload ip_tables overlay xen_blkfront
> > [ 14.045620] CR2: ffffc9004069100c
> > [ 14.045627] ---[ end trace 307f5bb3bd9f30b4 ]---
> > [ 14.045632] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
> > [ 14.045640] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
> > [ 14.045652] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
> > [ 14.045657] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c
> > [ 14.045663] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000
> > [ 14.045668] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f
> > [ 14.045674] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000
> > [ 14.045679] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000
> > [ 14.045698] FS: 0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
> > [ 14.045706] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 14.045711] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660
> > [ 14.045718] Kernel panic - not syncing: Fatal exception
> > [ 14.045726] Kernel Offset: disabled
> >
> > I've bisected it down to this commit:
> >
> > commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f
> > Author: Thomas Gleixner <tglx@linutronix.de>
> > Date: Thu Jul 29 23:51:41 2021 +0200
> >
> > PCI/MSI: Mask all unused MSI-X entries
> >
> > I can reliably reproduce it on Xen 4.14 and Xen 4.8, so I don't think
> > Xen version matters here.
> >
> > Any idea how to fix it?
> >
>
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2021-08-25 15:47 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-25 15:24 Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough Marek Marczykowski-Górecki
2021-08-25 15:33 ` Jan Beulich
2021-08-25 15:47 ` Marek Marczykowski-Górecki [this message]
2021-08-25 15:55 ` Jan Beulich
2021-08-26 1:28 ` Marek Marczykowski-Górecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YSZmFMeVeO4Bupn+@mail-itl \
--to=marmarek@invisiblethingslab.com \
--cc=bhelgaas@google.com \
--cc=jbeulich@suse.com \
--cc=linux-pci@vger.kernel.org \
--cc=regressions@lists.kernel.dev \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).