From: Bjorn Helgaas <helgaas@kernel.org>
To: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
Alex Williamson <alex.williamson@redhat.com>,
Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>
Subject: Re: Write to srvio_numvfs triggers kernel panic
Date: Fri, 6 May 2022 15:17:22 -0500 [thread overview]
Message-ID: <20220506201722.GA555374@bhelgaas> (raw)
In-Reply-To: <87a6bxm5vm.fsf@epam.com>
[+cc Alex, Leon, Jason]
On Wed, May 04, 2022 at 07:56:01PM +0000, Volodymyr Babchuk wrote:
>
> Hello,
>
> I have encountered issue when PCI code tries to use both fields in
>
> union {
> struct pci_sriov *sriov; /* PF: SR-IOV info */
> struct pci_dev *physfn; /* VF: related PF */
> };
>
> (which are part of struct pci_dev) at the same time.
>
> Symptoms are following:
>
> # echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
>
> pci 0000:01:00.2: reg 0x20c: [mem 0x30018000-0x3001ffff 64bit]
> pci 0000:01:00.2: VF(n) BAR0 space: [mem 0x30018000-0x30117fff 64bit] (contains BAR0 for 32 VFs)
> Unable to handle kernel paging request at virtual address 0001000200000010
> Mem abort info:
> ESR = 0x96000004
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x00000004
> CM = 0, WnR = 0
> [0001000200000010] address between user and kernel address ranges
> Internal error: Oops: 96000004 [#1] PREEMPT SMP
> Modules linked in: xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 libcrc32c iptable_filter crct10dif_ce nvme nvme_core at24
> pci_endpoint_test bridge pdrv_genirq ip_tables x_tables ipv6
> CPU: 3 PID: 287 Comm: sh Not tainted 5.10.41-lorc+ #233
> Hardware name: XENVM-4.17 (DT)
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> pc : pcie_aspm_get_link+0x90/0xcc
> lr : pcie_aspm_get_link+0x8c/0xcc
> sp : ffff8000130d39c0
> x29: ffff8000130d39c0 x28: 00000000000001a4
> x27: 00000000ffffee4b x26: ffff80001164f560
> x25: 0000000000000000 x24: 0000000000000000
> x23: ffff80001164f660 x22: 0000000000000000
> x23: ffff80001164f660 x22: 0000000000000000
> x21: ffff000003f08000 x20: ffff800010db37d8
> x19: ffff000004b8e780 x18: ffffffffffffffff
> x17: 0000000000000000 x16: 00000000deadbeef
> x15: ffff8000930d36c7 x14: 0000000000000006
> x13: ffff8000115c2710 x12: 000000000000081c
> x11: 00000000000002b4 x10: ffff8000115c2710
> x9 : ffff8000115c2710 x8 : 00000000ffffefff
> x7 : ffff80001161a710 x6 : ffff80001161a710
> x5 : ffff00003fdad900 x4 : 0000000000000000
> x3 : 0000000000000000 x2 : 0000000000000000
> x1 : ffff000003c51c80 x0 : 0001000200000000
> Call trace:
> pcie_aspm_get_link+0x90/0xcc
> aspm_ctrl_attrs_are_visible+0x30/0xc0
> internal_create_group+0xd0/0x3cc
> internal_create_groups.part.0+0x4c/0xc0
> sysfs_create_groups+0x20/0x34
> device_add+0x2b4/0x760
> pci_device_add+0x814/0x854
> pci_iov_add_virtfn+0x240/0x2f0
> sriov_enable+0x1f8/0x474
> pci_sriov_configure_simple+0x38/0x90
> sriov_numvfs_store+0xa4/0x1a0
> dev_attr_store+0x1c/0x30
> sysfs_kf_write+0x48/0x60
> kernfs_fop_write_iter+0x118/0x1ac
> new_sync_write+0xe8/0x184
> vfs_write+0x23c/0x2a0
> ksys_write+0x68/0xf4
> __arm64_sys_write+0x20/0x2c
> el0_svc_common.constprop.0+0x78/0x1a0
> do_el0_svc+0x28/0x94
> el0_svc+0x14/0x20
> el0_sync_handler+0xa4/0x130
> el0_sync+0x180/0x1c0
> Code: d0002120 9133e000 97ffef8e f9400a60 (f9400813)
>
>
> Debugging showed the following:
>
> pci_iov_add_virtfn() allocates new struct pci_dev:
>
> virtfn = pci_alloc_dev(bus);
> and sets physfn:
> virtfn->is_virtfn = 1;
> virtfn->physfn = pci_dev_get(dev);
>
> then we will get into sriov_init() via the following call path:
>
> pci_device_add(virtfn, virtfn->bus);
> pci_init_capabilities(dev);
> pci_iov_init(dev);
> sriov_init(dev, pos);
We called pci_device_add() with the VF. pci_iov_init() only calls
sriov_init() if it finds an SR-IOV capability on the device:
pci_iov_init(struct pci_dev *dev)
pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
if (pos)
return sriov_init(dev, pos);
So this means the VF must have an SR-IOV capability, which sounds a
little dubious. From PCIe r6.0:
9.3.3 SR-IOV Extended Capability
The SR-IOV Extended Capability defined here is a PCIe extended
capability that must be implemented in each PF that supports SR-IOV.
This Capability is used to describe and control a PF’s SR-IOV
Capabilities.
For Multi-Function Devices, each PF that supports SR-IOV shall
provide the Capability structure defined in this section. This
Capability structure may be present in any Function with a Type 0
Configuration Space Header. This Capability must not be present in
Functions with a Type 1 Configuration Space Header.
Can you supply the output of "sudo lspci -vv" for your system?
It could be that the device has an SR-IOV capability when it
shouldn't. But even if it does, Linux could tolerate that better
than it does today.
> sriov_init() overwrites value in the union:
> dev->sriov = iov; <<<<<---- There
> dev->is_physfn = 1;
>
> Next, we will get into function that causes panic:
>
> pci_device_add(virtfn, virtfn->bus);
> device_add(&dev->dev)
> sysfs_create_groups()
> internal_create_group()
> aspm_ctrl_attrs_are_visible()
> pcie_aspm_get_link()
> pci_upstream_bridge()
> pci_physfn()
>
> which is
>
> static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
> {
> #ifdef CONFIG_PCI_IOV
> if (dev->is_virtfn)
> dev = dev->physfn;
> #endif
> return dev;
> }
>
>
> as is_virtfn == 1 and dev->physfn was overwritten via write to
> dev->sriov, pci_physfn() will return pointer to struct pci_sriov
> allocated by sriov_init(). And then pci_upstream_bridge() will
> cause panic by acessing it as if it were pointer to struct pci_dev
>
> I encountered this issue on ARM64, Linux 5.10.41. Tried to test
> on master branch, but it is quite difficult to rebase platform
> code on the master. But I compared all relevant parts of PCI code
> and didn't found any differences.
>
> Looks like I am missing following, because how SR-IOV can be so broken?
> But what exactly?
>
> --
> Volodymyr Babchuk at EPAM
next prev parent reply other threads:[~2022-05-06 20:17 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-04 19:56 Write to srvio_numvfs triggers kernel panic Volodymyr Babchuk
2022-05-06 20:17 ` Bjorn Helgaas [this message]
2022-05-07 1:34 ` Jason Gunthorpe
2022-05-07 10:25 ` Volodymyr Babchuk
2022-05-08 11:19 ` Leon Romanovsky
2022-05-09 18:22 ` Keith Busch
2022-05-07 10:22 ` Volodymyr Babchuk
2022-05-07 15:41 ` Bjorn Helgaas
2022-05-08 11:07 ` Volodymyr Babchuk
2022-05-09 16:49 ` Bjorn Helgaas
2022-05-09 16:58 ` Alex Williamson
2022-05-10 6:39 ` Christoph Hellwig
2022-05-10 17:37 ` Bjorn Helgaas
2022-05-12 7:18 ` Volodymyr Babchuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220506201722.GA555374@bhelgaas \
--to=helgaas@kernel.org \
--cc=Volodymyr_Babchuk@epam.com \
--cc=alex.williamson@redhat.com \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.