* [PATCH v2 0/3] Miscellaneous fixes for pci subsystem
@ 2025-12-24 9:27 Ziming Du
2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Ziming Du @ 2025-12-24 9:27 UTC (permalink / raw)
To: bhelgaas, jbarnes, chrisw, alex.williamson
Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2
Miscellaneous fixes for pci subsystem
Ilpo Järvinen warned me of potential issues in my previous patch,
so I have made the necessary adjustments.
Changes in v2:
- Correct grammer and indentation.
- Remove unrelated stack traces from the commit message.
- Modify the handling of pos by adding a non-negative check to ensure
that the input value is valid.
- Use the existing IS_ALIGNED macro and ensure that after modification,
other cases still retuen -EINVAL as before.
- Link to v1: https://lore.kernel.org/linux-pci/20251216083912.758219-1-duziming2@huawei.com/
Thanx, Du
Yongqiang Liu (2):
PCI/sysfs: Prohibit unaligned access to I/O port on non-x86
PCI: Prevent overflow in proc_bus_pci_write()
Ziming Du (1):
PCI/sysfs: Fix null pointer dereference during hotplug
drivers/pci/pci-sysfs.c | 10 ++++++++++
drivers/pci/proc.c | 2 +-
2 files changed, 11 insertions(+), 1 deletion(-)
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug 2025-12-24 9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du @ 2025-12-24 9:27 ` Ziming Du 2025-12-29 17:31 ` Bjorn Helgaas 2025-12-24 9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du 2025-12-24 9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du 2 siblings, 1 reply; 12+ messages in thread From: Ziming Du @ 2025-12-24 9:27 UTC (permalink / raw) To: bhelgaas, jbarnes, chrisw, alex.williamson Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2 During the concurrent process of creating and rescanning in VF, the resource files for the same pci_dev may be created twice. The second creation attempt fails, resulting the res_attr in pci_dev to kfree(), but the pointer is not set to NULL. This will subsequently lead to dereferencing a null pointer when removing the device. When we perform the following operation: echo $vfcount > /sys/class/net/"$pfname"/device/sriov_numvfs & sleep 0.5 echo 1 > /sys/bus/pci/rescan pci_remove "$pfname" system will crash as follows: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: __pi_strlen+0x14/0x150 kernfs_find_ns+0x54/0x120 kernfs_remove_by_name_ns+0x58/0xf0 sysfs_remove_bin_file+0x24/0x38 pci_remove_resource_files+0x44/0x90 pci_remove_sysfs_dev_files+0x28/0x40 pci_stop_bus_device+0xb8/0x118 pci_stop_and_remove_bus_device+0x20/0x40 pci_iov_remove_virtfn+0xb8/0x138 sriov_disable+0xbc/0x190 pci_disable_sriov+0x30/0x48 hinic_pci_sriov_disable+0x54/0x138 [hinic] hinic_remove+0x140/0x290 [hinic] pci_device_remove+0x4c/0xf8 device_remove+0x54/0x90 device_release_driver_internal+0x1d4/0x238 device_release_driver+0x20/0x38 pci_stop_bus_device+0xa8/0x118 pci_stop_and_remove_bus_device_locked+0x28/0x50 remove_store+0x128/0x208 Fix this by set the pointer to NULL after releasing res_attr immediately. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ziming Du <duziming2@huawei.com> --- drivers/pci/pci-sysfs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index c2df915ad2d2..7e697b82c5e1 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1222,12 +1222,14 @@ static void pci_remove_resource_files(struct pci_dev *pdev) if (res_attr) { sysfs_remove_bin_file(&pdev->dev.kobj, res_attr); kfree(res_attr); + pdev->res_attr[i] = NULL; } res_attr = pdev->res_attr_wc[i]; if (res_attr) { sysfs_remove_bin_file(&pdev->dev.kobj, res_attr); kfree(res_attr); + pdev->res_attr_wc[i] = NULL; } } } -- 2.43.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug 2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du @ 2025-12-29 17:31 ` Bjorn Helgaas 2025-12-30 3:40 ` duziming 0 siblings, 1 reply; 12+ messages in thread From: Bjorn Helgaas @ 2025-12-29 17:31 UTC (permalink / raw) To: Ziming Du Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, linux-kernel, liuyongqiang13 On Wed, Dec 24, 2025 at 05:27:17PM +0800, Ziming Du wrote: > During the concurrent process of creating and rescanning in VF, the > resource files for the same pci_dev may be created twice. The second > creation attempt fails, resulting the res_attr in pci_dev to kfree(), > but the pointer is not set to NULL. This will subsequently lead to > dereferencing a null pointer when removing the device. > > When we perform the following operation: > echo $vfcount > /sys/class/net/"$pfname"/device/sriov_numvfs & Is the value of $vfcount relevant here? Can you use the actual values here instead of the variables so this is more useful to others? > sleep 0.5 > echo 1 > /sys/bus/pci/rescan > pci_remove "$pfname" > system will crash as follows: ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug 2025-12-29 17:31 ` Bjorn Helgaas @ 2025-12-30 3:40 ` duziming 0 siblings, 0 replies; 12+ messages in thread From: duziming @ 2025-12-30 3:40 UTC (permalink / raw) To: Bjorn Helgaas Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, linux-kernel, liuyongqiang13 在 2025/12/30 1:31, Bjorn Helgaas 写道: > On Wed, Dec 24, 2025 at 05:27:17PM +0800, Ziming Du wrote: >> During the concurrent process of creating and rescanning in VF, the >> resource files for the same pci_dev may be created twice. The second >> creation attempt fails, resulting the res_attr in pci_dev to kfree(), >> but the pointer is not set to NULL. This will subsequently lead to >> dereferencing a null pointer when removing the device. >> >> When we perform the following operation: >> echo $vfcount > /sys/class/net/"$pfname"/device/sriov_numvfs & > Is the value of $vfcount relevant here? Can you use the actual values > here instead of the variables so this is more useful to others? In fact, we directly use sriov_totalvfs here. In my opinion, the larger this value is, the more likely it is to cause the issue. >> sleep 0.5 >> echo 1 > /sys/bus/pci/rescan >> pci_remove "$pfname" >> system will crash as follows: ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() 2025-12-24 9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du 2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du @ 2025-12-24 9:27 ` Ziming Du 2025-12-29 18:07 ` Bjorn Helgaas 2025-12-24 9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du 2 siblings, 1 reply; 12+ messages in thread From: Ziming Du @ 2025-12-24 9:27 UTC (permalink / raw) To: bhelgaas, jbarnes, chrisw, alex.williamson Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2 From: Yongqiang Liu <liuyongqiang13@huawei.com> When the value of ppos over the INT_MAX, the pos is over set to a negtive value which will be passed to get_user() or pci_user_write_config_dword(). Unexpected behavior such as a softlock will happen as follows: watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444] RIP: 0010:_raw_spin_unlock_irq+0x17/0x30 Call Trace: <TASK> pci_user_write_config_dword+0x126/0x1f0 proc_bus_pci_write+0x273/0x470 proc_reg_write+0x1b6/0x280 do_iter_write+0x48e/0x790 vfs_writev+0x125/0x4a0 __x64_sys_pwritev+0x1e2/0x2a0 do_syscall_64+0x59/0x110 entry_SYSCALL_64_after_hwframe+0x78/0xe2 Fix this by add check for the pos. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> Signed-off-by: Ziming Du <duziming2@huawei.com> --- drivers/pci/proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index 9348a0fb8084..200d42feafd8 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf, if (ret) return ret; - if (pos >= size) + if (pos >= size || pos < 0) return 0; if (nbytes >= size) nbytes = size; -- 2.43.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() 2025-12-24 9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du @ 2025-12-29 18:07 ` Bjorn Helgaas 2025-12-30 8:20 ` duziming 0 siblings, 1 reply; 12+ messages in thread From: Bjorn Helgaas @ 2025-12-29 18:07 UTC (permalink / raw) To: Ziming Du Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, linux-kernel, liuyongqiang13, Krzysztof Wilczyński [+cc Krzysztof; I thought we looked at this long ago?] On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote: > From: Yongqiang Liu <liuyongqiang13@huawei.com> > > When the value of ppos over the INT_MAX, the pos is over set to a negtive > value which will be passed to get_user() or pci_user_write_config_dword(). > Unexpected behavior such as a softlock will happen as follows: s/negtive/negative/ s/softlock/soft lockup/ to match message below s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what you're referring to) I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos", loff_t is a signed type, and negative read/write offsets are invalid. If this is easily reproducible with "dd" or similar, could maybe include a sample command line? > watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444] > RIP: 0010:_raw_spin_unlock_irq+0x17/0x30 > Call Trace: > <TASK> > pci_user_write_config_dword+0x126/0x1f0 > proc_bus_pci_write+0x273/0x470 > proc_reg_write+0x1b6/0x280 > do_iter_write+0x48e/0x790 > vfs_writev+0x125/0x4a0 > __x64_sys_pwritev+0x1e2/0x2a0 > do_syscall_64+0x59/0x110 > entry_SYSCALL_64_after_hwframe+0x78/0xe2 > > Fix this by add check for the pos. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> > Signed-off-by: Ziming Du <duziming2@huawei.com> > --- > drivers/pci/proc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c > index 9348a0fb8084..200d42feafd8 100644 > --- a/drivers/pci/proc.c > +++ b/drivers/pci/proc.c > @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf, > if (ret) > return ret; > > - if (pos >= size) > + if (pos >= size || pos < 0) > return 0; I see a few similar cases that look like this; maybe we should do the same? if (pos < 0) return -EINVAL; Looks like proc_bus_pci_read() has the same issue? What about pci_read_config(), pci_write_config(), pci_llseek_resource(), pci_read_legacy_io(), pci_write_legacy_io(), pci_read_resource_io(), pci_write_resource_io(), pci_read_rom()? These are all sysfs things; does the sysfs infrastructure take care of negative offsets before we get to these? > if (nbytes >= size) > nbytes = size; > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() 2025-12-29 18:07 ` Bjorn Helgaas @ 2025-12-30 8:20 ` duziming 2025-12-31 9:31 ` Ilpo Järvinen 0 siblings, 1 reply; 12+ messages in thread From: duziming @ 2025-12-30 8:20 UTC (permalink / raw) To: Bjorn Helgaas Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, linux-kernel, liuyongqiang13, Krzysztof Wilczyński 在 2025/12/30 2:07, Bjorn Helgaas 写道: > [+cc Krzysztof; I thought we looked at this long ago?] > > On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote: >> From: Yongqiang Liu <liuyongqiang13@huawei.com> >> >> When the value of ppos over the INT_MAX, the pos is over set to a negtive >> value which will be passed to get_user() or pci_user_write_config_dword(). >> Unexpected behavior such as a softlock will happen as follows: > s/negtive/negative/ > s/softlock/soft lockup/ to match message below Thanks for pointing out the ambiguous parts. > s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what > you're referring to) > > I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos", > loff_t is a signed type, and negative read/write offsets are invalid. Actually, the *loff_t *ppos *passed in is not a negative value. The root cause of the issue lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the INT_MAX, the pos is over set to a negative value. This negative *pos* then propagates through subsequent logic, leading to the observed errors. > If this is easily reproducible with "dd" or similar, could maybe > include a sample command line? We reproduced the issue using the following POC: #include <stdio.h> #include <string.h> #include <unistd.h> #include <fcntl.h> #include <sys/uio.h> int main() { int fd = open("/proc/bus/pci/00/02.0", O_RDWR); if (fd < 0) { perror("open failed"); return 1; } char data[] = "926b7719201054f37a1d9d391e862c"; off_t offset = 0x80800001; struct iovec iov = { .iov_base = data, .iov_len = 0xf }; pwritev(fd, &iov, 1, offset); return 0; } >> watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444] >> RIP: 0010:_raw_spin_unlock_irq+0x17/0x30 >> Call Trace: >> <TASK> >> pci_user_write_config_dword+0x126/0x1f0 >> proc_bus_pci_write+0x273/0x470 >> proc_reg_write+0x1b6/0x280 >> do_iter_write+0x48e/0x790 >> vfs_writev+0x125/0x4a0 >> __x64_sys_pwritev+0x1e2/0x2a0 >> do_syscall_64+0x59/0x110 >> entry_SYSCALL_64_after_hwframe+0x78/0xe2 >> >> Fix this by add check for the pos. >> >> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> >> Signed-off-by: Ziming Du <duziming2@huawei.com> >> --- >> drivers/pci/proc.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c >> index 9348a0fb8084..200d42feafd8 100644 >> --- a/drivers/pci/proc.c >> +++ b/drivers/pci/proc.c >> @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf, >> if (ret) >> return ret; >> >> - if (pos >= size) >> + if (pos >= size || pos < 0) >> return 0; > I see a few similar cases that look like this; maybe we should do the > same? > > if (pos < 0) > return -EINVAL; > > Looks like proc_bus_pci_read() has the same issue? proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo Järvinen in https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/ However, it does not result in an overflow to a negative number. > > What about pci_read_config(), pci_write_config(), > pci_llseek_resource(), pci_read_legacy_io(), pci_write_legacy_io(), > pci_read_resource_io(), pci_write_resource_io(), pci_read_rom()? > These are all sysfs things; does the sysfs infrastructure take care of > negative offsets before we get to these? In do_pwritev(), the following check has been performed: if (pos < 0) return -EINVAL; Theoretically, a negative offset should not occur. >> if (nbytes >= size) >> nbytes = size; >> -- >> 2.43.0 >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() 2025-12-30 8:20 ` duziming @ 2025-12-31 9:31 ` Ilpo Järvinen 2025-12-31 17:04 ` Bjorn Helgaas 0 siblings, 1 reply; 12+ messages in thread From: Ilpo Järvinen @ 2025-12-31 9:31 UTC (permalink / raw) To: duziming, Bjorn Helgaas Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML, liuyongqiang13, Krzysztof Wilczyński [-- Attachment #1: Type: text/plain, Size: 4662 bytes --] On Tue, 30 Dec 2025, duziming wrote: > 在 2025/12/30 2:07, Bjorn Helgaas 写道: > > [+cc Krzysztof; I thought we looked at this long ago?] > > > > On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote: > > > From: Yongqiang Liu <liuyongqiang13@huawei.com> > > > > > > When the value of ppos over the INT_MAX, the pos is over set to a negtive > > > value which will be passed to get_user() or pci_user_write_config_dword(). > > > Unexpected behavior such as a softlock will happen as follows: > > s/negtive/negative/ > > s/softlock/soft lockup/ to match message below > Thanks for pointing out the ambiguous parts. > > s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what > > you're referring to) > > > > I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos", > > loff_t is a signed type, and negative read/write offsets are invalid. > > Actually, the *loff_t *ppos *passed in is not a negative value. The root cause > of the issue > > lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the > INT_MAX, the pos is over set > > to a negative value. This negative *pos* then propagates through subsequent > logic, leading to the observed errors. > > > If this is easily reproducible with "dd" or similar, could maybe > > include a sample command line? > > We reproduced the issue using the following POC: > > #include <stdio.h> > > #include <string.h> > #include <unistd.h> > #include <fcntl.h> > #include <sys/uio.h> > > int main() { > int fd = open("/proc/bus/pci/00/02.0", O_RDWR); > if (fd < 0) { > perror("open failed"); > return 1; > } > char data[] = "926b7719201054f37a1d9d391e862c"; > off_t offset = 0x80800001; > struct iovec iov = { > .iov_base = data, > .iov_len = 0xf > }; > pwritev(fd, &iov, 1, offset); > return 0; > } > > > > watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444] > > > RIP: 0010:_raw_spin_unlock_irq+0x17/0x30 > > > Call Trace: > > > <TASK> > > > pci_user_write_config_dword+0x126/0x1f0 > > > proc_bus_pci_write+0x273/0x470 > > > proc_reg_write+0x1b6/0x280 > > > do_iter_write+0x48e/0x790 > > > vfs_writev+0x125/0x4a0 > > > __x64_sys_pwritev+0x1e2/0x2a0 > > > do_syscall_64+0x59/0x110 > > > entry_SYSCALL_64_after_hwframe+0x78/0xe2 > > > > > > Fix this by add check for the pos. > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> > > > Signed-off-by: Ziming Du <duziming2@huawei.com> > > > --- > > > drivers/pci/proc.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c > > > index 9348a0fb8084..200d42feafd8 100644 > > > --- a/drivers/pci/proc.c > > > +++ b/drivers/pci/proc.c > > > @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, > > > const char __user *buf, > > > if (ret) > > > return ret; > > > - if (pos >= size) > > > + if (pos >= size || pos < 0) > > > return 0; > > I see a few similar cases that look like this; maybe we should do the > > same? > > > > if (pos < 0) > > return -EINVAL; > > > > Looks like proc_bus_pci_read() has the same issue? > > proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo > Järvinen in > > https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/ > > However, it does not result in an overflow to a negative number. Why does the cast has to happen first here? This would ensure _correctness_ without any false alignment issues for large numbers: int pos; int size = dev->cfg_size; ... if (*ppos > INT_MAX) return -EINVAL; pos = *ppos; (I'm not sure though if this should return 0 or -EINVAL when *ppos >= size as it currently returns 0 for non-overflowing values when pos >= size.) -- i. > > What about pci_read_config(), pci_write_config(), > > pci_llseek_resource(), pci_read_legacy_io(), pci_write_legacy_io(), > > pci_read_resource_io(), pci_write_resource_io(), pci_read_rom()? > > These are all sysfs things; does the sysfs infrastructure take care of > > negative offsets before we get to these? > > In do_pwritev(), the following check has been performed: > > if (pos < 0) > return -EINVAL; > > Theoretically, a negative offset should not occur. > > > > if (nbytes >= size) > > > nbytes = size; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() 2025-12-31 9:31 ` Ilpo Järvinen @ 2025-12-31 17:04 ` Bjorn Helgaas 2026-01-04 7:17 ` duziming 0 siblings, 1 reply; 12+ messages in thread From: Bjorn Helgaas @ 2025-12-31 17:04 UTC (permalink / raw) To: Ilpo Järvinen Cc: duziming, bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML, liuyongqiang13, Krzysztof Wilczyński On Wed, Dec 31, 2025 at 11:31:47AM +0200, Ilpo Järvinen wrote: > On Tue, 30 Dec 2025, duziming wrote: > > 在 2025/12/30 2:07, Bjorn Helgaas 写道: > > > [+cc Krzysztof; I thought we looked at this long ago?] > > > > > > On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote: > > > > From: Yongqiang Liu <liuyongqiang13@huawei.com> > > > > > > > > When the value of ppos over the INT_MAX, the pos is over set to a negtive > > > > value which will be passed to get_user() or pci_user_write_config_dword(). > > > > Unexpected behavior such as a softlock will happen as follows: > > > s/negtive/negative/ > > > s/softlock/soft lockup/ to match message below > > Thanks for pointing out the ambiguous parts. > > > s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what > > > you're referring to) > > > > > > I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos", > > > loff_t is a signed type, and negative read/write offsets are invalid. > > > > Actually, the *loff_t *ppos *passed in is not a negative value. The root cause > > of the issue > > > > lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the > > INT_MAX, the pos is over set > > > > to a negative value. This negative *pos* then propagates through subsequent > > logic, leading to the observed errors. > > > > > If this is easily reproducible with "dd" or similar, could maybe > > > include a sample command line? > > > > We reproduced the issue using the following POC: > > > > #include <stdio.h> > > > > #include <string.h> > > #include <unistd.h> > > #include <fcntl.h> > > #include <sys/uio.h> > > > > int main() { > > int fd = open("/proc/bus/pci/00/02.0", O_RDWR); > > if (fd < 0) { > > perror("open failed"); > > return 1; > > } > > char data[] = "926b7719201054f37a1d9d391e862c"; > > off_t offset = 0x80800001; > > struct iovec iov = { > > .iov_base = data, > > .iov_len = 0xf > > }; > > pwritev(fd, &iov, 1, offset); > > return 0; > > } > > > > > > watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444] > > > > RIP: 0010:_raw_spin_unlock_irq+0x17/0x30 > > > > Call Trace: > > > > <TASK> > > > > pci_user_write_config_dword+0x126/0x1f0 > > > > proc_bus_pci_write+0x273/0x470 > > > > proc_reg_write+0x1b6/0x280 > > > > do_iter_write+0x48e/0x790 > > > > vfs_writev+0x125/0x4a0 > > > > __x64_sys_pwritev+0x1e2/0x2a0 > > > > do_syscall_64+0x59/0x110 > > > > entry_SYSCALL_64_after_hwframe+0x78/0xe2 > > > > > > > > Fix this by add check for the pos. > > > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> > > > > Signed-off-by: Ziming Du <duziming2@huawei.com> > > > > --- > > > > drivers/pci/proc.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c > > > > index 9348a0fb8084..200d42feafd8 100644 > > > > --- a/drivers/pci/proc.c > > > > +++ b/drivers/pci/proc.c > > > > @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, > > > > const char __user *buf, > > > > if (ret) > > > > return ret; > > > > - if (pos >= size) > > > > + if (pos >= size || pos < 0) > > > > return 0; > > > I see a few similar cases that look like this; maybe we should do the > > > same? > > > > > > if (pos < 0) > > > return -EINVAL; > > > > > > Looks like proc_bus_pci_read() has the same issue? > > > > proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo > > Järvinen in > > > > https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/ > > > > However, it does not result in an overflow to a negative number. > > Why does the cast has to happen first here? > > This would ensure _correctness_ without any false alignment issues for > large numbers: > > int pos; > int size = dev->cfg_size; > > ... > if (*ppos > INT_MAX) Isn't *ppos a signed quantity? If so, wouldn't we want to check for "*ppos < 0"? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() 2025-12-31 17:04 ` Bjorn Helgaas @ 2026-01-04 7:17 ` duziming 0 siblings, 0 replies; 12+ messages in thread From: duziming @ 2026-01-04 7:17 UTC (permalink / raw) To: Bjorn Helgaas, Ilpo Järvinen Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML, liuyongqiang13, Krzysztof Wilczyński 在 2026/1/1 1:04, Bjorn Helgaas 写道: > On Wed, Dec 31, 2025 at 11:31:47AM +0200, Ilpo Järvinen wrote: >> On Tue, 30 Dec 2025, duziming wrote: >>> 在 2025/12/30 2:07, Bjorn Helgaas 写道: >>>> [+cc Krzysztof; I thought we looked at this long ago?] >>>> >>>> On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote: >>>>> From: Yongqiang Liu <liuyongqiang13@huawei.com> >>>>> >>>>> When the value of ppos over the INT_MAX, the pos is over set to a negtive >>>>> value which will be passed to get_user() or pci_user_write_config_dword(). >>>>> Unexpected behavior such as a softlock will happen as follows: >>>> s/negtive/negative/ >>>> s/softlock/soft lockup/ to match message below >>> Thanks for pointing out the ambiguous parts. >>>> s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what >>>> you're referring to) >>>> >>>> I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos", >>>> loff_t is a signed type, and negative read/write offsets are invalid. >>> Actually, the *loff_t *ppos *passed in is not a negative value. The root cause >>> of the issue >>> >>> lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the >>> INT_MAX, the pos is over set >>> >>> to a negative value. This negative *pos* then propagates through subsequent >>> logic, leading to the observed errors. >>> >>>> If this is easily reproducible with "dd" or similar, could maybe >>>> include a sample command line? >>> We reproduced the issue using the following POC: >>> >>> #include <stdio.h> >>> >>> #include <string.h> >>> #include <unistd.h> >>> #include <fcntl.h> >>> #include <sys/uio.h> >>> >>> int main() { >>> int fd = open("/proc/bus/pci/00/02.0", O_RDWR); >>> if (fd < 0) { >>> perror("open failed"); >>> return 1; >>> } >>> char data[] = "926b7719201054f37a1d9d391e862c"; >>> off_t offset = 0x80800001; >>> struct iovec iov = { >>> .iov_base = data, >>> .iov_len = 0xf >>> }; >>> pwritev(fd, &iov, 1, offset); >>> return 0; >>> } >>> >>>>> watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444] >>>>> RIP: 0010:_raw_spin_unlock_irq+0x17/0x30 >>>>> Call Trace: >>>>> <TASK> >>>>> pci_user_write_config_dword+0x126/0x1f0 >>>>> proc_bus_pci_write+0x273/0x470 >>>>> proc_reg_write+0x1b6/0x280 >>>>> do_iter_write+0x48e/0x790 >>>>> vfs_writev+0x125/0x4a0 >>>>> __x64_sys_pwritev+0x1e2/0x2a0 >>>>> do_syscall_64+0x59/0x110 >>>>> entry_SYSCALL_64_after_hwframe+0x78/0xe2 >>>>> >>>>> Fix this by add check for the pos. >>>>> >>>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >>>>> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> >>>>> Signed-off-by: Ziming Du <duziming2@huawei.com> >>>>> --- >>>>> drivers/pci/proc.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c >>>>> index 9348a0fb8084..200d42feafd8 100644 >>>>> --- a/drivers/pci/proc.c >>>>> +++ b/drivers/pci/proc.c >>>>> @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, >>>>> const char __user *buf, >>>>> if (ret) >>>>> return ret; >>>>> - if (pos >= size) >>>>> + if (pos >= size || pos < 0) >>>>> return 0; >>>> I see a few similar cases that look like this; maybe we should do the >>>> same? >>>> >>>> if (pos < 0) >>>> return -EINVAL; >>>> >>>> Looks like proc_bus_pci_read() has the same issue? >>> proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo >>> Järvinen in >>> >>> https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/ >>> >>> However, it does not result in an overflow to a negative number. >> Why does the cast has to happen first here? >> >> This would ensure _correctness_ without any false alignment issues for >> large numbers: >> >> int pos; >> int size = dev->cfg_size; >> >> ... >> if (*ppos > INT_MAX) > Isn't *ppos a signed quantity? If so, wouldn't we want to check for > "*ppos < 0"? If *ppos < 0, it will be discarded in the previous process, just like in do_pwritev(), where it returns -EINVAL when pos is negative. So we think that here using "*ppos > INT_MAX" might be more reasonable. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 2025-12-24 9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du 2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du 2025-12-24 9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du @ 2025-12-24 9:27 ` Ziming Du 2025-12-29 9:36 ` Ilpo Järvinen 2 siblings, 1 reply; 12+ messages in thread From: Ziming Du @ 2025-12-24 9:27 UTC (permalink / raw) To: bhelgaas, jbarnes, chrisw, alex.williamson Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2 From: Yongqiang Liu <liuyongqiang13@huawei.com> Unaligned access is harmful for non-x86 archs such as arm64. When we use pwrite or pread to access the I/O port resources with unaligned offset, system will crash as follows: Unable to handle kernel paging request at virtual address fffffbfffe8010c1 Internal error: Oops: 0000000096000061 [#1] SMP Call trace: _outw include/asm-generic/io.h:594 [inline] logic_outw+0x54/0x218 lib/logic_pio.c:305 pci_resource_io drivers/pci/pci-sysfs.c:1157 [inline] pci_write_resource_io drivers/pci/pci-sysfs.c:1191 [inline] pci_write_resource_io+0x208/0x260 drivers/pci/pci-sysfs.c:1181 sysfs_kf_bin_write+0x188/0x210 fs/sysfs/file.c:158 kernfs_fop_write_iter+0x2e8/0x4b0 fs/kernfs/file.c:338 vfs_write+0x7bc/0xac8 fs/read_write.c:586 ksys_write+0x12c/0x270 fs/read_write.c:639 __arm64_sys_write+0x78/0xb8 fs/read_write.c:648 Powerpc seems affected as well, so prohibit the unaligned access on non-x86 archs. Fixes: 8633328be242 ("PCI: Allow read/write access to sysfs I/O port resources") Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> Signed-off-by: Ziming Du <duziming2@huawei.com> --- drivers/pci/pci-sysfs.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 7e697b82c5e1..c44a9c4a91ab 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1166,12 +1166,20 @@ static ssize_t pci_resource_io(struct file *filp, struct kobject *kobj, *(u8 *)buf = inb(port); return 1; case 2: +#if !defined(CONFIG_X86) + if (!IS_ALIGNED(port, count)) + return -EFAULT; +#endif if (write) outw(*(u16 *)buf, port); else *(u16 *)buf = inw(port); return 2; case 4: +#if !defined(CONFIG_X86) + if (!IS_ALIGNED(port, count)) + return -EFAULT; +#endif if (write) outl(*(u32 *)buf, port); else -- 2.43.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 2025-12-24 9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du @ 2025-12-29 9:36 ` Ilpo Järvinen 0 siblings, 0 replies; 12+ messages in thread From: Ilpo Järvinen @ 2025-12-29 9:36 UTC (permalink / raw) To: Ziming Du Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML, liuyongqiang13 On Wed, 24 Dec 2025, Ziming Du wrote: > From: Yongqiang Liu <liuyongqiang13@huawei.com> > > Unaligned access is harmful for non-x86 archs such as arm64. When we > use pwrite or pread to access the I/O port resources with unaligned > offset, system will crash as follows: > > Unable to handle kernel paging request at virtual address fffffbfffe8010c1 > Internal error: Oops: 0000000096000061 [#1] SMP > Call trace: > _outw include/asm-generic/io.h:594 [inline] > logic_outw+0x54/0x218 lib/logic_pio.c:305 > pci_resource_io drivers/pci/pci-sysfs.c:1157 [inline] > pci_write_resource_io drivers/pci/pci-sysfs.c:1191 [inline] > pci_write_resource_io+0x208/0x260 drivers/pci/pci-sysfs.c:1181 > sysfs_kf_bin_write+0x188/0x210 fs/sysfs/file.c:158 > kernfs_fop_write_iter+0x2e8/0x4b0 fs/kernfs/file.c:338 > vfs_write+0x7bc/0xac8 fs/read_write.c:586 > ksys_write+0x12c/0x270 fs/read_write.c:639 > __arm64_sys_write+0x78/0xb8 fs/read_write.c:648 > > Powerpc seems affected as well, so prohibit the unaligned access > on non-x86 archs. > > Fixes: 8633328be242 ("PCI: Allow read/write access to sysfs I/O port resources") > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> > Signed-off-by: Ziming Du <duziming2@huawei.com> > --- > drivers/pci/pci-sysfs.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 7e697b82c5e1..c44a9c4a91ab 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -1166,12 +1166,20 @@ static ssize_t pci_resource_io(struct file *filp, struct kobject *kobj, > *(u8 *)buf = inb(port); > return 1; > case 2: > +#if !defined(CONFIG_X86) > + if (!IS_ALIGNED(port, count)) > + return -EFAULT; > +#endif > if (write) > outw(*(u16 *)buf, port); > else > *(u16 *)buf = inw(port); > return 2; > case 4: > +#if !defined(CONFIG_X86) > + if (!IS_ALIGNED(port, count)) > + return -EFAULT; > +#endif > if (write) > outl(*(u32 *)buf, port); > else > To use IS_ALIGNED(), you need to add: #include <linux/align.h> -- i. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-01-04 7:17 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-24 9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du 2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du 2025-12-29 17:31 ` Bjorn Helgaas 2025-12-30 3:40 ` duziming 2025-12-24 9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du 2025-12-29 18:07 ` Bjorn Helgaas 2025-12-30 8:20 ` duziming 2025-12-31 9:31 ` Ilpo Järvinen 2025-12-31 17:04 ` Bjorn Helgaas 2026-01-04 7:17 ` duziming 2025-12-24 9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du 2025-12-29 9:36 ` Ilpo Järvinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox