* [PATCH v2 0/3] Miscellaneous fixes for pci subsystem
@ 2025-12-24 9:27 Ziming Du
2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Ziming Du @ 2025-12-24 9:27 UTC (permalink / raw)
To: bhelgaas, jbarnes, chrisw, alex.williamson
Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2
Miscellaneous fixes for pci subsystem
Ilpo Järvinen warned me of potential issues in my previous patch,
so I have made the necessary adjustments.
Changes in v2:
- Correct grammer and indentation.
- Remove unrelated stack traces from the commit message.
- Modify the handling of pos by adding a non-negative check to ensure
that the input value is valid.
- Use the existing IS_ALIGNED macro and ensure that after modification,
other cases still retuen -EINVAL as before.
- Link to v1: https://lore.kernel.org/linux-pci/20251216083912.758219-1-duziming2@huawei.com/
Thanx, Du
Yongqiang Liu (2):
PCI/sysfs: Prohibit unaligned access to I/O port on non-x86
PCI: Prevent overflow in proc_bus_pci_write()
Ziming Du (1):
PCI/sysfs: Fix null pointer dereference during hotplug
drivers/pci/pci-sysfs.c | 10 ++++++++++
drivers/pci/proc.c | 2 +-
2 files changed, 11 insertions(+), 1 deletion(-)
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug
2025-12-24 9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du
@ 2025-12-24 9:27 ` Ziming Du
2025-12-29 17:31 ` Bjorn Helgaas
2025-12-24 9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du
2025-12-24 9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du
2 siblings, 1 reply; 12+ messages in thread
From: Ziming Du @ 2025-12-24 9:27 UTC (permalink / raw)
To: bhelgaas, jbarnes, chrisw, alex.williamson
Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2
During the concurrent process of creating and rescanning in VF, the
resource files for the same pci_dev may be created twice. The second
creation attempt fails, resulting the res_attr in pci_dev to kfree(),
but the pointer is not set to NULL. This will subsequently lead to
dereferencing a null pointer when removing the device.
When we perform the following operation:
echo $vfcount > /sys/class/net/"$pfname"/device/sriov_numvfs &
sleep 0.5
echo 1 > /sys/bus/pci/rescan
pci_remove "$pfname"
system will crash as follows:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Call trace:
__pi_strlen+0x14/0x150
kernfs_find_ns+0x54/0x120
kernfs_remove_by_name_ns+0x58/0xf0
sysfs_remove_bin_file+0x24/0x38
pci_remove_resource_files+0x44/0x90
pci_remove_sysfs_dev_files+0x28/0x40
pci_stop_bus_device+0xb8/0x118
pci_stop_and_remove_bus_device+0x20/0x40
pci_iov_remove_virtfn+0xb8/0x138
sriov_disable+0xbc/0x190
pci_disable_sriov+0x30/0x48
hinic_pci_sriov_disable+0x54/0x138 [hinic]
hinic_remove+0x140/0x290 [hinic]
pci_device_remove+0x4c/0xf8
device_remove+0x54/0x90
device_release_driver_internal+0x1d4/0x238
device_release_driver+0x20/0x38
pci_stop_bus_device+0xa8/0x118
pci_stop_and_remove_bus_device_locked+0x28/0x50
remove_store+0x128/0x208
Fix this by set the pointer to NULL after releasing res_attr immediately.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ziming Du <duziming2@huawei.com>
---
drivers/pci/pci-sysfs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index c2df915ad2d2..7e697b82c5e1 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1222,12 +1222,14 @@ static void pci_remove_resource_files(struct pci_dev *pdev)
if (res_attr) {
sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
kfree(res_attr);
+ pdev->res_attr[i] = NULL;
}
res_attr = pdev->res_attr_wc[i];
if (res_attr) {
sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
kfree(res_attr);
+ pdev->res_attr_wc[i] = NULL;
}
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
2025-12-24 9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du
2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
@ 2025-12-24 9:27 ` Ziming Du
2025-12-29 18:07 ` Bjorn Helgaas
2025-12-24 9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du
2 siblings, 1 reply; 12+ messages in thread
From: Ziming Du @ 2025-12-24 9:27 UTC (permalink / raw)
To: bhelgaas, jbarnes, chrisw, alex.williamson
Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2
From: Yongqiang Liu <liuyongqiang13@huawei.com>
When the value of ppos over the INT_MAX, the pos is over set to a negtive
value which will be passed to get_user() or pci_user_write_config_dword().
Unexpected behavior such as a softlock will happen as follows:
watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
Call Trace:
<TASK>
pci_user_write_config_dword+0x126/0x1f0
proc_bus_pci_write+0x273/0x470
proc_reg_write+0x1b6/0x280
do_iter_write+0x48e/0x790
vfs_writev+0x125/0x4a0
__x64_sys_pwritev+0x1e2/0x2a0
do_syscall_64+0x59/0x110
entry_SYSCALL_64_after_hwframe+0x78/0xe2
Fix this by add check for the pos.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Ziming Du <duziming2@huawei.com>
---
drivers/pci/proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 9348a0fb8084..200d42feafd8 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf,
if (ret)
return ret;
- if (pos >= size)
+ if (pos >= size || pos < 0)
return 0;
if (nbytes >= size)
nbytes = size;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86
2025-12-24 9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du
2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
2025-12-24 9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du
@ 2025-12-24 9:27 ` Ziming Du
2025-12-29 9:36 ` Ilpo Järvinen
2 siblings, 1 reply; 12+ messages in thread
From: Ziming Du @ 2025-12-24 9:27 UTC (permalink / raw)
To: bhelgaas, jbarnes, chrisw, alex.williamson
Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2
From: Yongqiang Liu <liuyongqiang13@huawei.com>
Unaligned access is harmful for non-x86 archs such as arm64. When we
use pwrite or pread to access the I/O port resources with unaligned
offset, system will crash as follows:
Unable to handle kernel paging request at virtual address fffffbfffe8010c1
Internal error: Oops: 0000000096000061 [#1] SMP
Call trace:
_outw include/asm-generic/io.h:594 [inline]
logic_outw+0x54/0x218 lib/logic_pio.c:305
pci_resource_io drivers/pci/pci-sysfs.c:1157 [inline]
pci_write_resource_io drivers/pci/pci-sysfs.c:1191 [inline]
pci_write_resource_io+0x208/0x260 drivers/pci/pci-sysfs.c:1181
sysfs_kf_bin_write+0x188/0x210 fs/sysfs/file.c:158
kernfs_fop_write_iter+0x2e8/0x4b0 fs/kernfs/file.c:338
vfs_write+0x7bc/0xac8 fs/read_write.c:586
ksys_write+0x12c/0x270 fs/read_write.c:639
__arm64_sys_write+0x78/0xb8 fs/read_write.c:648
Powerpc seems affected as well, so prohibit the unaligned access
on non-x86 archs.
Fixes: 8633328be242 ("PCI: Allow read/write access to sysfs I/O port resources")
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Ziming Du <duziming2@huawei.com>
---
drivers/pci/pci-sysfs.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 7e697b82c5e1..c44a9c4a91ab 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1166,12 +1166,20 @@ static ssize_t pci_resource_io(struct file *filp, struct kobject *kobj,
*(u8 *)buf = inb(port);
return 1;
case 2:
+#if !defined(CONFIG_X86)
+ if (!IS_ALIGNED(port, count))
+ return -EFAULT;
+#endif
if (write)
outw(*(u16 *)buf, port);
else
*(u16 *)buf = inw(port);
return 2;
case 4:
+#if !defined(CONFIG_X86)
+ if (!IS_ALIGNED(port, count))
+ return -EFAULT;
+#endif
if (write)
outl(*(u32 *)buf, port);
else
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86
2025-12-24 9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du
@ 2025-12-29 9:36 ` Ilpo Järvinen
0 siblings, 0 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2025-12-29 9:36 UTC (permalink / raw)
To: Ziming Du
Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML,
liuyongqiang13
On Wed, 24 Dec 2025, Ziming Du wrote:
> From: Yongqiang Liu <liuyongqiang13@huawei.com>
>
> Unaligned access is harmful for non-x86 archs such as arm64. When we
> use pwrite or pread to access the I/O port resources with unaligned
> offset, system will crash as follows:
>
> Unable to handle kernel paging request at virtual address fffffbfffe8010c1
> Internal error: Oops: 0000000096000061 [#1] SMP
> Call trace:
> _outw include/asm-generic/io.h:594 [inline]
> logic_outw+0x54/0x218 lib/logic_pio.c:305
> pci_resource_io drivers/pci/pci-sysfs.c:1157 [inline]
> pci_write_resource_io drivers/pci/pci-sysfs.c:1191 [inline]
> pci_write_resource_io+0x208/0x260 drivers/pci/pci-sysfs.c:1181
> sysfs_kf_bin_write+0x188/0x210 fs/sysfs/file.c:158
> kernfs_fop_write_iter+0x2e8/0x4b0 fs/kernfs/file.c:338
> vfs_write+0x7bc/0xac8 fs/read_write.c:586
> ksys_write+0x12c/0x270 fs/read_write.c:639
> __arm64_sys_write+0x78/0xb8 fs/read_write.c:648
>
> Powerpc seems affected as well, so prohibit the unaligned access
> on non-x86 archs.
>
> Fixes: 8633328be242 ("PCI: Allow read/write access to sysfs I/O port resources")
> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> Signed-off-by: Ziming Du <duziming2@huawei.com>
> ---
> drivers/pci/pci-sysfs.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 7e697b82c5e1..c44a9c4a91ab 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1166,12 +1166,20 @@ static ssize_t pci_resource_io(struct file *filp, struct kobject *kobj,
> *(u8 *)buf = inb(port);
> return 1;
> case 2:
> +#if !defined(CONFIG_X86)
> + if (!IS_ALIGNED(port, count))
> + return -EFAULT;
> +#endif
> if (write)
> outw(*(u16 *)buf, port);
> else
> *(u16 *)buf = inw(port);
> return 2;
> case 4:
> +#if !defined(CONFIG_X86)
> + if (!IS_ALIGNED(port, count))
> + return -EFAULT;
> +#endif
> if (write)
> outl(*(u32 *)buf, port);
> else
>
To use IS_ALIGNED(), you need to add:
#include <linux/align.h>
--
i.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug
2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
@ 2025-12-29 17:31 ` Bjorn Helgaas
2025-12-30 3:40 ` duziming
0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-12-29 17:31 UTC (permalink / raw)
To: Ziming Du
Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
linux-kernel, liuyongqiang13
On Wed, Dec 24, 2025 at 05:27:17PM +0800, Ziming Du wrote:
> During the concurrent process of creating and rescanning in VF, the
> resource files for the same pci_dev may be created twice. The second
> creation attempt fails, resulting the res_attr in pci_dev to kfree(),
> but the pointer is not set to NULL. This will subsequently lead to
> dereferencing a null pointer when removing the device.
>
> When we perform the following operation:
> echo $vfcount > /sys/class/net/"$pfname"/device/sriov_numvfs &
Is the value of $vfcount relevant here? Can you use the actual values
here instead of the variables so this is more useful to others?
> sleep 0.5
> echo 1 > /sys/bus/pci/rescan
> pci_remove "$pfname"
> system will crash as follows:
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
2025-12-24 9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du
@ 2025-12-29 18:07 ` Bjorn Helgaas
2025-12-30 8:20 ` duziming
0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-12-29 18:07 UTC (permalink / raw)
To: Ziming Du
Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
linux-kernel, liuyongqiang13, Krzysztof Wilczyński
[+cc Krzysztof; I thought we looked at this long ago?]
On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
> From: Yongqiang Liu <liuyongqiang13@huawei.com>
>
> When the value of ppos over the INT_MAX, the pos is over set to a negtive
> value which will be passed to get_user() or pci_user_write_config_dword().
> Unexpected behavior such as a softlock will happen as follows:
s/negtive/negative/
s/softlock/soft lockup/ to match message below
s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
you're referring to)
I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
loff_t is a signed type, and negative read/write offsets are invalid.
If this is easily reproducible with "dd" or similar, could maybe
include a sample command line?
> watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
> RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
> Call Trace:
> <TASK>
> pci_user_write_config_dword+0x126/0x1f0
> proc_bus_pci_write+0x273/0x470
> proc_reg_write+0x1b6/0x280
> do_iter_write+0x48e/0x790
> vfs_writev+0x125/0x4a0
> __x64_sys_pwritev+0x1e2/0x2a0
> do_syscall_64+0x59/0x110
> entry_SYSCALL_64_after_hwframe+0x78/0xe2
>
> Fix this by add check for the pos.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> Signed-off-by: Ziming Du <duziming2@huawei.com>
> ---
> drivers/pci/proc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index 9348a0fb8084..200d42feafd8 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf,
> if (ret)
> return ret;
>
> - if (pos >= size)
> + if (pos >= size || pos < 0)
> return 0;
I see a few similar cases that look like this; maybe we should do the
same?
if (pos < 0)
return -EINVAL;
Looks like proc_bus_pci_read() has the same issue?
What about pci_read_config(), pci_write_config(),
pci_llseek_resource(), pci_read_legacy_io(), pci_write_legacy_io(),
pci_read_resource_io(), pci_write_resource_io(), pci_read_rom()?
These are all sysfs things; does the sysfs infrastructure take care of
negative offsets before we get to these?
> if (nbytes >= size)
> nbytes = size;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug
2025-12-29 17:31 ` Bjorn Helgaas
@ 2025-12-30 3:40 ` duziming
0 siblings, 0 replies; 12+ messages in thread
From: duziming @ 2025-12-30 3:40 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
linux-kernel, liuyongqiang13
在 2025/12/30 1:31, Bjorn Helgaas 写道:
> On Wed, Dec 24, 2025 at 05:27:17PM +0800, Ziming Du wrote:
>> During the concurrent process of creating and rescanning in VF, the
>> resource files for the same pci_dev may be created twice. The second
>> creation attempt fails, resulting the res_attr in pci_dev to kfree(),
>> but the pointer is not set to NULL. This will subsequently lead to
>> dereferencing a null pointer when removing the device.
>>
>> When we perform the following operation:
>> echo $vfcount > /sys/class/net/"$pfname"/device/sriov_numvfs &
> Is the value of $vfcount relevant here? Can you use the actual values
> here instead of the variables so this is more useful to others?
In fact, we directly use sriov_totalvfs here. In my opinion, the larger
this value is,
the more likely it is to cause the issue.
>> sleep 0.5
>> echo 1 > /sys/bus/pci/rescan
>> pci_remove "$pfname"
>> system will crash as follows:
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
2025-12-29 18:07 ` Bjorn Helgaas
@ 2025-12-30 8:20 ` duziming
2025-12-31 9:31 ` Ilpo Järvinen
0 siblings, 1 reply; 12+ messages in thread
From: duziming @ 2025-12-30 8:20 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
linux-kernel, liuyongqiang13, Krzysztof Wilczyński
在 2025/12/30 2:07, Bjorn Helgaas 写道:
> [+cc Krzysztof; I thought we looked at this long ago?]
>
> On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
>> From: Yongqiang Liu <liuyongqiang13@huawei.com>
>>
>> When the value of ppos over the INT_MAX, the pos is over set to a negtive
>> value which will be passed to get_user() or pci_user_write_config_dword().
>> Unexpected behavior such as a softlock will happen as follows:
> s/negtive/negative/
> s/softlock/soft lockup/ to match message below
Thanks for pointing out the ambiguous parts.
> s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
> you're referring to)
>
> I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
> loff_t is a signed type, and negative read/write offsets are invalid.
Actually, the *loff_t *ppos *passed in is not a negative value. The root
cause of the issue
lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the
INT_MAX, the pos is over set
to a negative value. This negative *pos* then propagates through
subsequent logic, leading to the observed errors.
> If this is easily reproducible with "dd" or similar, could maybe
> include a sample command line?
We reproduced the issue using the following POC:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/uio.h>
int main() {
int fd = open("/proc/bus/pci/00/02.0", O_RDWR);
if (fd < 0) {
perror("open failed");
return 1;
}
char data[] = "926b7719201054f37a1d9d391e862c";
off_t offset = 0x80800001;
struct iovec iov = {
.iov_base = data,
.iov_len = 0xf
};
pwritev(fd, &iov, 1, offset);
return 0;
}
>> watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
>> RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
>> Call Trace:
>> <TASK>
>> pci_user_write_config_dword+0x126/0x1f0
>> proc_bus_pci_write+0x273/0x470
>> proc_reg_write+0x1b6/0x280
>> do_iter_write+0x48e/0x790
>> vfs_writev+0x125/0x4a0
>> __x64_sys_pwritev+0x1e2/0x2a0
>> do_syscall_64+0x59/0x110
>> entry_SYSCALL_64_after_hwframe+0x78/0xe2
>>
>> Fix this by add check for the pos.
>>
>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
>> Signed-off-by: Ziming Du <duziming2@huawei.com>
>> ---
>> drivers/pci/proc.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
>> index 9348a0fb8084..200d42feafd8 100644
>> --- a/drivers/pci/proc.c
>> +++ b/drivers/pci/proc.c
>> @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf,
>> if (ret)
>> return ret;
>>
>> - if (pos >= size)
>> + if (pos >= size || pos < 0)
>> return 0;
> I see a few similar cases that look like this; maybe we should do the
> same?
>
> if (pos < 0)
> return -EINVAL;
>
> Looks like proc_bus_pci_read() has the same issue?
proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo
Järvinen in
https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/
However, it does not result in an overflow to a negative number.
>
> What about pci_read_config(), pci_write_config(),
> pci_llseek_resource(), pci_read_legacy_io(), pci_write_legacy_io(),
> pci_read_resource_io(), pci_write_resource_io(), pci_read_rom()?
> These are all sysfs things; does the sysfs infrastructure take care of
> negative offsets before we get to these?
In do_pwritev(), the following check has been performed:
if (pos < 0)
return -EINVAL;
Theoretically, a negative offset should not occur.
>> if (nbytes >= size)
>> nbytes = size;
>> --
>> 2.43.0
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
2025-12-30 8:20 ` duziming
@ 2025-12-31 9:31 ` Ilpo Järvinen
2025-12-31 17:04 ` Bjorn Helgaas
0 siblings, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2025-12-31 9:31 UTC (permalink / raw)
To: duziming, Bjorn Helgaas
Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML,
liuyongqiang13, Krzysztof Wilczyński
[-- Attachment #1: Type: text/plain, Size: 4662 bytes --]
On Tue, 30 Dec 2025, duziming wrote:
> 在 2025/12/30 2:07, Bjorn Helgaas 写道:
> > [+cc Krzysztof; I thought we looked at this long ago?]
> >
> > On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
> > > From: Yongqiang Liu <liuyongqiang13@huawei.com>
> > >
> > > When the value of ppos over the INT_MAX, the pos is over set to a negtive
> > > value which will be passed to get_user() or pci_user_write_config_dword().
> > > Unexpected behavior such as a softlock will happen as follows:
> > s/negtive/negative/
> > s/softlock/soft lockup/ to match message below
> Thanks for pointing out the ambiguous parts.
> > s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
> > you're referring to)
> >
> > I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
> > loff_t is a signed type, and negative read/write offsets are invalid.
>
> Actually, the *loff_t *ppos *passed in is not a negative value. The root cause
> of the issue
>
> lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the
> INT_MAX, the pos is over set
>
> to a negative value. This negative *pos* then propagates through subsequent
> logic, leading to the observed errors.
>
> > If this is easily reproducible with "dd" or similar, could maybe
> > include a sample command line?
>
> We reproduced the issue using the following POC:
>
> #include <stdio.h>
>
> #include <string.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/uio.h>
>
> int main() {
> int fd = open("/proc/bus/pci/00/02.0", O_RDWR);
> if (fd < 0) {
> perror("open failed");
> return 1;
> }
> char data[] = "926b7719201054f37a1d9d391e862c";
> off_t offset = 0x80800001;
> struct iovec iov = {
> .iov_base = data,
> .iov_len = 0xf
> };
> pwritev(fd, &iov, 1, offset);
> return 0;
> }
>
> > > watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
> > > RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
> > > Call Trace:
> > > <TASK>
> > > pci_user_write_config_dword+0x126/0x1f0
> > > proc_bus_pci_write+0x273/0x470
> > > proc_reg_write+0x1b6/0x280
> > > do_iter_write+0x48e/0x790
> > > vfs_writev+0x125/0x4a0
> > > __x64_sys_pwritev+0x1e2/0x2a0
> > > do_syscall_64+0x59/0x110
> > > entry_SYSCALL_64_after_hwframe+0x78/0xe2
> > >
> > > Fix this by add check for the pos.
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> > > Signed-off-by: Ziming Du <duziming2@huawei.com>
> > > ---
> > > drivers/pci/proc.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> > > index 9348a0fb8084..200d42feafd8 100644
> > > --- a/drivers/pci/proc.c
> > > +++ b/drivers/pci/proc.c
> > > @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file,
> > > const char __user *buf,
> > > if (ret)
> > > return ret;
> > > - if (pos >= size)
> > > + if (pos >= size || pos < 0)
> > > return 0;
> > I see a few similar cases that look like this; maybe we should do the
> > same?
> >
> > if (pos < 0)
> > return -EINVAL;
> >
> > Looks like proc_bus_pci_read() has the same issue?
>
> proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo
> Järvinen in
>
> https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/
>
> However, it does not result in an overflow to a negative number.
Why does the cast has to happen first here?
This would ensure _correctness_ without any false alignment issues for
large numbers:
int pos;
int size = dev->cfg_size;
...
if (*ppos > INT_MAX)
return -EINVAL;
pos = *ppos;
(I'm not sure though if this should return 0 or -EINVAL when *ppos >=
size as it currently returns 0 for non-overflowing values when pos >=
size.)
--
i.
> > What about pci_read_config(), pci_write_config(),
> > pci_llseek_resource(), pci_read_legacy_io(), pci_write_legacy_io(),
> > pci_read_resource_io(), pci_write_resource_io(), pci_read_rom()?
> > These are all sysfs things; does the sysfs infrastructure take care of
> > negative offsets before we get to these?
>
> In do_pwritev(), the following check has been performed:
>
> if (pos < 0)
> return -EINVAL;
>
> Theoretically, a negative offset should not occur.
>
> > > if (nbytes >= size)
> > > nbytes = size;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
2025-12-31 9:31 ` Ilpo Järvinen
@ 2025-12-31 17:04 ` Bjorn Helgaas
2026-01-04 7:17 ` duziming
0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-12-31 17:04 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: duziming, bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
LKML, liuyongqiang13, Krzysztof Wilczyński
On Wed, Dec 31, 2025 at 11:31:47AM +0200, Ilpo Järvinen wrote:
> On Tue, 30 Dec 2025, duziming wrote:
> > 在 2025/12/30 2:07, Bjorn Helgaas 写道:
> > > [+cc Krzysztof; I thought we looked at this long ago?]
> > >
> > > On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
> > > > From: Yongqiang Liu <liuyongqiang13@huawei.com>
> > > >
> > > > When the value of ppos over the INT_MAX, the pos is over set to a negtive
> > > > value which will be passed to get_user() or pci_user_write_config_dword().
> > > > Unexpected behavior such as a softlock will happen as follows:
> > > s/negtive/negative/
> > > s/softlock/soft lockup/ to match message below
> > Thanks for pointing out the ambiguous parts.
> > > s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
> > > you're referring to)
> > >
> > > I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
> > > loff_t is a signed type, and negative read/write offsets are invalid.
> >
> > Actually, the *loff_t *ppos *passed in is not a negative value. The root cause
> > of the issue
> >
> > lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the
> > INT_MAX, the pos is over set
> >
> > to a negative value. This negative *pos* then propagates through subsequent
> > logic, leading to the observed errors.
> >
> > > If this is easily reproducible with "dd" or similar, could maybe
> > > include a sample command line?
> >
> > We reproduced the issue using the following POC:
> >
> > #include <stdio.h>
> >
> > #include <string.h>
> > #include <unistd.h>
> > #include <fcntl.h>
> > #include <sys/uio.h>
> >
> > int main() {
> > int fd = open("/proc/bus/pci/00/02.0", O_RDWR);
> > if (fd < 0) {
> > perror("open failed");
> > return 1;
> > }
> > char data[] = "926b7719201054f37a1d9d391e862c";
> > off_t offset = 0x80800001;
> > struct iovec iov = {
> > .iov_base = data,
> > .iov_len = 0xf
> > };
> > pwritev(fd, &iov, 1, offset);
> > return 0;
> > }
> >
> > > > watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
> > > > RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
> > > > Call Trace:
> > > > <TASK>
> > > > pci_user_write_config_dword+0x126/0x1f0
> > > > proc_bus_pci_write+0x273/0x470
> > > > proc_reg_write+0x1b6/0x280
> > > > do_iter_write+0x48e/0x790
> > > > vfs_writev+0x125/0x4a0
> > > > __x64_sys_pwritev+0x1e2/0x2a0
> > > > do_syscall_64+0x59/0x110
> > > > entry_SYSCALL_64_after_hwframe+0x78/0xe2
> > > >
> > > > Fix this by add check for the pos.
> > > >
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> > > > Signed-off-by: Ziming Du <duziming2@huawei.com>
> > > > ---
> > > > drivers/pci/proc.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> > > > index 9348a0fb8084..200d42feafd8 100644
> > > > --- a/drivers/pci/proc.c
> > > > +++ b/drivers/pci/proc.c
> > > > @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file,
> > > > const char __user *buf,
> > > > if (ret)
> > > > return ret;
> > > > - if (pos >= size)
> > > > + if (pos >= size || pos < 0)
> > > > return 0;
> > > I see a few similar cases that look like this; maybe we should do the
> > > same?
> > >
> > > if (pos < 0)
> > > return -EINVAL;
> > >
> > > Looks like proc_bus_pci_read() has the same issue?
> >
> > proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo
> > Järvinen in
> >
> > https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/
> >
> > However, it does not result in an overflow to a negative number.
>
> Why does the cast has to happen first here?
>
> This would ensure _correctness_ without any false alignment issues for
> large numbers:
>
> int pos;
> int size = dev->cfg_size;
>
> ...
> if (*ppos > INT_MAX)
Isn't *ppos a signed quantity? If so, wouldn't we want to check for
"*ppos < 0"?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
2025-12-31 17:04 ` Bjorn Helgaas
@ 2026-01-04 7:17 ` duziming
0 siblings, 0 replies; 12+ messages in thread
From: duziming @ 2026-01-04 7:17 UTC (permalink / raw)
To: Bjorn Helgaas, Ilpo Järvinen
Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML,
liuyongqiang13, Krzysztof Wilczyński
在 2026/1/1 1:04, Bjorn Helgaas 写道:
> On Wed, Dec 31, 2025 at 11:31:47AM +0200, Ilpo Järvinen wrote:
>> On Tue, 30 Dec 2025, duziming wrote:
>>> 在 2025/12/30 2:07, Bjorn Helgaas 写道:
>>>> [+cc Krzysztof; I thought we looked at this long ago?]
>>>>
>>>> On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
>>>>> From: Yongqiang Liu <liuyongqiang13@huawei.com>
>>>>>
>>>>> When the value of ppos over the INT_MAX, the pos is over set to a negtive
>>>>> value which will be passed to get_user() or pci_user_write_config_dword().
>>>>> Unexpected behavior such as a softlock will happen as follows:
>>>> s/negtive/negative/
>>>> s/softlock/soft lockup/ to match message below
>>> Thanks for pointing out the ambiguous parts.
>>>> s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
>>>> you're referring to)
>>>>
>>>> I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
>>>> loff_t is a signed type, and negative read/write offsets are invalid.
>>> Actually, the *loff_t *ppos *passed in is not a negative value. The root cause
>>> of the issue
>>>
>>> lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the
>>> INT_MAX, the pos is over set
>>>
>>> to a negative value. This negative *pos* then propagates through subsequent
>>> logic, leading to the observed errors.
>>>
>>>> If this is easily reproducible with "dd" or similar, could maybe
>>>> include a sample command line?
>>> We reproduced the issue using the following POC:
>>>
>>> #include <stdio.h>
>>>
>>> #include <string.h>
>>> #include <unistd.h>
>>> #include <fcntl.h>
>>> #include <sys/uio.h>
>>>
>>> int main() {
>>> int fd = open("/proc/bus/pci/00/02.0", O_RDWR);
>>> if (fd < 0) {
>>> perror("open failed");
>>> return 1;
>>> }
>>> char data[] = "926b7719201054f37a1d9d391e862c";
>>> off_t offset = 0x80800001;
>>> struct iovec iov = {
>>> .iov_base = data,
>>> .iov_len = 0xf
>>> };
>>> pwritev(fd, &iov, 1, offset);
>>> return 0;
>>> }
>>>
>>>>> watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
>>>>> RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
>>>>> Call Trace:
>>>>> <TASK>
>>>>> pci_user_write_config_dword+0x126/0x1f0
>>>>> proc_bus_pci_write+0x273/0x470
>>>>> proc_reg_write+0x1b6/0x280
>>>>> do_iter_write+0x48e/0x790
>>>>> vfs_writev+0x125/0x4a0
>>>>> __x64_sys_pwritev+0x1e2/0x2a0
>>>>> do_syscall_64+0x59/0x110
>>>>> entry_SYSCALL_64_after_hwframe+0x78/0xe2
>>>>>
>>>>> Fix this by add check for the pos.
>>>>>
>>>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>>>>> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
>>>>> Signed-off-by: Ziming Du <duziming2@huawei.com>
>>>>> ---
>>>>> drivers/pci/proc.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
>>>>> index 9348a0fb8084..200d42feafd8 100644
>>>>> --- a/drivers/pci/proc.c
>>>>> +++ b/drivers/pci/proc.c
>>>>> @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file,
>>>>> const char __user *buf,
>>>>> if (ret)
>>>>> return ret;
>>>>> - if (pos >= size)
>>>>> + if (pos >= size || pos < 0)
>>>>> return 0;
>>>> I see a few similar cases that look like this; maybe we should do the
>>>> same?
>>>>
>>>> if (pos < 0)
>>>> return -EINVAL;
>>>>
>>>> Looks like proc_bus_pci_read() has the same issue?
>>> proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo
>>> Järvinen in
>>>
>>> https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/
>>>
>>> However, it does not result in an overflow to a negative number.
>> Why does the cast has to happen first here?
>>
>> This would ensure _correctness_ without any false alignment issues for
>> large numbers:
>>
>> int pos;
>> int size = dev->cfg_size;
>>
>> ...
>> if (*ppos > INT_MAX)
> Isn't *ppos a signed quantity? If so, wouldn't we want to check for
> "*ppos < 0"?
If *ppos < 0, it will be discarded in the previous process, just like in
do_pwritev(), where it returns -EINVAL
when pos is negative. So we think that here using "*ppos > INT_MAX"
might be more reasonable.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-01-04 7:17 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-24 9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du
2025-12-24 9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
2025-12-29 17:31 ` Bjorn Helgaas
2025-12-30 3:40 ` duziming
2025-12-24 9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du
2025-12-29 18:07 ` Bjorn Helgaas
2025-12-30 8:20 ` duziming
2025-12-31 9:31 ` Ilpo Järvinen
2025-12-31 17:04 ` Bjorn Helgaas
2026-01-04 7:17 ` duziming
2025-12-24 9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du
2025-12-29 9:36 ` Ilpo Järvinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox