[PATCH v2 0/3] Miscellaneous fixes for pci subsystem

Linux PCI subsystem development
 help / color / mirror / Atom feed

* [PATCH v2 0/3] Miscellaneous fixes for pci subsystem
@ 2025-12-24  9:27 Ziming Du
  2025-12-24  9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ziming Du @ 2025-12-24  9:27 UTC (permalink / raw)
  To: bhelgaas, jbarnes, chrisw, alex.williamson
  Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2

Miscellaneous fixes for pci subsystem


Ilpo Järvinen warned me of potential issues in my previous patch,
so I have made the necessary adjustments. 

Changes in v2:
- Correct grammer and indentation.
- Remove unrelated stack traces from the commit message.
- Modify the handling of pos by adding a non-negative check to ensure
  that the input value is valid.
- Use the existing IS_ALIGNED macro and ensure that after modification,
  other cases still retuen -EINVAL as before.
- Link to v1: https://lore.kernel.org/linux-pci/20251216083912.758219-1-duziming2@huawei.com/
								Thanx, Du

Yongqiang Liu (2):
  PCI/sysfs: Prohibit unaligned access to I/O port on non-x86
  PCI: Prevent overflow in proc_bus_pci_write()

Ziming Du (1):
  PCI/sysfs: Fix null pointer dereference during hotplug

 drivers/pci/pci-sysfs.c | 10 ++++++++++
 drivers/pci/proc.c      |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug
  2025-12-24  9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du
@ 2025-12-24  9:27 ` Ziming Du
  2025-12-29 17:31   ` Bjorn Helgaas
  2025-12-24  9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du
  2025-12-24  9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du
  2 siblings, 1 reply; 12+ messages in thread
From: Ziming Du @ 2025-12-24  9:27 UTC (permalink / raw)
  To: bhelgaas, jbarnes, chrisw, alex.williamson
  Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2

During the concurrent process of creating and rescanning in VF, the
resource files for the same pci_dev may be created twice. The second
creation attempt fails, resulting the res_attr in pci_dev to kfree(),
but the pointer is not set to NULL. This will subsequently lead to
dereferencing a null pointer when removing the device.

When we perform the following operation:
  echo $vfcount > /sys/class/net/"$pfname"/device/sriov_numvfs &
  sleep 0.5
  echo 1 > /sys/bus/pci/rescan
  pci_remove "$pfname"
system will crash as follows:

  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
  Call trace:
   __pi_strlen+0x14/0x150
   kernfs_find_ns+0x54/0x120
   kernfs_remove_by_name_ns+0x58/0xf0
   sysfs_remove_bin_file+0x24/0x38
   pci_remove_resource_files+0x44/0x90
   pci_remove_sysfs_dev_files+0x28/0x40
   pci_stop_bus_device+0xb8/0x118
   pci_stop_and_remove_bus_device+0x20/0x40
   pci_iov_remove_virtfn+0xb8/0x138
   sriov_disable+0xbc/0x190
   pci_disable_sriov+0x30/0x48
   hinic_pci_sriov_disable+0x54/0x138 [hinic]
   hinic_remove+0x140/0x290 [hinic]
   pci_device_remove+0x4c/0xf8
   device_remove+0x54/0x90
   device_release_driver_internal+0x1d4/0x238
   device_release_driver+0x20/0x38
   pci_stop_bus_device+0xa8/0x118
   pci_stop_and_remove_bus_device_locked+0x28/0x50
   remove_store+0x128/0x208

Fix this by set the pointer to NULL after releasing res_attr immediately.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ziming Du <duziming2@huawei.com>
---
 drivers/pci/pci-sysfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index c2df915ad2d2..7e697b82c5e1 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1222,12 +1222,14 @@ static void pci_remove_resource_files(struct pci_dev *pdev)
 		if (res_attr) {
 			sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
 			kfree(res_attr);
+			pdev->res_attr[i] = NULL;
 		}
 
 		res_attr = pdev->res_attr_wc[i];
 		if (res_attr) {
 			sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
 			kfree(res_attr);
+			pdev->res_attr_wc[i] = NULL;
 		}
 	}
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug
  2025-12-24  9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
@ 2025-12-29 17:31   ` Bjorn Helgaas
  2025-12-30  3:40     ` duziming
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-12-29 17:31 UTC (permalink / raw)
  To: Ziming Du
  Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
	linux-kernel, liuyongqiang13

On Wed, Dec 24, 2025 at 05:27:17PM +0800, Ziming Du wrote:
> During the concurrent process of creating and rescanning in VF, the
> resource files for the same pci_dev may be created twice. The second
> creation attempt fails, resulting the res_attr in pci_dev to kfree(),
> but the pointer is not set to NULL. This will subsequently lead to
> dereferencing a null pointer when removing the device.
> 
> When we perform the following operation:
>   echo $vfcount > /sys/class/net/"$pfname"/device/sriov_numvfs &

Is the value of $vfcount relevant here?  Can you use the actual values
here instead of the variables so this is more useful to others?

>   sleep 0.5
>   echo 1 > /sys/bus/pci/rescan
>   pci_remove "$pfname"
> system will crash as follows:

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug
  2025-12-29 17:31   ` Bjorn Helgaas
@ 2025-12-30  3:40     ` duziming
  0 siblings, 0 replies; 12+ messages in thread
From: duziming @ 2025-12-30  3:40 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
	linux-kernel, liuyongqiang13


在 2025/12/30 1:31, Bjorn Helgaas 写道:
> On Wed, Dec 24, 2025 at 05:27:17PM +0800, Ziming Du wrote:
>> During the concurrent process of creating and rescanning in VF, the
>> resource files for the same pci_dev may be created twice. The second
>> creation attempt fails, resulting the res_attr in pci_dev to kfree(),
>> but the pointer is not set to NULL. This will subsequently lead to
>> dereferencing a null pointer when removing the device.
>>
>> When we perform the following operation:
>>    echo $vfcount > /sys/class/net/"$pfname"/device/sriov_numvfs &
> Is the value of $vfcount relevant here?  Can you use the actual values
> here instead of the variables so this is more useful to others?

In fact, we directly use sriov_totalvfs here. In my opinion, the larger 
this value is,

the more likely it is to cause the issue.

>>    sleep 0.5
>>    echo 1 > /sys/bus/pci/rescan
>>    pci_remove "$pfname"
>> system will crash as follows:

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
  2025-12-24  9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du
  2025-12-24  9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
@ 2025-12-24  9:27 ` Ziming Du
  2025-12-29 18:07   ` Bjorn Helgaas
  2025-12-24  9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du
  2 siblings, 1 reply; 12+ messages in thread
From: Ziming Du @ 2025-12-24  9:27 UTC (permalink / raw)
  To: bhelgaas, jbarnes, chrisw, alex.williamson
  Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2

From: Yongqiang Liu <liuyongqiang13@huawei.com>

When the value of ppos over the INT_MAX, the pos is over set to a negtive
value which will be passed to get_user() or pci_user_write_config_dword().
Unexpected behavior such as a softlock will happen as follows:

 watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
 RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
 Call Trace:
  <TASK>
  pci_user_write_config_dword+0x126/0x1f0
  proc_bus_pci_write+0x273/0x470
  proc_reg_write+0x1b6/0x280
  do_iter_write+0x48e/0x790
  vfs_writev+0x125/0x4a0
  __x64_sys_pwritev+0x1e2/0x2a0
  do_syscall_64+0x59/0x110
  entry_SYSCALL_64_after_hwframe+0x78/0xe2

Fix this by add check for the pos.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Ziming Du <duziming2@huawei.com>
---
 drivers/pci/proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 9348a0fb8084..200d42feafd8 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf,
 	if (ret)
 		return ret;

-	if (pos >= size)
+	if (pos >= size || pos < 0)
 		return 0;
 	if (nbytes >= size)
 		nbytes = size;
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
  2025-12-24  9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du
@ 2025-12-29 18:07   ` Bjorn Helgaas
  2025-12-30  8:20     ` duziming
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-12-29 18:07 UTC (permalink / raw)
  To: Ziming Du
  Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
	linux-kernel, liuyongqiang13, Krzysztof Wilczyński

[+cc Krzysztof; I thought we looked at this long ago?]

On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
> From: Yongqiang Liu <liuyongqiang13@huawei.com>
> 
> When the value of ppos over the INT_MAX, the pos is over set to a negtive
> value which will be passed to get_user() or pci_user_write_config_dword().
> Unexpected behavior such as a softlock will happen as follows:

s/negtive/negative/
s/softlock/soft lockup/ to match message below

s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
you're referring to)

I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
loff_t is a signed type, and negative read/write offsets are invalid.

If this is easily reproducible with "dd" or similar, could maybe
include a sample command line?

>  watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
>  RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
>  Call Trace:
>   <TASK>
>   pci_user_write_config_dword+0x126/0x1f0
>   proc_bus_pci_write+0x273/0x470
>   proc_reg_write+0x1b6/0x280
>   do_iter_write+0x48e/0x790
>   vfs_writev+0x125/0x4a0
>   __x64_sys_pwritev+0x1e2/0x2a0
>   do_syscall_64+0x59/0x110
>   entry_SYSCALL_64_after_hwframe+0x78/0xe2
> 
> Fix this by add check for the pos.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> Signed-off-by: Ziming Du <duziming2@huawei.com>
> ---
>  drivers/pci/proc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index 9348a0fb8084..200d42feafd8 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf,
>  	if (ret)
>  		return ret;
>  
> -	if (pos >= size)
> +	if (pos >= size || pos < 0)
>  		return 0;

I see a few similar cases that look like this; maybe we should do the
same?

  if (pos < 0)
    return -EINVAL;

Looks like proc_bus_pci_read() has the same issue?

What about pci_read_config(), pci_write_config(),
pci_llseek_resource(), pci_read_legacy_io(), pci_write_legacy_io(),
pci_read_resource_io(), pci_write_resource_io(), pci_read_rom()?
These are all sysfs things; does the sysfs infrastructure take care of
negative offsets before we get to these?

>  	if (nbytes >= size)
>  		nbytes = size;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
  2025-12-29 18:07   ` Bjorn Helgaas
@ 2025-12-30  8:20     ` duziming
  2025-12-31  9:31       ` Ilpo Järvinen
  0 siblings, 1 reply; 12+ messages in thread
From: duziming @ 2025-12-30  8:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
	linux-kernel, liuyongqiang13, Krzysztof Wilczyński


在 2025/12/30 2:07, Bjorn Helgaas 写道:
> [+cc Krzysztof; I thought we looked at this long ago?]
>
> On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
>> From: Yongqiang Liu <liuyongqiang13@huawei.com>
>>
>> When the value of ppos over the INT_MAX, the pos is over set to a negtive
>> value which will be passed to get_user() or pci_user_write_config_dword().
>> Unexpected behavior such as a softlock will happen as follows:
> s/negtive/negative/
> s/softlock/soft lockup/ to match message below
Thanks for pointing out the ambiguous parts.
> s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
> you're referring to)
>
> I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
> loff_t is a signed type, and negative read/write offsets are invalid.

Actually, the *loff_t *ppos *passed in is not a negative value. The root 
cause of the issue

lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the 
INT_MAX, the pos is over set

to a negative value. This negative *pos* then propagates through 
subsequent logic, leading to the observed errors.

> If this is easily reproducible with "dd" or similar, could maybe
> include a sample command line?

We reproduced the issue using the following POC:

     #include <stdio.h>

     #include <string.h>
     #include <unistd.h>
     #include <fcntl.h>
     #include <sys/uio.h>

     int main() {
     int fd = open("/proc/bus/pci/00/02.0", O_RDWR);
     if (fd < 0) {
         perror("open failed");
         return 1;
     }
     char data[] = "926b7719201054f37a1d9d391e862c";
     off_t offset = 0x80800001;
     struct iovec iov = {
         .iov_base = data,
         .iov_len = 0xf
     };
     pwritev(fd, &iov, 1, offset);
     return 0;
}

>>   watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
>>   RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
>>   Call Trace:
>>    <TASK>
>>    pci_user_write_config_dword+0x126/0x1f0
>>    proc_bus_pci_write+0x273/0x470
>>    proc_reg_write+0x1b6/0x280
>>    do_iter_write+0x48e/0x790
>>    vfs_writev+0x125/0x4a0
>>    __x64_sys_pwritev+0x1e2/0x2a0
>>    do_syscall_64+0x59/0x110
>>    entry_SYSCALL_64_after_hwframe+0x78/0xe2
>>
>> Fix this by add check for the pos.
>>
>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
>> Signed-off-by: Ziming Du <duziming2@huawei.com>
>> ---
>>   drivers/pci/proc.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
>> index 9348a0fb8084..200d42feafd8 100644
>> --- a/drivers/pci/proc.c
>> +++ b/drivers/pci/proc.c
>> @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf,
>>   	if (ret)
>>   		return ret;
>>   
>> -	if (pos >= size)
>> +	if (pos >= size || pos < 0)
>>   		return 0;
> I see a few similar cases that look like this; maybe we should do the
> same?
>
>    if (pos < 0)
>      return -EINVAL;
>
> Looks like proc_bus_pci_read() has the same issue?

proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo 
Järvinen in

https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/

However, it does not result in an overflow to a negative number.

>
> What about pci_read_config(), pci_write_config(),
> pci_llseek_resource(), pci_read_legacy_io(), pci_write_legacy_io(),
> pci_read_resource_io(), pci_write_resource_io(), pci_read_rom()?
> These are all sysfs things; does the sysfs infrastructure take care of
> negative offsets before we get to these?

In do_pwritev(), the following check has been performed:

    if (pos < 0)
          return -EINVAL;

Theoretically, a negative offset should not occur.

>>   	if (nbytes >= size)
>>   		nbytes = size;
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
  2025-12-30  8:20     ` duziming
@ 2025-12-31  9:31       ` Ilpo Järvinen
  2025-12-31 17:04         ` Bjorn Helgaas
  0 siblings, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2025-12-31  9:31 UTC (permalink / raw)
  To: duziming, Bjorn Helgaas
  Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML,
	liuyongqiang13, Krzysztof Wilczyński

[-- Attachment #1: Type: text/plain, Size: 4662 bytes --]

On Tue, 30 Dec 2025, duziming wrote:
> 在 2025/12/30 2:07, Bjorn Helgaas 写道:
> > [+cc Krzysztof; I thought we looked at this long ago?]
> > 
> > On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
> > > From: Yongqiang Liu <liuyongqiang13@huawei.com>
> > > 
> > > When the value of ppos over the INT_MAX, the pos is over set to a negtive
> > > value which will be passed to get_user() or pci_user_write_config_dword().
> > > Unexpected behavior such as a softlock will happen as follows:
> > s/negtive/negative/
> > s/softlock/soft lockup/ to match message below
> Thanks for pointing out the ambiguous parts.
> > s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
> > you're referring to)
> > 
> > I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
> > loff_t is a signed type, and negative read/write offsets are invalid.
> 
> Actually, the *loff_t *ppos *passed in is not a negative value. The root cause
> of the issue
> 
> lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the
> INT_MAX, the pos is over set
> 
> to a negative value. This negative *pos* then propagates through subsequent
> logic, leading to the observed errors.
> 
> > If this is easily reproducible with "dd" or similar, could maybe
> > include a sample command line?
> 
> We reproduced the issue using the following POC:
> 
>     #include <stdio.h>
> 
>     #include <string.h>
>     #include <unistd.h>
>     #include <fcntl.h>
>     #include <sys/uio.h>
> 
>     int main() {
>     int fd = open("/proc/bus/pci/00/02.0", O_RDWR);
>     if (fd < 0) {
>         perror("open failed");
>         return 1;
>     }
>     char data[] = "926b7719201054f37a1d9d391e862c";
>     off_t offset = 0x80800001;
>     struct iovec iov = {
>         .iov_base = data,
>         .iov_len = 0xf
>     };
>     pwritev(fd, &iov, 1, offset);
>     return 0;
> }
> 
> > >   watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
> > >   RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
> > >   Call Trace:
> > >    <TASK>
> > >    pci_user_write_config_dword+0x126/0x1f0
> > >    proc_bus_pci_write+0x273/0x470
> > >    proc_reg_write+0x1b6/0x280
> > >    do_iter_write+0x48e/0x790
> > >    vfs_writev+0x125/0x4a0
> > >    __x64_sys_pwritev+0x1e2/0x2a0
> > >    do_syscall_64+0x59/0x110
> > >    entry_SYSCALL_64_after_hwframe+0x78/0xe2
> > > 
> > > Fix this by add check for the pos.
> > > 
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> > > Signed-off-by: Ziming Du <duziming2@huawei.com>
> > > ---
> > >   drivers/pci/proc.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> > > index 9348a0fb8084..200d42feafd8 100644
> > > --- a/drivers/pci/proc.c
> > > +++ b/drivers/pci/proc.c
> > > @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file,
> > > const char __user *buf,
> > >   	if (ret)
> > >   		return ret;
> > >   -	if (pos >= size)
> > > +	if (pos >= size || pos < 0)
> > >   		return 0;
> > I see a few similar cases that look like this; maybe we should do the
> > same?
> > 
> >    if (pos < 0)
> >      return -EINVAL;
> > 
> > Looks like proc_bus_pci_read() has the same issue?
> 
> proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo
> Järvinen in
> 
> https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/
> 
> However, it does not result in an overflow to a negative number.

Why does the cast has to happen first here?

This would ensure _correctness_ without any false alignment issues for 
large numbers:

	int pos;
	int size = dev->cfg_size;

	...
	if (*ppos > INT_MAX)
		return -EINVAL;
	pos = *ppos;

(I'm not sure though if this should return 0 or -EINVAL when *ppos >= 
size as it currently returns 0 for non-overflowing values when pos >= 
size.)

-- 
 i.


> > What about pci_read_config(), pci_write_config(),
> > pci_llseek_resource(), pci_read_legacy_io(), pci_write_legacy_io(),
> > pci_read_resource_io(), pci_write_resource_io(), pci_read_rom()?
> > These are all sysfs things; does the sysfs infrastructure take care of
> > negative offsets before we get to these?
> 
> In do_pwritev(), the following check has been performed:
> 
>    if (pos < 0)
>          return -EINVAL;
> 
> Theoretically, a negative offset should not occur.
> 
> > >   	if (nbytes >= size)
> > >   		nbytes = size;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
  2025-12-31  9:31       ` Ilpo Järvinen
@ 2025-12-31 17:04         ` Bjorn Helgaas
  2026-01-04  7:17           ` duziming
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-12-31 17:04 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: duziming, bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci,
	LKML, liuyongqiang13, Krzysztof Wilczyński

On Wed, Dec 31, 2025 at 11:31:47AM +0200, Ilpo Järvinen wrote:
> On Tue, 30 Dec 2025, duziming wrote:
> > 在 2025/12/30 2:07, Bjorn Helgaas 写道:
> > > [+cc Krzysztof; I thought we looked at this long ago?]
> > > 
> > > On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
> > > > From: Yongqiang Liu <liuyongqiang13@huawei.com>
> > > > 
> > > > When the value of ppos over the INT_MAX, the pos is over set to a negtive
> > > > value which will be passed to get_user() or pci_user_write_config_dword().
> > > > Unexpected behavior such as a softlock will happen as follows:
> > > s/negtive/negative/
> > > s/softlock/soft lockup/ to match message below
> > Thanks for pointing out the ambiguous parts.
> > > s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
> > > you're referring to)
> > > 
> > > I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
> > > loff_t is a signed type, and negative read/write offsets are invalid.
> > 
> > Actually, the *loff_t *ppos *passed in is not a negative value. The root cause
> > of the issue
> > 
> > lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the
> > INT_MAX, the pos is over set
> > 
> > to a negative value. This negative *pos* then propagates through subsequent
> > logic, leading to the observed errors.
> > 
> > > If this is easily reproducible with "dd" or similar, could maybe
> > > include a sample command line?
> > 
> > We reproduced the issue using the following POC:
> > 
> >     #include <stdio.h>
> > 
> >     #include <string.h>
> >     #include <unistd.h>
> >     #include <fcntl.h>
> >     #include <sys/uio.h>
> > 
> >     int main() {
> >     int fd = open("/proc/bus/pci/00/02.0", O_RDWR);
> >     if (fd < 0) {
> >         perror("open failed");
> >         return 1;
> >     }
> >     char data[] = "926b7719201054f37a1d9d391e862c";
> >     off_t offset = 0x80800001;
> >     struct iovec iov = {
> >         .iov_base = data,
> >         .iov_len = 0xf
> >     };
> >     pwritev(fd, &iov, 1, offset);
> >     return 0;
> > }
> > 
> > > >   watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
> > > >   RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
> > > >   Call Trace:
> > > >    <TASK>
> > > >    pci_user_write_config_dword+0x126/0x1f0
> > > >    proc_bus_pci_write+0x273/0x470
> > > >    proc_reg_write+0x1b6/0x280
> > > >    do_iter_write+0x48e/0x790
> > > >    vfs_writev+0x125/0x4a0
> > > >    __x64_sys_pwritev+0x1e2/0x2a0
> > > >    do_syscall_64+0x59/0x110
> > > >    entry_SYSCALL_64_after_hwframe+0x78/0xe2
> > > > 
> > > > Fix this by add check for the pos.
> > > > 
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> > > > Signed-off-by: Ziming Du <duziming2@huawei.com>
> > > > ---
> > > >   drivers/pci/proc.c | 2 +-
> > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> > > > index 9348a0fb8084..200d42feafd8 100644
> > > > --- a/drivers/pci/proc.c
> > > > +++ b/drivers/pci/proc.c
> > > > @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file,
> > > > const char __user *buf,
> > > >   	if (ret)
> > > >   		return ret;
> > > >   -	if (pos >= size)
> > > > +	if (pos >= size || pos < 0)
> > > >   		return 0;
> > > I see a few similar cases that look like this; maybe we should do the
> > > same?
> > > 
> > >    if (pos < 0)
> > >      return -EINVAL;
> > > 
> > > Looks like proc_bus_pci_read() has the same issue?
> > 
> > proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo
> > Järvinen in
> > 
> > https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/
> > 
> > However, it does not result in an overflow to a negative number.
> 
> Why does the cast has to happen first here?
> 
> This would ensure _correctness_ without any false alignment issues for 
> large numbers:
> 
> 	int pos;
> 	int size = dev->cfg_size;
> 
> 	...
> 	if (*ppos > INT_MAX)

Isn't *ppos a signed quantity?  If so, wouldn't we want to check for
"*ppos < 0"?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write()
  2025-12-31 17:04         ` Bjorn Helgaas
@ 2026-01-04  7:17           ` duziming
  0 siblings, 0 replies; 12+ messages in thread
From: duziming @ 2026-01-04  7:17 UTC (permalink / raw)
  To: Bjorn Helgaas, Ilpo Järvinen
  Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML,
	liuyongqiang13, Krzysztof Wilczyński


在 2026/1/1 1:04, Bjorn Helgaas 写道:
> On Wed, Dec 31, 2025 at 11:31:47AM +0200, Ilpo Järvinen wrote:
>> On Tue, 30 Dec 2025, duziming wrote:
>>> 在 2025/12/30 2:07, Bjorn Helgaas 写道:
>>>> [+cc Krzysztof; I thought we looked at this long ago?]
>>>>
>>>> On Wed, Dec 24, 2025 at 05:27:18PM +0800, Ziming Du wrote:
>>>>> From: Yongqiang Liu <liuyongqiang13@huawei.com>
>>>>>
>>>>> When the value of ppos over the INT_MAX, the pos is over set to a negtive
>>>>> value which will be passed to get_user() or pci_user_write_config_dword().
>>>>> Unexpected behavior such as a softlock will happen as follows:
>>>> s/negtive/negative/
>>>> s/softlock/soft lockup/ to match message below
>>> Thanks for pointing out the ambiguous parts.
>>>> s/ppos/pos/ (or fix this to refer to "*ppos", which I think is what
>>>> you're referring to)
>>>>
>>>> I guess the point is that proc_bus_pci_write() takes a "loff_t *ppos",
>>>> loff_t is a signed type, and negative read/write offsets are invalid.
>>> Actually, the *loff_t *ppos *passed in is not a negative value. The root cause
>>> of the issue
>>>
>>> lies in the cast *int* *pos = *ppos*. When the value of **ppos* over the
>>> INT_MAX, the pos is over set
>>>
>>> to a negative value. This negative *pos* then propagates through subsequent
>>> logic, leading to the observed errors.
>>>
>>>> If this is easily reproducible with "dd" or similar, could maybe
>>>> include a sample command line?
>>> We reproduced the issue using the following POC:
>>>
>>>      #include <stdio.h>
>>>
>>>      #include <string.h>
>>>      #include <unistd.h>
>>>      #include <fcntl.h>
>>>      #include <sys/uio.h>
>>>
>>>      int main() {
>>>      int fd = open("/proc/bus/pci/00/02.0", O_RDWR);
>>>      if (fd < 0) {
>>>          perror("open failed");
>>>          return 1;
>>>      }
>>>      char data[] = "926b7719201054f37a1d9d391e862c";
>>>      off_t offset = 0x80800001;
>>>      struct iovec iov = {
>>>          .iov_base = data,
>>>          .iov_len = 0xf
>>>      };
>>>      pwritev(fd, &iov, 1, offset);
>>>      return 0;
>>> }
>>>
>>>>>    watchdog: BUG: soft lockup - CPU#0 stuck for 130s! [syz.3.109:3444]
>>>>>    RIP: 0010:_raw_spin_unlock_irq+0x17/0x30
>>>>>    Call Trace:
>>>>>     <TASK>
>>>>>     pci_user_write_config_dword+0x126/0x1f0
>>>>>     proc_bus_pci_write+0x273/0x470
>>>>>     proc_reg_write+0x1b6/0x280
>>>>>     do_iter_write+0x48e/0x790
>>>>>     vfs_writev+0x125/0x4a0
>>>>>     __x64_sys_pwritev+0x1e2/0x2a0
>>>>>     do_syscall_64+0x59/0x110
>>>>>     entry_SYSCALL_64_after_hwframe+0x78/0xe2
>>>>>
>>>>> Fix this by add check for the pos.
>>>>>
>>>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>>>>> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
>>>>> Signed-off-by: Ziming Du <duziming2@huawei.com>
>>>>> ---
>>>>>    drivers/pci/proc.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
>>>>> index 9348a0fb8084..200d42feafd8 100644
>>>>> --- a/drivers/pci/proc.c
>>>>> +++ b/drivers/pci/proc.c
>>>>> @@ -121,7 +121,7 @@ static ssize_t proc_bus_pci_write(struct file *file,
>>>>> const char __user *buf,
>>>>>    	if (ret)
>>>>>    		return ret;
>>>>>    -	if (pos >= size)
>>>>> +	if (pos >= size || pos < 0)
>>>>>    		return 0;
>>>> I see a few similar cases that look like this; maybe we should do the
>>>> same?
>>>>
>>>>     if (pos < 0)
>>>>       return -EINVAL;
>>>>
>>>> Looks like proc_bus_pci_read() has the same issue?
>>> proc_bus_pci_read() may also trigger similar issue as mentioned by Ilpo
>>> Järvinen in
>>>
>>> https://lore.kernel.org/linux-pci/e5a91378-4a41-32fb-00c6-2810084581bd@linux.intel.com/
>>>
>>> However, it does not result in an overflow to a negative number.
>> Why does the cast has to happen first here?
>>
>> This would ensure _correctness_ without any false alignment issues for
>> large numbers:
>>
>> 	int pos;
>> 	int size = dev->cfg_size;
>>
>> 	...
>> 	if (*ppos > INT_MAX)
> Isn't *ppos a signed quantity?  If so, wouldn't we want to check for
> "*ppos < 0"?

If *ppos < 0, it will be discarded in the previous process, just like in 
do_pwritev(), where it returns -EINVAL

when pos is negative. So we think that here using "*ppos > INT_MAX" 
might be more reasonable.

>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86
  2025-12-24  9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du
  2025-12-24  9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
  2025-12-24  9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du
@ 2025-12-24  9:27 ` Ziming Du
  2025-12-29  9:36   ` Ilpo Järvinen
  2 siblings, 1 reply; 12+ messages in thread
From: Ziming Du @ 2025-12-24  9:27 UTC (permalink / raw)
  To: bhelgaas, jbarnes, chrisw, alex.williamson
  Cc: linux-pci, linux-kernel, liuyongqiang13, duziming2

From: Yongqiang Liu <liuyongqiang13@huawei.com>

Unaligned access is harmful for non-x86 archs such as arm64. When we
use pwrite or pread to access the I/O port resources with unaligned
offset, system will crash as follows:

Unable to handle kernel paging request at virtual address fffffbfffe8010c1
Internal error: Oops: 0000000096000061 [#1] SMP
Call trace:
 _outw include/asm-generic/io.h:594 [inline]
 logic_outw+0x54/0x218 lib/logic_pio.c:305
 pci_resource_io drivers/pci/pci-sysfs.c:1157 [inline]
 pci_write_resource_io drivers/pci/pci-sysfs.c:1191 [inline]
 pci_write_resource_io+0x208/0x260 drivers/pci/pci-sysfs.c:1181
 sysfs_kf_bin_write+0x188/0x210 fs/sysfs/file.c:158
 kernfs_fop_write_iter+0x2e8/0x4b0 fs/kernfs/file.c:338
 vfs_write+0x7bc/0xac8 fs/read_write.c:586
 ksys_write+0x12c/0x270 fs/read_write.c:639
 __arm64_sys_write+0x78/0xb8 fs/read_write.c:648

Powerpc seems affected as well, so prohibit the unaligned access
on non-x86 archs.

Fixes: 8633328be242 ("PCI: Allow read/write access to sysfs I/O port resources")
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Ziming Du <duziming2@huawei.com>
---
 drivers/pci/pci-sysfs.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 7e697b82c5e1..c44a9c4a91ab 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1166,12 +1166,20 @@ static ssize_t pci_resource_io(struct file *filp, struct kobject *kobj,
 			*(u8 *)buf = inb(port);
 		return 1;
 	case 2:
+#if !defined(CONFIG_X86)
+		if (!IS_ALIGNED(port, count))
+			return -EFAULT;
+#endif
 		if (write)
 			outw(*(u16 *)buf, port);
 		else
 			*(u16 *)buf = inw(port);
 		return 2;
 	case 4:
+#if !defined(CONFIG_X86)
+		if (!IS_ALIGNED(port, count))
+			return -EFAULT;
+#endif
 		if (write)
 			outl(*(u32 *)buf, port);
 		else
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86
  2025-12-24  9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du
@ 2025-12-29  9:36   ` Ilpo Järvinen
  0 siblings, 0 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2025-12-29  9:36 UTC (permalink / raw)
  To: Ziming Du
  Cc: bhelgaas, jbarnes, chrisw, alex.williamson, linux-pci, LKML,
	liuyongqiang13

On Wed, 24 Dec 2025, Ziming Du wrote:

> From: Yongqiang Liu <liuyongqiang13@huawei.com>
> 
> Unaligned access is harmful for non-x86 archs such as arm64. When we
> use pwrite or pread to access the I/O port resources with unaligned
> offset, system will crash as follows:
> 
> Unable to handle kernel paging request at virtual address fffffbfffe8010c1
> Internal error: Oops: 0000000096000061 [#1] SMP
> Call trace:
>  _outw include/asm-generic/io.h:594 [inline]
>  logic_outw+0x54/0x218 lib/logic_pio.c:305
>  pci_resource_io drivers/pci/pci-sysfs.c:1157 [inline]
>  pci_write_resource_io drivers/pci/pci-sysfs.c:1191 [inline]
>  pci_write_resource_io+0x208/0x260 drivers/pci/pci-sysfs.c:1181
>  sysfs_kf_bin_write+0x188/0x210 fs/sysfs/file.c:158
>  kernfs_fop_write_iter+0x2e8/0x4b0 fs/kernfs/file.c:338
>  vfs_write+0x7bc/0xac8 fs/read_write.c:586
>  ksys_write+0x12c/0x270 fs/read_write.c:639
>  __arm64_sys_write+0x78/0xb8 fs/read_write.c:648
> 
> Powerpc seems affected as well, so prohibit the unaligned access
> on non-x86 archs.
> 
> Fixes: 8633328be242 ("PCI: Allow read/write access to sysfs I/O port resources")
> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> Signed-off-by: Ziming Du <duziming2@huawei.com>
> ---
>  drivers/pci/pci-sysfs.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 7e697b82c5e1..c44a9c4a91ab 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1166,12 +1166,20 @@ static ssize_t pci_resource_io(struct file *filp, struct kobject *kobj,
>  			*(u8 *)buf = inb(port);
>  		return 1;
>  	case 2:
> +#if !defined(CONFIG_X86)
> +		if (!IS_ALIGNED(port, count))
> +			return -EFAULT;
> +#endif
>  		if (write)
>  			outw(*(u16 *)buf, port);
>  		else
>  			*(u16 *)buf = inw(port);
>  		return 2;
>  	case 4:
> +#if !defined(CONFIG_X86)
> +		if (!IS_ALIGNED(port, count))
> +			return -EFAULT;
> +#endif
>  		if (write)
>  			outl(*(u32 *)buf, port);
>  		else
> 

To use IS_ALIGNED(), you need to add:

#include <linux/align.h>

-- 
 i.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-01-04  7:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-24  9:27 [PATCH v2 0/3] Miscellaneous fixes for pci subsystem Ziming Du
2025-12-24  9:27 ` [PATCH v2 1/3] PCI/sysfs: Fix null pointer dereference during hotplug Ziming Du
2025-12-29 17:31   ` Bjorn Helgaas
2025-12-30  3:40     ` duziming
2025-12-24  9:27 ` [PATCH v2 2/3] PCI: Prevent overflow in proc_bus_pci_write() Ziming Du
2025-12-29 18:07   ` Bjorn Helgaas
2025-12-30  8:20     ` duziming
2025-12-31  9:31       ` Ilpo Järvinen
2025-12-31 17:04         ` Bjorn Helgaas
2026-01-04  7:17           ` duziming
2025-12-24  9:27 ` [PATCH v2 3/3] PCI/sysfs: Prohibit unaligned access to I/O port on non-x86 Ziming Du
2025-12-29  9:36   ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox