linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline
@ 2024-06-13  7:42 Hongchen Zhang
  2024-07-22  2:11 ` Huacai Chen
  2024-07-22  7:39 ` Ethan Zhao
  0 siblings, 2 replies; 9+ messages in thread
From: Hongchen Zhang @ 2024-06-13  7:42 UTC (permalink / raw)
  To: Markus Elfring, Bjorn Helgaas
  Cc: Alex Belits, Peter Zijlstra (Intel), Nitesh Narayan Lal,
	Frederic Weisbecker, linux-pci, linux-kernel, loongarch,
	Hongchen Zhang, stable, Huacai Chen

Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
@cpu is a offline cpu would cause system stuck forever.

This can be happen if a node is online while all its CPUs are
offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).

So, in the above case, let pci_call_probe() call local_pci_probe()
instead of work_on_cpu() when the best selected cpu is offline.

Fixes: 69a18b18699b ("PCI: Restrict probe functions to housekeeping CPUs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
---
v2 -> v3: Modify commit message according to Markus's suggestion
v1 -> v2: Add a method to reproduce the problem
---
 drivers/pci/pci-driver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index af2996d0d17f..32a99828e6a3 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 		free_cpumask_var(wq_domain_mask);
 	}
 
-	if (cpu < nr_cpu_ids)
+	if ((cpu < nr_cpu_ids) && cpu_online(cpu))
 		error = work_on_cpu(cpu, local_pci_probe, &ddi);
 	else
 		error = local_pci_probe(&ddi);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline
  2024-06-13  7:42 [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline Hongchen Zhang
@ 2024-07-22  2:11 ` Huacai Chen
  2024-07-22  7:39 ` Ethan Zhao
  1 sibling, 0 replies; 9+ messages in thread
From: Huacai Chen @ 2024-07-22  2:11 UTC (permalink / raw)
  To: Hongchen Zhang
  Cc: Markus Elfring, Bjorn Helgaas, Alex Belits,
	Peter Zijlstra (Intel), Nitesh Narayan Lal, Frederic Weisbecker,
	linux-pci, linux-kernel, loongarch, stable, Huacai Chen

gentle ping?

Huacai

On Thu, Jun 13, 2024 at 3:43 PM Hongchen Zhang
<zhanghongchen@loongson.cn> wrote:
>
> Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
> @cpu is a offline cpu would cause system stuck forever.
>
> This can be happen if a node is online while all its CPUs are
> offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).
>
> So, in the above case, let pci_call_probe() call local_pci_probe()
> instead of work_on_cpu() when the best selected cpu is offline.
>
> Fixes: 69a18b18699b ("PCI: Restrict probe functions to housekeeping CPUs")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> ---
> v2 -> v3: Modify commit message according to Markus's suggestion
> v1 -> v2: Add a method to reproduce the problem
> ---
>  drivers/pci/pci-driver.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index af2996d0d17f..32a99828e6a3 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>                 free_cpumask_var(wq_domain_mask);
>         }
>
> -       if (cpu < nr_cpu_ids)
> +       if ((cpu < nr_cpu_ids) && cpu_online(cpu))
>                 error = work_on_cpu(cpu, local_pci_probe, &ddi);
>         else
>                 error = local_pci_probe(&ddi);
> --
> 2.33.0
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline
  2024-06-13  7:42 [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline Hongchen Zhang
  2024-07-22  2:11 ` Huacai Chen
@ 2024-07-22  7:39 ` Ethan Zhao
  2024-07-24  1:58   ` Hongchen Zhang
  1 sibling, 1 reply; 9+ messages in thread
From: Ethan Zhao @ 2024-07-22  7:39 UTC (permalink / raw)
  To: Hongchen Zhang, Markus Elfring, Bjorn Helgaas
  Cc: Alex Belits, Peter Zijlstra (Intel), Nitesh Narayan Lal,
	Frederic Weisbecker, linux-pci, linux-kernel, loongarch, stable,
	Huacai Chen


On 6/13/2024 3:42 PM, Hongchen Zhang wrote:
> Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
> @cpu is a offline cpu would cause system stuck forever.
>
> This can be happen if a node is online while all its CPUs are
> offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).
>
> So, in the above case, let pci_call_probe() call local_pci_probe()
> instead of work_on_cpu() when the best selected cpu is offline.
>
> Fixes: 69a18b18699b ("PCI: Restrict probe functions to housekeeping CPUs")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> ---
> v2 -> v3: Modify commit message according to Markus's suggestion
> v1 -> v2: Add a method to reproduce the problem
> ---
>   drivers/pci/pci-driver.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index af2996d0d17f..32a99828e6a3 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>   		free_cpumask_var(wq_domain_mask);
>   	}
>   
> -	if (cpu < nr_cpu_ids)

Why not choose the right cpu to callwork_on_cpu() ? the one that is online. Thanks, Ethan

> +	if ((cpu < nr_cpu_ids) && cpu_online(cpu))
>   		error = work_on_cpu(cpu, local_pci_probe, &ddi);
>   	else
>   		error = local_pci_probe(&ddi);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline
  2024-07-22  7:39 ` Ethan Zhao
@ 2024-07-24  1:58   ` Hongchen Zhang
  2024-07-24  2:47     ` Ethan Zhao
  0 siblings, 1 reply; 9+ messages in thread
From: Hongchen Zhang @ 2024-07-24  1:58 UTC (permalink / raw)
  To: Ethan Zhao, Markus Elfring, Bjorn Helgaas
  Cc: Alex Belits, Peter Zijlstra (Intel), Nitesh Narayan Lal,
	Frederic Weisbecker, linux-pci, linux-kernel, loongarch, stable,
	Huacai Chen

Hi Ethan,
On 2024/7/22 PM 3:39, Ethan Zhao wrote:
> 
> On 6/13/2024 3:42 PM, Hongchen Zhang wrote:
>> Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
>> @cpu is a offline cpu would cause system stuck forever.
>>
>> This can be happen if a node is online while all its CPUs are
>> offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).
>>
>> So, in the above case, let pci_call_probe() call local_pci_probe()
>> instead of work_on_cpu() when the best selected cpu is offline.
>>
>> Fixes: 69a18b18699b ("PCI: Restrict probe functions to housekeeping 
>> CPUs")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
>> ---
>> v2 -> v3: Modify commit message according to Markus's suggestion
>> v1 -> v2: Add a method to reproduce the problem
>> ---
>>   drivers/pci/pci-driver.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index af2996d0d17f..32a99828e6a3 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver *drv, 
>> struct pci_dev *dev,
>>           free_cpumask_var(wq_domain_mask);
>>       }
>> -    if (cpu < nr_cpu_ids)
> 
> Why not choose the right cpu to callwork_on_cpu() ? the one that is 
> online. Thanks, Ethan
Yes, let housekeeping_cpumask() return online cpu is a good idea, but it 
may be changed by command line. so the simplest way is to call 
local_pci_probe when the best selected cpu is offline.
> 
>> +    if ((cpu < nr_cpu_ids) && cpu_online(cpu))
>>           error = work_on_cpu(cpu, local_pci_probe, &ddi);
>>       else
>>           error = local_pci_probe(&ddi);


-- 
Best Regards
Hongchen Zhang


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline
  2024-07-24  1:58   ` Hongchen Zhang
@ 2024-07-24  2:47     ` Ethan Zhao
  2024-07-24  3:09       ` Hongchen Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Ethan Zhao @ 2024-07-24  2:47 UTC (permalink / raw)
  To: Hongchen Zhang, Markus Elfring, Bjorn Helgaas
  Cc: Alex Belits, Peter Zijlstra (Intel), Nitesh Narayan Lal,
	Frederic Weisbecker, linux-pci, linux-kernel, loongarch, stable,
	Huacai Chen

On 7/24/2024 9:58 AM, Hongchen Zhang wrote:
> Hi Ethan,
> On 2024/7/22 PM 3:39, Ethan Zhao wrote:
>>
>> On 6/13/2024 3:42 PM, Hongchen Zhang wrote:
>>> Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
>>> @cpu is a offline cpu would cause system stuck forever.
>>>
>>> This can be happen if a node is online while all its CPUs are
>>> offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).
>>>
>>> So, in the above case, let pci_call_probe() call local_pci_probe()
>>> instead of work_on_cpu() when the best selected cpu is offline.
>>>
>>> Fixes: 69a18b18699b ("PCI: Restrict probe functions to housekeeping 
>>> CPUs")
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>>> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
>>> ---
>>> v2 -> v3: Modify commit message according to Markus's suggestion
>>> v1 -> v2: Add a method to reproduce the problem
>>> ---
>>>   drivers/pci/pci-driver.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>> index af2996d0d17f..32a99828e6a3 100644
>>> --- a/drivers/pci/pci-driver.c
>>> +++ b/drivers/pci/pci-driver.c
>>> @@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver 
>>> *drv, struct pci_dev *dev,
>>>           free_cpumask_var(wq_domain_mask);
>>>       }
>>> -    if (cpu < nr_cpu_ids)
>>
>> Why not choose the right cpu to callwork_on_cpu() ? the one that is 
>> online. Thanks, Ethan
> Yes, let housekeeping_cpumask() return online cpu is a good idea, but 
> it may be changed by command line. so the simplest way is to call 
> local_pci_probe when the best selected cpu is offline.

Hmm..... housekeeping_cpumask() should never return offline CPU, so
I guess you didn't hit issue with the CPU isolation, but the following
code seems not good.

...

if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
         pci_physfn_is_probed(dev)) {
         cpu = nr_cpu_ids;
     } else {

....

perhaps you could change the logic there and fix it  ?

Thanks
Ethan



>>
>>> +    if ((cpu < nr_cpu_ids) && cpu_online(cpu))
>>>           error = work_on_cpu(cpu, local_pci_probe, &ddi);
>>>       else
>>>           error = local_pci_probe(&ddi);
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline
  2024-07-24  2:47     ` Ethan Zhao
@ 2024-07-24  3:09       ` Hongchen Zhang
  2024-07-24  6:40         ` Ethan Zhao
  0 siblings, 1 reply; 9+ messages in thread
From: Hongchen Zhang @ 2024-07-24  3:09 UTC (permalink / raw)
  To: Ethan Zhao, Markus Elfring, Bjorn Helgaas
  Cc: Alex Belits, Peter Zijlstra (Intel), Nitesh Narayan Lal,
	Frederic Weisbecker, linux-pci, linux-kernel, loongarch, stable,
	Huacai Chen

Hi Ethan,

On 2024/7/24 上午10:47, Ethan Zhao wrote:
> On 7/24/2024 9:58 AM, Hongchen Zhang wrote:
>> Hi Ethan,
>> On 2024/7/22 PM 3:39, Ethan Zhao wrote:
>>>
>>> On 6/13/2024 3:42 PM, Hongchen Zhang wrote:
>>>> Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
>>>> @cpu is a offline cpu would cause system stuck forever.
>>>>
>>>> This can be happen if a node is online while all its CPUs are
>>>> offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).
>>>>
>>>> So, in the above case, let pci_call_probe() call local_pci_probe()
>>>> instead of work_on_cpu() when the best selected cpu is offline.
>>>>
>>>> Fixes: 69a18b18699b ("PCI: Restrict probe functions to housekeeping 
>>>> CPUs")
>>>> Cc: <stable@vger.kernel.org>
>>>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>>>> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
>>>> ---
>>>> v2 -> v3: Modify commit message according to Markus's suggestion
>>>> v1 -> v2: Add a method to reproduce the problem
>>>> ---
>>>>   drivers/pci/pci-driver.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>>> index af2996d0d17f..32a99828e6a3 100644
>>>> --- a/drivers/pci/pci-driver.c
>>>> +++ b/drivers/pci/pci-driver.c
>>>> @@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver 
>>>> *drv, struct pci_dev *dev,
>>>>           free_cpumask_var(wq_domain_mask);
>>>>       }
>>>> -    if (cpu < nr_cpu_ids)
>>>
>>> Why not choose the right cpu to callwork_on_cpu() ? the one that is 
>>> online. Thanks, Ethan
>> Yes, let housekeeping_cpumask() return online cpu is a good idea, but 
>> it may be changed by command line. so the simplest way is to call 
>> local_pci_probe when the best selected cpu is offline.
> 
> Hmm..... housekeeping_cpumask() should never return offline CPU, so
> I guess you didn't hit issue with the CPU isolation, but the following
> code seems not good.
The issue is the dev node is online but the best selected cpu is 
offline, so it seems that there is no better way to directly set the cpu 
to nr_cpu_ids.
> 
> ...
> 
> if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
>          pci_physfn_is_probed(dev)) {
>          cpu = nr_cpu_ids;
>      } else {
> 
> ....
> 
> perhaps you could change the logic there and fix it  ?
> 
> Thanks
> Ethan
> 
> 
> 
>>>
>>>> +    if ((cpu < nr_cpu_ids) && cpu_online(cpu))
>>>>           error = work_on_cpu(cpu, local_pci_probe, &ddi);
>>>>       else
>>>>           error = local_pci_probe(&ddi);
>>
>>


-- 
Best Regards
Hongchen Zhang


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline
  2024-07-24  3:09       ` Hongchen Zhang
@ 2024-07-24  6:40         ` Ethan Zhao
  2024-07-24  7:05           ` Hongchen Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Ethan Zhao @ 2024-07-24  6:40 UTC (permalink / raw)
  To: Hongchen Zhang, Markus Elfring, Bjorn Helgaas
  Cc: Alex Belits, Peter Zijlstra (Intel), Nitesh Narayan Lal,
	Frederic Weisbecker, linux-pci, linux-kernel, loongarch, stable,
	Huacai Chen

On 7/24/2024 11:09 AM, Hongchen Zhang wrote:
> Hi Ethan,
>
> On 2024/7/24 上午10:47, Ethan Zhao wrote:
>> On 7/24/2024 9:58 AM, Hongchen Zhang wrote:
>>> Hi Ethan,
>>> On 2024/7/22 PM 3:39, Ethan Zhao wrote:
>>>>
>>>> On 6/13/2024 3:42 PM, Hongchen Zhang wrote:
>>>>> Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
>>>>> @cpu is a offline cpu would cause system stuck forever.
>>>>>
>>>>> This can be happen if a node is online while all its CPUs are
>>>>> offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).
>>>>>
>>>>> So, in the above case, let pci_call_probe() call local_pci_probe()
>>>>> instead of work_on_cpu() when the best selected cpu is offline.
>>>>>
>>>>> Fixes: 69a18b18699b ("PCI: Restrict probe functions to 
>>>>> housekeeping CPUs")
>>>>> Cc: <stable@vger.kernel.org>
>>>>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>>>>> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
>>>>> ---
>>>>> v2 -> v3: Modify commit message according to Markus's suggestion
>>>>> v1 -> v2: Add a method to reproduce the problem
>>>>> ---
>>>>>   drivers/pci/pci-driver.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>>>> index af2996d0d17f..32a99828e6a3 100644
>>>>> --- a/drivers/pci/pci-driver.c
>>>>> +++ b/drivers/pci/pci-driver.c
>>>>> @@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver 
>>>>> *drv, struct pci_dev *dev,
>>>>>           free_cpumask_var(wq_domain_mask);
>>>>>       }
>>>>> -    if (cpu < nr_cpu_ids)
>>>>
>>>> Why not choose the right cpu to callwork_on_cpu() ? the one that is 
>>>> online. Thanks, Ethan
>>> Yes, let housekeeping_cpumask() return online cpu is a good idea, 
>>> but it may be changed by command line. so the simplest way is to 
>>> call local_pci_probe when the best selected cpu is offline.
>>
>> Hmm..... housekeeping_cpumask() should never return offline CPU, so
>> I guess you didn't hit issue with the CPU isolation, but the following
>> code seems not good.
> The issue is the dev node is online but the best selected cpu is 
> offline, so it seems that there is no better way to directly set the 
> cpu to nr_cpu_ids.

I mean where the bug is ? you should debug more about that.
just add one cpu_online(cpu) check there might suggest there
is bug in the cpu selection stage.


if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
	    pci_physfn_is_probed(dev)) {
		cpu = nr_cpu_ids; // <----- if you hit here, then local_pci_probe() should be called.
             
	} else {
		cpumask_var_t wq_domain_mask;

		if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
			error = -ENOMEM;
			goto out;
		}
		cpumask_and(wq_domain_mask,
			    housekeeping_cpumask(HK_TYPE_WQ),
			    housekeeping_cpumask(HK_TYPE_DOMAIN));

		cpu = cpumask_any_and(cpumask_of_node(node),
				      wq_domain_mask);
		free_cpumask_var(wq_domain_mask);
                 // <----- if you hit here, then work_on_cpu(cpu, local_pci_probe, &ddi) should be called.
                 // do you mean there one offline cpu is selected ?

	}

	if (cpu < nr_cpu_ids)
		error = work_on_cpu(cpu, local_pci_probe, &ddi);
	else
		error = local_pci_probe(&ddi);


Thanks,
Ethan

>>
>> ...
>>
>> if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
>>          pci_physfn_is_probed(dev)) {
>>          cpu = nr_cpu_ids;
>>      } else {
>>
>> ....
>>
>> perhaps you could change the logic there and fix it  ?
>>
>> Thanks
>> Ethan
>>
>>
>>
>>>>
>>>>> +    if ((cpu < nr_cpu_ids) && cpu_online(cpu))
>>>>>           error = work_on_cpu(cpu, local_pci_probe, &ddi);
>>>>>       else
>>>>>           error = local_pci_probe(&ddi);
>>>
>>>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline
  2024-07-24  6:40         ` Ethan Zhao
@ 2024-07-24  7:05           ` Hongchen Zhang
  2024-09-13  8:06             ` Huacai Chen
  0 siblings, 1 reply; 9+ messages in thread
From: Hongchen Zhang @ 2024-07-24  7:05 UTC (permalink / raw)
  To: Ethan Zhao, Markus Elfring, Bjorn Helgaas
  Cc: Alex Belits, Peter Zijlstra (Intel), Nitesh Narayan Lal,
	Frederic Weisbecker, linux-pci, linux-kernel, loongarch, stable,
	Huacai Chen

On 2024/7/24 下午2:40, Ethan Zhao wrote:
> On 7/24/2024 11:09 AM, Hongchen Zhang wrote:
>> Hi Ethan,
>>
>> On 2024/7/24 上午10:47, Ethan Zhao wrote:
>>> On 7/24/2024 9:58 AM, Hongchen Zhang wrote:
>>>> Hi Ethan,
>>>> On 2024/7/22 PM 3:39, Ethan Zhao wrote:
>>>>>
>>>>> On 6/13/2024 3:42 PM, Hongchen Zhang wrote:
>>>>>> Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
>>>>>> @cpu is a offline cpu would cause system stuck forever.
>>>>>>
>>>>>> This can be happen if a node is online while all its CPUs are
>>>>>> offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).
>>>>>>
>>>>>> So, in the above case, let pci_call_probe() call local_pci_probe()
>>>>>> instead of work_on_cpu() when the best selected cpu is offline.
>>>>>>
>>>>>> Fixes: 69a18b18699b ("PCI: Restrict probe functions to 
>>>>>> housekeeping CPUs")
>>>>>> Cc: <stable@vger.kernel.org>
>>>>>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
>>>>>> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
>>>>>> ---
>>>>>> v2 -> v3: Modify commit message according to Markus's suggestion
>>>>>> v1 -> v2: Add a method to reproduce the problem
>>>>>> ---
>>>>>>   drivers/pci/pci-driver.c | 2 +-
>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>>>>> index af2996d0d17f..32a99828e6a3 100644
>>>>>> --- a/drivers/pci/pci-driver.c
>>>>>> +++ b/drivers/pci/pci-driver.c
>>>>>> @@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver 
>>>>>> *drv, struct pci_dev *dev,
>>>>>>           free_cpumask_var(wq_domain_mask);
>>>>>>       }
>>>>>> -    if (cpu < nr_cpu_ids)
>>>>>
>>>>> Why not choose the right cpu to callwork_on_cpu() ? the one that is 
>>>>> online. Thanks, Ethan
>>>> Yes, let housekeeping_cpumask() return online cpu is a good idea, 
>>>> but it may be changed by command line. so the simplest way is to 
>>>> call local_pci_probe when the best selected cpu is offline.
>>>
>>> Hmm..... housekeeping_cpumask() should never return offline CPU, so
>>> I guess you didn't hit issue with the CPU isolation, but the following
>>> code seems not good.
>> The issue is the dev node is online but the best selected cpu is 
>> offline, so it seems that there is no better way to directly set the 
>> cpu to nr_cpu_ids.
> 
> I mean where the bug is ? you should debug more about that.
> just add one cpu_online(cpu) check there might suggest there
> is bug in the cpu selection stage.
> 
> 
> if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
>          pci_physfn_is_probed(dev)) {
>          cpu = nr_cpu_ids; // <----- if you hit here, then 
> local_pci_probe() should be called.
>      } else {
>          cpumask_var_t wq_domain_mask;
> 
>          if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
>              error = -ENOMEM;
>              goto out;
>          }
>          cpumask_and(wq_domain_mask,
>                  housekeeping_cpumask(HK_TYPE_WQ),
>                  housekeeping_cpumask(HK_TYPE_DOMAIN));
> 
>          cpu = cpumask_any_and(cpumask_of_node(node),
>                        wq_domain_mask);
>          free_cpumask_var(wq_domain_mask);
>                  // <----- if you hit here, then work_on_cpu(cpu, 
> local_pci_probe, &ddi) should be called.
Yes, but if the offline cpu is selected, local_pci_probe should be called.
>                  // do you mean there one offline cpu is selected ?
Yes, the offline cpu is selected.
> 
>      }
> 
>      if (cpu < nr_cpu_ids)
>          error = work_on_cpu(cpu, local_pci_probe, &ddi);
>      else
>          error = local_pci_probe(&ddi);
> 
> 
> Thanks,
> Ethan
> 
>>>
>>> ...
>>>
>>> if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
>>>          pci_physfn_is_probed(dev)) {
>>>          cpu = nr_cpu_ids;
>>>      } else {
>>>
>>> ....
>>>
>>> perhaps you could change the logic there and fix it  ?
>>>
>>> Thanks
>>> Ethan
>>>
>>>
>>>
>>>>>
>>>>>> +    if ((cpu < nr_cpu_ids) && cpu_online(cpu))
>>>>>>           error = work_on_cpu(cpu, local_pci_probe, &ddi);
>>>>>>       else
>>>>>>           error = local_pci_probe(&ddi);
>>>>
>>>>
>>
>>


-- 
Best Regards
Hongchen Zhang


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline
  2024-07-24  7:05           ` Hongchen Zhang
@ 2024-09-13  8:06             ` Huacai Chen
  0 siblings, 0 replies; 9+ messages in thread
From: Huacai Chen @ 2024-09-13  8:06 UTC (permalink / raw)
  To: Hongchen Zhang
  Cc: Ethan Zhao, Markus Elfring, Bjorn Helgaas, Alex Belits,
	Peter Zijlstra (Intel), Nitesh Narayan Lal, Frederic Weisbecker,
	linux-pci, linux-kernel, loongarch, stable, Huacai Chen

Ping?

On Wed, Jul 24, 2024 at 3:05 PM Hongchen Zhang
<zhanghongchen@loongson.cn> wrote:
>
> On 2024/7/24 下午2:40, Ethan Zhao wrote:
> > On 7/24/2024 11:09 AM, Hongchen Zhang wrote:
> >> Hi Ethan,
> >>
> >> On 2024/7/24 上午10:47, Ethan Zhao wrote:
> >>> On 7/24/2024 9:58 AM, Hongchen Zhang wrote:
> >>>> Hi Ethan,
> >>>> On 2024/7/22 PM 3:39, Ethan Zhao wrote:
> >>>>>
> >>>>> On 6/13/2024 3:42 PM, Hongchen Zhang wrote:
> >>>>>> Call work_on_cpu(cpu, fn, arg) in pci_call_probe() while the argument
> >>>>>> @cpu is a offline cpu would cause system stuck forever.
> >>>>>>
> >>>>>> This can be happen if a node is online while all its CPUs are
> >>>>>> offline (We can use "maxcpus=1" without "nr_cpus=1" to reproduce it).
> >>>>>>
> >>>>>> So, in the above case, let pci_call_probe() call local_pci_probe()
> >>>>>> instead of work_on_cpu() when the best selected cpu is offline.
> >>>>>>
> >>>>>> Fixes: 69a18b18699b ("PCI: Restrict probe functions to
> >>>>>> housekeeping CPUs")
> >>>>>> Cc: <stable@vger.kernel.org>
> >>>>>> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> >>>>>> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> >>>>>> ---
> >>>>>> v2 -> v3: Modify commit message according to Markus's suggestion
> >>>>>> v1 -> v2: Add a method to reproduce the problem
> >>>>>> ---
> >>>>>>   drivers/pci/pci-driver.c | 2 +-
> >>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> >>>>>> index af2996d0d17f..32a99828e6a3 100644
> >>>>>> --- a/drivers/pci/pci-driver.c
> >>>>>> +++ b/drivers/pci/pci-driver.c
> >>>>>> @@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver
> >>>>>> *drv, struct pci_dev *dev,
> >>>>>>           free_cpumask_var(wq_domain_mask);
> >>>>>>       }
> >>>>>> -    if (cpu < nr_cpu_ids)
> >>>>>
> >>>>> Why not choose the right cpu to callwork_on_cpu() ? the one that is
> >>>>> online. Thanks, Ethan
> >>>> Yes, let housekeeping_cpumask() return online cpu is a good idea,
> >>>> but it may be changed by command line. so the simplest way is to
> >>>> call local_pci_probe when the best selected cpu is offline.
> >>>
> >>> Hmm..... housekeeping_cpumask() should never return offline CPU, so
> >>> I guess you didn't hit issue with the CPU isolation, but the following
> >>> code seems not good.
> >> The issue is the dev node is online but the best selected cpu is
> >> offline, so it seems that there is no better way to directly set the
> >> cpu to nr_cpu_ids.
> >
> > I mean where the bug is ? you should debug more about that.
> > just add one cpu_online(cpu) check there might suggest there
> > is bug in the cpu selection stage.
> >
> >
> > if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
> >          pci_physfn_is_probed(dev)) {
> >          cpu = nr_cpu_ids; // <----- if you hit here, then
> > local_pci_probe() should be called.
> >      } else {
> >          cpumask_var_t wq_domain_mask;
> >
> >          if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
> >              error = -ENOMEM;
> >              goto out;
> >          }
> >          cpumask_and(wq_domain_mask,
> >                  housekeeping_cpumask(HK_TYPE_WQ),
> >                  housekeeping_cpumask(HK_TYPE_DOMAIN));
> >
> >          cpu = cpumask_any_and(cpumask_of_node(node),
> >                        wq_domain_mask);
> >          free_cpumask_var(wq_domain_mask);
> >                  // <----- if you hit here, then work_on_cpu(cpu,
> > local_pci_probe, &ddi) should be called.
> Yes, but if the offline cpu is selected, local_pci_probe should be called.
> >                  // do you mean there one offline cpu is selected ?
> Yes, the offline cpu is selected.
> >
> >      }
> >
> >      if (cpu < nr_cpu_ids)
> >          error = work_on_cpu(cpu, local_pci_probe, &ddi);
> >      else
> >          error = local_pci_probe(&ddi);
> >
> >
> > Thanks,
> > Ethan
> >
> >>>
> >>> ...
> >>>
> >>> if (node < 0 || node >= MAX_NUMNODES || !node_online(node) ||
> >>>          pci_physfn_is_probed(dev)) {
> >>>          cpu = nr_cpu_ids;
> >>>      } else {
> >>>
> >>> ....
> >>>
> >>> perhaps you could change the logic there and fix it  ?
> >>>
> >>> Thanks
> >>> Ethan
> >>>
> >>>
> >>>
> >>>>>
> >>>>>> +    if ((cpu < nr_cpu_ids) && cpu_online(cpu))
> >>>>>>           error = work_on_cpu(cpu, local_pci_probe, &ddi);
> >>>>>>       else
> >>>>>>           error = local_pci_probe(&ddi);
> >>>>
> >>>>
> >>
> >>
>
>
> --
> Best Regards
> Hongchen Zhang
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-09-13  8:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13  7:42 [PATCH v3] PCI: pci_call_probe: call local_pci_probe() when selected cpu is offline Hongchen Zhang
2024-07-22  2:11 ` Huacai Chen
2024-07-22  7:39 ` Ethan Zhao
2024-07-24  1:58   ` Hongchen Zhang
2024-07-24  2:47     ` Ethan Zhao
2024-07-24  3:09       ` Hongchen Zhang
2024-07-24  6:40         ` Ethan Zhao
2024-07-24  7:05           ` Hongchen Zhang
2024-09-13  8:06             ` Huacai Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).