* [PATCH] PCI: pciehp: Use appropriate conditions to check the hotplug controller status @ 2024-05-24 6:30 Bitao Hu 2024-05-24 7:53 ` Lukas Wunner 2024-05-28 6:42 ` [PATCHv2] " Bitao Hu 0 siblings, 2 replies; 11+ messages in thread From: Bitao Hu @ 2024-05-24 6:30 UTC (permalink / raw) To: bhelgaas, lukas, weirongguang; +Cc: yaoma, kanie, linux-pci, linux-kernel The values of 'present' and 'link_active' have similar meanings: the value is %1 if the status is ready, and %0 if it is not. If the hotplug controller itself is not available, the value should be %-ENODEV. However, both %1 and %-ENODEV are considered true, which obviously does not meet expectations. 'Slot(xx): Card present' and 'Slot(xx): Link Up' should only be output when the value is %1. Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com> --- drivers/pci/hotplug/pciehp_ctrl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index dcdbfcf404dd..6adfdbb70150 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) case OFF_STATE: ctrl->state = POWERON_STATE; mutex_unlock(&ctrl->state_lock); - if (present) + if (present > 0) ctrl_info(ctrl, "Slot(%s): Card present\n", slot_name(ctrl)); - if (link_active) + if (link_active > 0) ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl)); ctrl->request_result = pciehp_enable_slot(ctrl); -- 2.37.1 (Apple Git-137.1) ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-05-24 6:30 [PATCH] PCI: pciehp: Use appropriate conditions to check the hotplug controller status Bitao Hu @ 2024-05-24 7:53 ` Lukas Wunner 2024-05-26 14:45 ` yaoma 2024-05-28 6:42 ` [PATCHv2] " Bitao Hu 1 sibling, 1 reply; 11+ messages in thread From: Lukas Wunner @ 2024-05-24 7:53 UTC (permalink / raw) To: Bitao Hu; +Cc: bhelgaas, weirongguang, kanie, linux-pci, linux-kernel On Fri, May 24, 2024 at 02:30:23PM +0800, Bitao Hu wrote: > The values of 'present' and 'link_active' have similar meanings: > the value is %1 if the status is ready, and %0 if it is not. If the > hotplug controller itself is not available, the value should be > %-ENODEV. However, both %1 and %-ENODEV are considered true, which > obviously does not meet expectations. 'Slot(xx): Card present' and > 'Slot(xx): Link Up' should only be output when the value is %1. [...] > --- a/drivers/pci/hotplug/pciehp_ctrl.c > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) > case OFF_STATE: > ctrl->state = POWERON_STATE; > mutex_unlock(&ctrl->state_lock); > - if (present) > + if (present > 0) > ctrl_info(ctrl, "Slot(%s): Card present\n", > slot_name(ctrl)); > - if (link_active) > + if (link_active > 0) > ctrl_info(ctrl, "Slot(%s): Link Up\n", > slot_name(ctrl)); > ctrl->request_result = pciehp_enable_slot(ctrl); We already handle the "<= 0" case immediately above this code excerpt: if (present <= 0 && link_active <= 0) { ... } So neither "present" nor "link_active" can be < 0 at this point. Hence I don't quite understand what motivates the proposed code change? Thanks, Lukas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-05-24 7:53 ` Lukas Wunner @ 2024-05-26 14:45 ` yaoma 2024-05-27 8:50 ` Lukas Wunner 0 siblings, 1 reply; 11+ messages in thread From: yaoma @ 2024-05-26 14:45 UTC (permalink / raw) To: Lukas Wunner Cc: yaoma, bhelgaas, weirongguang, kanie, linux-pci, linux-kernel Hi, > 2024年5月24日 15:53,Lukas Wunner <lukas@wunner.de> 写道: > > On Fri, May 24, 2024 at 02:30:23PM +0800, Bitao Hu wrote: >> The values of 'present' and 'link_active' have similar meanings: >> the value is %1 if the status is ready, and %0 if it is not. If the >> hotplug controller itself is not available, the value should be >> %-ENODEV. However, both %1 and %-ENODEV are considered true, which >> obviously does not meet expectations. 'Slot(xx): Card present' and >> 'Slot(xx): Link Up' should only be output when the value is %1. > [...] >> --- a/drivers/pci/hotplug/pciehp_ctrl.c >> +++ b/drivers/pci/hotplug/pciehp_ctrl.c >> @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) >> case OFF_STATE: >> ctrl->state = POWERON_STATE; >> mutex_unlock(&ctrl->state_lock); >> - if (present) >> + if (present > 0) >> ctrl_info(ctrl, "Slot(%s): Card present\n", >> slot_name(ctrl)); >> - if (link_active) >> + if (link_active > 0) >> ctrl_info(ctrl, "Slot(%s): Link Up\n", >> slot_name(ctrl)); >> ctrl->request_result = pciehp_enable_slot(ctrl); > > We already handle the "<= 0" case immediately above this code excerpt: > > if (present <= 0 && link_active <= 0) { > ... > } I'm not sure if the following scenarios would occur in actual production environment, but from the code level, there is the possibility of “present <= 0 && link_active > 0” or “present > 0 && link_active <= 0”. In these cases, the “<= 0” conditions will not be properly handled, and “ctrl_info” will output incorrect prompt messages. > So neither "present" nor "link_active" can be < 0 at this point. > Best Regards, Bitao Hu ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-05-26 14:45 ` yaoma @ 2024-05-27 8:50 ` Lukas Wunner 2024-05-27 9:43 ` yaoma 0 siblings, 1 reply; 11+ messages in thread From: Lukas Wunner @ 2024-05-27 8:50 UTC (permalink / raw) To: yaoma; +Cc: bhelgaas, weirongguang, kanie, linux-pci, linux-kernel On Sun, May 26, 2024 at 10:45:36PM +0800, yaoma wrote: > > 2024 5 24 15:53 Lukas Wunner <lukas@wunner.de> > > On Fri, May 24, 2024 at 02:30:23PM +0800, Bitao Hu wrote: > > > The values of 'present' and 'link_active' have similar meanings: > > > the value is %1 if the status is ready, and %0 if it is not. If the > > > hotplug controller itself is not available, the value should be > > > %-ENODEV. However, both %1 and %-ENODEV are considered true, which > > > obviously does not meet expectations. 'Slot(xx): Card present' and > > > 'Slot(xx): Link Up' should only be output when the value is %1. > > [...] > > > --- a/drivers/pci/hotplug/pciehp_ctrl.c > > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > > > @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) > > > case OFF_STATE: > > > ctrl->state = POWERON_STATE; > > > mutex_unlock(&ctrl->state_lock); > > > - if (present) > > > + if (present > 0) > > > ctrl_info(ctrl, "Slot(%s): Card present\n", > > > slot_name(ctrl)); > > > - if (link_active) > > > + if (link_active > 0) > > > ctrl_info(ctrl, "Slot(%s): Link Up\n", > > > slot_name(ctrl)); > > > ctrl->request_result = pciehp_enable_slot(ctrl); > > > > We already handle the "<= 0" case immediately above this code excerpt: > > > > if (present <= 0 && link_active <= 0) { > > ... > > } > > I'm not sure if the following scenarios would occur in actual production > environment, but from the code level, there is the possibility of > "present <= 0 && link_active > 0" or "present > 0 && link_active <= 0". > In these cases, the "<= 0" conditions will not be properly handled, > and "ctrl_info" will output incorrect prompt messages. I see, that makes sense. "present" and "link_active" can be -ENODEV if reading the config space of the hotplug port failed. That's typically the case if the hotplug port itself was hot-removed, which happens all the time with Thunderbolt/USB4. E.g. pciehp_card_present() may return 1 and pciehp_check_link_active() may return -ENODEV because the hotplug port was hot-removed in-between the two function calls. In that case we'll emit both "Card present" *and* "Link Up". The latter is uncalled for and is supressed by your patch. So your code change is Reviewed-by: Lukas Wunner <lukas@wunner.de> ...but it would be good if you could respin the patch and explain the rationale of the code change in the commit message more clearly. Basically summarize what you and I have explained above. Also, the percent sign % in front of 0, 1, -ENODEV is unnecessary in commit messages. It only has special meaning in kernel-doc. Thanks, Lukas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-05-27 8:50 ` Lukas Wunner @ 2024-05-27 9:43 ` yaoma 0 siblings, 0 replies; 11+ messages in thread From: yaoma @ 2024-05-27 9:43 UTC (permalink / raw) To: Lukas Wunner Cc: yaoma, bhelgaas, weirongguang, kanie, linux-pci, linux-kernel Hi, > 2024年5月27日 16:50,Lukas Wunner <lukas@wunner.de> 写道: > > On Sun, May 26, 2024 at 10:45:36PM +0800, yaoma wrote: >>> 2024 5 24 15:53 Lukas Wunner <lukas@wunner.de> >>> On Fri, May 24, 2024 at 02:30:23PM +0800, Bitao Hu wrote: >>>> The values of 'present' and 'link_active' have similar meanings: >>>> the value is %1 if the status is ready, and %0 if it is not. If the >>>> hotplug controller itself is not available, the value should be >>>> %-ENODEV. However, both %1 and %-ENODEV are considered true, which >>>> obviously does not meet expectations. 'Slot(xx): Card present' and >>>> 'Slot(xx): Link Up' should only be output when the value is %1. >>> [...] >>>> --- a/drivers/pci/hotplug/pciehp_ctrl.c >>>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c >>>> @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) >>>> case OFF_STATE: >>>> ctrl->state = POWERON_STATE; >>>> mutex_unlock(&ctrl->state_lock); >>>> - if (present) >>>> + if (present > 0) >>>> ctrl_info(ctrl, "Slot(%s): Card present\n", >>>> slot_name(ctrl)); >>>> - if (link_active) >>>> + if (link_active > 0) >>>> ctrl_info(ctrl, "Slot(%s): Link Up\n", >>>> slot_name(ctrl)); >>>> ctrl->request_result = pciehp_enable_slot(ctrl); >>> >>> We already handle the "<= 0" case immediately above this code excerpt: >>> >>> if (present <= 0 && link_active <= 0) { >>> ... >>> } >> >> I'm not sure if the following scenarios would occur in actual production >> environment, but from the code level, there is the possibility of >> "present <= 0 && link_active > 0" or "present > 0 && link_active <= 0". >> In these cases, the "<= 0" conditions will not be properly handled, >> and "ctrl_info" will output incorrect prompt messages. > > I see, that makes sense. > > "present" and "link_active" can be -ENODEV if reading the config space > of the hotplug port failed. That's typically the case if the hotplug > port itself was hot-removed, which happens all the time with > Thunderbolt/USB4. > > E.g. pciehp_card_present() may return 1 and pciehp_check_link_active() > may return -ENODEV because the hotplug port was hot-removed in-between > the two function calls. In that case we'll emit both "Card present" > *and* "Link Up". The latter is uncalled for and is supressed by your > patch. > > So your code change is > Reviewed-by: Lukas Wunner <lukas@wunner.de> > > ...but it would be good if you could respin the patch and explain the > rationale of the code change in the commit message more clearly. > Basically summarize what you and I have explained above. > > Also, the percent sign % in front of 0, 1, -ENODEV is unnecessary in > commit messages. It only has special meaning in kernel-doc. > Thanks for your analysis. I will make the the rationale of the code change more clearly in next patch. Best Regards, Bitao Hu ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCHv2] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-05-24 6:30 [PATCH] PCI: pciehp: Use appropriate conditions to check the hotplug controller status Bitao Hu 2024-05-24 7:53 ` Lukas Wunner @ 2024-05-28 6:42 ` Bitao Hu 2024-05-28 10:54 ` Ilpo Järvinen 2024-06-14 18:41 ` Bjorn Helgaas 1 sibling, 2 replies; 11+ messages in thread From: Bitao Hu @ 2024-05-28 6:42 UTC (permalink / raw) To: lukas, bhelgaas, weirongguang; +Cc: linux-pci, linux-kernel, kanie, yaoma "present" and "link_active" can be 1 if the status is ready, and 0 if it is not. Both of them can be -ENODEV if reading the config space of the hotplug port failed. That's typically the case if the hotplug port itself was hot-removed. Therefore, this situation can occur: pciehp_card_present() may return 1 and pciehp_check_link_active() may return -ENODEV because the hotplug port was hot-removed in-between the two function calls. In that case we'll emit both "Card present" *and* "Link Up" since both 1 and -ENODEV are considered "true". This is not the expected behavior. Those messages should be emited when "present" and "link_active" are positive. Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> --- v1 -> v2: 1. Explain the rationale of the code change in the commit message more clearly. 2. Add the "Reviewed-by" tag of Lukas. --- drivers/pci/hotplug/pciehp_ctrl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index dcdbfcf404dd..6adfdbb70150 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) case OFF_STATE: ctrl->state = POWERON_STATE; mutex_unlock(&ctrl->state_lock); - if (present) + if (present > 0) ctrl_info(ctrl, "Slot(%s): Card present\n", slot_name(ctrl)); - if (link_active) + if (link_active > 0) ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl)); ctrl->request_result = pciehp_enable_slot(ctrl); -- 2.37.1 (Apple Git-137.1) ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCHv2] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-05-28 6:42 ` [PATCHv2] " Bitao Hu @ 2024-05-28 10:54 ` Ilpo Järvinen 2024-06-14 18:41 ` Bjorn Helgaas 1 sibling, 0 replies; 11+ messages in thread From: Ilpo Järvinen @ 2024-05-28 10:54 UTC (permalink / raw) To: Bitao Hu; +Cc: Lukas Wunner, bhelgaas, weirongguang, linux-pci, LKML, kanie [-- Attachment #1: Type: text/plain, Size: 1922 bytes --] On Tue, 28 May 2024, Bitao Hu wrote: > "present" and "link_active" can be 1 if the status is ready, and 0 if > it is not. Both of them can be -ENODEV if reading the config space > of the hotplug port failed. That's typically the case if the hotplug > port itself was hot-removed. Therefore, this situation can occur: > pciehp_card_present() may return 1 and pciehp_check_link_active() > may return -ENODEV because the hotplug port was hot-removed in-between > the two function calls. In that case we'll emit both "Card present" > *and* "Link Up" since both 1 and -ENODEV are considered "true". This > is not the expected behavior. Those messages should be emited when > "present" and "link_active" are positive. > > Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com> > Reviewed-by: Lukas Wunner <lukas@wunner.de> Thanks for updaring the description. Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> -- i. > --- > v1 -> v2: > 1. Explain the rationale of the code change in the commit message > more clearly. > 2. Add the "Reviewed-by" tag of Lukas. > --- > drivers/pci/hotplug/pciehp_ctrl.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c > index dcdbfcf404dd..6adfdbb70150 100644 > --- a/drivers/pci/hotplug/pciehp_ctrl.c > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) > case OFF_STATE: > ctrl->state = POWERON_STATE; > mutex_unlock(&ctrl->state_lock); > - if (present) > + if (present > 0) > ctrl_info(ctrl, "Slot(%s): Card present\n", > slot_name(ctrl)); > - if (link_active) > + if (link_active > 0) > ctrl_info(ctrl, "Slot(%s): Link Up\n", > slot_name(ctrl)); > ctrl->request_result = pciehp_enable_slot(ctrl); > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv2] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-05-28 6:42 ` [PATCHv2] " Bitao Hu 2024-05-28 10:54 ` Ilpo Järvinen @ 2024-06-14 18:41 ` Bjorn Helgaas 2024-06-14 19:36 ` Lukas Wunner 1 sibling, 1 reply; 11+ messages in thread From: Bjorn Helgaas @ 2024-06-14 18:41 UTC (permalink / raw) To: Bitao Hu Cc: lukas, bhelgaas, weirongguang, linux-pci, linux-kernel, kanie, Ilpo Järvinen [+cc Ilpo] On Tue, May 28, 2024 at 02:42:00PM +0800, Bitao Hu wrote: > "present" and "link_active" can be 1 if the status is ready, and 0 if > it is not. Both of them can be -ENODEV if reading the config space > of the hotplug port failed. That's typically the case if the hotplug > port itself was hot-removed. Therefore, this situation can occur: > pciehp_card_present() may return 1 and pciehp_check_link_active() > may return -ENODEV because the hotplug port was hot-removed in-between > the two function calls. In that case we'll emit both "Card present" > *and* "Link Up" since both 1 and -ENODEV are considered "true". This > is not the expected behavior. Those messages should be emited when > "present" and "link_active" are positive. > > Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com> > Reviewed-by: Lukas Wunner <lukas@wunner.de> > --- > v1 -> v2: > 1. Explain the rationale of the code change in the commit message > more clearly. > 2. Add the "Reviewed-by" tag of Lukas. > --- > drivers/pci/hotplug/pciehp_ctrl.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c > index dcdbfcf404dd..6adfdbb70150 100644 > --- a/drivers/pci/hotplug/pciehp_ctrl.c > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) > case OFF_STATE: > ctrl->state = POWERON_STATE; > mutex_unlock(&ctrl->state_lock); > - if (present) > + if (present > 0) I completely agree that this is a problem and this patch addresses it. But ... It seems a little bit weird to me that we even get to this switch statement if we got -ENODEV from either pciehp_card_present() or pciehp_check_link_active(). If that happens, a config read failed, but we're going to go ahead and call pciehp_enable_slot(), which is going to do a bunch more config accesses, potentially try to power up the slot, etc. If a config read failed, it seems like we might want to avoid doing some of this stuff. > ctrl_info(ctrl, "Slot(%s): Card present\n", > slot_name(ctrl)); > - if (link_active) > + if (link_active > 0) > ctrl_info(ctrl, "Slot(%s): Link Up\n", > slot_name(ctrl)); These are cases where we misinterpreted -ENODEV as "device is present" or "link is active". pciehp_ignore_dpc_link_change() and pciehp_slot_reset() also call pciehp_check_link_active(), and I think they also interpret -ENODEV as "link is active". Do we need similar changes there? > ctrl->request_result = pciehp_enable_slot(ctrl); > -- > 2.37.1 (Apple Git-137.1) > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv2] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-06-14 18:41 ` Bjorn Helgaas @ 2024-06-14 19:36 ` Lukas Wunner 2024-06-14 22:03 ` Bjorn Helgaas 0 siblings, 1 reply; 11+ messages in thread From: Lukas Wunner @ 2024-06-14 19:36 UTC (permalink / raw) To: Bjorn Helgaas Cc: Bitao Hu, bhelgaas, weirongguang, linux-pci, linux-kernel, kanie, Ilpo Järvinen On Fri, Jun 14, 2024 at 01:41:20PM -0500, Bjorn Helgaas wrote: > On Tue, May 28, 2024 at 02:42:00PM +0800, Bitao Hu wrote: > > "present" and "link_active" can be 1 if the status is ready, and 0 if > > it is not. Both of them can be -ENODEV if reading the config space > > of the hotplug port failed. That's typically the case if the hotplug > > port itself was hot-removed. Therefore, this situation can occur: > > pciehp_card_present() may return 1 and pciehp_check_link_active() > > may return -ENODEV because the hotplug port was hot-removed in-between > > the two function calls. In that case we'll emit both "Card present" > > *and* "Link Up" since both 1 and -ENODEV are considered "true". This > > is not the expected behavior. Those messages should be emited when > > "present" and "link_active" are positive. [...] > > --- a/drivers/pci/hotplug/pciehp_ctrl.c > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > > @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) > > case OFF_STATE: > > ctrl->state = POWERON_STATE; > > mutex_unlock(&ctrl->state_lock); > > - if (present) > > + if (present > 0) > > I completely agree that this is a problem and this patch addresses it. > But ... > > It seems a little bit weird to me that we even get to this switch > statement if we got -ENODEV from either pciehp_card_present() or > pciehp_check_link_active(). If that happens, a config read failed, > but we're going to go ahead and call pciehp_enable_slot(), which is > going to do a bunch more config accesses, potentially try to power up > the slot, etc. > > If a config read failed, it seems like we might want to avoid doing > some of this stuff. Hm, good point. I guess we should change the logical expression instead: - if (present <= 0 && link_active <= 0) { + if (present < 0 || link_active < 0 || (!present && !link_active)) { > > - if (link_active) > > + if (link_active > 0) > > ctrl_info(ctrl, "Slot(%s): Link Up\n", > > slot_name(ctrl)); > > These are cases where we misinterpreted -ENODEV as "device is present" > or "link is active". > > pciehp_ignore_dpc_link_change() and pciehp_slot_reset() also call > pciehp_check_link_active(), and I think they also interpret -ENODEV as > "link is active". > > Do we need similar changes there? Another good observation, both need to check for <= 0 instead of == 0. Do you want to fix that yourself or would you prefer me (or someone else) to submit a patch? Thanks, Lukas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv2] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-06-14 19:36 ` Lukas Wunner @ 2024-06-14 22:03 ` Bjorn Helgaas 2024-06-15 10:06 ` Lukas Wunner 0 siblings, 1 reply; 11+ messages in thread From: Bjorn Helgaas @ 2024-06-14 22:03 UTC (permalink / raw) To: Lukas Wunner Cc: Bitao Hu, bhelgaas, weirongguang, linux-pci, linux-kernel, kanie, Ilpo Järvinen On Fri, Jun 14, 2024 at 09:36:57PM +0200, Lukas Wunner wrote: > On Fri, Jun 14, 2024 at 01:41:20PM -0500, Bjorn Helgaas wrote: > > On Tue, May 28, 2024 at 02:42:00PM +0800, Bitao Hu wrote: > > > "present" and "link_active" can be 1 if the status is ready, and 0 if > > > it is not. Both of them can be -ENODEV if reading the config space > > > of the hotplug port failed. That's typically the case if the hotplug > > > port itself was hot-removed. Therefore, this situation can occur: > > > pciehp_card_present() may return 1 and pciehp_check_link_active() > > > may return -ENODEV because the hotplug port was hot-removed in-between > > > the two function calls. In that case we'll emit both "Card present" > > > *and* "Link Up" since both 1 and -ENODEV are considered "true". This > > > is not the expected behavior. Those messages should be emited when > > > "present" and "link_active" are positive. > [...] > > > --- a/drivers/pci/hotplug/pciehp_ctrl.c > > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > > > @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) > > > case OFF_STATE: > > > ctrl->state = POWERON_STATE; > > > mutex_unlock(&ctrl->state_lock); > > > - if (present) > > > + if (present > 0) > > > > I completely agree that this is a problem and this patch addresses it. > > But ... > > > > It seems a little bit weird to me that we even get to this switch > > statement if we got -ENODEV from either pciehp_card_present() or > > pciehp_check_link_active(). If that happens, a config read failed, > > but we're going to go ahead and call pciehp_enable_slot(), which is > > going to do a bunch more config accesses, potentially try to power up > > the slot, etc. > > > > If a config read failed, it seems like we might want to avoid doing > > some of this stuff. > > Hm, good point. I guess we should change the logical expression instead: > > - if (present <= 0 && link_active <= 0) { > + if (present < 0 || link_active < 0 || (!present && !link_active)) { It gets to be a fairly complicated expression, and I'm not 100% sure we should handle the config read failure the same as the "!present && !link_active" case. The config read failure probably means the Downstream Port is gone, the other case means the device *below* that port is gone. We likely want to cancel the delayed work in both cases, but what about the indicators? If the Downstream Port is gone, we're not going to be able to change them. Do we want the same message for both? Maybe we should handle the config failures separately first? These error conditions make everything so ugly. > > > - if (link_active) > > > + if (link_active > 0) > > > ctrl_info(ctrl, "Slot(%s): Link Up\n", > > > slot_name(ctrl)); > > > > These are cases where we misinterpreted -ENODEV as "device is present" > > or "link is active". > > > > pciehp_ignore_dpc_link_change() and pciehp_slot_reset() also call > > pciehp_check_link_active(), and I think they also interpret -ENODEV as > > "link is active". > > > > Do we need similar changes there? > > Another good observation, both need to check for <= 0 instead of == 0. > Do you want to fix that yourself or would you prefer me (or someone else) > to submit a patch? It'd be great if you or somebody else could do that. Bjorn ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv2] PCI: pciehp: Use appropriate conditions to check the hotplug controller status 2024-06-14 22:03 ` Bjorn Helgaas @ 2024-06-15 10:06 ` Lukas Wunner 0 siblings, 0 replies; 11+ messages in thread From: Lukas Wunner @ 2024-06-15 10:06 UTC (permalink / raw) To: Bjorn Helgaas Cc: Bitao Hu, bhelgaas, weirongguang, linux-pci, linux-kernel, kanie, Ilpo Järvinen On Fri, Jun 14, 2024 at 05:03:27PM -0500, Bjorn Helgaas wrote: > On Fri, Jun 14, 2024 at 09:36:57PM +0200, Lukas Wunner wrote: > > Hm, good point. I guess we should change the logical expression instead: > > > > - if (present <= 0 && link_active <= 0) { > > + if (present < 0 || link_active < 0 || (!present && !link_active)) { > > It gets to be a fairly complicated expression, and I'm not 100% sure > we should handle the config read failure the same as the "!present && > !link_active" case. The config read failure probably means the > Downstream Port is gone, the other case means the device *below* that > port is gone. > > We likely want to cancel the delayed work in both cases, but what > about the indicators? If the Downstream Port is gone, we're not going > to be able to change them. Do we want the same message for both? > > Maybe we should handle the config failures separately first? These > error conditions make everything so ugly. To keep the code simple, I'm leaning towards not making the call to pciehp_set_indicators() conditional. The worst thing that can happen is that pciehp waits 1 sec for a previous write to the Slot Control register to time out. > > > These are cases where we misinterpreted -ENODEV as "device is present" > > > or "link is active". > > > > > > pciehp_ignore_dpc_link_change() and pciehp_slot_reset() also call > > > pciehp_check_link_active(), and I think they also interpret -ENODEV as > > > "link is active". > > > > > > Do we need similar changes there? > > > > Another good observation, both need to check for <= 0 instead of == 0. > > Do you want to fix that yourself or would you prefer me (or someone else) > > to submit a patch? > > It'd be great if you or somebody else could do that. After looking at this with a fresh pair of eyeballs, I'm thinking now that the code is actually fine the way it is: - pciehp_ignore_dpc_link_change(): If pciehp_check_link_active() returns -ENODEV, it means we recovered from DPC but immediately afterwards the hotplug port became inaccessible, perhaps because it was hot-removed or because a DPC event occurred further up in the hierarchy. In neither case would it be called for to synthesize a Data Link Layer State Changed event: If the hotplug port was hot-removed, it's better to let the hotplug port in its ancestry handle the de-enumeration of its sub-hierarchy and not interfere with that by trying to concurrently remove a portion of that sub-hierarchy. If a DPC event occurred further up, it's better to let the DPC-capable port in the ancestry handle the recovery and not interfere with that. - pciehp_slot_reset(): If pciehp_check_link_active() returns -ENODEV, it means a Hot Reset was propagated down the hierarchy after which the hotplug port is no longer accessible. Perhaps the hotplug port was hot removed by the user, in which case we should let the hotplug port in the ancestry handle de-enumeration. Another possibility is that reset recovery failed. I don't think we should try to de-enumerate devices below the hotplug port in that case. Maybe another error occurred which triggered another reset and things will be fine after we've recovered from that. Thanks, Lukas ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-06-15 10:07 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-24 6:30 [PATCH] PCI: pciehp: Use appropriate conditions to check the hotplug controller status Bitao Hu 2024-05-24 7:53 ` Lukas Wunner 2024-05-26 14:45 ` yaoma 2024-05-27 8:50 ` Lukas Wunner 2024-05-27 9:43 ` yaoma 2024-05-28 6:42 ` [PATCHv2] " Bitao Hu 2024-05-28 10:54 ` Ilpo Järvinen 2024-06-14 18:41 ` Bjorn Helgaas 2024-06-14 19:36 ` Lukas Wunner 2024-06-14 22:03 ` Bjorn Helgaas 2024-06-15 10:06 ` Lukas Wunner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox