All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: Jijie Shao <shaojijie@huawei.com>,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	andrew+netdev@lunn.ch, horms@kernel.org, shenjian15@huawei.com,
	wangpeiyang1@huawei.com, liuyonglong@huawei.com,
	chenhao418@huawei.com, jonathan.cameron@huawei.com,
	shameerali.kolothum.thodi@huawei.com, salil.mehta@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND V2 net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type
Date: Thu, 19 Dec 2024 11:11:15 +0100	[thread overview]
Message-ID: <Z2PxQ8A5DObivci8@mev-dev.igk.intel.com> (raw)
In-Reply-To: <8a789f23-a17a-456d-ba2a-de8207d65503@redhat.com>

On Thu, Dec 19, 2024 at 10:41:53AM +0100, Paolo Abeni wrote:
> On 12/18/24 10:02, Michal Swiatkowski wrote:
> > On Tue, Dec 17, 2024 at 09:08:33AM +0800, Jijie Shao wrote:
> >> From: Hao Lan <lanhao@huawei.com>
> >>
> >> When a reset type that is not supported by the driver is input, a reset
> >> pending flag bit of the HNAE3_NONE_RESET type is generated in
> >> reset_pending. The driver does not have a mechanism to clear this type
> >> of error. As a result, the driver considers that the reset is not
> >> complete. This patch provides a mechanism to clear the
> >> HNAE3_NONE_RESET flag and the parameter of
> >> hnae3_ae_ops.set_default_reset_request is verified.
> >>
> >> The error message:
> >> hns3 0000:39:01.0: cmd failed -16
> >> hns3 0000:39:01.0: hclge device re-init failed, VF is disabled!
> >> hns3 0000:39:01.0: failed to reset VF stack
> >> hns3 0000:39:01.0: failed to reset VF(4)
> >> hns3 0000:39:01.0: prepare reset(2) wait done
> >> hns3 0000:39:01.0 eth4: already uninitialized
> >>
> >> Use the crash tool to view struct hclgevf_dev:
> >> struct hclgevf_dev {
> >> ...
> >> 	default_reset_request = 0x20,
> >> 	reset_level = HNAE3_NONE_RESET,
> >> 	reset_pending = 0x100,
> >> 	reset_type = HNAE3_NONE_RESET,
> >> ...
> >> };
> >>
> >> Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops")
> >> Signed-off-by: Hao Lan <lanhao@huawei.com>
> >> Signed-off-by: Jijie Shao <shaojijie@huawei.com>
> >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> I haven't signed-off this patch.
> 
> Still no need to repost (yet) for this if the following points are
> solved rapidly (as I may end-up merging the series and really adding my
> SoB), but please avoid this kind of issue in the future.
> 
> >> @@ -4227,7 +4240,7 @@ static bool hclge_reset_err_handle(struct hclge_dev *hdev)
> >>  		return false;
> >>  	} else if (hdev->rst_stats.reset_fail_cnt < MAX_RESET_FAIL_CNT) {
> >>  		hdev->rst_stats.reset_fail_cnt++;
> >> -		set_bit(hdev->reset_type, &hdev->reset_pending);
> >> +		hclge_set_reset_pending(hdev, hdev->reset_type);
> > Sth is unclear for me here. Doesn't HNAE3_NONE_RESET mean that there is
> > no reset? If yes, why in this case reset_fail_cnt++ is increasing?
> > 
> > Maybe the check for NONE_RESET should be done in this else if check to
> > prevent reset_fail_cnt from increasing (and also solve the problem with
> > pending bit set)
> 
> @Michal: I don't understand your comment above. hclge_reset_err_handle()
> handles attempted reset failures. I don't see it triggered when
> reset_type == HNAE3_NONE_RESET.
> 

Maybe I missed sth. The hclge_set_reset_pending() is added to check if
reset type isn't HNAE3_NONE_RESET. If it is the set_bit isn't called. It
is the only place where hclge_set_reset_pending() is called with a
variable, so I assumed the fix is for this place.

This means that code can be reach here with HNAE3_NONE_RESET which is
unclear for me why to increment resets if rest_type in NONE. If it is
true that hclge_reset_err_handle() is never called with reset_type
HNAE3_NONE_RESET it shouldn't be needed to have the
hclge_set_reset_pending() function.

Following that I suggested to check if reset_type isn't NONE before
checking other things.

> >> @@ -4470,8 +4483,20 @@ static void hclge_reset_event(struct pci_dev *pdev, struct hnae3_handle *handle)
> >>  static void hclge_set_def_reset_request(struct hnae3_ae_dev *ae_dev,
> >>  					enum hnae3_reset_type rst_type)
> >>  {
> >> +#define HCLGE_SUPPORT_RESET_TYPE \
> >> +	(BIT(HNAE3_FLR_RESET) | BIT(HNAE3_FUNC_RESET) | \
> >> +	BIT(HNAE3_GLOBAL_RESET) | BIT(HNAE3_IMP_RESET))
> >> +
> >>  	struct hclge_dev *hdev = ae_dev->priv;
> >>  
> >> +	if (!(BIT(rst_type) & HCLGE_SUPPORT_RESET_TYPE)) {
> >> +		/* To prevent reset triggered by hclge_reset_event */
> >> +		set_bit(HNAE3_NONE_RESET, &hdev->default_reset_request);
> >> +		dev_warn(&hdev->pdev->dev, "unsupported reset type %d\n",
> >> +			 rst_type);
> >> +		return;
> >> +	}
> > Maybe (nit):
> > if (...) {
> > 	rst_type = 
> > 	dev_warn();
> > }
> > 
> > set_bit(rst_type, );
> > It is a little hard to follow with return in the if.
> 
> @Michal: I personally find the patch code quite readable, do you have
> strong opinions here?
> 
> >>  	set_bit(rst_type, &hdev->default_reset_request);
> >>  }
> >>  
> >> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
> >> index 2f6ffb88e700..fd0abe37fdd7 100644
> >> --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
> >> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
> >> @@ -1393,6 +1393,17 @@ static int hclgevf_notify_roce_client(struct hclgevf_dev *hdev,
> >>  	return ret;
> >>  }
> >>  
> >> +static void hclgevf_set_reset_pending(struct hclgevf_dev *hdev,
> >> +				      enum hnae3_reset_type reset_type)
> >> +{
> >> +	/* When an incorrect reset type is executed, the get_reset_level
> >> +	 * function generates the HNAE3_NONE_RESET flag. As a result, this
> >> +	 * type do not need to pending.
> >> +	 */
> >> +	if (reset_type != HNAE3_NONE_RESET)
> >> +		set_bit(reset_type, &hdev->reset_pending);
> >> +}
> > You already have a way to share the code between PF and VF, so please
> > move the same functions to common file in one direction up.
> 
> AFAICS this can't be shared short of a large refactor not suitable for
> net as the functions eligible for sharing operate on different structs
> with different layout (hclgevf_dev vs hclge_dev). Currently all the
> shared code operates on shared structs.
> 
> Cheers,
> 
> Paolo
> 

  reply	other threads:[~2024-12-19 10:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-17  1:08 [PATCH RESEND V2 net 0/7] There are some bugfix for the HNS3 ethernet driver Jijie Shao
2024-12-17  1:08 ` [PATCH RESEND V2 net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type Jijie Shao
2024-12-18  9:02   ` Michal Swiatkowski
2024-12-19  9:41     ` Paolo Abeni
2024-12-19 10:11       ` Michal Swiatkowski [this message]
2024-12-19 10:43         ` Paolo Abeni
2024-12-19 12:26           ` Jijie Shao
2025-01-06 14:41             ` Jijie Shao
2024-12-19 10:13       ` Michal Swiatkowski
2024-12-19 12:18       ` Jijie Shao
2024-12-17  1:08 ` [PATCH RESEND V2 net 2/7] net: hns3: fix missing features due to dev->features configuration too early Jijie Shao
2024-12-18  9:16   ` Michal Swiatkowski
2024-12-17  1:08 ` [PATCH RESEND V2 net 3/7] net: hns3: Resolved the issue that the debugfs query result is inconsistent Jijie Shao
2024-12-17  1:08 ` [PATCH RESEND V2 net 4/7] net: hns3: don't auto enable misc vector Jijie Shao
2024-12-17  1:08 ` [PATCH RESEND V2 net 5/7] net: hns3: initialize reset_timer before hclgevf_misc_irq_init() Jijie Shao
2024-12-18  9:20   ` Michal Swiatkowski
2024-12-19 11:48     ` Jijie Shao
2024-12-17  1:08 ` [PATCH RESEND V2 net 6/7] net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue Jijie Shao
2024-12-18  9:29   ` Michal Swiatkowski
2024-12-19  9:51     ` Paolo Abeni
2024-12-19 10:23       ` Michal Swiatkowski
2024-12-19 11:52     ` Jijie Shao
2024-12-17  1:08 ` [PATCH RESEND V2 net 7/7] net: hns3: fix kernel crash when 1588 is sent on HIP08 devices Jijie Shao
2024-12-18  9:30   ` Michal Swiatkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2PxQ8A5DObivci8@mev-dev.igk.intel.com \
    --to=michal.swiatkowski@linux.intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=chenhao418@huawei.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liuyonglong@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=salil.mehta@huawei.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=shaojijie@huawei.com \
    --cc=shenjian15@huawei.com \
    --cc=wangpeiyang1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.