From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACB84C05027 for ; Thu, 2 Feb 2023 13:08:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231277AbjBBNIq (ORCPT ); Thu, 2 Feb 2023 08:08:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230070AbjBBNIp (ORCPT ); Thu, 2 Feb 2023 08:08:45 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 979D88A7C6 for ; Thu, 2 Feb 2023 05:08:42 -0800 (PST) Received: from kwepemm600017.china.huawei.com (unknown [172.30.72.57]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4P6zZM1HxBz16McN; Thu, 2 Feb 2023 21:06:35 +0800 (CST) Received: from [10.67.101.149] (10.67.101.149) by kwepemm600017.china.huawei.com (7.193.23.234) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Thu, 2 Feb 2023 21:08:38 +0800 Subject: Re: [PATCH net-next 2/2] net: hns3: add vf fault process in hns3 ras To: Leon Romanovsky References: <20230113020829.48451-1-lanhao@huawei.com> <20230113020829.48451-3-lanhao@huawei.com> <3ce018d9-e005-f988-37ed-016c559973ec@huawei.com> <06188aca-7080-2506-1155-a739d84a420f@huawei.com> CC: Hao Lan , , , , , , , , , From: "wangjie (L)" Message-ID: Date: Thu, 2 Feb 2023 21:08:37 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.101.149] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600017.china.huawei.com (7.193.23.234) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 2023/1/31 21:24, Leon Romanovsky wrote: > On Tue, Jan 31, 2023 at 08:04:14PM +0800, wangjie (L) wrote: >> >> >> On 2023/1/21 1:12, Leon Romanovsky wrote: >>> On Wed, Jan 18, 2023 at 08:34:03PM +0800, wangjie (L) wrote: >>>> >>>> >>>> On 2023/1/17 19:21, Leon Romanovsky wrote: >>>>> On Tue, Jan 17, 2023 at 03:04:15PM +0800, wangjie (L) wrote: >>>>>> >>>>>> >>>>>> On 2023/1/13 14:51, Leon Romanovsky wrote: >>>>>>> On Fri, Jan 13, 2023 at 10:08:29AM +0800, Hao Lan wrote: >>>>>>>> From: Jie Wang >>>>>>>> >>>>>>>> Currently hns3 driver supports vf fault detect feature. Several ras caused >>>>>>>> by VF resources don't need to do PF function reset for recovery. The driver >>>>>>>> only needs to reset the specified VF. >>>>>>>> >>>>>>>> So this patch adds process in ras module. New process will get detailed >>>>>>>> information about ras and do the most correct measures based on these >>>>>>>> accurate information. >>>>>>>> >>>>>>>> Signed-off-by: Jie Wang >>>>>>>> Signed-off-by: Hao Lan >>>>>>>> --- >>>>>>>> drivers/net/ethernet/hisilicon/hns3/hnae3.h | 1 + >>>>>>>> .../hns3/hns3_common/hclge_comm_cmd.h | 1 + >>>>>>>> .../hisilicon/hns3/hns3pf/hclge_err.c | 113 +++++++++++++++++- >>>>>>>> .../hisilicon/hns3/hns3pf/hclge_err.h | 2 + >>>>>>>> .../hisilicon/hns3/hns3pf/hclge_main.c | 3 +- >>>>>>>> .../hisilicon/hns3/hns3pf/hclge_main.h | 1 + >>>>>>>> 6 files changed, 115 insertions(+), 6 deletions(-) >>>>>>> >>>>>>> Why is it good idea to reset VF from PF? >>>>>>> What will happen with driver bound to this VF? >>>>>>> Shouldn't PCI recovery handle it? >>>>>>> >>>>>>> Thanks >>>>>>> . >>>>>> PF doesn't reset VF directly. These VF faults are detected by hardware, >>>>>> and only reported to PF. PF get the VF id from firmware, then notify the VF >>>>>> that it needs reset. VF will do reset after receive the request. >>>>> >>>>> This description isn't aligned with the code. You are issuing >>>>> hclge_func_reset_cmd() command which will reset VF, while notification >>>>> are handled by hclge_func_reset_notify_vf(). >>>>> >>>>> It also doesn't make any sense to send notification event to VF through >>>>> FW while the goal is to recover from stuck FW in that VF. >>>>> >>>> Yes, I misunderstand the hclge_func_reset_notify_vf and >>>> hclge_func_reset_cmd. It should use hclge_func_reset_notify_vf to inform >>>> the VF for recovery. I will fix and retest it in V2. >>>> >>>> This patch is used to recover specific vf hardware errors, for example the >>>> tx queue configuration exceptions. It make sense in these cases for the >>>> firmware is still working properly and can do the recovery rightly. >>> >>> If FW is operational and knows about failure, why can't FW do recovery >>> internally to that VF without PF involvement? >> I'm sorry to reply so late because I took a vacation. If firmware reset VF >> hardware directly without notify the running VF driver, it will cause VF >> driver works abnormal. > > mlx5 health recovery code proves that it is possible to do. > Even in your case, FW can notify VF without PF in the middle. > > Thanks > These faults only report to PF in hns3 devices, even if devlink health is used in hns3 driver, these faults also need to report to PF. Thanks >> >> Thanks >>> >>> Thanks >>> . >>> > . >